CN116486107B - Optical flow calculation method, system, equipment and medium - Google Patents
Optical flow calculation method, system, equipment and medium Download PDFInfo
- Publication number
- CN116486107B CN116486107B CN202310735464.6A CN202310735464A CN116486107B CN 116486107 B CN116486107 B CN 116486107B CN 202310735464 A CN202310735464 A CN 202310735464A CN 116486107 B CN116486107 B CN 116486107B
- Authority
- CN
- China
- Prior art keywords
- image
- optical flow
- global
- motion information
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 97
- 238000004364 calculation method Methods 0.000 title claims description 30
- 238000000605 extraction Methods 0.000 claims abstract description 57
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 30
- 230000004913 activation Effects 0.000 claims description 26
- 238000010586 diagram Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 6
- 238000005111 flow chemistry technique Methods 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 15
- 238000006073 displacement reaction Methods 0.000 description 8
- 230000008859 change Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method, a system, equipment and a medium for calculating optical flow, and relates to the field of optical flow processing. The method comprises the following steps: acquiring a target image; the target image includes: two continuous frames of images, namely a first image and a second image; extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; determining a feature map of the first image and a feature map of the second image according to the motion features and the feature extraction channel number of the target image, and calculating the matching cost volumes of the two feature maps; extracting a context feature of the first image using a context encoder; based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image; the global-local loop optical flow decoder is constructed based on a depth separable residual block, a multi-layer perceptron block, a depth separable convolution module, and a multi-head attention module. The invention can improve the accuracy and the robustness of the optical flow estimation.
Description
Technical Field
The present invention relates to the field of optical flow processing, and in particular, to a method, a system, an apparatus, and a medium for optical flow calculation.
Background
The optical flow refers to a two-dimensional motion vector of a moving object and a scene surface pixel point in an image sequence, which not only provides the motion vector of the object and the scene in the image, but also carries rich shape and structure information. Optical flow estimation is therefore a research hotspot in the fields of image processing and computer vision. In many advanced visual tasks, such as motion recognition, video interpolation, video segmentation, object tracking, provide valuable motion correlation cues as a basis.
In recent years, with the advent of deep learning, optical flow estimation models based on convolutional neural networks (Convolutional Neural Network, CNN) have been highly successful. The method firstly utilizes a data-driven learning-based optimization strategy, and adopts a modeled feature encoder to extract image features. And then calculating the similarity of all the feature vectors between the feature images, taking a pair of feature vectors with highest similarity as matching points, and finally decoding the displacement field between the continuous frame images. Since the encoding and decoding processes require features with sufficient resolution, the problem of matching errors due to large displacement motion and local ambiguity (occlusion, weak texture, illumination variation, etc.) is reduced, so how to efficiently and accurately decode motion features becomes a key to improve the accuracy and robustness of optical flow estimation. However, the existing deep learning optical flow calculation model generally adopts a local convolution operation with limited receptive field to perform optical flow decoding, which results in insufficient extraction and expression capability of the model on image features, thereby affecting the overall performance of optical flow estimation.
Disclosure of Invention
Based on the above, the embodiment of the invention provides an optical flow calculation method, an optical flow calculation system, an optical flow calculation device and an optical flow calculation medium, so as to improve the accuracy and the robustness of optical flow estimation.
In order to achieve the above object, the embodiment of the present invention provides the following solutions:
an optical flow calculation method, comprising:
acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
Optionally, the motion feature extraction network specifically includes: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected;
the convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
Optionally, determining a feature map of the first image and a feature map of the second image according to the motion feature of the target image and the feature extraction channel number, and calculating a matching cost volume of the feature map of the first image and the feature map of the second image, which specifically includes:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
Optionally, the depth separable residual block specifically includes: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer and the second activation function are connected in sequence;
the convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions.
Optionally, the multi-layer perceptron block specifically includes: the fifth convolution layer, the third depth separable convolution layer, the third activation function and the sixth convolution layer are sequentially connected;
the convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
The present invention also provides an optical flow computing system comprising:
the image acquisition module is used for acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
the motion feature extraction module is used for extracting the motion features of the target image by adopting a motion feature extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
the matching cost calculation module is used for determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
a context feature extraction module for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
the optical flow field solving module is used for carrying out loop iteration solving by adopting a global-local loop optical flow decoder based on the matching cost volume and the context characteristics to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
The invention also provides an electronic device, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program to enable the electronic device to execute the optical flow calculation method.
The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the optical flow calculation method described above.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
aiming at the problem of insufficient feature extraction capability in the existing optical flow estimation model, the embodiment of the invention introduces a depth separable residual block and a multi-layer perceptron block to increase the receptive field, and constructs a global-local circulation optical flow decoder by means of the local characteristics of a depth local motion information encoder and the global characteristics of a global motion information encoder, wherein the global-local circulation optical flow decoder is used as an optical flow estimation model related to global and local motion information, and can improve the accuracy and the robustness of optical flow estimation of a large-displacement image area and a weak texture area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an optical flow calculation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a first image according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a second image according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a global-local loop optical flow decoder according to an embodiment of the present invention;
FIG. 5 is a visual image of an optical flow field provided by an embodiment of the present invention;
FIG. 6 is a block diagram of an optical flow computing system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
Referring to fig. 1, the optical flow calculation method of the present embodiment includes:
step 101: acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images.
In the present embodiment, successive thirty-first frame images and thirty-first frame images in a sequence of the b mboo_3 images are selected and input, wherein the thirty-first frame image is the first image I 1 As shown in fig. 2; thirty-first frame image as second image I 2 As shown in fig. 3.
Step 102: extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network includes a plurality of different sized convolutional layers.
Specifically, the step is that firstly, a motion feature extraction network is constructed: combining a plurality of stacked successive convolutions for a first image I 1 And a second image I 2 Performing motion feature extraction, wherein the input of the motion feature extraction network is a first image I 1 And a second image I 2 The stacked image F after the stacking is performed,h represents the height of the input image, W represents the width of the input image, and the motion characteristics F of two continuous frames of images are output M 。
The motion feature extraction network specifically comprises: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected. The convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
The motion feature extraction network is divided into 3 stages (Stage), namely Stage 1, stage 2 and Stage 3, wherein Stage 1 is 1/2 resolution, stage 2 is 1/4 resolution, and Stage 3 is 1/8 resolution. First, the 7×7 first convolution layer of Stage 1 is passed to control downsampling and extraction of features, then two 3×3 convolved residual blocks stacked in succession in Stage 2 and Stage 3, each downsampling the image by one Stage representation, finally, the two blocks are passed to the processing unitA second convolution layer of 1×1 adjusts the channel number and outputs the motion characteristic F of two continuous frames M . The specific calculation formula is as follows:
;
the above formula represents the process of extracting network extraction characteristics from motion characteristics, conv 7×7 (·)、Conv 1×1 (. Cndot.) means feature extraction of an image using a 7×7 first convolution layer and a 1×1 second convolution layer, respectively; convBlock 3×3 (. Cndot.) means that the feature extraction of an image is performed using a convolution residual block composed of a third convolution layer having a step size of 2 and a convolution kernel size of 3×3 and a fourth convolution layer having a step size of 1 and a convolution kernel size of 3×3.
Wherein,,
;
;
f 1 the third convolution layer with the step length of 2 and the convolution kernel size of 3 multiplied by 3 is used for carrying out downsampling and feature extraction on the input image x1, and then carrying out residual connection and output result after activating a function relu; f represents the f of the third convolution layer with a step size of 1 and a convolution kernel size of 3×3 to the input 1 After downsampling and feature extraction, the output result after the function relu is connected and activated through the residual error.
Step 103: and determining a feature map of the first image and a feature map of the second image according to the motion feature of the target image and the feature extraction channel number, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image.
The method specifically comprises the following steps:
(1) Motion characteristics F M The motion characteristic F is divided into two according to the number of the characteristic extraction channels M Mid-first half motion feature determinationMap F defined as the first image 1 Motion characteristics F M The second half of the motion features are determined as the feature map F of the second image 2 。
(2) And carrying out dot product similarity operation on the feature vectors on the feature images of the first image and the feature images of the second image, so that matching cost information between all relevant point pairs on the two feature images can be obtained, and the matching cost information of the feature images of the first image and the feature images of the second image is obtained.
(3) And downsampling the matching cost information by adopting pooling operation, so that the matching cost information of large displacement is converted into the matching cost information of small displacement, and the matching cost volumes of the feature map of the first image and the feature map of the second image are obtained, wherein the matching cost volumes represent the form of a multi-scale matching cost pyramid, and the calculation formula is as follows:
;
wherein Cost represents matching Cost information;representing a matrix multiplication operation; avgPool represents an average pooling operation;layer numbers representing the multi-scale matching cost pyramid; />Representing +.f. of multi-scale matching cost pyramid obtained after downsampling matching cost information>Matching cost volumes of layers. When->When the change occurs, the size of each feature map in the Cost will change, and the change in size will be determined by the step size of the average pooling operation. This embodimentAnd obtaining the matching cost volume of each layer so as to better perform optical flow estimation when large displacement and small displacement occur.
Step 104: extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as the structure of the motion feature extraction network.
The context encoder in this step and the motion feature extraction network in step 102 are in a parallel configuration with respect to each other. The context encoder combines a plurality of stacked successive convolutions for a first image I in the sequence of images 1 Context feature extraction is performed, the context encoder input being the first image I 1 ,Output is the contextual feature F of the first image C 。
Specifically, the context encoder is divided into 3 stages (Stage), stage 1, stage 2, and Stage 3, respectively, where Stage 1 is 1/2 resolution, stage 2 is 1/4 resolution, and Stage 3 is 1/8 resolution. Firstly, the 7×7 first convolution layer of Stage 1 is used for controlling downsampling and extracting features, then two 3×3 convolution residual blocks which are stacked in succession in Stage 2 and Stage 3 are used for downsampling an image by one time each, finally, the channel number is adjusted by a second convolution layer of 1×1, and the context feature F of the first image is output C . The calculation formula is as follows:
。
step 105: and based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image.
Referring to fig. 4, the global-local loop optical flow decoder includes: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder.
The various parts of the global-local loop optical flow decoder described above are described in further detail below in conjunction with fig. 4.
(1) A local motion information encoder.
The local motion information encoder includes: a depth separable residual block and a multi-layer perceptron block (Multilayer Perceptron, MLP) connected in sequence; the global motion information encoder includes: the depth separable convolution module and the multi-head attention module are connected in sequence.
The depth separable residual block specifically comprises: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer, and the second activation function are connected in sequence. The convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions. The second depth separable convolution layer in this embodiment specifically includes: and a seventh convolution layer with a convolution kernel size of 1 multiplied by 1, an eighth convolution layer with a convolution kernel size of 15 multiplied by 15 and a ninth convolution layer with a convolution kernel size of 1 multiplied by 1, which are sequentially connected, wherein a GELU activation function is adopted to activate the feature after each layer of convolution operation, and residual connection operation is carried out on the feature after each layer of convolution operation.
The multi-layer perceptron block specifically comprises: the fifth convolution layer, the third depth separable convolution layer, the third activation function, and the sixth convolution layer are connected in sequence. The convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
The local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics. Specifically, the input of the local motion information encoder is motion characteristics formed by matching motion information in the cost volume with the initial optical flow field, and the characteristics are input into the local motion information encoderPerforming cyclic iterative coding of motion features, and finally outputting local motion features of the current iterationThe calculation formula is as follows:
;
wherein,,representing the residual optical flow for each iteration, DLP (·) represents a depth separable MLP block; cat (·) represents the operation of connecting the feature graphs, namely splicing a plurality of feature graphs with the same resolution according to the channel dimension; />Representing +.1 convolutional layer for input>Extracting features; f (F) f Representing features extracted from the optical flow of the iteration; f (F) cost Representing local motion characteristics which are extracted from the matching cost volume through the optical flow of the iteration and are obtained according to the matching cost volume; f (F) m Representing local motion features enhanced by DLP blocks; />Representing the local motion characteristics of the current iteration. RDSCBlocks (·) represents a depth separable residual block; MLP (&) represents a multi-layer perceptron block implemented by convolution; dwConv 7×7 (x 2) means feature extraction of the input image x2 using a first depth separable convolution layer having a convolution kernel size of 7 x 7; dwConv 15×15,d (. Cndot.) means feature extraction of an image using a second depth separable convolution layer having a convolution kernel size of 15 x 15; GELU (·) represents a GELU activation function; conv 1×1 (x 3) means that the fifth convolution layer with the convolution kernel size of 1×1 is used to perform feature extraction on the input image x 3; dwConv 3×3 (. Cndot.) represents feature extraction of an image using a third depth separable convolution layer having a convolution kernel size of 3 x 3.
(2) Global motion information encoder.
The global motion information encoder includes: the depth separable convolution module and the multi-head attention module are connected in sequence.
The global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information. In particular, local motion information encoder output local motion features by constructing a depth separable convolution module and a multi-head attention module with local position codingAnd context feature F output by the context feature encoder C Input to global motion information encoder, and finally output global motion information F g The calculation formula is as follows:
;
global motion information encoder pair context feature F C Coding to obtain query vector q c And key vector k c . i represents the abscissa of a certain point in the query vector, and j represents the ordinate of a certain point in the query vector. u represents the abscissa of a point in the key vector and v represents the ordinate of a point in the key vector. q c (i, j) represents a characteristic value, k, of a point in the query vector c (u, v) represents a feature value of a certain point in the key vector. v m Representing the motion according to local motion features F l Constructing a value vector; v m (u, v) represents a feature value of a certain point in the value vector. Gamma is a learnable factor; f (·) represents a point-by-point attention function. F (F) l (i, j) represents a feature value of a certain point in the local motion feature.
(3) Global-local motion information decoder.
The structure of the global-local motion information decoder is the same as that of the local motion information encoder, and will not be described again.
The global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
Specifically, the input of the global-local motion information decoder is global and local motion information formed by aggregating local motion characteristics output by the local motion information encoder and other part of hidden state characteristics output by the global motion information encoder and the context characteristic encoder, and the output is residual error optical flow of the current iterationThe calculation formula is as follows:
;
F a representing aggregated features aggregated from global motion information, local motion features, and contextual features;representing the global-local motion characteristics of the iteration; />Representing the global-local motion characteristics of the last iteration.
The global-local loop optical flow decoder of the embodiment optimizes the optical flow field in a loop iteration mode for n times, and upsamples the optical flow after the last iteration optimization to the same resolution as the input image so as to obtain a final optical flow field. Wherein, during the initial iteration, the initial optical flow field f is set to 0, and the residual optical flow is setIs 0, lead toAfter iterating the optical flow, the optical flow field after each iteration update +.>And calculating the residual error light flow of the last iteration to determine the final light flow field. The final optical flow field visualization is shown in fig. 5.
The embodiment utilizes the local modeling capability of the depth separable convolution residual block with the large convolution kernel and the remote modeling capability of the local position coding transducer to improve the capturing capability of motion characteristics and capture the motion relation among more pixels, thereby optimizing the optical flow estimation accuracy of a large displacement image area and a weak texture image area, reducing errors caused by local information and ensuring the reliability and the robustness of optical flow estimation.
Example two
In order to perform a corresponding method of the above embodiment to achieve the corresponding functions and technical effects, an optical flow computing system is provided below.
Referring to fig. 6, the system includes:
an image acquisition module 601, configured to acquire a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images.
A motion feature extraction module 602, configured to extract a motion feature of the target image using a motion feature extraction network; the motion feature extraction network includes a plurality of different sized convolutional layers.
The matching cost calculation module 603 is configured to determine a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculate a matching cost volume of the feature map of the first image and the feature map of the second image.
A context feature extraction module 604 for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as the structure of the motion feature extraction network.
And the optical flow field solving module 605 is configured to perform loop iteration solving by using a global-local loop optical flow decoder based on the matching cost volume and the context feature, so as to obtain an optical flow field of the target image.
Wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder.
The local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: the depth separable convolution module and the multi-head attention module are connected in sequence.
The local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics. The global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information. The global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
Example III
The present embodiment provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor runs the computer program to enable the electronic device to execute the optical flow calculation method of the first embodiment.
Alternatively, the electronic device may be a server.
In addition, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the optical flow calculation method of the first embodiment.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (7)
1. An optical flow calculation method, comprising:
acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image;
according to the motion characteristics and the characteristic extraction channel number of the target image, determining the characteristic diagram of the first image and the characteristic diagram of the second image, and calculating the matching cost volume of the characteristic diagram of the first image and the characteristic diagram of the second image, wherein the method specifically comprises the following steps:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
2. The optical flow computing method according to claim 1, characterized in that the motion feature extraction network specifically comprises: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected;
the convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
3. The optical flow computing method according to claim 1, characterized in that the depth separable residual block comprises: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer and the second activation function are connected in sequence;
the convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions.
4. The optical flow computing method according to claim 1, wherein the multi-layer perceptron block specifically comprises: the fifth convolution layer, the third depth separable convolution layer, the third activation function and the sixth convolution layer are sequentially connected;
the convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
5. An optical flow computing system, comprising:
the image acquisition module is used for acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
the motion feature extraction module is used for extracting the motion features of the target image by adopting a motion feature extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
the matching cost calculation module is used for determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
a context feature extraction module for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
the optical flow field solving module is used for carrying out loop iteration solving by adopting a global-local loop optical flow decoder based on the matching cost volume and the context characteristics to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image;
according to the motion characteristics and the characteristic extraction channel number of the target image, determining the characteristic diagram of the first image and the characteristic diagram of the second image, and calculating the matching cost volume of the characteristic diagram of the first image and the characteristic diagram of the second image, wherein the method specifically comprises the following steps:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
6. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the optical flow calculation method of any one of claims 1 to 4.
7. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the optical flow calculation method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310735464.6A CN116486107B (en) | 2023-06-21 | 2023-06-21 | Optical flow calculation method, system, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310735464.6A CN116486107B (en) | 2023-06-21 | 2023-06-21 | Optical flow calculation method, system, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116486107A CN116486107A (en) | 2023-07-25 |
CN116486107B true CN116486107B (en) | 2023-09-05 |
Family
ID=87219922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310735464.6A Active CN116486107B (en) | 2023-06-21 | 2023-06-21 | Optical flow calculation method, system, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116486107B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118381927B (en) * | 2024-06-24 | 2024-08-23 | 杭州宇泛智能科技股份有限公司 | Dynamic point cloud compression method, system, storage medium and device based on multi-mode bidirectional circulating scene flow |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101713986B1 (en) * | 2016-02-17 | 2017-03-08 | 한국항공대학교산학협력단 | Optical flow estimator for moving object detection and method thereof |
CN106973293A (en) * | 2017-04-21 | 2017-07-21 | 中国科学技术大学 | The light field image coding method predicted based on parallax |
CN111626308A (en) * | 2020-04-22 | 2020-09-04 | 上海交通大学 | Real-time optical flow estimation method based on lightweight convolutional neural network |
WO2021035807A1 (en) * | 2019-08-23 | 2021-03-04 | 深圳大学 | Target tracking method and device fusing optical flow information and siamese framework |
CN112686952A (en) * | 2020-12-10 | 2021-04-20 | 中国科学院深圳先进技术研究院 | Image optical flow computing system, method and application |
WO2021201438A1 (en) * | 2020-04-01 | 2021-10-07 | Samsung Electronics Co., Ltd. | System and method for motion warping using multi-exposure frames |
CN113554039A (en) * | 2021-07-27 | 2021-10-26 | 广东工业大学 | Method and system for generating optical flow graph of dynamic image based on multi-attention machine system |
CN114299105A (en) * | 2021-08-04 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN114565880A (en) * | 2022-04-28 | 2022-05-31 | 武汉大学 | Method, system and equipment for detecting counterfeit video based on optical flow tracking |
CN114677412A (en) * | 2022-03-18 | 2022-06-28 | 苏州大学 | Method, device and equipment for estimating optical flow |
CN114913196A (en) * | 2021-12-28 | 2022-08-16 | 天翼数字生活科技有限公司 | Attention-based dense optical flow calculation method |
CN115018888A (en) * | 2022-07-04 | 2022-09-06 | 东南大学 | Optical flow unsupervised estimation method based on Transformer |
CN115170826A (en) * | 2022-07-08 | 2022-10-11 | 杭州电子科技大学 | Local search-based fast optical flow estimation method for small moving target and storage medium |
CN115272423A (en) * | 2022-09-19 | 2022-11-01 | 深圳比特微电子科技有限公司 | Method and device for training optical flow estimation model and readable storage medium |
CN115690170A (en) * | 2022-10-08 | 2023-02-03 | 苏州大学 | Method and system for self-adaptive optical flow estimation aiming at different-scale targets |
CN115731263A (en) * | 2022-10-28 | 2023-03-03 | 苏州工业园区服务外包职业学院 | Optical flow calculation method, system, device and medium fusing shift window attention |
CN115830090A (en) * | 2022-12-01 | 2023-03-21 | 大连理工大学 | Self-supervision monocular depth prediction training method for predicting camera attitude based on pixel matching |
CN115861384A (en) * | 2023-02-27 | 2023-03-28 | 广东工业大学 | Optical flow estimation method and system based on generation of countermeasure and attention mechanism |
WO2023056730A1 (en) * | 2021-10-09 | 2023-04-13 | 深圳市中兴微电子技术有限公司 | Video image augmentation method, network training method, electronic device and storage medium |
CN116091793A (en) * | 2023-02-27 | 2023-05-09 | 南京邮电大学 | Light field significance detection method based on optical flow fusion |
CN116205953A (en) * | 2023-04-12 | 2023-06-02 | 华中科技大学 | Optical flow estimation method and device based on hierarchical total-correlation cost body aggregation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11620328B2 (en) * | 2020-06-22 | 2023-04-04 | International Business Machines Corporation | Speech to media translation |
-
2023
- 2023-06-21 CN CN202310735464.6A patent/CN116486107B/en active Active
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101713986B1 (en) * | 2016-02-17 | 2017-03-08 | 한국항공대학교산학협력단 | Optical flow estimator for moving object detection and method thereof |
CN106973293A (en) * | 2017-04-21 | 2017-07-21 | 中国科学技术大学 | The light field image coding method predicted based on parallax |
WO2021035807A1 (en) * | 2019-08-23 | 2021-03-04 | 深圳大学 | Target tracking method and device fusing optical flow information and siamese framework |
WO2021201438A1 (en) * | 2020-04-01 | 2021-10-07 | Samsung Electronics Co., Ltd. | System and method for motion warping using multi-exposure frames |
CN111626308A (en) * | 2020-04-22 | 2020-09-04 | 上海交通大学 | Real-time optical flow estimation method based on lightweight convolutional neural network |
CN112686952A (en) * | 2020-12-10 | 2021-04-20 | 中国科学院深圳先进技术研究院 | Image optical flow computing system, method and application |
CN113554039A (en) * | 2021-07-27 | 2021-10-26 | 广东工业大学 | Method and system for generating optical flow graph of dynamic image based on multi-attention machine system |
CN114299105A (en) * | 2021-08-04 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, computer equipment and storage medium |
WO2023056730A1 (en) * | 2021-10-09 | 2023-04-13 | 深圳市中兴微电子技术有限公司 | Video image augmentation method, network training method, electronic device and storage medium |
CN114913196A (en) * | 2021-12-28 | 2022-08-16 | 天翼数字生活科技有限公司 | Attention-based dense optical flow calculation method |
CN114677412A (en) * | 2022-03-18 | 2022-06-28 | 苏州大学 | Method, device and equipment for estimating optical flow |
CN114565880A (en) * | 2022-04-28 | 2022-05-31 | 武汉大学 | Method, system and equipment for detecting counterfeit video based on optical flow tracking |
CN115018888A (en) * | 2022-07-04 | 2022-09-06 | 东南大学 | Optical flow unsupervised estimation method based on Transformer |
CN115170826A (en) * | 2022-07-08 | 2022-10-11 | 杭州电子科技大学 | Local search-based fast optical flow estimation method for small moving target and storage medium |
CN115272423A (en) * | 2022-09-19 | 2022-11-01 | 深圳比特微电子科技有限公司 | Method and device for training optical flow estimation model and readable storage medium |
CN115690170A (en) * | 2022-10-08 | 2023-02-03 | 苏州大学 | Method and system for self-adaptive optical flow estimation aiming at different-scale targets |
CN115731263A (en) * | 2022-10-28 | 2023-03-03 | 苏州工业园区服务外包职业学院 | Optical flow calculation method, system, device and medium fusing shift window attention |
CN115830090A (en) * | 2022-12-01 | 2023-03-21 | 大连理工大学 | Self-supervision monocular depth prediction training method for predicting camera attitude based on pixel matching |
CN115861384A (en) * | 2023-02-27 | 2023-03-28 | 广东工业大学 | Optical flow estimation method and system based on generation of countermeasure and attention mechanism |
CN116091793A (en) * | 2023-02-27 | 2023-05-09 | 南京邮电大学 | Light field significance detection method based on optical flow fusion |
CN116205953A (en) * | 2023-04-12 | 2023-06-02 | 华中科技大学 | Optical flow estimation method and device based on hierarchical total-correlation cost body aggregation |
Non-Patent Citations (1)
Title |
---|
SS-SF:Piecewise 3D Scene Flow Estimatin With Semantic Segmentation;Cheng Feng等;《IEEE Access》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116486107A (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220114750A1 (en) | Map constructing method, positioning method and wireless communication terminal | |
CN111985343A (en) | Method for constructing behavior recognition deep network model and behavior recognition method | |
CN109389667B (en) | High-efficiency global illumination drawing method based on deep learning | |
TWI791405B (en) | Method for depth estimation for variable focus camera, computer system and computer-readable storage medium | |
CN112232134B (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN114677412B (en) | Optical flow estimation method, device and equipment | |
CN116486107B (en) | Optical flow calculation method, system, equipment and medium | |
CN113850900B (en) | Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction | |
CN112084849A (en) | Image recognition method and device | |
CN115761594B (en) | Optical flow calculation method based on global and local coupling | |
CN114723787A (en) | Optical flow calculation method and system | |
CN111612825A (en) | Image sequence motion occlusion detection method based on optical flow and multi-scale context | |
CN111294614B (en) | Method and apparatus for digital image, audio or video data processing | |
CN114708436B (en) | Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium | |
CN103208109B (en) | A kind of unreal structure method of face embedded based on local restriction iteration neighborhood | |
CN111738092A (en) | Method for recovering shielded human body posture sequence based on deep learning | |
CN115861384B (en) | Optical flow estimation method and system based on countermeasure and attention mechanism generation | |
CN115035173B (en) | Monocular depth estimation method and system based on inter-frame correlation | |
CN116416649A (en) | Video pedestrian re-identification method based on multi-scale resolution alignment | |
CN113222016B (en) | Change detection method and device based on cross enhancement of high-level and low-level features | |
CN114399648A (en) | Behavior recognition method and apparatus, storage medium, and electronic device | |
CN113239771A (en) | Attitude estimation method, system and application thereof | |
CN115661929B (en) | Time sequence feature coding method and device, electronic equipment and storage medium | |
CN115082295B (en) | Image editing method and device based on self-attention mechanism | |
Wang et al. | E-HANet: Event-based hybrid attention network for optical flow estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |