CN116486107B - Optical flow calculation method, system, equipment and medium - Google Patents

Optical flow calculation method, system, equipment and medium Download PDF

Info

Publication number
CN116486107B
CN116486107B CN202310735464.6A CN202310735464A CN116486107B CN 116486107 B CN116486107 B CN 116486107B CN 202310735464 A CN202310735464 A CN 202310735464A CN 116486107 B CN116486107 B CN 116486107B
Authority
CN
China
Prior art keywords
image
optical flow
global
motion information
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310735464.6A
Other languages
Chinese (zh)
Other versions
CN116486107A (en
Inventor
王子旭
葛利跃
陈震
张聪炫
卢锋
吕科
胡卫明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hangkong University
Original Assignee
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hangkong University filed Critical Nanchang Hangkong University
Priority to CN202310735464.6A priority Critical patent/CN116486107B/en
Publication of CN116486107A publication Critical patent/CN116486107A/en
Application granted granted Critical
Publication of CN116486107B publication Critical patent/CN116486107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system, equipment and a medium for calculating optical flow, and relates to the field of optical flow processing. The method comprises the following steps: acquiring a target image; the target image includes: two continuous frames of images, namely a first image and a second image; extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; determining a feature map of the first image and a feature map of the second image according to the motion features and the feature extraction channel number of the target image, and calculating the matching cost volumes of the two feature maps; extracting a context feature of the first image using a context encoder; based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image; the global-local loop optical flow decoder is constructed based on a depth separable residual block, a multi-layer perceptron block, a depth separable convolution module, and a multi-head attention module. The invention can improve the accuracy and the robustness of the optical flow estimation.

Description

Optical flow calculation method, system, equipment and medium
Technical Field
The present invention relates to the field of optical flow processing, and in particular, to a method, a system, an apparatus, and a medium for optical flow calculation.
Background
The optical flow refers to a two-dimensional motion vector of a moving object and a scene surface pixel point in an image sequence, which not only provides the motion vector of the object and the scene in the image, but also carries rich shape and structure information. Optical flow estimation is therefore a research hotspot in the fields of image processing and computer vision. In many advanced visual tasks, such as motion recognition, video interpolation, video segmentation, object tracking, provide valuable motion correlation cues as a basis.
In recent years, with the advent of deep learning, optical flow estimation models based on convolutional neural networks (Convolutional Neural Network, CNN) have been highly successful. The method firstly utilizes a data-driven learning-based optimization strategy, and adopts a modeled feature encoder to extract image features. And then calculating the similarity of all the feature vectors between the feature images, taking a pair of feature vectors with highest similarity as matching points, and finally decoding the displacement field between the continuous frame images. Since the encoding and decoding processes require features with sufficient resolution, the problem of matching errors due to large displacement motion and local ambiguity (occlusion, weak texture, illumination variation, etc.) is reduced, so how to efficiently and accurately decode motion features becomes a key to improve the accuracy and robustness of optical flow estimation. However, the existing deep learning optical flow calculation model generally adopts a local convolution operation with limited receptive field to perform optical flow decoding, which results in insufficient extraction and expression capability of the model on image features, thereby affecting the overall performance of optical flow estimation.
Disclosure of Invention
Based on the above, the embodiment of the invention provides an optical flow calculation method, an optical flow calculation system, an optical flow calculation device and an optical flow calculation medium, so as to improve the accuracy and the robustness of optical flow estimation.
In order to achieve the above object, the embodiment of the present invention provides the following solutions:
an optical flow calculation method, comprising:
acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
Optionally, the motion feature extraction network specifically includes: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected;
the convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
Optionally, determining a feature map of the first image and a feature map of the second image according to the motion feature of the target image and the feature extraction channel number, and calculating a matching cost volume of the feature map of the first image and the feature map of the second image, which specifically includes:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
Optionally, the depth separable residual block specifically includes: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer and the second activation function are connected in sequence;
the convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions.
Optionally, the multi-layer perceptron block specifically includes: the fifth convolution layer, the third depth separable convolution layer, the third activation function and the sixth convolution layer are sequentially connected;
the convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
The present invention also provides an optical flow computing system comprising:
the image acquisition module is used for acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
the motion feature extraction module is used for extracting the motion features of the target image by adopting a motion feature extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
the matching cost calculation module is used for determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
a context feature extraction module for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
the optical flow field solving module is used for carrying out loop iteration solving by adopting a global-local loop optical flow decoder based on the matching cost volume and the context characteristics to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
The invention also provides an electronic device, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program to enable the electronic device to execute the optical flow calculation method.
The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the optical flow calculation method described above.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
aiming at the problem of insufficient feature extraction capability in the existing optical flow estimation model, the embodiment of the invention introduces a depth separable residual block and a multi-layer perceptron block to increase the receptive field, and constructs a global-local circulation optical flow decoder by means of the local characteristics of a depth local motion information encoder and the global characteristics of a global motion information encoder, wherein the global-local circulation optical flow decoder is used as an optical flow estimation model related to global and local motion information, and can improve the accuracy and the robustness of optical flow estimation of a large-displacement image area and a weak texture area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an optical flow calculation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a first image according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a second image according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a global-local loop optical flow decoder according to an embodiment of the present invention;
FIG. 5 is a visual image of an optical flow field provided by an embodiment of the present invention;
FIG. 6 is a block diagram of an optical flow computing system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1
Referring to fig. 1, the optical flow calculation method of the present embodiment includes:
step 101: acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images.
In the present embodiment, successive thirty-first frame images and thirty-first frame images in a sequence of the b mboo_3 images are selected and input, wherein the thirty-first frame image is the first image I 1 As shown in fig. 2; thirty-first frame image as second image I 2 As shown in fig. 3.
Step 102: extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network includes a plurality of different sized convolutional layers.
Specifically, the step is that firstly, a motion feature extraction network is constructed: combining a plurality of stacked successive convolutions for a first image I 1 And a second image I 2 Performing motion feature extraction, wherein the input of the motion feature extraction network is a first image I 1 And a second image I 2 The stacked image F after the stacking is performed,h represents the height of the input image, W represents the width of the input image, and the motion characteristics F of two continuous frames of images are output M
The motion feature extraction network specifically comprises: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected. The convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
The motion feature extraction network is divided into 3 stages (Stage), namely Stage 1, stage 2 and Stage 3, wherein Stage 1 is 1/2 resolution, stage 2 is 1/4 resolution, and Stage 3 is 1/8 resolution. First, the 7×7 first convolution layer of Stage 1 is passed to control downsampling and extraction of features, then two 3×3 convolved residual blocks stacked in succession in Stage 2 and Stage 3, each downsampling the image by one Stage representation, finally, the two blocks are passed to the processing unitA second convolution layer of 1×1 adjusts the channel number and outputs the motion characteristic F of two continuous frames M . The specific calculation formula is as follows:
the above formula represents the process of extracting network extraction characteristics from motion characteristics, conv 7×7 (·)、Conv 1×1 (. Cndot.) means feature extraction of an image using a 7×7 first convolution layer and a 1×1 second convolution layer, respectively; convBlock 3×3 (. Cndot.) means that the feature extraction of an image is performed using a convolution residual block composed of a third convolution layer having a step size of 2 and a convolution kernel size of 3×3 and a fourth convolution layer having a step size of 1 and a convolution kernel size of 3×3.
Wherein,,
f 1 the third convolution layer with the step length of 2 and the convolution kernel size of 3 multiplied by 3 is used for carrying out downsampling and feature extraction on the input image x1, and then carrying out residual connection and output result after activating a function relu; f represents the f of the third convolution layer with a step size of 1 and a convolution kernel size of 3×3 to the input 1 After downsampling and feature extraction, the output result after the function relu is connected and activated through the residual error.
Step 103: and determining a feature map of the first image and a feature map of the second image according to the motion feature of the target image and the feature extraction channel number, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image.
The method specifically comprises the following steps:
(1) Motion characteristics F M The motion characteristic F is divided into two according to the number of the characteristic extraction channels M Mid-first half motion feature determinationMap F defined as the first image 1 Motion characteristics F M The second half of the motion features are determined as the feature map F of the second image 2
(2) And carrying out dot product similarity operation on the feature vectors on the feature images of the first image and the feature images of the second image, so that matching cost information between all relevant point pairs on the two feature images can be obtained, and the matching cost information of the feature images of the first image and the feature images of the second image is obtained.
(3) And downsampling the matching cost information by adopting pooling operation, so that the matching cost information of large displacement is converted into the matching cost information of small displacement, and the matching cost volumes of the feature map of the first image and the feature map of the second image are obtained, wherein the matching cost volumes represent the form of a multi-scale matching cost pyramid, and the calculation formula is as follows:
wherein Cost represents matching Cost information;representing a matrix multiplication operation; avgPool represents an average pooling operation;layer numbers representing the multi-scale matching cost pyramid; />Representing +.f. of multi-scale matching cost pyramid obtained after downsampling matching cost information>Matching cost volumes of layers. When->When the change occurs, the size of each feature map in the Cost will change, and the change in size will be determined by the step size of the average pooling operation. This embodimentAnd obtaining the matching cost volume of each layer so as to better perform optical flow estimation when large displacement and small displacement occur.
Step 104: extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as the structure of the motion feature extraction network.
The context encoder in this step and the motion feature extraction network in step 102 are in a parallel configuration with respect to each other. The context encoder combines a plurality of stacked successive convolutions for a first image I in the sequence of images 1 Context feature extraction is performed, the context encoder input being the first image I 1Output is the contextual feature F of the first image C
Specifically, the context encoder is divided into 3 stages (Stage), stage 1, stage 2, and Stage 3, respectively, where Stage 1 is 1/2 resolution, stage 2 is 1/4 resolution, and Stage 3 is 1/8 resolution. Firstly, the 7×7 first convolution layer of Stage 1 is used for controlling downsampling and extracting features, then two 3×3 convolution residual blocks which are stacked in succession in Stage 2 and Stage 3 are used for downsampling an image by one time each, finally, the channel number is adjusted by a second convolution layer of 1×1, and the context feature F of the first image is output C . The calculation formula is as follows:
step 105: and based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image.
Referring to fig. 4, the global-local loop optical flow decoder includes: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder.
The various parts of the global-local loop optical flow decoder described above are described in further detail below in conjunction with fig. 4.
(1) A local motion information encoder.
The local motion information encoder includes: a depth separable residual block and a multi-layer perceptron block (Multilayer Perceptron, MLP) connected in sequence; the global motion information encoder includes: the depth separable convolution module and the multi-head attention module are connected in sequence.
The depth separable residual block specifically comprises: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer, and the second activation function are connected in sequence. The convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions. The second depth separable convolution layer in this embodiment specifically includes: and a seventh convolution layer with a convolution kernel size of 1 multiplied by 1, an eighth convolution layer with a convolution kernel size of 15 multiplied by 15 and a ninth convolution layer with a convolution kernel size of 1 multiplied by 1, which are sequentially connected, wherein a GELU activation function is adopted to activate the feature after each layer of convolution operation, and residual connection operation is carried out on the feature after each layer of convolution operation.
The multi-layer perceptron block specifically comprises: the fifth convolution layer, the third depth separable convolution layer, the third activation function, and the sixth convolution layer are connected in sequence. The convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
The local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics. Specifically, the input of the local motion information encoder is motion characteristics formed by matching motion information in the cost volume with the initial optical flow field, and the characteristics are input into the local motion information encoderPerforming cyclic iterative coding of motion features, and finally outputting local motion features of the current iterationThe calculation formula is as follows:
wherein,,representing the residual optical flow for each iteration, DLP (·) represents a depth separable MLP block; cat (·) represents the operation of connecting the feature graphs, namely splicing a plurality of feature graphs with the same resolution according to the channel dimension; />Representing +.1 convolutional layer for input>Extracting features; f (F) f Representing features extracted from the optical flow of the iteration; f (F) cost Representing local motion characteristics which are extracted from the matching cost volume through the optical flow of the iteration and are obtained according to the matching cost volume; f (F) m Representing local motion features enhanced by DLP blocks; />Representing the local motion characteristics of the current iteration. RDSCBlocks (·) represents a depth separable residual block; MLP (&) represents a multi-layer perceptron block implemented by convolution; dwConv 7×7 (x 2) means feature extraction of the input image x2 using a first depth separable convolution layer having a convolution kernel size of 7 x 7; dwConv 15×15,d (. Cndot.) means feature extraction of an image using a second depth separable convolution layer having a convolution kernel size of 15 x 15; GELU (·) represents a GELU activation function; conv 1×1 (x 3) means that the fifth convolution layer with the convolution kernel size of 1×1 is used to perform feature extraction on the input image x 3; dwConv 3×3 (. Cndot.) represents feature extraction of an image using a third depth separable convolution layer having a convolution kernel size of 3 x 3.
(2) Global motion information encoder.
The global motion information encoder includes: the depth separable convolution module and the multi-head attention module are connected in sequence.
The global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information. In particular, local motion information encoder output local motion features by constructing a depth separable convolution module and a multi-head attention module with local position codingAnd context feature F output by the context feature encoder C Input to global motion information encoder, and finally output global motion information F g The calculation formula is as follows:
global motion information encoder pair context feature F C Coding to obtain query vector q c And key vector k c . i represents the abscissa of a certain point in the query vector, and j represents the ordinate of a certain point in the query vector. u represents the abscissa of a point in the key vector and v represents the ordinate of a point in the key vector. q c (i, j) represents a characteristic value, k, of a point in the query vector c (u, v) represents a feature value of a certain point in the key vector. v m Representing the motion according to local motion features F l Constructing a value vector; v m (u, v) represents a feature value of a certain point in the value vector. Gamma is a learnable factor; f (·) represents a point-by-point attention function. F (F) l (i, j) represents a feature value of a certain point in the local motion feature.
(3) Global-local motion information decoder.
The structure of the global-local motion information decoder is the same as that of the local motion information encoder, and will not be described again.
The global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
Specifically, the input of the global-local motion information decoder is global and local motion information formed by aggregating local motion characteristics output by the local motion information encoder and other part of hidden state characteristics output by the global motion information encoder and the context characteristic encoder, and the output is residual error optical flow of the current iterationThe calculation formula is as follows:
F a representing aggregated features aggregated from global motion information, local motion features, and contextual features;representing the global-local motion characteristics of the iteration; />Representing the global-local motion characteristics of the last iteration.
The global-local loop optical flow decoder of the embodiment optimizes the optical flow field in a loop iteration mode for n times, and upsamples the optical flow after the last iteration optimization to the same resolution as the input image so as to obtain a final optical flow field. Wherein, during the initial iteration, the initial optical flow field f is set to 0, and the residual optical flow is setIs 0, lead toAfter iterating the optical flow, the optical flow field after each iteration update +.>And calculating the residual error light flow of the last iteration to determine the final light flow field. The final optical flow field visualization is shown in fig. 5.
The embodiment utilizes the local modeling capability of the depth separable convolution residual block with the large convolution kernel and the remote modeling capability of the local position coding transducer to improve the capturing capability of motion characteristics and capture the motion relation among more pixels, thereby optimizing the optical flow estimation accuracy of a large displacement image area and a weak texture image area, reducing errors caused by local information and ensuring the reliability and the robustness of optical flow estimation.
Example two
In order to perform a corresponding method of the above embodiment to achieve the corresponding functions and technical effects, an optical flow computing system is provided below.
Referring to fig. 6, the system includes:
an image acquisition module 601, configured to acquire a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images.
A motion feature extraction module 602, configured to extract a motion feature of the target image using a motion feature extraction network; the motion feature extraction network includes a plurality of different sized convolutional layers.
The matching cost calculation module 603 is configured to determine a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculate a matching cost volume of the feature map of the first image and the feature map of the second image.
A context feature extraction module 604 for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as the structure of the motion feature extraction network.
And the optical flow field solving module 605 is configured to perform loop iteration solving by using a global-local loop optical flow decoder based on the matching cost volume and the context feature, so as to obtain an optical flow field of the target image.
Wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder.
The local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: the depth separable convolution module and the multi-head attention module are connected in sequence.
The local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics. The global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information. The global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image.
Example III
The present embodiment provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor runs the computer program to enable the electronic device to execute the optical flow calculation method of the first embodiment.
Alternatively, the electronic device may be a server.
In addition, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the optical flow calculation method of the first embodiment.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (7)

1. An optical flow calculation method, comprising:
acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image;
according to the motion characteristics and the characteristic extraction channel number of the target image, determining the characteristic diagram of the first image and the characteristic diagram of the second image, and calculating the matching cost volume of the characteristic diagram of the first image and the characteristic diagram of the second image, wherein the method specifically comprises the following steps:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
2. The optical flow computing method according to claim 1, characterized in that the motion feature extraction network specifically comprises: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected;
the convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
3. The optical flow computing method according to claim 1, characterized in that the depth separable residual block comprises: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer and the second activation function are connected in sequence;
the convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions.
4. The optical flow computing method according to claim 1, wherein the multi-layer perceptron block specifically comprises: the fifth convolution layer, the third depth separable convolution layer, the third activation function and the sixth convolution layer are sequentially connected;
the convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
5. An optical flow computing system, comprising:
the image acquisition module is used for acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
the motion feature extraction module is used for extracting the motion features of the target image by adopting a motion feature extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
the matching cost calculation module is used for determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
a context feature extraction module for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
the optical flow field solving module is used for carrying out loop iteration solving by adopting a global-local loop optical flow decoder based on the matching cost volume and the context characteristics to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image;
according to the motion characteristics and the characteristic extraction channel number of the target image, determining the characteristic diagram of the first image and the characteristic diagram of the second image, and calculating the matching cost volume of the characteristic diagram of the first image and the characteristic diagram of the second image, wherein the method specifically comprises the following steps:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
6. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the optical flow calculation method of any one of claims 1 to 4.
7. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the optical flow calculation method according to any one of claims 1 to 4.
CN202310735464.6A 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium Active CN116486107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310735464.6A CN116486107B (en) 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310735464.6A CN116486107B (en) 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN116486107A CN116486107A (en) 2023-07-25
CN116486107B true CN116486107B (en) 2023-09-05

Family

ID=87219922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310735464.6A Active CN116486107B (en) 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116486107B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118381927B (en) * 2024-06-24 2024-08-23 杭州宇泛智能科技股份有限公司 Dynamic point cloud compression method, system, storage medium and device based on multi-mode bidirectional circulating scene flow

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713986B1 (en) * 2016-02-17 2017-03-08 한국항공대학교산학협력단 Optical flow estimator for moving object detection and method thereof
CN106973293A (en) * 2017-04-21 2017-07-21 中国科学技术大学 The light field image coding method predicted based on parallax
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
WO2021035807A1 (en) * 2019-08-23 2021-03-04 深圳大学 Target tracking method and device fusing optical flow information and siamese framework
CN112686952A (en) * 2020-12-10 2021-04-20 中国科学院深圳先进技术研究院 Image optical flow computing system, method and application
WO2021201438A1 (en) * 2020-04-01 2021-10-07 Samsung Electronics Co., Ltd. System and method for motion warping using multi-exposure frames
CN113554039A (en) * 2021-07-27 2021-10-26 广东工业大学 Method and system for generating optical flow graph of dynamic image based on multi-attention machine system
CN114299105A (en) * 2021-08-04 2022-04-08 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN114565880A (en) * 2022-04-28 2022-05-31 武汉大学 Method, system and equipment for detecting counterfeit video based on optical flow tracking
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and equipment for estimating optical flow
CN114913196A (en) * 2021-12-28 2022-08-16 天翼数字生活科技有限公司 Attention-based dense optical flow calculation method
CN115018888A (en) * 2022-07-04 2022-09-06 东南大学 Optical flow unsupervised estimation method based on Transformer
CN115170826A (en) * 2022-07-08 2022-10-11 杭州电子科技大学 Local search-based fast optical flow estimation method for small moving target and storage medium
CN115272423A (en) * 2022-09-19 2022-11-01 深圳比特微电子科技有限公司 Method and device for training optical flow estimation model and readable storage medium
CN115690170A (en) * 2022-10-08 2023-02-03 苏州大学 Method and system for self-adaptive optical flow estimation aiming at different-scale targets
CN115731263A (en) * 2022-10-28 2023-03-03 苏州工业园区服务外包职业学院 Optical flow calculation method, system, device and medium fusing shift window attention
CN115830090A (en) * 2022-12-01 2023-03-21 大连理工大学 Self-supervision monocular depth prediction training method for predicting camera attitude based on pixel matching
CN115861384A (en) * 2023-02-27 2023-03-28 广东工业大学 Optical flow estimation method and system based on generation of countermeasure and attention mechanism
WO2023056730A1 (en) * 2021-10-09 2023-04-13 深圳市中兴微电子技术有限公司 Video image augmentation method, network training method, electronic device and storage medium
CN116091793A (en) * 2023-02-27 2023-05-09 南京邮电大学 Light field significance detection method based on optical flow fusion
CN116205953A (en) * 2023-04-12 2023-06-02 华中科技大学 Optical flow estimation method and device based on hierarchical total-correlation cost body aggregation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11620328B2 (en) * 2020-06-22 2023-04-04 International Business Machines Corporation Speech to media translation

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713986B1 (en) * 2016-02-17 2017-03-08 한국항공대학교산학협력단 Optical flow estimator for moving object detection and method thereof
CN106973293A (en) * 2017-04-21 2017-07-21 中国科学技术大学 The light field image coding method predicted based on parallax
WO2021035807A1 (en) * 2019-08-23 2021-03-04 深圳大学 Target tracking method and device fusing optical flow information and siamese framework
WO2021201438A1 (en) * 2020-04-01 2021-10-07 Samsung Electronics Co., Ltd. System and method for motion warping using multi-exposure frames
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
CN112686952A (en) * 2020-12-10 2021-04-20 中国科学院深圳先进技术研究院 Image optical flow computing system, method and application
CN113554039A (en) * 2021-07-27 2021-10-26 广东工业大学 Method and system for generating optical flow graph of dynamic image based on multi-attention machine system
CN114299105A (en) * 2021-08-04 2022-04-08 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
WO2023056730A1 (en) * 2021-10-09 2023-04-13 深圳市中兴微电子技术有限公司 Video image augmentation method, network training method, electronic device and storage medium
CN114913196A (en) * 2021-12-28 2022-08-16 天翼数字生活科技有限公司 Attention-based dense optical flow calculation method
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and equipment for estimating optical flow
CN114565880A (en) * 2022-04-28 2022-05-31 武汉大学 Method, system and equipment for detecting counterfeit video based on optical flow tracking
CN115018888A (en) * 2022-07-04 2022-09-06 东南大学 Optical flow unsupervised estimation method based on Transformer
CN115170826A (en) * 2022-07-08 2022-10-11 杭州电子科技大学 Local search-based fast optical flow estimation method for small moving target and storage medium
CN115272423A (en) * 2022-09-19 2022-11-01 深圳比特微电子科技有限公司 Method and device for training optical flow estimation model and readable storage medium
CN115690170A (en) * 2022-10-08 2023-02-03 苏州大学 Method and system for self-adaptive optical flow estimation aiming at different-scale targets
CN115731263A (en) * 2022-10-28 2023-03-03 苏州工业园区服务外包职业学院 Optical flow calculation method, system, device and medium fusing shift window attention
CN115830090A (en) * 2022-12-01 2023-03-21 大连理工大学 Self-supervision monocular depth prediction training method for predicting camera attitude based on pixel matching
CN115861384A (en) * 2023-02-27 2023-03-28 广东工业大学 Optical flow estimation method and system based on generation of countermeasure and attention mechanism
CN116091793A (en) * 2023-02-27 2023-05-09 南京邮电大学 Light field significance detection method based on optical flow fusion
CN116205953A (en) * 2023-04-12 2023-06-02 华中科技大学 Optical flow estimation method and device based on hierarchical total-correlation cost body aggregation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SS-SF:Piecewise 3D Scene Flow Estimatin With Semantic Segmentation;Cheng Feng等;《IEEE Access》;全文 *

Also Published As

Publication number Publication date
CN116486107A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
US20220114750A1 (en) Map constructing method, positioning method and wireless communication terminal
CN111985343A (en) Method for constructing behavior recognition deep network model and behavior recognition method
CN109389667B (en) High-efficiency global illumination drawing method based on deep learning
TWI791405B (en) Method for depth estimation for variable focus camera, computer system and computer-readable storage medium
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN114677412B (en) Optical flow estimation method, device and equipment
CN116486107B (en) Optical flow calculation method, system, equipment and medium
CN113850900B (en) Method and system for recovering depth map based on image and geometric clues in three-dimensional reconstruction
CN112084849A (en) Image recognition method and device
CN115761594B (en) Optical flow calculation method based on global and local coupling
CN114723787A (en) Optical flow calculation method and system
CN111612825A (en) Image sequence motion occlusion detection method based on optical flow and multi-scale context
CN111294614B (en) Method and apparatus for digital image, audio or video data processing
CN114708436B (en) Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium
CN103208109B (en) A kind of unreal structure method of face embedded based on local restriction iteration neighborhood
CN111738092A (en) Method for recovering shielded human body posture sequence based on deep learning
CN115861384B (en) Optical flow estimation method and system based on countermeasure and attention mechanism generation
CN115035173B (en) Monocular depth estimation method and system based on inter-frame correlation
CN116416649A (en) Video pedestrian re-identification method based on multi-scale resolution alignment
CN113222016B (en) Change detection method and device based on cross enhancement of high-level and low-level features
CN114399648A (en) Behavior recognition method and apparatus, storage medium, and electronic device
CN113239771A (en) Attitude estimation method, system and application thereof
CN115661929B (en) Time sequence feature coding method and device, electronic equipment and storage medium
CN115082295B (en) Image editing method and device based on self-attention mechanism
Wang et al. E-HANet: Event-based hybrid attention network for optical flow estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant