CN115690170A - Method and system for self-adaptive optical flow estimation aiming at different-scale targets - Google Patents

Method and system for self-adaptive optical flow estimation aiming at different-scale targets Download PDF

Info

Publication number
CN115690170A
CN115690170A CN202211221511.7A CN202211221511A CN115690170A CN 115690170 A CN115690170 A CN 115690170A CN 202211221511 A CN202211221511 A CN 202211221511A CN 115690170 A CN115690170 A CN 115690170A
Authority
CN
China
Prior art keywords
scale
features
images
frames
optical flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211221511.7A
Other languages
Chinese (zh)
Other versions
CN115690170B (en
Inventor
钟宝江
李牧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202211221511.7A priority Critical patent/CN115690170B/en
Priority claimed from CN202211221511.7A external-priority patent/CN115690170B/en
Publication of CN115690170A publication Critical patent/CN115690170A/en
Application granted granted Critical
Publication of CN115690170B publication Critical patent/CN115690170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method and a system for self-adaptive optical flow estimation aiming at different scales of an object, wherein the method comprises the steps of inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain the shallow characteristics of the two frames of images; processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images; obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images; performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost; and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function. The method solves the problem of poor estimation performance caused by losing fine details of objects with different scales due to single cost, and improves the accuracy of optical flow estimation.

Description

Method and system for self-adaptive optical flow estimation aiming at different-scale targets
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a system for estimating optical flow in a self-adaptive manner aiming at different scales of targets.
Background
Optical flow estimation, which is the task of estimating per-pixel motion between video frames, is a fundamental technique for a wide range of computer vision applications, such as motion segmentation, motion recognition, and autopilot. Optical flow estimation has traditionally been considered a knowledge-driven technique, and conventional methods typically construct optical flow as an energy function optimization problem that specifies various constraints by considering existing knowledge (e.g., corner points), however, optimizing such constraint functions typically takes too long and runs too slow to be applied in real-time systems, and on the other hand, designing various corner points and making it a robust optimization goal is difficult.
In recent years, optical flow estimation techniques have advanced significantly with the development of convolutional neural networks, which provide a powerful ability to learn from large amounts of data compared to knowledge-driven methods, making these techniques data-driven strategies. To learn the optical flow, many methods use encoder-decoders or spatial pyramid structures. One pioneering work was the FlowNet proposed by Dosovitskiy et al in 2015, in which two models were proposed, namely FlowNet and FlowNet c, spynet, introduced a feature pyramid module that uses a spatial pyramid network to distort images at each level and decompose large displacements into small displacements, so that only one displacement needs to be calculated at each pyramid level, thereby greatly reducing the amount of calculation. Teed and Deng propose RAFT, in which a lightweight loop module is coupled with the GRU module as an update operator.
In the above network, in the feature extraction process, the receptive fields of the artificial neurons in each layer are generally designed to be the same size, and since they all use a single network structure, the amount of cost is generated in a single manner. However, the cost quantity represents the similarity between two adjacent frames, and the accurate cost quantity is the key to obtaining an accurate optical flow estimation, which unfortunately may result in losing fine details of different scale objects, resulting in poor estimation performance.
Disclosure of Invention
The embodiment of the invention provides a method and a system for adaptively estimating optical flows of targets with different scales, which are used for solving the problem that in the prior art, fine details of objects with different scales are lost due to single cost, so that the estimation performance is poor.
The embodiment of the invention provides a method for estimating optical flow in a self-adaptive way aiming at different scale targets, which comprises the following steps:
s1: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images;
s2: processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
s3: obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
s4: performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost quantities;
s5: and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
Preferably, the convolutional neural network employs a downsampling structure.
Preferably, the two frames of images are subjected to feature extraction, and the method for obtaining the shallow features of the two frames of images comprises convolution, pooling and normalization.
Preferably, the method for processing the shallow feature of the two frames of images comprises a segmentation operation, a fusion operation and a selection operation.
Preferably, the segmentation operation specifically includes:
given an intermediate feature mapping
Figure BDA0003878418940000031
As input, two convolutional layers are used, the convolutional kernels having a size of 3 respectivelyAnd 5, mapping the intermediate features
Figure BDA0003878418940000032
Image feature segmented into two different scales
Figure BDA0003878418940000033
And
Figure BDA0003878418940000034
in this, a convolution kernel of 5 × 5 is replaced with a convolution kernel of 3 × 3 in size, and a dilation convolution with a dilation coefficient of 2 is set.
Preferably, the fusion operation specifically comprises:
firstly, the multi-scale information of the two different branches is fused through element summation operation, and the two-scale fused features are obtained:
Figure BDA0003878418940000035
then, for M fuse Global information in the spatial dimension is captured using a global average pool:
Figure BDA0003878418940000036
wherein,
Figure BDA0003878418940000037
representing a global average pooling operation, H and W being the height and width of the feature dimension, respectively;
finally, using the full-connectivity layer aggregation feature, add bulk specification layers and activation functions after the full-connectivity layer:
Figure BDA0003878418940000038
wherein,
Figure BDA0003878418940000039
represents the fully connected layer, δ represents the ReLu activation function,
Figure BDA00038784189400000310
representing a batch normalization layer.
Preferably, the selecting operation specifically includes:
guiding the feature matrix t to use soft attention across channels to adaptively select different information space scales, wherein the dimension of t needs to be expanded, and then using a softmax operator on the aspect of the channels to obtain attention weight:
Figure BDA0003878418940000041
Figure BDA0003878418940000042
wherein, the above is the formula of softmax;
generating a final feature map M using the derived attention weights fine And M coarse That is, applying the corresponding weighting coefficients to the segmented features:
Figure BDA0003878418940000043
Figure BDA0003878418940000044
wherein,
Figure BDA0003878418940000045
and
Figure BDA0003878418940000046
features representing two scales obtained after segmenting input M features, and adding the two scales to obtain a fused feature M fuse
Preferably, the method for obtaining the multi-scale cost amount by using the information interaction between the coarse-scale feature, the medium-scale feature and the fine-scale feature of the two frames of images comprises the following steps:
generating initial cost quantity by using the fine-scale features, and then reinforcing the fine-scale features by using the medium-scale features to obtain cost quantity 1;
pooling the cost 1 to obtain a cost 2, wherein the cost 2 is a fusion of the mesoscale feature and the coarse-scale feature;
pooling said cost 2 yields a cost 3.
The invention also provides a system for adaptive optical flow estimation aiming at different scale targets, which comprises the following steps:
the shallow layer feature extraction module is used for inputting two adjacent frames of images into a convolutional neural network and extracting features of the two frames of images to obtain shallow layer features of the two frames of images;
the multi-scale feature extraction module is used for processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
the multi-scale cost amount generation module is used for utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images to obtain multi-scale cost amount;
the optical flow estimation calculation module is used for carrying out context coding on a first frame image in the two frames of images and calculating an optical flow estimation result by combining the multi-scale cost;
and the optical flow estimation fitting module is used for fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
An embodiment of the present invention provides a network device, including a processor, a memory, and a bus system, where the processor and the memory are connected via the bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, so as to implement any one of the above methods.
According to the technical scheme, the invention has the following advantages:
the invention provides a method and a system for adaptively estimating optical flow for different scales of objects, firstly, a characteristic selectable module is introduced into the field of optical flow estimation and integrated into a network, which is beneficial to the network to generate multi-scale characteristic information, so that more accurate optical flow estimation results are learned for objects of different scales; secondly, a multi-scale cost generation module is introduced, and the multi-scale cost enhances the similarity characterization capability; finally, the optical flow estimation method utilizes the characteristic selectable module to enhance the generation of the multi-scale cost quantity, and jointly learns the multi-scale cost quantity and the context codes, thereby solving the problem of poor estimation performance caused by losing fine details of objects with different scales due to single cost quantity and improving the accuracy of optical flow estimation.
Drawings
In order to illustrate embodiments of the present invention or technical solutions in the prior art more clearly, the following brief description of the drawings which are required in the embodiments will be made, and features and advantages of the present invention will be understood more clearly by referring to the drawings, which are schematic and should not be understood as limiting the present invention in any way, and for those skilled in the art, other drawings can be obtained from these drawings without creative effort. Wherein:
FIG. 1 is a schematic diagram of an adaptive optical flow estimation method for different scale targets according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an adaptive optical-flow estimation system for different scale targets according to an embodiment of the invention;
fig. 3 is a schematic block diagram of a network device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As shown in fig. 1, an embodiment of the present invention provides an optical flow estimation method adaptive to objects with different scales, where the method includes:
s101: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images;
s102: processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
s103: obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
s104: performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost quantities;
s105: and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
The invention introduces the characteristic selectable module into the optical flow estimation field and integrates the module into the network, which is helpful for the network to generate multi-scale characteristic information, thereby learning more accurate optical flow estimation results for objects with different scales; secondly, a multi-scale cost generation module is introduced, and the multi-scale cost enhances the similarity characterization capability; finally, the optical flow estimation method of the invention utilizes the characteristic selectable module to enhance the generation of the multi-scale cost quantity, and performs the joint learning of the multi-scale cost quantity and the context codes, thereby improving the accuracy of the optical flow estimation.
Further, step S101 includes:
inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images; the convolutional neural network adopts a downsampling structure, wherein the characteristic extraction method comprises convolution, pooling and normalization.
Further, step S102 includes:
processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, so that the convolutional neural network can selectively use the generated multi-scale features, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features, and therefore, the convolutional neural network can capture objects with different sizes on the images;
the method for processing the shallow features of the two frames of images comprises the operations of segmentation, fusion and selection;
the segmentation operation specifically includes: given an intermediate feature mapping
Figure BDA0003878418940000081
Using as input two convolution layers with convolution kernel sizes of 3 and 5, respectively, mapping the intermediate features
Figure BDA0003878418940000082
Image feature segmented into two different scales
Figure BDA0003878418940000083
And
Figure BDA0003878418940000084
wherein, 5 × 5 convolution kernels are replaced by convolution kernels of 3 × 3 size, and a dilation convolution with a dilation coefficient of 2 is set;
the fusion operation specifically comprises the following steps:
firstly, multi-scale information of the two different branches is fused through element summation operation, and two scale fused features are obtained:
Figure BDA0003878418940000085
then, for M fuse Global information in the spatial dimension is captured using a global average pool:
Figure BDA0003878418940000086
wherein,
Figure BDA0003878418940000087
representing a global average pooling operation, H and W being the height and width of the feature dimension, respectively;
finally, using the full-connectivity layer aggregation feature, a bulk specification layer and activation functions are added after the full-connectivity layer:
Figure BDA0003878418940000088
wherein,
Figure BDA0003878418940000091
represents the fully connected layer, δ represents the ReLu activation function,
Figure BDA0003878418940000092
representing a batch normalization layer.
The selecting operation specifically includes:
guiding the feature matrix t to use soft attention across channels to adaptively select different information space scales, wherein the dimension of t needs to be expanded, and then using a softmax operator on the aspect of the channels to obtain attention weight:
Figure BDA0003878418940000093
Figure BDA0003878418940000094
wherein, the above is the formula of softmax;
generating a final feature map M using the derived attention weights fine And M coarse That is, applying the corresponding weighting coefficients to the segmented features:
Figure BDA0003878418940000095
Figure BDA0003878418940000096
wherein,
Figure BDA0003878418940000097
and
Figure BDA0003878418940000098
features representing two scales obtained after segmenting input M features, and adding the two scales to obtain a fused feature M fuse
Further, step S103 includes:
obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images, and specifically comprising the following steps: generating initial cost quantity by using the fine scale features, and then reinforcing the fine scale features by using the medium scale features to obtain cost quantity 1; pooling the cost 1 to obtain a cost 2, wherein the cost 2 is a fusion of the mesoscale feature and the coarse-scale feature; pooling said cost 2 to obtain a cost 3.
In previous approaches, the process of cost generation was achieved by simply using a set of global average pooling, which resulted in the loss of fine detail; the invention provides a new cost generation process, and combines the extracted multi-scale image characteristics to strengthen the interaction between different scales of information, thereby considering the receptive field of each scale.
Further, step S104 includes:
the optical flow records the position offset of each pixel point between two frames, so that a context network is used for obtaining the context code of the first frame image, and the learning of the optical flow network on the position information between the two frames is assisted; and searching the corresponding relation between the two frames on the cost quantity by utilizing the context information of the first frame image, thereby calculating an accurate optical flow estimation result.
Further, in step S105, the method includes:
and monitoring the learning of the optical flow by using the endpoint error of the optical flow as a loss function, and fitting the estimated optical flow estimation result. The method of the invention generates good optical flow estimation precision and can be applied to the fields of unmanned driving, robots and the like.
Example two
As shown in FIG. 2, the present invention also provides a system for adaptive optical flow estimation for different scale objects, the system comprising:
the shallow feature extraction module 201 is configured to input two adjacent frames of images into a convolutional neural network, and perform feature extraction on the two frames of images to obtain shallow features of the two frames of images;
a multi-scale feature extraction module 202, configured to process shallow features of the two frames of images to obtain multi-scale features of the two frames of images, where the multi-scale features include a coarse-scale feature, a medium-scale feature, and a fine-scale feature;
a multi-scale cost amount generating module 203, configured to obtain a multi-scale cost amount by using information interaction between the coarse-scale feature, the medium-scale feature, and the fine-scale feature of the two frames of images;
an optical flow estimation calculation module 204, configured to perform context coding on a first frame image of the two frame images, and calculate an optical flow estimation result by combining the multi-scale cost quantities;
and an optical flow estimation fitting module 205, configured to fit the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
The system is configured to implement the method for adaptive optical flow estimation for different scale targets according to the first embodiment, and details are not repeated herein in order to avoid redundancy.
EXAMPLE III
As shown in fig. 3, an embodiment of the present invention further provides a network apparatus, where the apparatus includes a processor 301, a memory 302, and a bus system 303, where the processor 301 and the memory 302 are connected via the bus system 303, the memory 302 is configured to store instructions, and the processor 301 is configured to execute the instructions stored in the memory 302;
wherein the processor 301 is configured to: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images; processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features; obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images; performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost; and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
The network device introduces the characteristic selectable module into the optical flow estimation field and integrates the characteristic selectable module into the network, so that the network is facilitated to generate multi-scale characteristic information, and a more accurate optical flow estimation result is learned for objects with different scales; secondly, a multi-scale cost generation module is introduced, and the multi-scale cost enhances the similarity characterization capability; finally, the optical flow estimation method of the invention utilizes the characteristic selectable module to enhance the generation of the multi-scale cost quantity, and performs the joint learning of the multi-scale cost quantity and the context codes, thereby improving the accuracy of the optical flow estimation.
Optionally, as an embodiment, an unmanned vehicle comprises the above network device comprising a processor 301, a memory 302 and a bus system 303, which are not described in detail herein to avoid redundancy.
Optionally, as an embodiment, a robot includes the above network device, where the network device includes a processor 301, a memory 302, and a bus system 303, and for avoiding repetition, detailed descriptions are omitted here, and are omitted here for brevity.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Various other modifications and alterations will occur to those skilled in the art upon reading the foregoing description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims (10)

1. A method of adaptive optical flow estimation for different scale objects, comprising:
s1: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images;
s2: processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
s3: obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
s4: performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost;
s5: and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
2. The method of adaptive optical flow estimation for different scale objects as claimed in claim 1, wherein the convolutional neural network employs a downsampling structure.
3. The method of claim 1, wherein the two frames of images are subjected to feature extraction, and the method of obtaining shallow features of the two frames of images comprises convolution, pooling and normalization.
4. The method of claim 1, wherein the method of processing the shallow features of the two frames of images comprises a segmentation operation, a fusion operation, and a selection operation.
5. The method for adaptive optical flow estimation for different scale objects according to claim 4, wherein the segmentation specifically comprises:
given an intermediate feature mapping
Figure FDA0003878418930000021
Using as input two convolutional layers, with convolutional kernel sizes of 3 and 5, respectively, mapping the intermediate features
Figure FDA0003878418930000022
Image feature segmented into two different scales
Figure FDA0003878418930000023
And
Figure FDA0003878418930000024
in this, a convolution kernel of 5 × 5 is replaced with a convolution kernel of 3 × 3 in size, and a dilation convolution with a dilation coefficient of 2 is set.
6. The method for adaptive optical flow estimation for different scale targets according to claim 4, wherein the fusion operation specifically comprises:
firstly, the multi-scale information of the two different branches is fused through element summation operation, and the two-scale fused features are obtained:
Figure FDA0003878418930000025
then, for M fuse Global information in the spatial dimension is captured using a global average pool:
Figure FDA0003878418930000026
wherein,
Figure FDA0003878418930000027
representing a global average pooling operation, H and W being the height and width of the feature dimension, respectively;
finally, using the full-connectivity layer aggregation feature, a bulk specification layer and activation functions are added after the full-connectivity layer:
Figure FDA0003878418930000028
wherein,
Figure FDA0003878418930000029
represents a fully connected layerAnd delta represents the ReLu activation function,
Figure FDA00038784189300000210
representing a batch normalization layer.
7. The method according to claim 4, wherein said selecting operation comprises in particular:
guiding the feature matrix t to use soft attention across channels to adaptively select different information space scales, wherein the dimension of t needs to be expanded, and then using a softmax operator on the aspect of the channels to obtain attention weight:
Figure FDA0003878418930000031
Figure FDA0003878418930000032
wherein, the above is the formula of softmax;
generating a final feature map M using the derived attention weights fine And M coarse That is, corresponding weighting coefficients are applied to the segmented features:
Figure FDA0003878418930000033
Figure FDA0003878418930000034
wherein,
Figure FDA0003878418930000035
and
Figure FDA0003878418930000036
representing the features of two scales obtained after the input M features are divided, and adding the features to obtain the fused feature M fuse
8. The method of claim 1, wherein the method for obtaining multi-scale cost by using information interaction between coarse-scale features, medium-scale features and fine-scale features of the two frames of images comprises:
generating initial cost quantity by using the fine scale features, and then reinforcing the fine scale features by using the medium scale features to obtain cost quantity 1;
pooling the cost 1 to obtain a cost 2, wherein the cost 2 is fused with the mesoscale feature and the coarse-scale feature;
pooling said cost 2 yields a cost 3.
9. A system for adaptive optical flow estimation for different scale objects, comprising:
the shallow layer feature extraction module is used for inputting two adjacent frames of images into a convolutional neural network and extracting features of the two frames of images to obtain shallow layer features of the two frames of images;
the multi-scale feature extraction module is used for processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
the multi-scale cost quantity generation module is used for obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
the optical flow estimation calculation module is used for carrying out context coding on a first frame image in the two frames of images and calculating an optical flow estimation result by combining the multi-scale cost;
and the optical flow estimation fitting module is used for fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
10. A network device comprising a processor, a memory and a bus system, the processor and the memory being connected via the bus system, the memory being adapted to store instructions, and the processor being adapted to execute the instructions stored by the memory to implement the method of any one of claims 1 to 8.
CN202211221511.7A 2022-10-08 Method and system for optical flow estimation aiming at different scale target adaptation Active CN115690170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211221511.7A CN115690170B (en) 2022-10-08 Method and system for optical flow estimation aiming at different scale target adaptation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211221511.7A CN115690170B (en) 2022-10-08 Method and system for optical flow estimation aiming at different scale target adaptation

Publications (2)

Publication Number Publication Date
CN115690170A true CN115690170A (en) 2023-02-03
CN115690170B CN115690170B (en) 2024-10-15

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN118397038A (en) * 2024-06-24 2024-07-26 中南大学 Moving object segmentation method, system, equipment and medium based on deep learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111366A (en) * 2019-05-06 2019-08-09 北京理工大学 A kind of end-to-end light stream estimation method based on multistage loss amount
CN111291647A (en) * 2020-01-21 2020-06-16 陕西师范大学 Single-stage action positioning method based on multi-scale convolution kernel and superevent module
CN111340844A (en) * 2020-02-24 2020-06-26 南昌航空大学 Multi-scale feature optical flow learning calculation method based on self-attention mechanism
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and equipment for estimating optical flow
CN114943747A (en) * 2022-04-08 2022-08-26 浙江商汤科技开发有限公司 Image analysis method and device, video editing method and device, and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111366A (en) * 2019-05-06 2019-08-09 北京理工大学 A kind of end-to-end light stream estimation method based on multistage loss amount
CN111291647A (en) * 2020-01-21 2020-06-16 陕西师范大学 Single-stage action positioning method based on multi-scale convolution kernel and superevent module
CN111340844A (en) * 2020-02-24 2020-06-26 南昌航空大学 Multi-scale feature optical flow learning calculation method based on self-attention mechanism
CN111582483A (en) * 2020-05-14 2020-08-25 哈尔滨工程大学 Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and equipment for estimating optical flow
CN114943747A (en) * 2022-04-08 2022-08-26 浙江商汤科技开发有限公司 Image analysis method and device, video editing method and device, and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZACHARY TEED AND JIA DENG: "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow", 《ARXIV:2003.12039V3》, 25 August 2020 (2020-08-25), pages 1 - 21 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116486107A (en) * 2023-06-21 2023-07-25 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN116486107B (en) * 2023-06-21 2023-09-05 南昌航空大学 Optical flow calculation method, system, equipment and medium
CN118397038A (en) * 2024-06-24 2024-07-26 中南大学 Moving object segmentation method, system, equipment and medium based on deep learning

Similar Documents

Publication Publication Date Title
Guen et al. Disentangling physical dynamics from unknown factors for unsupervised video prediction
Guizilini et al. 3d packing for self-supervised monocular depth estimation
Guizilini et al. Robust semi-supervised monocular depth estimation with reprojected distances
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
CN113657560B (en) Weak supervision image semantic segmentation method and system based on node classification
CN111667535B (en) Six-degree-of-freedom pose estimation method for occlusion scene
CN102722697B (en) Unmanned aerial vehicle autonomous navigation landing visual target tracking method
CN108550162B (en) Object detection method based on deep reinforcement learning
CN113158862A (en) Lightweight real-time face detection method based on multiple tasks
CN115063445A (en) Target tracking method and system based on multi-scale hierarchical feature representation
CN109685830B (en) Target tracking method, device and equipment and computer storage medium
CN111696110A (en) Scene segmentation method and system
CN111191739B (en) Wall surface defect detection method based on attention mechanism
CN113160278A (en) Scene flow estimation and training method and device of scene flow estimation model
US20230020713A1 (en) Image processing system and method
CN114677412A (en) Method, device and equipment for estimating optical flow
CN112507943A (en) Visual positioning navigation method, system and medium based on multitask neural network
CN111260660A (en) 3D point cloud semantic segmentation migration method based on meta-learning
CN113420590A (en) Robot positioning method, device, equipment and medium in weak texture environment
CN109493370B (en) Target tracking method based on space offset learning
CN107798329A (en) Adaptive particle filter method for tracking target based on CNN
CN117710841A (en) Small target detection method and device for aerial image of unmanned aerial vehicle
Shukla et al. UBOL: User-Behavior-aware one-shot learning for safe autonomous driving
CN111738092A (en) Method for recovering shielded human body posture sequence based on deep learning
CN112232126A (en) Dimension reduction expression method for improving variable scene positioning robustness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant