CN115690170A - Method and system for self-adaptive optical flow estimation aiming at different-scale targets - Google Patents
Method and system for self-adaptive optical flow estimation aiming at different-scale targets Download PDFInfo
- Publication number
- CN115690170A CN115690170A CN202211221511.7A CN202211221511A CN115690170A CN 115690170 A CN115690170 A CN 115690170A CN 202211221511 A CN202211221511 A CN 202211221511A CN 115690170 A CN115690170 A CN 115690170A
- Authority
- CN
- China
- Prior art keywords
- scale
- features
- images
- frames
- optical flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 15
- 230000003993 interaction Effects 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 20
- 238000011176 pooling Methods 0.000 claims description 13
- 230000003044 adaptive effect Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000010339 dilation Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000003014 reinforcing effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a method and a system for self-adaptive optical flow estimation aiming at different scales of an object, wherein the method comprises the steps of inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain the shallow characteristics of the two frames of images; processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images; obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images; performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost; and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function. The method solves the problem of poor estimation performance caused by losing fine details of objects with different scales due to single cost, and improves the accuracy of optical flow estimation.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a system for estimating optical flow in a self-adaptive manner aiming at different scales of targets.
Background
Optical flow estimation, which is the task of estimating per-pixel motion between video frames, is a fundamental technique for a wide range of computer vision applications, such as motion segmentation, motion recognition, and autopilot. Optical flow estimation has traditionally been considered a knowledge-driven technique, and conventional methods typically construct optical flow as an energy function optimization problem that specifies various constraints by considering existing knowledge (e.g., corner points), however, optimizing such constraint functions typically takes too long and runs too slow to be applied in real-time systems, and on the other hand, designing various corner points and making it a robust optimization goal is difficult.
In recent years, optical flow estimation techniques have advanced significantly with the development of convolutional neural networks, which provide a powerful ability to learn from large amounts of data compared to knowledge-driven methods, making these techniques data-driven strategies. To learn the optical flow, many methods use encoder-decoders or spatial pyramid structures. One pioneering work was the FlowNet proposed by Dosovitskiy et al in 2015, in which two models were proposed, namely FlowNet and FlowNet c, spynet, introduced a feature pyramid module that uses a spatial pyramid network to distort images at each level and decompose large displacements into small displacements, so that only one displacement needs to be calculated at each pyramid level, thereby greatly reducing the amount of calculation. Teed and Deng propose RAFT, in which a lightweight loop module is coupled with the GRU module as an update operator.
In the above network, in the feature extraction process, the receptive fields of the artificial neurons in each layer are generally designed to be the same size, and since they all use a single network structure, the amount of cost is generated in a single manner. However, the cost quantity represents the similarity between two adjacent frames, and the accurate cost quantity is the key to obtaining an accurate optical flow estimation, which unfortunately may result in losing fine details of different scale objects, resulting in poor estimation performance.
Disclosure of Invention
The embodiment of the invention provides a method and a system for adaptively estimating optical flows of targets with different scales, which are used for solving the problem that in the prior art, fine details of objects with different scales are lost due to single cost, so that the estimation performance is poor.
The embodiment of the invention provides a method for estimating optical flow in a self-adaptive way aiming at different scale targets, which comprises the following steps:
s1: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images;
s2: processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
s3: obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
s4: performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost quantities;
s5: and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
Preferably, the convolutional neural network employs a downsampling structure.
Preferably, the two frames of images are subjected to feature extraction, and the method for obtaining the shallow features of the two frames of images comprises convolution, pooling and normalization.
Preferably, the method for processing the shallow feature of the two frames of images comprises a segmentation operation, a fusion operation and a selection operation.
Preferably, the segmentation operation specifically includes:
given an intermediate feature mappingAs input, two convolutional layers are used, the convolutional kernels having a size of 3 respectivelyAnd 5, mapping the intermediate featuresImage feature segmented into two different scalesAndin this, a convolution kernel of 5 × 5 is replaced with a convolution kernel of 3 × 3 in size, and a dilation convolution with a dilation coefficient of 2 is set.
Preferably, the fusion operation specifically comprises:
firstly, the multi-scale information of the two different branches is fused through element summation operation, and the two-scale fused features are obtained:
then, for M fuse Global information in the spatial dimension is captured using a global average pool:
wherein,representing a global average pooling operation, H and W being the height and width of the feature dimension, respectively;
finally, using the full-connectivity layer aggregation feature, add bulk specification layers and activation functions after the full-connectivity layer:
wherein,represents the fully connected layer, δ represents the ReLu activation function,representing a batch normalization layer.
Preferably, the selecting operation specifically includes:
guiding the feature matrix t to use soft attention across channels to adaptively select different information space scales, wherein the dimension of t needs to be expanded, and then using a softmax operator on the aspect of the channels to obtain attention weight:
wherein, the above is the formula of softmax;
generating a final feature map M using the derived attention weights fine And M coarse That is, applying the corresponding weighting coefficients to the segmented features:
wherein,andfeatures representing two scales obtained after segmenting input M features, and adding the two scales to obtain a fused feature M fuse 。
Preferably, the method for obtaining the multi-scale cost amount by using the information interaction between the coarse-scale feature, the medium-scale feature and the fine-scale feature of the two frames of images comprises the following steps:
generating initial cost quantity by using the fine-scale features, and then reinforcing the fine-scale features by using the medium-scale features to obtain cost quantity 1;
pooling the cost 1 to obtain a cost 2, wherein the cost 2 is a fusion of the mesoscale feature and the coarse-scale feature;
pooling said cost 2 yields a cost 3.
The invention also provides a system for adaptive optical flow estimation aiming at different scale targets, which comprises the following steps:
the shallow layer feature extraction module is used for inputting two adjacent frames of images into a convolutional neural network and extracting features of the two frames of images to obtain shallow layer features of the two frames of images;
the multi-scale feature extraction module is used for processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
the multi-scale cost amount generation module is used for utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images to obtain multi-scale cost amount;
the optical flow estimation calculation module is used for carrying out context coding on a first frame image in the two frames of images and calculating an optical flow estimation result by combining the multi-scale cost;
and the optical flow estimation fitting module is used for fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
An embodiment of the present invention provides a network device, including a processor, a memory, and a bus system, where the processor and the memory are connected via the bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, so as to implement any one of the above methods.
According to the technical scheme, the invention has the following advantages:
the invention provides a method and a system for adaptively estimating optical flow for different scales of objects, firstly, a characteristic selectable module is introduced into the field of optical flow estimation and integrated into a network, which is beneficial to the network to generate multi-scale characteristic information, so that more accurate optical flow estimation results are learned for objects of different scales; secondly, a multi-scale cost generation module is introduced, and the multi-scale cost enhances the similarity characterization capability; finally, the optical flow estimation method utilizes the characteristic selectable module to enhance the generation of the multi-scale cost quantity, and jointly learns the multi-scale cost quantity and the context codes, thereby solving the problem of poor estimation performance caused by losing fine details of objects with different scales due to single cost quantity and improving the accuracy of optical flow estimation.
Drawings
In order to illustrate embodiments of the present invention or technical solutions in the prior art more clearly, the following brief description of the drawings which are required in the embodiments will be made, and features and advantages of the present invention will be understood more clearly by referring to the drawings, which are schematic and should not be understood as limiting the present invention in any way, and for those skilled in the art, other drawings can be obtained from these drawings without creative effort. Wherein:
FIG. 1 is a schematic diagram of an adaptive optical flow estimation method for different scale targets according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an adaptive optical-flow estimation system for different scale targets according to an embodiment of the invention;
fig. 3 is a schematic block diagram of a network device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
As shown in fig. 1, an embodiment of the present invention provides an optical flow estimation method adaptive to objects with different scales, where the method includes:
s101: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images;
s102: processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
s103: obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
s104: performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost quantities;
s105: and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
The invention introduces the characteristic selectable module into the optical flow estimation field and integrates the module into the network, which is helpful for the network to generate multi-scale characteristic information, thereby learning more accurate optical flow estimation results for objects with different scales; secondly, a multi-scale cost generation module is introduced, and the multi-scale cost enhances the similarity characterization capability; finally, the optical flow estimation method of the invention utilizes the characteristic selectable module to enhance the generation of the multi-scale cost quantity, and performs the joint learning of the multi-scale cost quantity and the context codes, thereby improving the accuracy of the optical flow estimation.
Further, step S101 includes:
inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images; the convolutional neural network adopts a downsampling structure, wherein the characteristic extraction method comprises convolution, pooling and normalization.
Further, step S102 includes:
processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, so that the convolutional neural network can selectively use the generated multi-scale features, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features, and therefore, the convolutional neural network can capture objects with different sizes on the images;
the method for processing the shallow features of the two frames of images comprises the operations of segmentation, fusion and selection;
the segmentation operation specifically includes: given an intermediate feature mappingUsing as input two convolution layers with convolution kernel sizes of 3 and 5, respectively, mapping the intermediate featuresImage feature segmented into two different scalesAndwherein, 5 × 5 convolution kernels are replaced by convolution kernels of 3 × 3 size, and a dilation convolution with a dilation coefficient of 2 is set;
the fusion operation specifically comprises the following steps:
firstly, multi-scale information of the two different branches is fused through element summation operation, and two scale fused features are obtained:
then, for M fuse Global information in the spatial dimension is captured using a global average pool:
wherein,representing a global average pooling operation, H and W being the height and width of the feature dimension, respectively;
finally, using the full-connectivity layer aggregation feature, a bulk specification layer and activation functions are added after the full-connectivity layer:
wherein,represents the fully connected layer, δ represents the ReLu activation function,representing a batch normalization layer.
The selecting operation specifically includes:
guiding the feature matrix t to use soft attention across channels to adaptively select different information space scales, wherein the dimension of t needs to be expanded, and then using a softmax operator on the aspect of the channels to obtain attention weight:
wherein, the above is the formula of softmax;
generating a final feature map M using the derived attention weights fine And M coarse That is, applying the corresponding weighting coefficients to the segmented features:
wherein,andfeatures representing two scales obtained after segmenting input M features, and adding the two scales to obtain a fused feature M fuse 。
Further, step S103 includes:
obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images, and specifically comprising the following steps: generating initial cost quantity by using the fine scale features, and then reinforcing the fine scale features by using the medium scale features to obtain cost quantity 1; pooling the cost 1 to obtain a cost 2, wherein the cost 2 is a fusion of the mesoscale feature and the coarse-scale feature; pooling said cost 2 to obtain a cost 3.
In previous approaches, the process of cost generation was achieved by simply using a set of global average pooling, which resulted in the loss of fine detail; the invention provides a new cost generation process, and combines the extracted multi-scale image characteristics to strengthen the interaction between different scales of information, thereby considering the receptive field of each scale.
Further, step S104 includes:
the optical flow records the position offset of each pixel point between two frames, so that a context network is used for obtaining the context code of the first frame image, and the learning of the optical flow network on the position information between the two frames is assisted; and searching the corresponding relation between the two frames on the cost quantity by utilizing the context information of the first frame image, thereby calculating an accurate optical flow estimation result.
Further, in step S105, the method includes:
and monitoring the learning of the optical flow by using the endpoint error of the optical flow as a loss function, and fitting the estimated optical flow estimation result. The method of the invention generates good optical flow estimation precision and can be applied to the fields of unmanned driving, robots and the like.
Example two
As shown in FIG. 2, the present invention also provides a system for adaptive optical flow estimation for different scale objects, the system comprising:
the shallow feature extraction module 201 is configured to input two adjacent frames of images into a convolutional neural network, and perform feature extraction on the two frames of images to obtain shallow features of the two frames of images;
a multi-scale feature extraction module 202, configured to process shallow features of the two frames of images to obtain multi-scale features of the two frames of images, where the multi-scale features include a coarse-scale feature, a medium-scale feature, and a fine-scale feature;
a multi-scale cost amount generating module 203, configured to obtain a multi-scale cost amount by using information interaction between the coarse-scale feature, the medium-scale feature, and the fine-scale feature of the two frames of images;
an optical flow estimation calculation module 204, configured to perform context coding on a first frame image of the two frame images, and calculate an optical flow estimation result by combining the multi-scale cost quantities;
and an optical flow estimation fitting module 205, configured to fit the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
The system is configured to implement the method for adaptive optical flow estimation for different scale targets according to the first embodiment, and details are not repeated herein in order to avoid redundancy.
EXAMPLE III
As shown in fig. 3, an embodiment of the present invention further provides a network apparatus, where the apparatus includes a processor 301, a memory 302, and a bus system 303, where the processor 301 and the memory 302 are connected via the bus system 303, the memory 302 is configured to store instructions, and the processor 301 is configured to execute the instructions stored in the memory 302;
wherein the processor 301 is configured to: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images; processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features; obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images; performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost; and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
The network device introduces the characteristic selectable module into the optical flow estimation field and integrates the characteristic selectable module into the network, so that the network is facilitated to generate multi-scale characteristic information, and a more accurate optical flow estimation result is learned for objects with different scales; secondly, a multi-scale cost generation module is introduced, and the multi-scale cost enhances the similarity characterization capability; finally, the optical flow estimation method of the invention utilizes the characteristic selectable module to enhance the generation of the multi-scale cost quantity, and performs the joint learning of the multi-scale cost quantity and the context codes, thereby improving the accuracy of the optical flow estimation.
Optionally, as an embodiment, an unmanned vehicle comprises the above network device comprising a processor 301, a memory 302 and a bus system 303, which are not described in detail herein to avoid redundancy.
Optionally, as an embodiment, a robot includes the above network device, where the network device includes a processor 301, a memory 302, and a bus system 303, and for avoiding repetition, detailed descriptions are omitted here, and are omitted here for brevity.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Various other modifications and alterations will occur to those skilled in the art upon reading the foregoing description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.
Claims (10)
1. A method of adaptive optical flow estimation for different scale objects, comprising:
s1: inputting two adjacent frames of images into a convolutional neural network, and extracting the characteristics of the two frames of images to obtain shallow layer characteristics of the two frames of images;
s2: processing shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
s3: obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
s4: performing context coding on a first frame image in the two frames of images, and calculating an optical flow estimation result by combining the multi-scale cost;
s5: and fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
2. The method of adaptive optical flow estimation for different scale objects as claimed in claim 1, wherein the convolutional neural network employs a downsampling structure.
3. The method of claim 1, wherein the two frames of images are subjected to feature extraction, and the method of obtaining shallow features of the two frames of images comprises convolution, pooling and normalization.
4. The method of claim 1, wherein the method of processing the shallow features of the two frames of images comprises a segmentation operation, a fusion operation, and a selection operation.
5. The method for adaptive optical flow estimation for different scale objects according to claim 4, wherein the segmentation specifically comprises:
given an intermediate feature mappingUsing as input two convolutional layers, with convolutional kernel sizes of 3 and 5, respectively, mapping the intermediate featuresImage feature segmented into two different scalesAndin this, a convolution kernel of 5 × 5 is replaced with a convolution kernel of 3 × 3 in size, and a dilation convolution with a dilation coefficient of 2 is set.
6. The method for adaptive optical flow estimation for different scale targets according to claim 4, wherein the fusion operation specifically comprises:
firstly, the multi-scale information of the two different branches is fused through element summation operation, and the two-scale fused features are obtained:
then, for M fuse Global information in the spatial dimension is captured using a global average pool:
wherein,representing a global average pooling operation, H and W being the height and width of the feature dimension, respectively;
finally, using the full-connectivity layer aggregation feature, a bulk specification layer and activation functions are added after the full-connectivity layer:
7. The method according to claim 4, wherein said selecting operation comprises in particular:
guiding the feature matrix t to use soft attention across channels to adaptively select different information space scales, wherein the dimension of t needs to be expanded, and then using a softmax operator on the aspect of the channels to obtain attention weight:
wherein, the above is the formula of softmax;
generating a final feature map M using the derived attention weights fine And M coarse That is, corresponding weighting coefficients are applied to the segmented features:
8. The method of claim 1, wherein the method for obtaining multi-scale cost by using information interaction between coarse-scale features, medium-scale features and fine-scale features of the two frames of images comprises:
generating initial cost quantity by using the fine scale features, and then reinforcing the fine scale features by using the medium scale features to obtain cost quantity 1;
pooling the cost 1 to obtain a cost 2, wherein the cost 2 is fused with the mesoscale feature and the coarse-scale feature;
pooling said cost 2 yields a cost 3.
9. A system for adaptive optical flow estimation for different scale objects, comprising:
the shallow layer feature extraction module is used for inputting two adjacent frames of images into a convolutional neural network and extracting features of the two frames of images to obtain shallow layer features of the two frames of images;
the multi-scale feature extraction module is used for processing the shallow features of the two frames of images to obtain multi-scale features of the two frames of images, wherein the multi-scale features comprise coarse scale features, medium scale features and fine scale features;
the multi-scale cost quantity generation module is used for obtaining multi-scale cost quantity by utilizing information interaction among the coarse scale features, the medium scale features and the fine scale features of the two frames of images;
the optical flow estimation calculation module is used for carrying out context coding on a first frame image in the two frames of images and calculating an optical flow estimation result by combining the multi-scale cost;
and the optical flow estimation fitting module is used for fitting the optical flow estimation result by using the endpoint error of the optical flow as a loss function.
10. A network device comprising a processor, a memory and a bus system, the processor and the memory being connected via the bus system, the memory being adapted to store instructions, and the processor being adapted to execute the instructions stored by the memory to implement the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211221511.7A CN115690170B (en) | 2022-10-08 | Method and system for optical flow estimation aiming at different scale target adaptation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211221511.7A CN115690170B (en) | 2022-10-08 | Method and system for optical flow estimation aiming at different scale target adaptation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115690170A true CN115690170A (en) | 2023-02-03 |
CN115690170B CN115690170B (en) | 2024-10-15 |
Family
ID=
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116486107A (en) * | 2023-06-21 | 2023-07-25 | 南昌航空大学 | Optical flow calculation method, system, equipment and medium |
CN118397038A (en) * | 2024-06-24 | 2024-07-26 | 中南大学 | Moving object segmentation method, system, equipment and medium based on deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111366A (en) * | 2019-05-06 | 2019-08-09 | 北京理工大学 | A kind of end-to-end light stream estimation method based on multistage loss amount |
CN111291647A (en) * | 2020-01-21 | 2020-06-16 | 陕西师范大学 | Single-stage action positioning method based on multi-scale convolution kernel and superevent module |
CN111340844A (en) * | 2020-02-24 | 2020-06-26 | 南昌航空大学 | Multi-scale feature optical flow learning calculation method based on self-attention mechanism |
CN111582483A (en) * | 2020-05-14 | 2020-08-25 | 哈尔滨工程大学 | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism |
CN114677412A (en) * | 2022-03-18 | 2022-06-28 | 苏州大学 | Method, device and equipment for estimating optical flow |
CN114943747A (en) * | 2022-04-08 | 2022-08-26 | 浙江商汤科技开发有限公司 | Image analysis method and device, video editing method and device, and medium |
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110111366A (en) * | 2019-05-06 | 2019-08-09 | 北京理工大学 | A kind of end-to-end light stream estimation method based on multistage loss amount |
CN111291647A (en) * | 2020-01-21 | 2020-06-16 | 陕西师范大学 | Single-stage action positioning method based on multi-scale convolution kernel and superevent module |
CN111340844A (en) * | 2020-02-24 | 2020-06-26 | 南昌航空大学 | Multi-scale feature optical flow learning calculation method based on self-attention mechanism |
CN111582483A (en) * | 2020-05-14 | 2020-08-25 | 哈尔滨工程大学 | Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism |
CN114677412A (en) * | 2022-03-18 | 2022-06-28 | 苏州大学 | Method, device and equipment for estimating optical flow |
CN114943747A (en) * | 2022-04-08 | 2022-08-26 | 浙江商汤科技开发有限公司 | Image analysis method and device, video editing method and device, and medium |
Non-Patent Citations (1)
Title |
---|
ZACHARY TEED AND JIA DENG: "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow", 《ARXIV:2003.12039V3》, 25 August 2020 (2020-08-25), pages 1 - 21 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116486107A (en) * | 2023-06-21 | 2023-07-25 | 南昌航空大学 | Optical flow calculation method, system, equipment and medium |
CN116486107B (en) * | 2023-06-21 | 2023-09-05 | 南昌航空大学 | Optical flow calculation method, system, equipment and medium |
CN118397038A (en) * | 2024-06-24 | 2024-07-26 | 中南大学 | Moving object segmentation method, system, equipment and medium based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guen et al. | Disentangling physical dynamics from unknown factors for unsupervised video prediction | |
Guizilini et al. | 3d packing for self-supervised monocular depth estimation | |
Guizilini et al. | Robust semi-supervised monocular depth estimation with reprojected distances | |
CN107369166B (en) | Target tracking method and system based on multi-resolution neural network | |
CN113657560B (en) | Weak supervision image semantic segmentation method and system based on node classification | |
CN111667535B (en) | Six-degree-of-freedom pose estimation method for occlusion scene | |
CN102722697B (en) | Unmanned aerial vehicle autonomous navigation landing visual target tracking method | |
CN108550162B (en) | Object detection method based on deep reinforcement learning | |
CN113158862A (en) | Lightweight real-time face detection method based on multiple tasks | |
CN115063445A (en) | Target tracking method and system based on multi-scale hierarchical feature representation | |
CN109685830B (en) | Target tracking method, device and equipment and computer storage medium | |
CN111696110A (en) | Scene segmentation method and system | |
CN111191739B (en) | Wall surface defect detection method based on attention mechanism | |
CN113160278A (en) | Scene flow estimation and training method and device of scene flow estimation model | |
US20230020713A1 (en) | Image processing system and method | |
CN114677412A (en) | Method, device and equipment for estimating optical flow | |
CN112507943A (en) | Visual positioning navigation method, system and medium based on multitask neural network | |
CN111260660A (en) | 3D point cloud semantic segmentation migration method based on meta-learning | |
CN113420590A (en) | Robot positioning method, device, equipment and medium in weak texture environment | |
CN109493370B (en) | Target tracking method based on space offset learning | |
CN107798329A (en) | Adaptive particle filter method for tracking target based on CNN | |
CN117710841A (en) | Small target detection method and device for aerial image of unmanned aerial vehicle | |
Shukla et al. | UBOL: User-Behavior-aware one-shot learning for safe autonomous driving | |
CN111738092A (en) | Method for recovering shielded human body posture sequence based on deep learning | |
CN112232126A (en) | Dimension reduction expression method for improving variable scene positioning robustness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |