CN113269699B - Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image - Google Patents

Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image Download PDF

Info

Publication number
CN113269699B
CN113269699B CN202110436248.2A CN202110436248A CN113269699B CN 113269699 B CN113269699 B CN 113269699B CN 202110436248 A CN202110436248 A CN 202110436248A CN 113269699 B CN113269699 B CN 113269699B
Authority
CN
China
Prior art keywords
image
event
gray level
frame
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110436248.2A
Other languages
Chinese (zh)
Other versions
CN113269699A (en
Inventor
史殿习
刘聪
苏雅倩文
金松昌
杨烟台
景罗希
李雪辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
National Defense Technology Innovation Institute PLA Academy of Military Science
Original Assignee
Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center, National Defense Technology Innovation Institute PLA Academy of Military Science filed Critical Tianjin (binhai) Intelligence Military-Civil Integration Innovation Center
Priority to CN202110436248.2A priority Critical patent/CN113269699B/en
Publication of CN113269699A publication Critical patent/CN113269699A/en
Application granted granted Critical
Publication of CN113269699B publication Critical patent/CN113269699B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The application relates to the technical field of optical flow estimation in computer vision, in particular to an optical flow estimation method and system based on fusion of asynchronous event streams and gray level images. The method comprises the following steps: acquiring an asynchronous event stream and a synchronous gray image; preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image; performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, performing pooling on the composite image to obtain a region feature matrix, superposing the region feature matrix to obtain a corresponding tensor, and inputting the tensor into a weight adaptive extraction network for fusion; and inputting the fused image obtained after fusion into a trained optical flow estimation depth neural network to obtain a final optical flow estimation result. The method and the system organically fuse the event stream and the gray image, and can improve the robustness and the generalization of the optical flow estimation algorithm.

Description

Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image
Technical Field
The application relates to the technical field of optical flow estimation in computer vision, in particular to an optical flow estimation method and system based on fusion of asynchronous event streams and gray images.
Background
For the optical flow estimation visual task, a clear and stable input is a basic premise to ensure the performance of the algorithm. In a common scene, a traditional camera can capture a clearer image, but in complex conditions of high-speed motion, strong illumination change and the like, the performance of the optical flow estimation method based on the traditional camera is affected. The event camera is a novel bionic visual sensor, has the characteristics of high dynamic range, high time resolution, low delay and the like, and is particularly suitable for high-speed dynamic scenes. The existing optical flow estimation method only utilizes an event camera or only utilizes a traditional camera, and the combination of the complementary two is beneficial to improving the robustness of an optical flow estimation algorithm.
The event camera outputs an asynchronous event stream, and the unique imaging mode of the event camera enables the output data to be unstable and unreliable under the condition that the illumination change is not obvious enough or the relative motion between the event camera and the environment does not change greatly, and the output data contains a lot of noises. In this case, the conventional camera can complementarily capture a clearer image. There are many optical flow estimation algorithms facing asynchronous event streams or traditional cameras, but there is no method for optical flow estimation combining the two.
The output of the event camera is an asynchronous event stream, which is essentially different from the conventional grayscale image in data format, and the first thing to fuse the two is to process the event stream into data having the same format as the grayscale image. Common methods are an event count representation and a recent timestamp representation of the event stream. In other visual task fields, the method of merging visible event streams with traditional images is most often a direct merging method based on channel expansion. The fusion method does not consider that the imaging quality of the event frame is different from that of the traditional image under different conditions, but proportionally fuses the event frame and the traditional image. Meanwhile, the difference of the two quality distributions also exists in different areas of one image.
Therefore, the present application proposes an optical flow estimation method and system based on asynchronous event stream and grayscale image fusion to at least partially solve the above technical problems.
Disclosure of Invention
The method of the invention changes the asynchronous event stream into the same format as the synchronous gray image, and then fuses the asynchronous event stream and the synchronous gray image, thereby improving the robustness and the generalization of the optical flow estimation method.
In order to achieve the technical purpose, the application provides an optical flow estimation method based on the fusion of an asynchronous event flow and a gray level image, which comprises the following steps:
acquiring an asynchronous event stream and a synchronous gray image;
preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image;
performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, pooling the composite image to obtain a regional characteristic matrix, superposing the regional characteristic matrix to obtain a corresponding tensor, and inputting the tensor to a weight adaptive extraction network for fusion;
and inputting the fused image obtained after fusion into a trained optical flow estimation depth neural network to obtain a final optical flow estimation result.
Preferably, the acquiring asynchronous event stream and the synchronous grayscale image are obtained through an event camera, scene construction and motion control.
Preferably, the multiple channels are five channels.
Further, the preprocessing the asynchronous event stream and the synchronous grayscale image to obtain an event frame and a grayscale image specifically includes:
dividing a discrete event into two subjects according to a left binocular camera and a right binocular camera, and respectively corresponding to the event and a gray image;
packaging all events into a Rosbag format file package;
storing the gray level image in a jpg format, and labeling the gray level image according to a time sequence to obtain a gray level image sequence;
and receiving the Rosbag format file packet and the gray level image sequence, and superposing the asynchronous event stream in the Rosbag format file packet into a format of a synchronous frame with the same size as the gray level image, and recording the format of the synchronous frame as an event frame.
Further, the representation method of the event frame adopts an event counting image or a latest timestamp image representation method.
Further, channel superposition and expansion are performed on the event frame and the gray level image according to time alignment to obtain a five-channel composite image, pooling is performed on the five-channel composite image to obtain a regional characteristic matrix, the regional characteristic matrix is superposed to obtain a corresponding tensor, and then the tensor is input to the adaptive extraction network for fusion, specifically comprising:
receiving five frames of event frames and gray level images with the same size, wherein the five frames comprise a counting image 1 frame of a positive event and a latest timestamp image 1 frame thereof, a counting image 1 frame of a negative event and a latest timestamp image 1 frame thereof, and a gray level image 1 frame;
performing channel superposition expansion on the five frames according to time alignment to obtain a composite image containing five channels;
dividing each channel of the composite image into 16 small areas of 4 multiplied by 4 in proportion, then carrying out area average pooling on each small area, and reducing all pixels in the pooled area into a value for replacement to obtain an area characteristic matrix of 4 multiplied by 4;
and superposing the regional characteristic matrixes to obtain tensors with the size of 4 multiplied by 5, and inputting the tensors into a weight self-adaptive extraction network for fusion.
Preferably, the weight adaptive extraction network is implemented by two fully connected layers, the activation function of the full connection of the first layer is a ReLU activation function, the activation function of the full connection of the second layer is a Sigmoid activation function, and the output of the weight adaptive extraction network is a tensor of 4 × 4 × 5 with the same size as the input.
Preferably, the training step of the optical flow estimation deep neural network comprises:
acquiring a fusion image of the event frame and the gray level image;
performing convolution coding and up-sampling decoding on the fused image;
calculating the self-supervision loss, wherein the loss function comprises a smooth error and a photometric error;
and updating the network connection parameters.
Preferably, the optical flow estimation deep neural network employs a 4-layer encoding layer, a 2-layer residual layer, and a 4-layer decoding layer.
The second aspect of the present invention further provides an optical flow estimation system based on asynchronous event stream and grayscale image fusion, including:
the system bottom layer module is used for acquiring an asynchronous event stream and a synchronous gray level image;
the event frame generation module is used for preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image;
the event frame and gray level image fusion module is used for performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, pooling the composite image to obtain a region feature matrix, superposing the region feature matrix to obtain a corresponding tensor, and inputting the tensor into a weight adaptive extraction network for fusion;
and the optical flow estimation depth neural network module is used for inputting the fused image obtained after fusion into the trained optical flow estimation depth neural network to obtain a final optical flow estimation result.
The beneficial effect of this application does:
the optical flow estimation method and system based on the fusion of the asynchronous event stream and the gray level image realize the better fusion of the original three-dimensional asynchronous event stream and the gray level image, finally obtain the clear two-dimensional image under the condition that the data formats of the two images are completely different, organically fuse the two images, and further improve the robustness and the generalization of an optical flow estimation algorithm.
Drawings
FIG. 1 shows a schematic flow chart of the method of embodiment 1 of the present application;
FIG. 2 is a schematic diagram showing a process of fusion in example 1 of the present application;
FIG. 3 is a schematic diagram showing the overall process of optical flow estimation in embodiment 1 of the present application;
FIG. 4 is a view showing a system configuration employed in embodiment 2 of the present application;
fig. 5 shows a system structure and a process diagram in embodiment 3 of the present application.
Detailed Description
Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present application. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present application. It will be apparent to one skilled in the art that the present application may be practiced without one or more of these details. In other instances, well-known features have not been described in order to avoid obscuring the present application.
It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Exemplary embodiments according to the present application will now be described in more detail with reference to the accompanying drawings. These exemplary embodiments may, however, be embodied in many different forms and should not be construed as limited to only the embodiments set forth herein. The figures are not drawn to scale, wherein certain details may be exaggerated and omitted for clarity. The shapes of the various regions, layers and their relative sizes, positional relationships are shown in the drawings as examples only, and in practice deviations due to manufacturing tolerances or technical limitations are possible, and a person skilled in the art may additionally design regions/layers with different shapes, sizes, relative positions according to the actual needs.
Example 1:
the embodiment implements an optical flow estimation method based on asynchronous event stream and grayscale image fusion, as shown in fig. 1, including the following steps:
s1, acquiring an asynchronous event stream and a synchronous gray image;
s2, preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image;
s3, performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, pooling the composite image to obtain a regional characteristic matrix, superposing the regional characteristic matrix to obtain a corresponding tensor, and inputting the tensor to a weight adaptive extraction network for fusion;
and S4, inputting the fused image obtained after fusion into the trained optical flow estimation depth neural network to obtain a final optical flow estimation result.
Preferably, the acquiring asynchronous event stream and the synchronous grayscale image are obtained through an event camera, scene construction and motion control.
Preferably, the multiple channels are five channels.
In S2, the preprocessing the asynchronous event stream and the synchronous grayscale image to obtain an event frame and a grayscale image specifically includes:
dividing a discrete event into two subjects (topic) according to a left binocular camera and a right binocular camera, wherein the subjects correspond to the event and a gray level image respectively;
packaging all events into Rosbag format file packets, and setting the time length of each Rosbag format file packet to be 10 seconds in order to control the size of Rosbag files and reduce the time for a system to read data;
storing the gray level image in a jpg format, and labeling the gray level image according to a time sequence to obtain a gray level image sequence;
and receiving the Rosbag format file packet and the gray level image sequence, and superposing the asynchronous event stream in the Rosbag format file packet into a format of a synchronous frame with the same size as the gray level image by taking the inter-frame time interval of the gray level image as an accumulation time span, and marking as an event frame.
All of the above operations prior to obtaining an event frame are the preprocessing.
Further, the representation method of the event frame adopts an event counting image or a latest time stamp image representation method.
In S3, performing channel superposition expansion on the event frame and the grayscale image according to time alignment to obtain a five-channel composite image, pooling the five-channel composite image to obtain a region feature matrix, superposing the region feature matrix to obtain a corresponding tensor, and inputting the tensor into an adaptive extraction network for fusion, specifically including:
receiving five frames of event frames and gray level images with the same size, wherein the five frames comprise a counting image 1 frame of a positive event and a latest timestamp image 1 frame thereof, a counting image 1 frame of a negative event and a latest timestamp image 1 frame thereof, and a gray level image 1 frame;
performing channel superposition expansion on the five frames according to time alignment to obtain a composite image containing five channels;
in order to obtain the characteristics of each region and in consideration of the size of data quantity, dividing each channel of the composite image into 16 small regions of 4 × 4 in proportion, then performing region average pooling on each small region, and reducing all pixels in the pooled region into one value to replace the value to obtain a region characteristic matrix of 4 × 4 size;
and superposing the region feature matrixes to obtain tensors with the sizes of 4 multiplied by 5, and inputting the tensors into a weight adaptive extraction network for fusion.
Preferably, the weight adaptive extraction network is implemented by two fully connected layers, the activation function of the full connection of the first layer is a ReLU activation function, the activation function of the full connection of the second layer is a Sigmoid activation function, and the output of the weight adaptive extraction network is a tensor of 4 × 4 × 5 with the same size as the input. FIG. 2 is a schematic diagram showing the fusion process, in FIG. 2, F represents the acronym of function, lsq represents the abbreviation of local squeeze, and ex represents the abbreviation of extract-Net. The asynchronous event x is originally three-dimensional, and is subjected to channel expansion into five channels (c is the initial of a channel and represents the channel) according to time alignment, namely the five frames, wherein the five frames comprise a count image 1 frame of a positive event and a latest timestamp image 1 frame thereof, a count image 1 frame of a negative event and a latest timestamp image 1 frame thereof, and a grayscale image 1 frame. The frame and the channel are synonymous, each channel is divided into 16 small areas of 4 multiplied by 4 in proportion, then each small area is subjected to area average pooling, all pixels in the pooled area are reduced to one value to replace the value, a 4 multiplied by 4 sized area feature matrix is obtained, the area feature matrices are superposed to obtain a 4 multiplied by 5 sized tensor, then the tensor is input to a weight adaptive extraction network to be subjected to adaptive weight calculation, asynchronous events are fused with gray level images, and x is changed into 16 small areas
Figure BDA0003033216710000091
The weight self-adaptive extraction network realizes the fusion of the event frame and the gray level image, and if the channel 1 is clearer compared with the event frame, the weight self-adaptive extraction network assigns larger weight to the event frame, so that the self-adaptive extraction network realizes the self-adaptive extractionAnd (4) performing sexual weight calculation to enable the asynchronous event stream and the synchronous gray level image to be effectively fused.
Preferably, the step of training the optical flow estimation depth neural network in S4 includes:
acquiring a fusion image of the event frame and the gray level image;
performing convolutional encoding and upsampling decoding on the fused image;
calculating the self-supervision loss, wherein the loss function comprises a smooth error and a photometric error;
and updating the network connection parameters.
An overall optical flow estimation process schematic diagram based on the fusion of the asynchronous event stream and the gray-scale image is shown in fig. 3, and in fig. 3, the weight adaptive extraction network is abbreviated as LSENet. Further preferably, the optical flow estimation deep neural network employs a 4-layer encoding layer, a 2-layer residual layer, and a 4-layer decoding layer. The coding layer is realized by convolution with the step length of 2, the size of the image is reduced by half and the number of channels is doubled after passing through each coding layer, and meanwhile, in order to enhance the adaptability of the system to the images with different sizes, the data obtained by each coding layer is stored for jump connection with the corresponding coding layer. The residual layer is introduced to prevent overfitting. The decoding layer is realized by bilinear interpolation up-sampling, the size of data is doubled after each layer of original data passes through the decoding layer, and the number of channels is reduced by half. The decoded data is input to the next decoding layer together with the data from the corresponding encoding layer. The network training is realized in an automatic supervision mode, and the loss function comprises a smooth error and a photometric error.
Example 2:
the embodiment implements an optical flow estimation system based on fusion of an asynchronous event stream and a grayscale image, as shown in fig. 4, including:
a system bottom module 501, configured to obtain an asynchronous event stream and a synchronous grayscale image;
an event frame generation module 502, configured to pre-process the asynchronous event stream and the synchronous grayscale image to obtain an event frame and a grayscale image;
an event frame and grayscale image fusion module 503, configured to perform channel superposition and expansion on the event frame and grayscale image according to time alignment to obtain a multi-channel composite image, perform pooling on the composite image to obtain a region feature matrix, perform superposition on the region feature matrix to obtain a corresponding tensor, and input the tensor into a weight adaptive extraction network for fusion;
and an optical flow estimation depth neural network module 504, configured to input the fused image obtained after the fusion into the trained optical flow estimation depth neural network, so as to obtain a final optical flow estimation result.
Example 3:
the embodiment implements an optical flow estimation system based on asynchronous event stream and grayscale image fusion, as shown in fig. 5, including: the system comprises a system bottom layer module, an event frame generation module, an event frame and gray level image fusion module and an optical flow estimation depth neural network module.
The system bottom layer module is loaded with an agent of an event camera, can synchronously acquire asynchronous event streams and traditional gray level images, and generally comprises a motion control module and a scene construction module which are carried by the agent. The ROS and Ubuntu operating systems receive asynchronous event streams from the event cameras DAVIS 346, divide discrete independent event points into two topics according to the difference of left and right binocular cameras, and then pack all events in each topic into a Rosbag format file packet, so that subsequent processing and screening of events are facilitated. The gray level images are stored in a jpg format and are labeled according to the time sequence to obtain a gray level image sequence. And receiving the Rosbag format file packet and the gray level image sequence, and superposing the asynchronous event stream in the Rosbag format file packet into a format of a synchronous frame with the same size as the gray level image by taking the inter-frame time interval of the gray level image as an accumulation time span, and marking as an event frame.
And the event frame generation module is used for preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image, and all operations before the event frame and the gray level image are obtained are preprocessing.
And the event frame and gray level image fusion module is used for performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, pooling the composite image to obtain a region characteristic matrix, superposing the region characteristic matrix to obtain a corresponding tensor, and inputting the tensor into the weight adaptive extraction network for fusion. Namely, the tensor input (Extraction-Net) is subjected to adaptive weight learning, the network is realized by two layers of fully-connected networks, the activation function of the full connection of the first layer is a ReLU activation function, the activation function of the full connection of the second layer is a Sigmoid activation function, and finally, the tensor with the same size as the input tensor of 4 multiplied by 5 is output.
And the optical flow estimation depth neural network module is used for receiving the fusion image output by the event frame and gray image fusion module, estimating the depth neural network through the trained optical flow and obtaining a final optical flow estimation result. The optical flow estimation deep neural network adopts a 4-layer coding layer, a 2-layer residual error layer and a 4-layer decoding layer. The coding layer is realized by convolution with the step length of 2, the size of the image is reduced by half and the number of channels is doubled after passing through each coding layer, and meanwhile, in order to enhance the adaptability of the system to the images with different sizes, the data obtained by each coding layer is stored for jump connection with the corresponding coding layer. The residual layer is introduced to prevent overfitting. The decoding layer is realized by bilinear interpolation up-sampling, the size of data is doubled after each layer of original data passes through the decoding layer, and the number of channels is reduced by half. The decoded data will be input to the next decoding layer together with the data from the corresponding encoding layer. The network training is realized in an automatic supervision mode, and the loss function comprises a smooth error and a photometric error.
As an alternative embodiment, as shown in fig. 5, the optical flow estimation depth neural network module in this embodiment outputs the color optical flow estimation graph, which can be input to downstream applications, such as emergency obstacle avoidance, visual odometer, video analysis, and weather prediction.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. An optical flow estimation method based on asynchronous event flow and gray level image fusion is characterized by comprising the following steps:
acquiring an asynchronous event stream and a synchronous gray image;
preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image;
performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, performing pooling on the composite image to obtain a region feature matrix, superposing the region feature matrix to obtain a corresponding tensor, and inputting the tensor into a weight adaptive extraction network for fusion;
inputting the fused image obtained after fusion into a trained optical flow estimation depth neural network to obtain a final optical flow estimation result;
wherein the multiple channels are five channels;
performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a five-channel composite image, pooling the five-channel composite image to obtain a regional characteristic matrix, superposing the regional characteristic matrix to obtain a corresponding tensor, and inputting the tensor into a self-adaptive extraction network for fusion, wherein the method specifically comprises the following steps of:
receiving five frames of event frames and gray level images with the same size, wherein the five frames comprise a counting image 1 frame of a positive event and a latest timestamp image 1 frame thereof, a counting image 1 frame of a negative event and a latest timestamp image 1 frame thereof, and a gray level image 1 frame;
performing channel superposition expansion on the five frames according to time alignment to obtain a composite image containing five channels;
dividing each channel of the composite image into 16 small areas of 4 multiplied by 4 in proportion, then carrying out area average pooling on each small area, and reducing all pixels in the pooled area into a value to replace the value to obtain an area characteristic matrix of 4 multiplied by 4 size;
and superposing the region feature matrixes to obtain tensors with the sizes of 4 multiplied by 5, and inputting the tensors into a weight adaptive extraction network for fusion.
2. The method of claim 1, wherein the asynchronous event stream and the synchronous grayscale image are obtained by an event camera, scene construction and motion control.
3. The method for estimating optical flow based on the fusion of asynchronous event stream and gray image as claimed in claim 1, wherein the preprocessing the asynchronous event stream and the synchronous gray image to obtain an event frame and a gray image specifically comprises:
dividing a discrete event into two themes according to a left binocular camera and a right binocular camera, and respectively corresponding to the event and a gray image;
packaging all events into a Rosbag format file package;
storing the gray level image in a jpg format, and labeling the gray level image according to a time sequence to obtain a gray level image sequence;
and receiving the Rosbag format file packet and the gray level image sequence, and superposing the asynchronous event stream in the Rosbag format file packet into a format of a synchronous frame with the same size as the gray level image, and recording the format of the synchronous frame as an event frame.
4. The method for estimating optical flow based on asynchronous event stream and gray image fusion as claimed in claim 3, wherein the representation method of the event frame is event counting image or latest time stamp image representation.
5. The method for estimating optical flow based on the fusion of asynchronous event flow and gray image according to claim 1, wherein the weight adaptive extraction network is implemented by two fully connected layers, the activation function of the first layer is a ReLU activation function, the activation function of the second layer is a Sigmoid activation function, and the output of the weight adaptive extraction network is a 4 × 4 × 5 tensor with the same size as the input.
6. The method for estimating optical flow based on fusion of asynchronous event stream and gray image as claimed in claim 1, wherein the step of training the optical flow estimation depth neural network comprises:
acquiring a fusion image of the event frame and the gray level image;
performing convolutional encoding and upsampling decoding on the fused image;
calculating the self-supervision loss, wherein the loss function comprises a smooth error and a photometric error;
and updating the network connection parameters.
7. The method of claim 6, wherein the deep neural network employs 4 coding layers, 2 residual layers and 4 decoding layers.
8. An optical flow estimation system based on asynchronous event stream and gray level image fusion, comprising:
the system bottom layer module is used for acquiring an asynchronous event stream and a synchronous gray level image;
the event frame generation module is used for preprocessing the asynchronous event stream and the synchronous gray level image to obtain an event frame and a gray level image;
the event frame and gray level image fusion module is used for performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a multi-channel composite image, pooling the composite image to obtain a region feature matrix, superposing the region feature matrix to obtain a corresponding tensor, and inputting the tensor into a weight adaptive extraction network for fusion;
the optical flow estimation depth neural network module is used for inputting the fused image obtained after fusion into a trained optical flow estimation depth neural network to obtain a final optical flow estimation result;
wherein the multiple channels are five channels;
performing channel superposition expansion on the event frame and the gray level image according to time alignment to obtain a composite image of five channels, performing pooling on the composite image to obtain a region feature matrix, performing superposition on the region feature matrix to obtain a corresponding tensor, and inputting the tensor into a self-adaptive extraction network for fusion, wherein the method specifically comprises the following steps of:
receiving five frames of event frames and gray level images with the same size, wherein the five frames comprise a counting image 1 frame of a positive event and a latest timestamp image 1 frame thereof, a counting image 1 frame of a negative event and a latest timestamp image 1 frame thereof, and a gray level image 1 frame;
performing channel superposition expansion on the five frames according to time alignment to obtain a composite image containing five channels;
dividing each channel of the composite image into 16 small areas of 4 multiplied by 4 in proportion, then carrying out area average pooling on each small area, and reducing all pixels in the pooled area into a value for replacement to obtain an area characteristic matrix of 4 multiplied by 4;
and superposing the regional characteristic matrixes to obtain tensors with the size of 4 multiplied by 5, and inputting the tensors into a weight self-adaptive extraction network for fusion.
CN202110436248.2A 2021-04-22 2021-04-22 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image Active CN113269699B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110436248.2A CN113269699B (en) 2021-04-22 2021-04-22 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110436248.2A CN113269699B (en) 2021-04-22 2021-04-22 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image

Publications (2)

Publication Number Publication Date
CN113269699A CN113269699A (en) 2021-08-17
CN113269699B true CN113269699B (en) 2023-01-03

Family

ID=77229092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110436248.2A Active CN113269699B (en) 2021-04-22 2021-04-22 Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image

Country Status (1)

Country Link
CN (1) CN113269699B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723009B (en) * 2022-04-12 2023-04-25 重庆大学 Data representation method and system based on asynchronous event stream

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798370A (en) * 2020-06-30 2020-10-20 武汉大学 Manifold constraint-based event camera image reconstruction method and system
CN111798395A (en) * 2020-06-30 2020-10-20 武汉大学 Event camera image reconstruction method and system based on TV constraint
WO2021035807A1 (en) * 2019-08-23 2021-03-04 深圳大学 Target tracking method and device fusing optical flow information and siamese framework
CN112529944A (en) * 2020-12-05 2021-03-19 东南大学 End-to-end unsupervised optical flow estimation method based on event camera
CN112686928A (en) * 2021-01-07 2021-04-20 大连理工大学 Moving target visual tracking method based on multi-source information fusion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733428B2 (en) * 2017-02-01 2020-08-04 The Government Of The United States Of America, As Represented By The Secretary Of The Navy Recognition actions on event based cameras with motion event features
US11288818B2 (en) * 2019-02-19 2022-03-29 The Trustees Of The University Of Pennsylvania Methods, systems, and computer readable media for estimation of optical flow, depth, and egomotion using neural network trained using event-based learning
CN110111366B (en) * 2019-05-06 2021-04-30 北京理工大学 End-to-end optical flow estimation method based on multistage loss
CN111667442B (en) * 2020-05-21 2022-04-01 武汉大学 High-quality high-frame-rate image reconstruction method based on event camera

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021035807A1 (en) * 2019-08-23 2021-03-04 深圳大学 Target tracking method and device fusing optical flow information and siamese framework
CN111798370A (en) * 2020-06-30 2020-10-20 武汉大学 Manifold constraint-based event camera image reconstruction method and system
CN111798395A (en) * 2020-06-30 2020-10-20 武汉大学 Event camera image reconstruction method and system based on TV constraint
CN112529944A (en) * 2020-12-05 2021-03-19 东南大学 End-to-end unsupervised optical flow estimation method based on event camera
CN112686928A (en) * 2021-01-07 2021-04-20 大连理工大学 Moving target visual tracking method based on multi-source information fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras;Alex Zihao Zhu et.al.;《arXiv》;20180813;全文 *
双路融合的深度估计神经网络方法研究;刘春等;《计算机工程与应用》;20191010(第20期);全文 *
基于动态视觉传感器的目标跟踪算法研究;赵园良;《中国优秀博硕士学位论文全文数据库(硕士)》;20210115;全文 *
基于多流对极卷积神经网络的光场图像深度估计;王硕等;《计算机应用与软件》;20200812(第08期);全文 *

Also Published As

Publication number Publication date
CN113269699A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN109377530B (en) Binocular depth estimation method based on depth neural network
WO2021179820A1 (en) Image processing method and apparatus, storage medium and electronic device
KR101947782B1 (en) Apparatus and method for depth estimation based on thermal image, and neural network learning method
CN112529944B (en) End-to-end unsupervised optical flow estimation method based on event camera
CN112580545B (en) Crowd counting method and system based on multi-scale self-adaptive context network
CN113269699B (en) Optical flow estimation method and system based on fusion of asynchronous event flow and gray level image
CN113034413A (en) Low-illumination image enhancement method based on multi-scale fusion residual error codec
CN115294282A (en) Monocular depth estimation system and method for enhancing feature fusion in three-dimensional scene reconstruction
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN114119694A (en) Improved U-Net based self-supervision monocular depth estimation algorithm
CN113660486A (en) Image coding, decoding, reconstructing and analyzing method, system and electronic equipment
TWI826160B (en) Image encoding and decoding method and apparatus
CN116071412A (en) Unsupervised monocular depth estimation method integrating full-scale and adjacent frame characteristic information
CN113191301B (en) Video dense crowd counting method and system integrating time sequence and spatial information
CN112801912B (en) Face image restoration method, system, device and storage medium
CN116091337A (en) Image enhancement method and device based on event signal nerve coding mode
CN115564664A (en) Motion blur removing method of two-stage transform coder/decoder based on fusion bilateral recombination attention
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN111950496B (en) Mask person identity recognition method
CN115131414A (en) Unmanned aerial vehicle image alignment method based on deep learning, electronic equipment and storage medium
CN113762032A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115456903B (en) Deep learning-based full-color night vision enhancement method and system
CN114140363B (en) Video deblurring method and device and video deblurring model training method and device
EP4174757B1 (en) Method and system for embedding information in a video signal
KR20220020560A (en) Method and Apparatus for Frame Rate Up Conversion Using Convolutional Neural Networks with Recurrent Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant