CN118052985A - Low-light video target segmentation method based on event signal driving - Google Patents

Low-light video target segmentation method based on event signal driving Download PDF

Info

Publication number
CN118052985A
CN118052985A CN202410215980.0A CN202410215980A CN118052985A CN 118052985 A CN118052985 A CN 118052985A CN 202410215980 A CN202410215980 A CN 202410215980A CN 118052985 A CN118052985 A CN 118052985A
Authority
CN
China
Prior art keywords
event
low
image
light
moment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410215980.0A
Other languages
Chinese (zh)
Inventor
孙晓艳
李和倍
张越一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202410215980.0A priority Critical patent/CN118052985A/en
Publication of CN118052985A publication Critical patent/CN118052985A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a low-light video target segmentation method based on event signal driving, which comprises the following steps: 1. preparing video data in a low light scene, a target mask and a corresponding event sequence; 2. constructing a low-light video target segmentation model; 3. offline training is carried out on the constructed low-light video target segmentation neural network model; 4. and predicting the mask under the low light scene by using the trained model so as to realize the target of low light video target segmentation. According to the method and the device, the effect of video target segmentation under the low light scene can be improved by utilizing an event data driving mode, so that an accurate target mask can be generated.

Description

Low-light video target segmentation method based on event signal driving
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a low-light video target segmentation method based on event signal driving.
Background
Video object segmentation technology occupies a central research position in the field of computer vision, and its main task is to accurately identify and track one or more target objects in a video sequence. The application range of the technology is very wide, and the technology covers a plurality of fields from the environment perception of an automatic driving system, the video monitoring system of urban safety, video editing software for providing innovative interaction modes and the like. With the rapid development and application of the deep learning technology, the video object segmentation technology has realized a qualitative leap, and particularly when high-definition video input is processed, the existing method not only can segment the object with higher precision, but also can track the motion trail of the object more stably.
However, despite significant achievements under standard lighting conditions, application of video object segmentation techniques under low light conditions still faces significant challenges. In such an environment, the video picture is often subject to serious quality degradation, such as significant increase of noise, massive loss of scene details, and serious distortion of color information, which directly affect the accuracy of the segmentation algorithm and the stability of the tracking algorithm. More importantly, most of the current video object segmentation techniques are highly dependent on clear and high-quality video input as a premise, and the condition is often difficult to meet in practical application scenes such as night monitoring or low-illumination automatic driving. The dependence on high-quality video input greatly limits the application potential and practical effect of video object segmentation technology in low-light environments.
Disclosure of Invention
The invention aims to solve the defects of the prior art, provides a low-light video target segmentation method based on event signal driving, and aims to improve the robustness of a video target segmentation technology under low light and the segmentation capability of a moving object by utilizing the high dynamic characteristic of event data and the high-speed motion characteristic of a captured object and improve the video target segmentation effect under a low light scene.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
The invention discloses a low-light video target segmentation method based on event signal driving, which is characterized by comprising the following steps:
Step 1, acquiring a video image set I under a low light scene, a target mask set Y and a corresponding event sequence E:
step 2, constructing a low-light video target segmentation neural network, which comprises the following steps: a multi-modal encoder and an event guided memory matching module:
Step 2.1, the multi-mode encoder is used for extracting the characteristics of the I and the E to obtain mixed characteristics;
step 2.2, the event-guided memory matching module is used for processing the mixed features to obtain a predicted target mask;
Step 3, constructing a total loss function based on the predicted target mask and the target mask set Y;
And 4, training the low-light video target segmentation neural network by using a gradient descent method, calculating the total loss function L to update network parameters, and stopping training when the training iteration number reaches the set number or the total loss function L converges, so as to obtain the optimal low-light video target segmentation neural network, and processing the low-light video image to obtain a corresponding prediction mask.
The low-light video target segmentation method based on event signal driving is also characterized in that the step 1 is performed according to the following steps:
Step 1.1.1, acquiring a video image set I= { I 1,I2,...,It,...,IT } of a low light scene and a video image set N= { N 1,N2,...,Nt,...,NT } of a corresponding normal light scene, wherein I t represents a low light image at the t moment, and N t represents a normal light image at the t moment; t represents the number of frames of the image;
Step 1.1.2, labeling a target mask of a video image set of a normal light scene through a labeling tool to obtain a target mask set Y= { Y 1,y2,...,yt,...,yT } of the video image sets of a low light scene and the normal light scene, wherein Y t represents a target mask of a low light image I t and a normal light image N t at a t-th moment;
Step 1.1.3 obtains an event sequence of the video image set I of the low light scene, denoted as e= { E 0,1,E1,2,...,Et-1,t,...,ET-1,T }, where E t-1,t represents low light events corresponding to the low light image I t-1 at time t-1 to the low light image I t at time t.
The multi-mode encoder in step 2.1 comprises: the system comprises an image encoder, an event encoder and a self-adaptive cross-mode fusion module;
step 2.1.1, the image encoder is composed of an m-layer residual error module and an n-layer downsampling module;
Inputting the low-light image I t at the t moment into the image encoder to perform feature extraction to obtain a multi-scale image feature F t Img of I t;
Step 2.1.2, the event encoder is composed of an m-layer residual error module and an n-layer downsampling module;
inputting the low light event E t-1,t from the t-1 time to the t time into the event encoder for feature extraction to obtain a multi-scale event feature F t Evt of E t-1,t;
Step 2.1.3, the self-adaptive cross-modal fusion module splices the multi-scale image characteristic F t Img and the multi-scale event characteristic F t Evt in the channel dimension, and then carries out convolution and average pooling treatment to obtain a multi-scale mixed characteristic F t Cat at the t-th moment;
F t Cat respectively performs dot multiplication with the multi-scale image feature F t Img and the multi-scale event feature F t Evt to obtain a multi-scale image feature after t time screening And event feature/>
The screened multi-scale event featuresAfter the channel attention operation and the space attention operation are sequentially carried out, the multiscale event attention characteristic/>, at the t-th moment, is obtainedMulti-scale event attention feature/>And filtered multiscale image features/>Summing to obtain the multiscale image feature/>, of the t moment fusion event information
Multi-scale image features for fused event informationAnd post-screening multiscale event features/>And (3) carrying out convolution summation to obtain a mixed characteristic F t at the t-th moment.
The event-guided memory matching module in the step 2.2 comprises: a memory storage module, an event guidance module, an attention matching module, a mask decoder;
Step 2.2.1, after the memory storage module performs linear transformation on the hybrid characteristic F t at the t-th moment, obtaining a key K t and a value V t at the t-th moment; the mask defining the t-th moment is noted as When t=1, initialize/>
Step 2.2.2 the event guidance module pairs in the channel dimensionAfter being combined with F t evt, the filter signal SE t,SEt at the t moment is obtained by multi-scale information extraction through different convolution kernel sizes and pooling, and F t evt and F t evt are respectively carried outPerforming point multiplication and summation, and finally outputting a guide signal G t after strengthening at the t moment;
Step 2.2.3 the attention matching module obtains the filtered key K t' at time t using equation (1):
Kt′=Kt·Gt (1)
the attention matrix a t+1 at time t+1st is obtained by using the formula (2):
In the formula (2), Q t+1 represents a query value of F t+1 after linear transformation, d k represents channel dimensions of Q t+1 and K t', and Softmax represents an activation function; tr represents matrix transposition;
obtaining a matching result R t+1 at the t+1st time by using the formula (3):
Rt+1=At+1(Gt+Vt) (3)
Step 2.2.4, the mask decoder is composed of a convolution layer and an up-sampling layer;
After the matching result R t+1 and the mixed feature F t+1 are channel-combined, the matching result R t+1 and the mixed feature F t+1 are input into the Mask decoder for processing, and the target Mask t+1 predicted at the t+1st time is output.
The step 3is carried out according to the following steps:
Step 3.1 constructing a t-th moment cross entropy loss function using equation (4)
Step 3.2 construction of Soft Jack loss function at time t Using equation (5)
Step 3.3 constructing the total loss function L at time t using equation (6):
In the formula (7), α and β are two weighting coefficients.
The electronic device of the present invention includes a memory and a processor, wherein the memory is configured to store a program for supporting the processor to execute the low-light video object segmentation method, and the processor is configured to execute the program stored in the memory.
The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being run by a processor, performs the steps of the low-light video object segmentation method.
Compared with the prior art, the invention has the beneficial effects that:
1. The present invention proposes an innovative video object segmentation network design that is based on event signal driven by combining event signal data with conventional video segmentation techniques, innovatively applying the unique advantages of event signals to low-light video object segmentation tasks. The method has remarkable progress in improving the robustness of video target segmentation in a low-illumination environment, and particularly has obvious advantages compared with the existing mainstream video target segmentation technology in improving the segmentation precision and stability of fast moving objects.
2. The invention develops a self-adaptive cross-mode fusion module. The module adopts an advanced multi-scale fusion strategy, so that the information fusion efficiency of the image frames and the event data is enhanced, and the illumination robustness of the event data under the low illumination condition is effectively utilized, so that the performance of video object segmentation under various illumination conditions is remarkably improved.
3. The method creatively fuses the event signal and the target mask characteristic to generate the signal for guiding the segmentation network, and effectively improves the matching capability of the segmentation mask when the network processes the low-illumination video sequence. The innovation solves the problem of reduced segmentation performance caused by low matching accuracy under the condition of low illumination, and further enhances the application capability of the system in complex environments.
4. The invention adopts a supervision training mode to train, and embeds event information in depth into the video target segmentation network, thereby improving the quality of the output mask.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a block diagram of an adaptive cross-modality fusion module of the present invention;
FIG. 3 is a diagram illustrating an event guided memory matching module according to the present invention.
Detailed Description
In this embodiment, a series of challenges brought by a low-illumination environment are faced, and a low-light video target segmentation method driven based on an event signal is provided, which constructs a self-adaptive cross-mode fusion module and an event guiding target matching module by using high dynamic characteristics of event data and high-speed motion of a captured object. The method can adapt to video quality degradation under low illumination conditions and reduce dependence on high-quality video input, so that target segmentation and tracking performance under the conditions can be effectively improved, the application range of a video target segmentation technology can be expanded, the practicability and reliability of the video target segmentation technology under a complex environment can be improved, and wider practical application requirements can be met. As shown in fig. 1, the method specifically comprises the following steps:
step 1, obtaining video data in a low light scene, a target mask and a corresponding event sequence:
Step 1.1.1, acquiring a video image set I= { I 1,I2,...,It,...,IT } of a low light scene and a video image set N= { N 1,N2,...,Nt,...,NT } of a corresponding normal light scene, wherein I t represents a low light image at the t moment, and N t represents a normal light image at the t moment; t represents the number of frames of the image; in this example, the number of image frames t=5 at the time of neural network training.
Step 1.1.2, labeling a target mask of a video image set of a normal light scene by a labeling tool to obtain a target mask set y= { Y 1,y2,...,yt,...,yT } of the video image sets of a low light scene and the normal light scene, wherein Y t represents the target masks of a low light image I t and a normal light image N t at the t-th moment.
Step 1.1.3 obtains an event sequence of the video image set I of the low light scene, denoted as e= { E 0,1,E1,2,...,Et-1,t,...,ET-1,T }, where E t-1,t represents low light events corresponding to the low light image I t-1 at time t-1 to the low light image I t at time t.
Step 2, constructing a low-light video target segmentation neural network, as shown in fig. 1, including: a multi-mode encoder, an event guided memory matching module:
Step 2.1 the multi-mode encoder comprises: the system comprises an image encoder, an event encoder and a self-adaptive cross-mode fusion module;
step 2.1.1, the image encoder is composed of an m-layer residual error module and an n-layer downsampling module; in this example, m=4, n=3.
The low-light image I t at the t-th moment is input into an image encoder for feature extraction to obtain the multi-scale image feature F t Img of the I t. Step 2.1.2, the event encoder is composed of an m-layer residual error module and an n-layer downsampling module; in this example, m=4, n=3
The low light event E t-1,t from the t-1 time to the t time is input into an event encoder for feature extraction, and the multi-scale event feature F t Evt of E t-1,t is obtained.
Step 2.1.3, as shown in fig. 2, the adaptive cross-mode fusion module splices the multi-scale image feature F t Img and the multi-scale event feature F t Evt in the channel dimension, and then carries out convolution and average pooling treatment to obtain a multi-scale mixed feature F t Cat at the t-th moment;
F t Cat respectively performs dot multiplication with the multi-scale image feature F t Img and the multi-scale event feature F t Evt to obtain a multi-scale image feature after t time screening And event feature/>
Multi-scale event features after screeningAfter the channel attention operation and the space attention operation are sequentially carried out, the multiscale event attention characteristic/>, at the t-th moment, is obtainedMulti-scale event attention feature/>And filtered multiscale image features/>Summing to obtain the multiscale image feature/>, of the t moment fusion event information
Multi-scale image features for fused event informationAnd post-screening multiscale event features/>After convolution summation is carried out, a mixed characteristic F t at the t moment is obtained;
step 2.2, an event guided memory matching module, comprising: a memory storage module, an event guidance module, an attention matching module, a mask decoder;
Step 2.2.1, the memory storage module performs linear transformation on the mixed characteristic F t at the t moment to obtain a key K t and a value V t at the t moment; the mask defining the t-th moment is noted as When t=1, initialize/>
Step 2.2.2 As shown in FIG. 3, the event guidance module pairs in the channel dimensionAfter being combined with F t evt, the filter signals SE t,SEt at the t moment are respectively combined with F t evt and/>, and multi-scale information extraction is carried out through different convolution kernel sizes and poolingAnd carrying out dot multiplication and summation, and finally outputting the guide signal G t after strengthening at the t-th moment.
Step 2.2.3 the attention matching module obtains the filtered key K t' at time t using equation (1):
Kt′=Kt·Gt (1)
the attention matrix a t+1 at time t+1st is obtained by using the formula (2):
In formula (2), Q t+1 represents the query vector of F t+1 after linear transformation, d k represents the channel dimensions of vectors Q t+1 and K t', softmax represents the activation function; tr represents the matrix transpose.
Obtaining a matching result R t+1 at the t+1st time by using the formula (3):
Rt+1=At+1(Gt+Vt) (3)
step 2.2.4 the mask decoder is composed of a convolutional layer and an upsampling layer;
After channel combination is carried out on the matching result R t+1 and the mixed feature F t+1, the matching result R t+1 and the mixed feature F t+1 are input into a Mask decoder for processing, and a target Mask t+1 predicted at the t+1st moment is output;
step 3, training of a low-light video target segmentation neural network:
Step 3.1 constructing a t-th moment cross entropy loss function using equation (4)
Step 3.2 construction of Soft Jack loss function at time t Using equation (5)
Step 3.3 constructing the total loss function L at time t using equation (6):
in the formula (7), α and β are two weighting coefficients;
in the formula (7), α and β are two weighting coefficients; in this example, α and β are both 0.5.
And 4, training the low-light video target segmentation neural network by using a gradient descent method, calculating a total loss function L to update network parameters, and stopping training when the training iteration number reaches the set number or the total loss function L converges, so as to obtain an optimal low-light video target segmentation neural network, and processing the low-light video image to obtain a corresponding prediction mask.
In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.

Claims (7)

1. The low-light video target segmentation method based on event signal driving is characterized by comprising the following steps of:
Step 1, acquiring a video image set I under a low light scene, a target mask set Y and a corresponding event sequence E:
step 2, constructing a low-light video target segmentation neural network, which comprises the following steps: a multi-modal encoder and an event guided memory matching module:
Step 2.1, the multi-mode encoder is used for extracting the characteristics of the I and the E to obtain mixed characteristics;
step 2.2, the event-guided memory matching module is used for processing the mixed features to obtain a predicted target mask;
Step 3, constructing a total loss function based on the predicted target mask and the target mask set Y;
And 4, training the low-light video target segmentation neural network by using a gradient descent method, calculating the total loss function L to update network parameters, and stopping training when the training iteration number reaches the set number or the total loss function L converges, so as to obtain the optimal low-light video target segmentation neural network, and processing the low-light video image to obtain a corresponding prediction mask.
2. The method for splitting a low-light video object based on event signal driving according to claim 1, wherein the step 1 is performed as follows:
Step 1.1.1, acquiring a video image set I= { I 1,I2,...,It,...,IT } of a low light scene and a video image set N= { N 1,N2,...,Nt,...,NT } of a corresponding normal light scene, wherein I t represents a low light image at the t moment, and N t represents a normal light image at the t moment; t represents the number of frames of the image;
Step 1.1.2, labeling a target mask of a video image set of a normal light scene by a labeling tool to obtain a target mask set Y= { Y 1,y2,...,yt,...,yT } of the video image sets of the low light scene and the normal light scene, wherein Y t represents target masks of a low light image I t and a normal light image N t at a t-th moment;
step 1.1.3, obtaining an event sequence of a video image set I of a low light scene, which is denoted as E= { E 0,1,E1,2,...,Et-1,t,...,ET-1,T }, wherein E t-1,t represents low light events corresponding to low light images I t-1 at time t-1 to low light images I t at time t.
3. The method for event signal driven low-light video object segmentation according to claim 2, wherein the multi-mode encoder in step 2.1 comprises: the system comprises an image encoder, an event encoder and a self-adaptive cross-mode fusion module;
step 2.1.1, the image encoder is composed of an m-layer residual error module and an n-layer downsampling module;
Inputting the low-light image I t at the t moment into the image encoder to perform feature extraction to obtain a multi-scale image feature F t Img of I t;
Step 2.1.2, the event encoder is composed of an m-layer residual error module and an n-layer downsampling module;
inputting the low light event E t-1,t from the t-1 time to the t time into the event encoder for feature extraction to obtain a multi-scale event feature F t Evt of E t-1,t;
Step 2.1.3, the self-adaptive cross-modal fusion module splices the multi-scale image characteristic F t Img and the multi-scale event characteristic F t Evt in the channel dimension, and then carries out convolution and average pooling treatment to obtain a multi-scale mixed characteristic F t Cat at the t-th moment;
F t Cat respectively performs dot multiplication with the multi-scale image feature F t Img and the multi-scale event feature F t Evt to obtain a multi-scale image feature after t time screening And event feature/>
The screened multi-scale event featuresAfter the channel attention operation and the space attention operation are sequentially carried out, the multiscale event attention characteristic/>, at the t-th moment, is obtainedMulti-scale event attention feature/>And filtered multiscale image features/>Summing to obtain the multiscale image feature/>, of the t moment fusion event information
Multi-scale image features for fused event informationAnd post-screening multiscale event features/>And (3) carrying out convolution summation to obtain a mixed characteristic F t at the t-th moment.
4. The method for event signal driven low-light video object segmentation according to claim 3, wherein the event-guided memory matching module in step 2.2 comprises: a memory storage module, an event guidance module, an attention matching module, a mask decoder;
Step 2.2.1, after the memory storage module performs linear transformation on the mixed characteristic F t at the t moment, obtaining a key K t and a value V t at the t moment; the mask defining the t-th moment is noted as When t=1, initialize/>
Step 2.2.2, the event guidance module pairs in the channel dimensionAfter being combined with F t evt, the filter signals SE t,SEt at the t moment are respectively combined with F t evt and/>, and multi-scale information extraction is carried out through different convolution kernel sizes and poolingPerforming point multiplication and summation, and finally outputting a guide signal G t after strengthening at the t moment;
Step 2.2.3, the attention matching module obtains a filtered key K t' at time t using equation (1):
Kt′=Kt·Gt (1)
the attention matrix a t+1 at time t+1st is obtained by using the formula (2):
In the formula (2), Q t+1 represents a query value of F t+1 after linear transformation, d k represents channel dimensions of Q t+1 and K t', and Softmax represents an activation function; tr represents matrix transposition;
obtaining a matching result R t+1 at the t+1st time by using the formula (3):
Rt+1=At+1(Gt+Vt) (3)
Step 2.2.4, the mask decoder is composed of a convolution layer and an up-sampling layer;
After the matching result R t+1 and the mixed feature F t+1 are channel-combined, the matching result R t+1 and the mixed feature F t+1 are input into the Mask decoder for processing, and the target Mask t+1 predicted at the t+1st time is output.
5. The method for splitting a low-light video object based on event signal driving according to claim 4, wherein said step 3 is performed as follows:
step 3.1, constructing a t moment cross entropy loss function by utilizing the step (4)
Step 3.2, constructing a Soft Jack loss function at the t-th time by using the formula (5)
Step 3.3, constructing a total loss function L at the t-th moment by using the formula (6):
In the formula (7), α and β are two weighting coefficients.
6. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the low-light video object segmentation method of any one of claims 1-5, the processor being configured to execute the program stored in the memory.
7. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the low-light video object segmentation method according to any of claims 1-5.
CN202410215980.0A 2024-02-27 2024-02-27 Low-light video target segmentation method based on event signal driving Pending CN118052985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410215980.0A CN118052985A (en) 2024-02-27 2024-02-27 Low-light video target segmentation method based on event signal driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410215980.0A CN118052985A (en) 2024-02-27 2024-02-27 Low-light video target segmentation method based on event signal driving

Publications (1)

Publication Number Publication Date
CN118052985A true CN118052985A (en) 2024-05-17

Family

ID=91044365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410215980.0A Pending CN118052985A (en) 2024-02-27 2024-02-27 Low-light video target segmentation method based on event signal driving

Country Status (1)

Country Link
CN (1) CN118052985A (en)

Similar Documents

Publication Publication Date Title
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN111639692B (en) Shadow detection method based on attention mechanism
CN110210551B (en) Visual target tracking method based on adaptive subject sensitivity
WO2021129569A1 (en) Human action recognition method
CN113657388B (en) Image semantic segmentation method for super-resolution reconstruction of fused image
CN111340844A (en) Multi-scale feature optical flow learning calculation method based on self-attention mechanism
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN113255837A (en) Improved CenterNet network-based target detection method in industrial environment
CN111696110A (en) Scene segmentation method and system
CN109657538B (en) Scene segmentation method and system based on context information guidance
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN111382647B (en) Picture processing method, device, equipment and storage medium
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
CN111882581B (en) Multi-target tracking method for depth feature association
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN111445496B (en) Underwater image recognition tracking system and method
CN113409355A (en) Moving target identification system and method based on FPGA
CN112785626A (en) Twin network small target tracking method based on multi-scale feature fusion
CN113436198A (en) Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction
CN116543162B (en) Image segmentation method and system based on feature difference and context awareness consistency
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN116363361A (en) Automatic driving method based on real-time semantic segmentation network
CN118052985A (en) Low-light video target segmentation method based on event signal driving
CN116188555A (en) Monocular indoor depth estimation algorithm based on depth network and motion information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination