CN118052985A - Low-light video target segmentation method based on event signal driving - Google Patents
Low-light video target segmentation method based on event signal driving Download PDFInfo
- Publication number
- CN118052985A CN118052985A CN202410215980.0A CN202410215980A CN118052985A CN 118052985 A CN118052985 A CN 118052985A CN 202410215980 A CN202410215980 A CN 202410215980A CN 118052985 A CN118052985 A CN 118052985A
- Authority
- CN
- China
- Prior art keywords
- event
- low
- image
- light
- moment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 52
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 20
- 230000004927 fusion Effects 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000005055 memory storage Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005728 strengthening Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 230000017105 transposition Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 3
- 238000003062 neural network model Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 10
- 238000005286 illumination Methods 0.000 description 9
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a low-light video target segmentation method based on event signal driving, which comprises the following steps: 1. preparing video data in a low light scene, a target mask and a corresponding event sequence; 2. constructing a low-light video target segmentation model; 3. offline training is carried out on the constructed low-light video target segmentation neural network model; 4. and predicting the mask under the low light scene by using the trained model so as to realize the target of low light video target segmentation. According to the method and the device, the effect of video target segmentation under the low light scene can be improved by utilizing an event data driving mode, so that an accurate target mask can be generated.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a low-light video target segmentation method based on event signal driving.
Background
Video object segmentation technology occupies a central research position in the field of computer vision, and its main task is to accurately identify and track one or more target objects in a video sequence. The application range of the technology is very wide, and the technology covers a plurality of fields from the environment perception of an automatic driving system, the video monitoring system of urban safety, video editing software for providing innovative interaction modes and the like. With the rapid development and application of the deep learning technology, the video object segmentation technology has realized a qualitative leap, and particularly when high-definition video input is processed, the existing method not only can segment the object with higher precision, but also can track the motion trail of the object more stably.
However, despite significant achievements under standard lighting conditions, application of video object segmentation techniques under low light conditions still faces significant challenges. In such an environment, the video picture is often subject to serious quality degradation, such as significant increase of noise, massive loss of scene details, and serious distortion of color information, which directly affect the accuracy of the segmentation algorithm and the stability of the tracking algorithm. More importantly, most of the current video object segmentation techniques are highly dependent on clear and high-quality video input as a premise, and the condition is often difficult to meet in practical application scenes such as night monitoring or low-illumination automatic driving. The dependence on high-quality video input greatly limits the application potential and practical effect of video object segmentation technology in low-light environments.
Disclosure of Invention
The invention aims to solve the defects of the prior art, provides a low-light video target segmentation method based on event signal driving, and aims to improve the robustness of a video target segmentation technology under low light and the segmentation capability of a moving object by utilizing the high dynamic characteristic of event data and the high-speed motion characteristic of a captured object and improve the video target segmentation effect under a low light scene.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
The invention discloses a low-light video target segmentation method based on event signal driving, which is characterized by comprising the following steps:
Step 1, acquiring a video image set I under a low light scene, a target mask set Y and a corresponding event sequence E:
step 2, constructing a low-light video target segmentation neural network, which comprises the following steps: a multi-modal encoder and an event guided memory matching module:
Step 2.1, the multi-mode encoder is used for extracting the characteristics of the I and the E to obtain mixed characteristics;
step 2.2, the event-guided memory matching module is used for processing the mixed features to obtain a predicted target mask;
Step 3, constructing a total loss function based on the predicted target mask and the target mask set Y;
And 4, training the low-light video target segmentation neural network by using a gradient descent method, calculating the total loss function L to update network parameters, and stopping training when the training iteration number reaches the set number or the total loss function L converges, so as to obtain the optimal low-light video target segmentation neural network, and processing the low-light video image to obtain a corresponding prediction mask.
The low-light video target segmentation method based on event signal driving is also characterized in that the step 1 is performed according to the following steps:
Step 1.1.1, acquiring a video image set I= { I 1,I2,...,It,...,IT } of a low light scene and a video image set N= { N 1,N2,...,Nt,...,NT } of a corresponding normal light scene, wherein I t represents a low light image at the t moment, and N t represents a normal light image at the t moment; t represents the number of frames of the image;
Step 1.1.2, labeling a target mask of a video image set of a normal light scene through a labeling tool to obtain a target mask set Y= { Y 1,y2,...,yt,...,yT } of the video image sets of a low light scene and the normal light scene, wherein Y t represents a target mask of a low light image I t and a normal light image N t at a t-th moment;
Step 1.1.3 obtains an event sequence of the video image set I of the low light scene, denoted as e= { E 0,1,E1,2,...,Et-1,t,...,ET-1,T }, where E t-1,t represents low light events corresponding to the low light image I t-1 at time t-1 to the low light image I t at time t.
The multi-mode encoder in step 2.1 comprises: the system comprises an image encoder, an event encoder and a self-adaptive cross-mode fusion module;
step 2.1.1, the image encoder is composed of an m-layer residual error module and an n-layer downsampling module;
Inputting the low-light image I t at the t moment into the image encoder to perform feature extraction to obtain a multi-scale image feature F t Img of I t;
Step 2.1.2, the event encoder is composed of an m-layer residual error module and an n-layer downsampling module;
inputting the low light event E t-1,t from the t-1 time to the t time into the event encoder for feature extraction to obtain a multi-scale event feature F t Evt of E t-1,t;
Step 2.1.3, the self-adaptive cross-modal fusion module splices the multi-scale image characteristic F t Img and the multi-scale event characteristic F t Evt in the channel dimension, and then carries out convolution and average pooling treatment to obtain a multi-scale mixed characteristic F t Cat at the t-th moment;
F t Cat respectively performs dot multiplication with the multi-scale image feature F t Img and the multi-scale event feature F t Evt to obtain a multi-scale image feature after t time screening And event feature/>
The screened multi-scale event featuresAfter the channel attention operation and the space attention operation are sequentially carried out, the multiscale event attention characteristic/>, at the t-th moment, is obtainedMulti-scale event attention feature/>And filtered multiscale image features/>Summing to obtain the multiscale image feature/>, of the t moment fusion event information
Multi-scale image features for fused event informationAnd post-screening multiscale event features/>And (3) carrying out convolution summation to obtain a mixed characteristic F t at the t-th moment.
The event-guided memory matching module in the step 2.2 comprises: a memory storage module, an event guidance module, an attention matching module, a mask decoder;
Step 2.2.1, after the memory storage module performs linear transformation on the hybrid characteristic F t at the t-th moment, obtaining a key K t and a value V t at the t-th moment; the mask defining the t-th moment is noted as When t=1, initialize/>
Step 2.2.2 the event guidance module pairs in the channel dimensionAfter being combined with F t evt, the filter signal SE t,SEt at the t moment is obtained by multi-scale information extraction through different convolution kernel sizes and pooling, and F t evt and F t evt are respectively carried outPerforming point multiplication and summation, and finally outputting a guide signal G t after strengthening at the t moment;
Step 2.2.3 the attention matching module obtains the filtered key K t' at time t using equation (1):
Kt′=Kt·Gt (1)
the attention matrix a t+1 at time t+1st is obtained by using the formula (2):
In the formula (2), Q t+1 represents a query value of F t+1 after linear transformation, d k represents channel dimensions of Q t+1 and K t', and Softmax represents an activation function; tr represents matrix transposition;
obtaining a matching result R t+1 at the t+1st time by using the formula (3):
Rt+1=At+1(Gt+Vt) (3)
Step 2.2.4, the mask decoder is composed of a convolution layer and an up-sampling layer;
After the matching result R t+1 and the mixed feature F t+1 are channel-combined, the matching result R t+1 and the mixed feature F t+1 are input into the Mask decoder for processing, and the target Mask t+1 predicted at the t+1st time is output.
The step 3is carried out according to the following steps:
Step 3.1 constructing a t-th moment cross entropy loss function using equation (4)
Step 3.2 construction of Soft Jack loss function at time t Using equation (5)
Step 3.3 constructing the total loss function L at time t using equation (6):
In the formula (7), α and β are two weighting coefficients.
The electronic device of the present invention includes a memory and a processor, wherein the memory is configured to store a program for supporting the processor to execute the low-light video object segmentation method, and the processor is configured to execute the program stored in the memory.
The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being run by a processor, performs the steps of the low-light video object segmentation method.
Compared with the prior art, the invention has the beneficial effects that:
1. The present invention proposes an innovative video object segmentation network design that is based on event signal driven by combining event signal data with conventional video segmentation techniques, innovatively applying the unique advantages of event signals to low-light video object segmentation tasks. The method has remarkable progress in improving the robustness of video target segmentation in a low-illumination environment, and particularly has obvious advantages compared with the existing mainstream video target segmentation technology in improving the segmentation precision and stability of fast moving objects.
2. The invention develops a self-adaptive cross-mode fusion module. The module adopts an advanced multi-scale fusion strategy, so that the information fusion efficiency of the image frames and the event data is enhanced, and the illumination robustness of the event data under the low illumination condition is effectively utilized, so that the performance of video object segmentation under various illumination conditions is remarkably improved.
3. The method creatively fuses the event signal and the target mask characteristic to generate the signal for guiding the segmentation network, and effectively improves the matching capability of the segmentation mask when the network processes the low-illumination video sequence. The innovation solves the problem of reduced segmentation performance caused by low matching accuracy under the condition of low illumination, and further enhances the application capability of the system in complex environments.
4. The invention adopts a supervision training mode to train, and embeds event information in depth into the video target segmentation network, thereby improving the quality of the output mask.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a block diagram of an adaptive cross-modality fusion module of the present invention;
FIG. 3 is a diagram illustrating an event guided memory matching module according to the present invention.
Detailed Description
In this embodiment, a series of challenges brought by a low-illumination environment are faced, and a low-light video target segmentation method driven based on an event signal is provided, which constructs a self-adaptive cross-mode fusion module and an event guiding target matching module by using high dynamic characteristics of event data and high-speed motion of a captured object. The method can adapt to video quality degradation under low illumination conditions and reduce dependence on high-quality video input, so that target segmentation and tracking performance under the conditions can be effectively improved, the application range of a video target segmentation technology can be expanded, the practicability and reliability of the video target segmentation technology under a complex environment can be improved, and wider practical application requirements can be met. As shown in fig. 1, the method specifically comprises the following steps:
step 1, obtaining video data in a low light scene, a target mask and a corresponding event sequence:
Step 1.1.1, acquiring a video image set I= { I 1,I2,...,It,...,IT } of a low light scene and a video image set N= { N 1,N2,...,Nt,...,NT } of a corresponding normal light scene, wherein I t represents a low light image at the t moment, and N t represents a normal light image at the t moment; t represents the number of frames of the image; in this example, the number of image frames t=5 at the time of neural network training.
Step 1.1.2, labeling a target mask of a video image set of a normal light scene by a labeling tool to obtain a target mask set y= { Y 1,y2,...,yt,...,yT } of the video image sets of a low light scene and the normal light scene, wherein Y t represents the target masks of a low light image I t and a normal light image N t at the t-th moment.
Step 1.1.3 obtains an event sequence of the video image set I of the low light scene, denoted as e= { E 0,1,E1,2,...,Et-1,t,...,ET-1,T }, where E t-1,t represents low light events corresponding to the low light image I t-1 at time t-1 to the low light image I t at time t.
Step 2, constructing a low-light video target segmentation neural network, as shown in fig. 1, including: a multi-mode encoder, an event guided memory matching module:
Step 2.1 the multi-mode encoder comprises: the system comprises an image encoder, an event encoder and a self-adaptive cross-mode fusion module;
step 2.1.1, the image encoder is composed of an m-layer residual error module and an n-layer downsampling module; in this example, m=4, n=3.
The low-light image I t at the t-th moment is input into an image encoder for feature extraction to obtain the multi-scale image feature F t Img of the I t. Step 2.1.2, the event encoder is composed of an m-layer residual error module and an n-layer downsampling module; in this example, m=4, n=3
The low light event E t-1,t from the t-1 time to the t time is input into an event encoder for feature extraction, and the multi-scale event feature F t Evt of E t-1,t is obtained.
Step 2.1.3, as shown in fig. 2, the adaptive cross-mode fusion module splices the multi-scale image feature F t Img and the multi-scale event feature F t Evt in the channel dimension, and then carries out convolution and average pooling treatment to obtain a multi-scale mixed feature F t Cat at the t-th moment;
F t Cat respectively performs dot multiplication with the multi-scale image feature F t Img and the multi-scale event feature F t Evt to obtain a multi-scale image feature after t time screening And event feature/>
Multi-scale event features after screeningAfter the channel attention operation and the space attention operation are sequentially carried out, the multiscale event attention characteristic/>, at the t-th moment, is obtainedMulti-scale event attention feature/>And filtered multiscale image features/>Summing to obtain the multiscale image feature/>, of the t moment fusion event information
Multi-scale image features for fused event informationAnd post-screening multiscale event features/>After convolution summation is carried out, a mixed characteristic F t at the t moment is obtained;
step 2.2, an event guided memory matching module, comprising: a memory storage module, an event guidance module, an attention matching module, a mask decoder;
Step 2.2.1, the memory storage module performs linear transformation on the mixed characteristic F t at the t moment to obtain a key K t and a value V t at the t moment; the mask defining the t-th moment is noted as When t=1, initialize/>
Step 2.2.2 As shown in FIG. 3, the event guidance module pairs in the channel dimensionAfter being combined with F t evt, the filter signals SE t,SEt at the t moment are respectively combined with F t evt and/>, and multi-scale information extraction is carried out through different convolution kernel sizes and poolingAnd carrying out dot multiplication and summation, and finally outputting the guide signal G t after strengthening at the t-th moment.
Step 2.2.3 the attention matching module obtains the filtered key K t' at time t using equation (1):
Kt′=Kt·Gt (1)
the attention matrix a t+1 at time t+1st is obtained by using the formula (2):
In formula (2), Q t+1 represents the query vector of F t+1 after linear transformation, d k represents the channel dimensions of vectors Q t+1 and K t', softmax represents the activation function; tr represents the matrix transpose.
Obtaining a matching result R t+1 at the t+1st time by using the formula (3):
Rt+1=At+1(Gt+Vt) (3)
step 2.2.4 the mask decoder is composed of a convolutional layer and an upsampling layer;
After channel combination is carried out on the matching result R t+1 and the mixed feature F t+1, the matching result R t+1 and the mixed feature F t+1 are input into a Mask decoder for processing, and a target Mask t+1 predicted at the t+1st moment is output;
step 3, training of a low-light video target segmentation neural network:
Step 3.1 constructing a t-th moment cross entropy loss function using equation (4)
Step 3.2 construction of Soft Jack loss function at time t Using equation (5)
Step 3.3 constructing the total loss function L at time t using equation (6):
in the formula (7), α and β are two weighting coefficients;
in the formula (7), α and β are two weighting coefficients; in this example, α and β are both 0.5.
And 4, training the low-light video target segmentation neural network by using a gradient descent method, calculating a total loss function L to update network parameters, and stopping training when the training iteration number reaches the set number or the total loss function L converges, so as to obtain an optimal low-light video target segmentation neural network, and processing the low-light video image to obtain a corresponding prediction mask.
In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.
Claims (7)
1. The low-light video target segmentation method based on event signal driving is characterized by comprising the following steps of:
Step 1, acquiring a video image set I under a low light scene, a target mask set Y and a corresponding event sequence E:
step 2, constructing a low-light video target segmentation neural network, which comprises the following steps: a multi-modal encoder and an event guided memory matching module:
Step 2.1, the multi-mode encoder is used for extracting the characteristics of the I and the E to obtain mixed characteristics;
step 2.2, the event-guided memory matching module is used for processing the mixed features to obtain a predicted target mask;
Step 3, constructing a total loss function based on the predicted target mask and the target mask set Y;
And 4, training the low-light video target segmentation neural network by using a gradient descent method, calculating the total loss function L to update network parameters, and stopping training when the training iteration number reaches the set number or the total loss function L converges, so as to obtain the optimal low-light video target segmentation neural network, and processing the low-light video image to obtain a corresponding prediction mask.
2. The method for splitting a low-light video object based on event signal driving according to claim 1, wherein the step 1 is performed as follows:
Step 1.1.1, acquiring a video image set I= { I 1,I2,...,It,...,IT } of a low light scene and a video image set N= { N 1,N2,...,Nt,...,NT } of a corresponding normal light scene, wherein I t represents a low light image at the t moment, and N t represents a normal light image at the t moment; t represents the number of frames of the image;
Step 1.1.2, labeling a target mask of a video image set of a normal light scene by a labeling tool to obtain a target mask set Y= { Y 1,y2,...,yt,...,yT } of the video image sets of the low light scene and the normal light scene, wherein Y t represents target masks of a low light image I t and a normal light image N t at a t-th moment;
step 1.1.3, obtaining an event sequence of a video image set I of a low light scene, which is denoted as E= { E 0,1,E1,2,...,Et-1,t,...,ET-1,T }, wherein E t-1,t represents low light events corresponding to low light images I t-1 at time t-1 to low light images I t at time t.
3. The method for event signal driven low-light video object segmentation according to claim 2, wherein the multi-mode encoder in step 2.1 comprises: the system comprises an image encoder, an event encoder and a self-adaptive cross-mode fusion module;
step 2.1.1, the image encoder is composed of an m-layer residual error module and an n-layer downsampling module;
Inputting the low-light image I t at the t moment into the image encoder to perform feature extraction to obtain a multi-scale image feature F t Img of I t;
Step 2.1.2, the event encoder is composed of an m-layer residual error module and an n-layer downsampling module;
inputting the low light event E t-1,t from the t-1 time to the t time into the event encoder for feature extraction to obtain a multi-scale event feature F t Evt of E t-1,t;
Step 2.1.3, the self-adaptive cross-modal fusion module splices the multi-scale image characteristic F t Img and the multi-scale event characteristic F t Evt in the channel dimension, and then carries out convolution and average pooling treatment to obtain a multi-scale mixed characteristic F t Cat at the t-th moment;
F t Cat respectively performs dot multiplication with the multi-scale image feature F t Img and the multi-scale event feature F t Evt to obtain a multi-scale image feature after t time screening And event feature/>
The screened multi-scale event featuresAfter the channel attention operation and the space attention operation are sequentially carried out, the multiscale event attention characteristic/>, at the t-th moment, is obtainedMulti-scale event attention feature/>And filtered multiscale image features/>Summing to obtain the multiscale image feature/>, of the t moment fusion event information
Multi-scale image features for fused event informationAnd post-screening multiscale event features/>And (3) carrying out convolution summation to obtain a mixed characteristic F t at the t-th moment.
4. The method for event signal driven low-light video object segmentation according to claim 3, wherein the event-guided memory matching module in step 2.2 comprises: a memory storage module, an event guidance module, an attention matching module, a mask decoder;
Step 2.2.1, after the memory storage module performs linear transformation on the mixed characteristic F t at the t moment, obtaining a key K t and a value V t at the t moment; the mask defining the t-th moment is noted as When t=1, initialize/>
Step 2.2.2, the event guidance module pairs in the channel dimensionAfter being combined with F t evt, the filter signals SE t,SEt at the t moment are respectively combined with F t evt and/>, and multi-scale information extraction is carried out through different convolution kernel sizes and poolingPerforming point multiplication and summation, and finally outputting a guide signal G t after strengthening at the t moment;
Step 2.2.3, the attention matching module obtains a filtered key K t' at time t using equation (1):
Kt′=Kt·Gt (1)
the attention matrix a t+1 at time t+1st is obtained by using the formula (2):
In the formula (2), Q t+1 represents a query value of F t+1 after linear transformation, d k represents channel dimensions of Q t+1 and K t', and Softmax represents an activation function; tr represents matrix transposition;
obtaining a matching result R t+1 at the t+1st time by using the formula (3):
Rt+1=At+1(Gt+Vt) (3)
Step 2.2.4, the mask decoder is composed of a convolution layer and an up-sampling layer;
After the matching result R t+1 and the mixed feature F t+1 are channel-combined, the matching result R t+1 and the mixed feature F t+1 are input into the Mask decoder for processing, and the target Mask t+1 predicted at the t+1st time is output.
5. The method for splitting a low-light video object based on event signal driving according to claim 4, wherein said step 3 is performed as follows:
step 3.1, constructing a t moment cross entropy loss function by utilizing the step (4)
Step 3.2, constructing a Soft Jack loss function at the t-th time by using the formula (5)
Step 3.3, constructing a total loss function L at the t-th moment by using the formula (6):
In the formula (7), α and β are two weighting coefficients.
6. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the low-light video object segmentation method of any one of claims 1-5, the processor being configured to execute the program stored in the memory.
7. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the low-light video object segmentation method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410215980.0A CN118052985A (en) | 2024-02-27 | 2024-02-27 | Low-light video target segmentation method based on event signal driving |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410215980.0A CN118052985A (en) | 2024-02-27 | 2024-02-27 | Low-light video target segmentation method based on event signal driving |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118052985A true CN118052985A (en) | 2024-05-17 |
Family
ID=91044365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410215980.0A Pending CN118052985A (en) | 2024-02-27 | 2024-02-27 | Low-light video target segmentation method based on event signal driving |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118052985A (en) |
-
2024
- 2024-02-27 CN CN202410215980.0A patent/CN118052985A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN111639692B (en) | Shadow detection method based on attention mechanism | |
CN110210551B (en) | Visual target tracking method based on adaptive subject sensitivity | |
WO2021129569A1 (en) | Human action recognition method | |
CN113657388B (en) | Image semantic segmentation method for super-resolution reconstruction of fused image | |
CN111340844A (en) | Multi-scale feature optical flow learning calculation method based on self-attention mechanism | |
CN113870335A (en) | Monocular depth estimation method based on multi-scale feature fusion | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN113255837A (en) | Improved CenterNet network-based target detection method in industrial environment | |
CN111696110A (en) | Scene segmentation method and system | |
CN109657538B (en) | Scene segmentation method and system based on context information guidance | |
CN113449691A (en) | Human shape recognition system and method based on non-local attention mechanism | |
CN111382647B (en) | Picture processing method, device, equipment and storage medium | |
CN115393396B (en) | Unmanned aerial vehicle target tracking method based on mask pre-training | |
CN111882581B (en) | Multi-target tracking method for depth feature association | |
CN113554032A (en) | Remote sensing image segmentation method based on multi-path parallel network of high perception | |
CN111445496B (en) | Underwater image recognition tracking system and method | |
CN113409355A (en) | Moving target identification system and method based on FPGA | |
CN112785626A (en) | Twin network small target tracking method based on multi-scale feature fusion | |
CN113436198A (en) | Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction | |
CN116543162B (en) | Image segmentation method and system based on feature difference and context awareness consistency | |
CN110942463B (en) | Video target segmentation method based on generation countermeasure network | |
CN116363361A (en) | Automatic driving method based on real-time semantic segmentation network | |
CN118052985A (en) | Low-light video target segmentation method based on event signal driving | |
CN116188555A (en) | Monocular indoor depth estimation algorithm based on depth network and motion information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |