CN116402874A - Spacecraft depth complementing method based on time sequence optical image and laser radar data - Google Patents
Spacecraft depth complementing method based on time sequence optical image and laser radar data Download PDFInfo
- Publication number
- CN116402874A CN116402874A CN202310393175.2A CN202310393175A CN116402874A CN 116402874 A CN116402874 A CN 116402874A CN 202310393175 A CN202310393175 A CN 202310393175A CN 116402874 A CN116402874 A CN 116402874A
- Authority
- CN
- China
- Prior art keywords
- depth
- image
- level
- feature
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000003287 optical effect Effects 0.000 title claims abstract description 28
- 230000000295 complement effect Effects 0.000 claims abstract description 38
- 238000003062 neural network model Methods 0.000 claims abstract description 30
- 238000004088 simulation Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 9
- 230000004927 fusion Effects 0.000 claims description 77
- 238000000605 extraction Methods 0.000 claims description 29
- 230000011218 segmentation Effects 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 19
- 230000015654 memory Effects 0.000 claims description 18
- 238000007781 pre-processing Methods 0.000 claims description 10
- 230000000877 morphologic effect Effects 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 6
- 238000005286 illumination Methods 0.000 claims description 4
- 239000000463 material Substances 0.000 claims description 4
- 230000036544 posture Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims 1
- 230000008447 perception Effects 0.000 abstract description 3
- 238000001514 detection method Methods 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000011176 pooling Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 210000001503 joint Anatomy 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Geometry (AREA)
- Optics & Photonics (AREA)
- Computer Graphics (AREA)
- Image Processing (AREA)
Abstract
A spacecraft depth complement method based on time sequence optical images and laser radar data belongs to the technical field of space target three-dimensional structure perception. The method aims at solving the problem that the existing single-frame target depth completion method ignores time related information of continuous frames, so that target time sequence depth completion results are inconsistent. The method comprises the steps of obtaining a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar based on a space target three-dimensional model simulation, and generating a target dense depth ground label of an image; and training the time sequence spacecraft depth complement neural network model by using the target gray image time sequence and the sparse depth image time sequence as samples, obtaining the trained time sequence spacecraft depth complement neural network model, embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing the prediction of the dense depth of the space target based on real-time sensing data acquired by a monocular visible light camera and a laser radar. The method is used for depth completion of spacecraft detection.
Description
Technical Field
The invention relates to a spacecraft depth complement method based on time sequence optical images and laser radar data, and belongs to the technical field of space target three-dimensional structure perception.
Background
With the rapid development of space technology, the type of space on-orbit task for human gradually shows a diversified development trend. The three-dimensional structure of the spacecraft is perceived, and the point cloud data are acquired, so that a plurality of space on-orbit tasks are smoothly executed, such as fragment cleaning, on-orbit maintenance, intersection butt joint and the like.
Currently, the mainstream three-dimensional structure perception schemes of spatial targets can be mainly divided into a stereoscopic vision system, a time-of-flight (TOF) camera, and a combination of a monocular camera and a lidar. The stereoscopic vision system recovers the depth of the extracted characteristic points by using a triangulation principle, and has poor effect on a smooth surface or an object with repeated textures; in addition, the base line of the binocular camera greatly limits the working distance of the system, and the space on-orbit task requirement is difficult to meet. TOF cameras calculate the exact depth of a target by measuring the time delay between transmitting and receiving a laser pulse. Although accurate depth can be obtained at high density, the working distance of TOF cameras is typically less than 10m, which is limited by the in-orbit power limit, preventing their use in practical applications. The working distance between the monocular camera and the laser radar is far, and the dense depth of the spacecraft is recovered by utilizing the optical image and the sparse ranging information. Compared with a binocular system and a TOF camera, the combination of the monocular camera and the laser radar can effectively increase the working distance of the system, reduce the sensitivity to illumination conditions and materials, and is more suitable for being used in space practical application.
Since achieving dense depth restoration of targets based on monocular cameras and lidar has important applications in many scenarios, a large number of deep learning-based target depth complement algorithms have been proposed in recent years to meet various depth-based application requirements. Although the problem of target depth completion based on single-frame sensor data has made an important progress, the data to be processed in the actual on-orbit working process is sensor sequence data, and the existing single-frame target depth completion method ignores time related information of continuous frames, so that the target time sequence depth completion results are inconsistent. The present invention is therefore aimed at studying a method of spatial spacecraft depth complement based on sequence data.
Disclosure of Invention
Aiming at the problem that the existing single-frame target depth completion method ignores time-related information of continuous frames, so that target time sequence depth completion results are inconsistent, the invention provides a spacecraft depth completion method based on time sequence optical images and laser radar data.
The invention relates to a spacecraft depth complementing method based on time sequence optical images and laser radar data, which comprises the following steps of,
collecting a plurality of space target three-dimensional models, and setting simulation working conditions of the space target three-dimensional models and sensor parameters of a monocular visible light camera and a laser radar; obtaining a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar based on simulation of a plurality of space target three-dimensional models, and generating a target dense depth ground label of the image;
constructing a time sequence spacecraft depth complement neural network model: comprising a plurality of target depth prediction branches cascaded in chronological order;
each target depth prediction branch includes an encoding stage and a decoding stage:
the coding stage comprises a foreground segmentation module, a gray image feature extraction module, a morphological preprocessing module and a depth image feature extraction module; the decoding stage comprises an LSTM module and a deconvolution layer;
the prediction process for the spatial target depth at the time t comprises the following steps:
sparse depth image I at time t s,t Preprocessing by a morphological preprocessing module to obtain a preprocessed depth image, and inputting the preprocessed depth image into a depth image feature extraction module;
target gray scale image I at time t g,t Extracting multi-scale gray scale image feature images with different semantic levels and different resolutions through a gray scale image feature extraction module;
sparse depth image I for time t s,t And target gray scale image I g,t After cascade operation, inputting the image to a foreground segmentation module for target foreground segmentation to obtain a foreground segmentation image;
the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then performs feature fusion with a gray image feature image step by step according to the sequence from large scale to small scale to obtain a multi-mode data fusion feature F at the moment t t ;
Multi-mode data fusion feature F at t moment t Modeling a time sequence relation with the corresponding level characteristic state at the time t-1 step by step through an LSTM module, and performing characteristic decoding through a deconvolution layer; finally, combining the layer characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t;
training a sequential spacecraft depth complement neural network model by using a target gray image sequential sequence of a monocular visible light camera and a sparse depth image sequential sequence of a laser radar as sample images to obtain a trained sequential spacecraft depth complement neural network model;
and embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing prediction of dense depth of the space target based on real-time sensing data acquired by the monocular visible light camera and the laser radar.
According to the spacecraft depth complement method based on time sequence optical image and laser radar data, the target gray level image I g,t Extracting gray image feature images with five sizes respectively as target gray images I by a gray image feature extraction module g,t 1/2, 1/4, 1/8, 1/16 and 1/32.
According to the spacecraft depth complement method based on time sequence optical images and laser radar data, a depth image feature extraction module obtains t-moment multi-mode data fusion features F t The process of (1) comprises:
the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then the preprocessed depth image is processed through the primary residual error module and fused with the 1/2 gray image feature image through the primary feature fusion module to obtain primary data fusion features; the primary data fusion features are processed by a secondary residual error module and then fused with the 1/4 gray image feature map by a secondary feature fusion module to obtain secondary data fusion features; the secondary data fusion features are processed by the tertiary residual error module and then fused with the 1/8 gray level image feature map by the tertiary feature fusion module to obtain tertiary data fusion features; the three-level data fusion features are processed by a four-level residual error module and then fused with the 1/16 gray image feature map by a four-level feature fusion module to obtain four-level data fusion features; the four-level data fusion feature is processed by a five-level residual error module and then is subjected to element addition with the 1/32 gray level image feature map to obtain the five-level data fusion feature which is used as a t-moment multi-mode data fusion feature F t 。
According to the spacecraft depth complement method based on the time sequence optical image and the laser radar data, the method for obtaining the target depth prediction result at the time t in the decoding stage comprises the following steps:
fusion of t-moment multi-mode data to feature F t Five-stage characteristic state at time t-1 through five-stage LSTM moduleModeling the time sequence relationship to obtain five times at the time tStage timing enhancement feature->And five-level characteristic memory state-> And->Five-level characteristic state of common composition t time>Five-level timing enhancement feature->Performing feature decoding through a five-stage deconvolution layer to obtain a five-stage feature decoding result; the five-level feature decoding result passes through a four-level LSTM module and a four-level feature state at the time t-1Modeling the time sequence relationship to obtain four-level time sequence enhancement characteristic ++at the time t>And four-level characteristic memory state->And->Four-level characteristic states which together form the time t>Four-level timing enhancement feature->Performing feature decoding through a four-level deconvolution layer to obtain a four-level feature decoding result; the four-level characteristic decoding result passes through a three-level LSTM module and the three-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain three-level time sequence enhancement characteristic +.>And three-level characteristic memory state-> And (3) withThree-level characteristic state of common composition t time>Three-level timing enhancement feature->Performing feature decoding through the three-level deconvolution layer to obtain a three-level feature decoding result; the third-level characteristic decoding result passes through a second-level LSTM module and a second-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain the secondary time sequence enhancement characteristic +.>And a secondary feature memory stateAnd->Second-order characteristic states which together form the time t>Second order timing enhancement feature->Performing feature decoding through a secondary deconvolution layer to obtain a secondary feature decoding result; the secondary characteristic decoding result passes through a primary LSTM module and the primary characteristic state at the time t-1>Modeling the time sequence relationship to obtain the first-level time sequence enhancement characteristic +.>And the first-level characteristic memory state->And->First order characteristic states which together form the time t>One level of timing enhancement feature->Performing feature decoding through the primary deconvolution layer to obtain a primary feature decoding result;
and combining the primary characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t.
According to the spacecraft depth complement method based on time sequence optical images and laser radar data, a foreground segmentation module uses sparse depth image I s,t And target gray scale image I g,t As input, predicting each pixel in the cascade image by means of an encoding-decoding structure with a skip connectionAnd setting the pixel depth prediction result lower than the target threshold value to 0 to obtain the image after foreground segmentation.
According to the spacecraft depth complement method based on the time sequence optical image and the laser radar data, the simulation working conditions of the set space target three-dimensional model comprise three-dimensional model material parameter setting, texture map setting, illumination adding, earth background adding, starry sky background adding, setting of relative positions and relative postures of a target and an observation platform and setting of output nodes.
According to the spacecraft depth complementing method based on the time sequence optical image and the laser radar data, in the training process of the time sequence spacecraft depth complementing neural network model, the network weight parameter gradient is calculated by utilizing the error between the target depth prediction result and the target dense depth ground label of the image each time, so that the network parameter updating is realized.
According to the spacecraft depth complementing method based on the time sequence optical image and the laser radar data, after the network parameters of the time sequence spacecraft depth complementing neural network model are updated for set times, performance verification is carried out on the network model by using verification set sample data, and the network parameters corresponding to the optimal performance of the network model are used as the network parameters of the time sequence spacecraft depth complementing neural network model after training.
According to the spacecraft depth complement method based on the time sequence optical image and the laser radar data, a three-dimensional rendering software simulation is adopted on a space target three-dimensional model to obtain a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of the laser radar.
According to the spacecraft depth complementing method based on the time sequence optical image and the laser radar data, in the decoding stage, spacecraft size priori knowledge is introduced, and pixel depths, of which the deviation between a target depth prediction result and a laser radar average ranging result is larger than a preset deviation threshold value, are filtered.
The invention has the beneficial effects that: the method fully utilizes the space-time coherence contained in continuous frame data through the LSTM module in the decoding stage, thereby obtaining the depth complement result with high precision and consistent time of the space target. Firstly, collecting a space target three-dimensional structure model, and realizing simulation of sensor imaging data and automatic generation of a depth label under different working conditions and different camera parameters by utilizing three-dimensional software, thereby constructing a space target time sequence depth complement data set; then model training and parameter updating are carried out on the deep complement neural network model of the sequential spacecraft; and finally, the test data is injected into the trained time sequence spacecraft depth completion neural network model, so that the completion and completion precision evaluation of the target depth data can be realized.
The method fully excavates the target information association relation between different frame image data, improves the accurate recovery of the dense depth of the space target, and obtains the space target depth prediction result with stable time sequence; the method can solve the problem that the inter-frame complement results are inconsistent due to the fact that the existing single-frame depth complement method ignores time sequence data related information, has the advantages of being small in memory occupation, high in accuracy, high in speed, consistent in time sequence prediction result and the like, and achieves accurate recovery of the space target three-dimensional fine structure.
According to the method, a cyclic neural network is introduced into a standard decoder structure, so that the network can sense the change of target time sequence characteristics; the cyclic neural network adopts a multi-level embedded form, so that the network can adapt to working condition scenes with different change rates, and the network is more robust. The recurrent neural network can accumulate past target feature knowledge, and network prediction results are more and more accurate along with the time.
Drawings
FIG. 1 is a schematic flow chart of a spacecraft depth completion method based on time-series optical images and laser radar data;
FIG. 2 is an overall framework diagram of a time series spacecraft depth complement neural network model; in the diagram, t-2 is taken as a subscript variable, and the corresponding characteristic state at the moment t-2 is taken as a variable;
FIG. 3 is a schematic flow diagram of a feature fusion module within the encoding stage depth image feature extraction module;
fig. 4 is a block diagram of a long and short term memory network (LSTM module).
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.
The invention provides a spacecraft depth complementing method based on time sequence optical images and laser radar data, which is shown in the accompanying figures 1 to 3,
collecting a plurality of space target three-dimensional models, and setting on-orbit imaging simulation working conditions of the space target three-dimensional models and sensor parameters of a monocular visible light camera and a laser radar; a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar are obtained based on simulation of a plurality of space target three-dimensional models, and a target dense depth ground label of the image is generated, so that a space target time sequence depth complement data set is constructed;
constructing a time sequence spacecraft deep complement neural network model and carrying out network parameter initialization design and super-parameter setting; the network model comprises a plurality of target depth prediction branches which are cascaded in time sequence;
each target depth prediction branch includes an encoding stage and a decoding stage:
the coding stage comprises a foreground segmentation module, a gray image feature extraction module, a morphological preprocessing module and a depth image feature extraction module; the decoding stage comprises an LSTM module and a deconvolution layer; the depth image feature extraction module comprises four feature fusion modules;
the prediction process for the spatial target depth at the time t comprises the following steps:
sparse depth image I at time t s,t Preprocessing by a morphological preprocessing module to obtain a preprocessed depth image, and inputting the preprocessed depth image into a depth image feature extraction module;
target gray scale image I at time t g,t Extracting multi-scale gray scale image feature images with different semantic levels and different resolutions through a gray scale image feature extraction module;
sparse depth image I for time t s,t And target gray scale image I g,t After cascade operation, inputting the image to a foreground segmentation module for target foreground segmentation to obtain a foreground segmentation image;
the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then performs feature fusion with a gray image feature image step by step according to the sequence from large scale to small scale to obtain a multi-mode data fusion feature F at the moment t t ;
Multi-mode data fusion feature F at t moment t Modeling a time sequence relation with the corresponding level characteristic state at the time t-1 step by step through an LSTM module, and performing characteristic decoding through a deconvolution layer; finally, combining the layer characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t;
training a sequential spacecraft depth complement neural network model by using a target gray image sequential sequence of a monocular visible light camera and a sparse depth image sequential sequence of a laser radar as sample images to obtain a trained sequential spacecraft depth complement neural network model;
and embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing prediction of dense depth of a space target and recovery of a target three-dimensional structure based on real-time sensing data acquired by a monocular visible light camera and a laser radar.
The spatial target three-dimensional model in this embodiment may be obtained by collection purchase.
The time sequence spacecraft depth complement neural network model adopts an encoder-decoder structure as a network base structure to obtain an end-to-end trainable neural network model.
In target gray level image and sparse depth image feature coding, in order to prevent the damage of too sparse data to convolution operation, pseudo dense depth data is obtained by adopting a morphological preprocessing method and is used as input of a depth image feature extraction module.
The LSTM module in the decoding stage in this embodiment is one of Recurrent Neural Networks (RNNs), and is hierarchically embedded in a standard decoder structure to capture feature deviations between frames of feature maps of different levels, so as to fully utilize time correlation between adjacent frames to generate a dense depth map with stable time sequence.
In this embodiment, in order to filter the interference of the unrelated background on the depth completion of the spacecraft, a foreground segmentation network in a single-frame spacecraft depth completion network (SDCNet) is used to filter the sky background.
In the present embodiment, at the feature decoding stage, the LSTM module is used to fuse the multi-mode data at the time t-1 with the feature F t And the characteristic state s at the current moment t-1 Modeling time sequence relation and outputting time sequence enhancement characteristicsTiming enhancement feature->Feature decoding is performed by the deconvolution layer.
Still further, as shown in fig. 2, the target gradation image I in the present embodiment g,t Extracting gray image feature images with five sizes respectively as target gray images I by a gray image feature extraction module g,t 1/2, 1/4, 1/8, 1/16 and 1/32.
Further, referring to fig. 2, the depth image feature extraction module obtains a multi-mode data fusion feature F at time t t The process of (1) comprises:
the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then fuses the preprocessed depth image with the 1/2 gray image feature image after being processed by the primary residual error module by the primary feature fusion module to obtain a depth imageA stage data fusion feature; the primary data fusion features are processed by a secondary residual error module and then fused with the 1/4 gray image feature map by a secondary feature fusion module to obtain secondary data fusion features; the secondary data fusion features are processed by the tertiary residual error module and then fused with the 1/8 gray level image feature map by the tertiary feature fusion module to obtain tertiary data fusion features; the three-level data fusion features are processed by a four-level residual error module and then fused with the 1/16 gray image feature map by a four-level feature fusion module to obtain four-level data fusion features; the four-level data fusion feature is processed by a five-level residual error module and then is subjected to element addition with the 1/32 gray level image feature map to obtain the five-level data fusion feature which is used as a t-moment multi-mode data fusion feature F t 。
Still further, as shown in fig. 2 and 3, the method for obtaining the target depth prediction result at the time t in the decoding stage includes:
fusion of t-moment multi-mode data to feature F t Five-stage characteristic state at time t-1 through five-stage LSTM moduleModeling the time sequence relationship to obtain five-level time sequence enhancement characteristic +.>And five-level characteristic memory state->And->Five-level characteristic state of common composition t time>Five-level timing enhancement feature->Performing feature decoding through a five-stage deconvolution layer to obtain a five-stage feature decoding result; passing the five-level feature decoding result through four levelsLSTM module and four-level characteristic state at t-1 momentModeling the time sequence relationship to obtain four-level time sequence enhancement characteristic ++at the time t>And four-level characteristic memory state->And->Four-level characteristic states which together form the time t>Four-level timing enhancement feature->Performing feature decoding through a four-level deconvolution layer to obtain a four-level feature decoding result; the four-level characteristic decoding result passes through a three-level LSTM module and the three-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain three-level time sequence enhancement characteristic +.>And three-level characteristic memory state-> And (3) withThree-level characteristic state of common composition t time>Three-level timing enhancement feature->Performing feature decoding through the three-level deconvolution layer to obtain a three-level feature decoding result; the third-level characteristic decoding result passes through a second-level LSTM module and a second-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain the secondary time sequence enhancement characteristic +.>And a secondary feature memory stateAnd->Second-order characteristic states which together form the time t>Second order timing enhancement feature->Performing feature decoding through a secondary deconvolution layer to obtain a secondary feature decoding result; the secondary characteristic decoding result passes through a primary LSTM module and the primary characteristic state at the time t-1>Modeling the time sequence relationship to obtain the first-level time sequence enhancement characteristic +.>And the first-level characteristic memory state->And->First order characteristic states which together form the time t>One level of timing enhancement feature->Performing feature decoding through the primary deconvolution layer to obtain a primary feature decoding result;
and combining the primary characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t.
In the present embodiment, the foreground segmentation module uses sparse depth image I s,t And target gray scale image I g,t The cascade image of (2) is used as input, the probability that each pixel in the cascade image belongs to a target is predicted through the coding-decoding structure with jump connection, and the pixel depth prediction result lower than the target threshold value is set to 0, so that the image after foreground segmentation is obtained.
As an example, the simulation conditions for setting the three-dimensional model of the spatial target include three-dimensional model material parameter setting, texture map setting, illumination adding, earth background adding, starry sky background adding, setting of relative positions and relative postures of the target and the observation platform, and setting of output nodes.
Still further, in the training process of the time sequence spacecraft depth complement neural network model, the network weight parameter gradient is calculated by utilizing the error between the target depth prediction result and the target dense depth ground label of the image each time, so that the network parameter updating is realized.
In this embodiment, after the network parameters of the time-series spacecraft depth-complement neural network model are updated for a set number of times, performance verification is performed on the network model by using verification set sample data, and the network parameters corresponding to the network model with optimal performance are stored as the network parameters of the trained time-series spacecraft depth-complement neural network model.
And taking the saved network parameters as final network weights, loading test data to the network to obtain a deep complement result, and comparing the deep complement result with a ground label to evaluate network accuracy.
As an example, a target gray image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar are obtained by adopting three-dimensional rendering software simulation to a space target three-dimensional model.
In the embodiment, in the decoding stage, spacecraft size priori knowledge is introduced, and pixel depths, where the deviation between a target depth prediction result and a laser radar average ranging result is greater than a preset deviation threshold, are filtered, so that a high-quality depth prediction result is obtained.
The following describes the working process of the deep complement neural network model of the time sequence spacecraft in detail:
referring to fig. 2, a target gray-scale image I is inputted at time t g,t And sparse depth image I s,t Different sensor data features are aggregated by adopting a multi-source deep image feature extraction module, so that a Gao Biaozheng-force multi-mode data fusion feature F is obtained t Can be expressed as:
F t =F enc o der ([I g,t ,I s,t ],θ enc o der ),
f in the formula encoder For encoder feature extraction function, θ encoder Representing the network parameters that the encoder structure needs to learn.
In combination with the illustration of fig. 3, the feature fusion module in the encoding stage realizes the fusion of the target gray image features and the depth image features based on the attention mechanism, and provides features with high characterization capability for the subsequent target depth decoding.
The feature fusion module mainly comprises a feature embedding layer, a cross channel fusion layer and a space attention layer; the following is a detailed description through the primary feature fusion module, and the corresponding operation is performed on the primary feature added with the upper right corner mark 1:
the primary characteristic fusion module uses the primary characteristic of the gray imageAnd depth image oneStage feature(C represents the number of characteristic channels, H and W represent the length and width of the characteristic map respectively) as input, and obtaining primary data fusion characteristics>
The feature embedding layer encodes the feature graphs of different channels to generate corresponding feature vectors; the characteristic embedding layer respectively characterizes gray scaleAnd depth feature->The method comprises the steps of decomposing the region into M mutually non-overlapping regions (each feature block is of a size of S multiplied by S), and extracting the region features by using a depth separable convolution operation with a convolution kernel of the size of S multiplied by S and a convolution step of S. In addition, pair->And->The regional blocks of the (2) are subjected to maximum pooling operation and average pooling operation to extract regional global features, and finally three features are cascaded to output gray level image primary embedded features +.>First-level embedding of features with depth image->Wherein d is k =3×H×W/S 2 。
The cross channel attention layer takes the characteristic encoding result of the characteristic embedding layer as input, firstly adopts linear transformation to respectively calculate gray image primary embedded characteristic query vectors and depth image primary embedded characteristic key vectors of n attention heads, and can be specifically expressed as:
in the middle ofLinear mapping weight matrix for computing query vector and key vector, respectively, < >>Respectively embedding a feature query vector and a depth image embedded feature key vector for the gray level image of the ith attention head; n is the number of attention headers.
The gray image feature and depth image feature channel association weight matrix of the ith attention head is further calculated by using the scaling dot product attention, and can be specifically expressed as:
w in i The weights are associated for the feature channels of the ith attention head, softmax (·) is a normalization function.
Finally, according to the association weight matrix, the gray image primary characteristic is carried outPrimary feature of depth image F s 1 Channel fusion is performed. Specifically, the method can be expressed as:
h in g 、h s Respectively one-level characteristics of gray-scale imagesFirst-level features of depth image->Feature vectors spread in rows, w i For the feature channel associated weight of the ith attention head, reshape (·) is a vector dimension transform operation, +.>The first channel fusion feature for the ith attention head.
Finally, cascading the channel fusion characteristics obtained by calculation of n attention heads, and further realizing multi-head attention characteristic fusion by using convolution operation to obtain primary channel fusion characteristicsSpecifically, the method can be expressed as:
wherein Conv (·) is a convolution operation, [ ·; and is a characteristic cascade operation.
Spatial attention layer with primary channel fusion featuresFirst-order features of gray-scale images->For input, features at different spatial positions are characterized by means of channel average pooling and channel maximum pooling operations, and a feature cascade is input to a convolution layer to obtain first-level spatial attention weights. Specifically, the method can be expressed as:
in the middle ofIs the first-level spatial attention weight; max (max) c (·)、avg c (. Cndot.) is the channel maximum pooling and average pooling operations, respectively; sigma (·) is a Sigmoid normalization function; conv (-) and [; carrying out]A convolution operation and a feature concatenation operation, respectively.
Based on the first-order spatial attention weight, the features at different spatial positions are weighted and summed to obtain a first-order fusion featureSpecifically, the method can be expressed as:
where +..
Still further, in the feature encoding stage, four levels of fused featuresThe five-level residual error module is used for processing and adding the five-level residual error module with 1/32 gray image characteristic image elements to obtain five-level data fusion characteristics serving as t-moment multi-mode data fusion characteristics F t 。
In the decoding stage, a long and short-term memory network (LSTM module) is introduced before each deconvolution operation, as shown in fig. 4. The LSTM module is mainly composed of an input gate i t Forgetting door f t Output gate o t The flow direction of different frame information is controlled, the internal data processing process is described in detail by taking a five-stage LSTM module as an example, and the corresponding variable is added with an upper right corner mark 5 to correspond to the following:
the same-level characteristic memory state of one frame above five-level LSTM moduleThe same level time sequence enhancement feature of the previous frame>Current frame fusion feature F t For input, output the characteristic memory state of the current frame +.>And current frame timing enhancement features And->Characteristic states which together form the current frame +.>Timing enhancement feature->As input to a subsequent deconvolution operation.And->The calculation process of (1) can be expressed as:
wherein [. Cndot.; carrying out]Characterizing the feature cascade operation, # denotes a convolution operation, +. f 、W i 、W o 、W c 、b f 、b i 、b o 、b c And memorizing the parameters to be learned of the network for a long time. Conv in FIG. 4 represents a convolution operation, and tanh represents a hyperbolic tangent function.
In summary, the method of the invention introduces a cyclic neural network into the decoder structure, so that the network can mine the target time sequence characteristic change; the method comprises the steps of embedding a circulating neural network into the deconvolution operation of different levels in a hierarchical manner by adopting a multi-scale mechanism, so that the network can adapt to target characteristic changes under different movement speeds and different movement modes; and finally, the network prediction result is more stable through the time correlation of the depth mining time sequence data, and the depth complement precision can be continuously improved along with the time.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.
Claims (10)
1. A spacecraft depth complementing method based on time sequence optical images and laser radar data is characterized by comprising the following steps of,
collecting a plurality of space target three-dimensional models, and setting simulation working conditions of the space target three-dimensional models and sensor parameters of a monocular visible light camera and a laser radar; obtaining a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar based on simulation of a plurality of space target three-dimensional models, and generating a target dense depth ground label of the image;
constructing a time sequence spacecraft depth complement neural network model: comprising a plurality of target depth prediction branches cascaded in chronological order;
each target depth prediction branch includes an encoding stage and a decoding stage:
the coding stage comprises a foreground segmentation module, a gray image feature extraction module, a morphological preprocessing module and a depth image feature extraction module; the decoding stage comprises an LSTM module and a deconvolution layer;
the prediction process for the spatial target depth at the time t comprises the following steps:
sparse depth image I at time t s,t Preprocessing by a morphological preprocessing module to obtain a preprocessed depth image, and inputting the preprocessed depth image into a depth image feature extraction module;
target gray scale image I at time t g,t Extracting multi-scale gray scale image feature images with different semantic levels and different resolutions through a gray scale image feature extraction module;
sparse depth image I for time t s,t And target gray scale image I g,t After cascade operation, inputting the image to a foreground segmentation module for target foreground segmentation to obtain a foreground segmentation image;
the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then performs feature fusion with a gray image feature image step by step according to the sequence from large scale to small scale to obtain a multi-mode data fusion feature F at the moment t t ;
Multi-mode data fusion feature F at t moment t Modeling a time sequence relation with the corresponding level characteristic state at the time t-1 step by step through an LSTM module, and performing characteristic decoding through a deconvolution layer; finally, combining the layer characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t;
training a sequential spacecraft depth complement neural network model by using a target gray image sequential sequence of a monocular visible light camera and a sparse depth image sequential sequence of a laser radar as sample images to obtain a trained sequential spacecraft depth complement neural network model;
and embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing prediction of dense depth of the space target based on real-time sensing data acquired by the monocular visible light camera and the laser radar.
2. The spacecraft depth complementing method based on time-series optical images and laser radar data according to claim 1, wherein the target gray scale image I g,t Extracting gray image feature images with five sizes respectively as target gray images I by a gray image feature extraction module g,t 1/2, 1/4, 1/8, 1/16 and 1/32.
3. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 2, wherein,
the depth image feature extraction module obtains a t-moment multi-mode data fusion feature F t The process of (1) comprises:
the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then the preprocessed depth image is processed through the primary residual error module and fused with the 1/2 gray image feature image through the primary feature fusion module to obtain primary data fusion features; the primary data fusion features are processed by a secondary residual error module and then fused with the 1/4 gray image feature map by a secondary feature fusion module to obtain secondary data fusion features; the secondary data fusion features are processed by the tertiary residual error module and then fused with the 1/8 gray level image feature map by the tertiary feature fusion module to obtain tertiary data fusion features; the three-level data fusion features are processed by a four-level residual error module and then fused with the 1/16 gray image feature map by a four-level feature fusion module to obtain four-level data fusion features; the four-level data fusion characteristic is processed by a five-level residual error module and then is subjected to element addition with the 1/32 gray level image characteristic image to obtain the five-level data fusion characteristic which is used as a t-time multi-mode data fusion characteristicSign F t 。
4. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 3,
the method for obtaining the target depth prediction result at the time t in the decoding stage comprises the following steps:
fusion of t-moment multi-mode data to feature F t Five-stage characteristic state at time t-1 through five-stage LSTM moduleModeling the time sequence relationship to obtain five-level time sequence enhancement characteristic +.>And five-level characteristic memory state-> And->Five-level characteristic state of common composition t time>Five-level timing enhancement feature->Performing feature decoding through a five-stage deconvolution layer to obtain a five-stage feature decoding result; the decoding result of the five-level characteristic is processed by a four-level LSTM module and the four-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain four-level time sequence enhancement characteristic ++at the time t>And four-level characteristic memory state-> And->Four-level characteristic states which together form the time t>Four-level timing enhancement feature->Performing feature decoding through a four-level deconvolution layer to obtain a four-level feature decoding result; the four-level characteristic decoding result passes through a three-level LSTM module and the three-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain three-level time sequence enhancement characteristic +.>And three-level characteristic memory state-> And->Three-level characteristic state of common composition t time>Three-level timing enhancement feature->Performing feature decoding through the three-level deconvolution layer to obtain a three-level feature decoding result; the third-level characteristic decoding result passes through a second-level LSTM module and a second-level characteristic state at the time t-1>Modeling the time sequence relationship to obtain the secondary time sequence enhancement characteristic +.>And secondary characteristic memory state-> And->Second-order characteristic states which together form the time t>Second order timing enhancement feature->Performing feature decoding through a secondary deconvolution layer to obtain a secondary feature decoding result; the secondary characteristic decoding result passes through a primary LSTM module and the primary characteristic state at the time t-1>Modeling the time sequence relationship to obtain the first-level time sequence enhancement characteristic +.>And the first-level characteristic memory state-> And (3) withFirst order characteristic states which together form the time t>One level of timing enhancement feature->Performing feature decoding through the primary deconvolution layer to obtain a primary feature decoding result;
and combining the primary characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t.
5. The spacecraft depth filling method based on time sequence optical images and laser radar data according to claim 4, wherein the foreground segmentation module uses sparse depth image I s,t And target gray scale image I g,t The cascade image of (2) is used as input, the probability that each pixel in the cascade image belongs to a target is predicted through the coding-decoding structure with jump connection, and the pixel depth prediction result lower than the target threshold value is set to 0, so that the image after foreground segmentation is obtained.
6. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 1, wherein,
the simulation working conditions of the three-dimensional model of the set space target comprise three-dimensional model material parameter setting, texture map setting, illumination adding, earth background adding, starry sky background adding, and setting of relative positions and relative postures of the target and the observation platform and output node setting.
7. The spacecraft depth completing method based on time sequence optical images and laser radar data according to claim 1, wherein in the training process of a time sequence spacecraft depth completing neural network model, a network weight parameter gradient is calculated by utilizing an error between each target depth predicting result and a target dense depth ground label of an image, and network parameter updating is achieved.
8. The spacecraft depth complementing method based on the time-series optical image and the laser radar data, according to claim 7, is characterized in that after the network parameters of the time-series spacecraft depth complementing neural network model are updated for a set number of times, performance verification is carried out on the network model by using verification set sample data, and the network parameters corresponding to the optimal performance of the network model are used as the network parameters of the trained time-series spacecraft depth complementing neural network model.
9. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 1, wherein,
and simulating the space target three-dimensional model by adopting three-dimensional rendering software to obtain a target gray level image time sequence of the monocular visible light camera and a sparse depth image time sequence of the laser radar.
10. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 1, wherein,
and in the decoding stage, introducing priori knowledge of the spacecraft size, and filtering out pixel depths, wherein the deviation between the target depth prediction result and the laser radar average ranging result is larger than a preset deviation threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310393175.2A CN116402874A (en) | 2023-04-13 | 2023-04-13 | Spacecraft depth complementing method based on time sequence optical image and laser radar data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310393175.2A CN116402874A (en) | 2023-04-13 | 2023-04-13 | Spacecraft depth complementing method based on time sequence optical image and laser radar data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116402874A true CN116402874A (en) | 2023-07-07 |
Family
ID=87019671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310393175.2A Pending CN116402874A (en) | 2023-04-13 | 2023-04-13 | Spacecraft depth complementing method based on time sequence optical image and laser radar data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116402874A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117009750A (en) * | 2023-09-28 | 2023-11-07 | 北京宝隆泓瑞科技有限公司 | Methane concentration data complement method and device for machine learning |
-
2023
- 2023-04-13 CN CN202310393175.2A patent/CN116402874A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117009750A (en) * | 2023-09-28 | 2023-11-07 | 北京宝隆泓瑞科技有限公司 | Methane concentration data complement method and device for machine learning |
CN117009750B (en) * | 2023-09-28 | 2024-01-02 | 北京宝隆泓瑞科技有限公司 | Methane concentration data complement method and device for machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210551B (en) | Visual target tracking method based on adaptive subject sensitivity | |
Gu et al. | DenseLiDAR: A real-time pseudo dense depth guided depth completion network | |
CN110443842B (en) | Depth map prediction method based on visual angle fusion | |
Wang et al. | Fadnet: A fast and accurate network for disparity estimation | |
CN112529015B (en) | Three-dimensional point cloud processing method, device and equipment based on geometric unwrapping | |
Mahjourian et al. | Geometry-based next frame prediction from monocular video | |
Bešić et al. | Dynamic object removal and spatio-temporal RGB-D inpainting via geometry-aware adversarial learning | |
Maslov et al. | Online supervised attention-based recurrent depth estimation from monocular video | |
CN116797787B (en) | Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network | |
JP2024513596A (en) | Image processing method and apparatus and computer readable storage medium | |
Xiong et al. | Contextual Sa-attention convolutional LSTM for precipitation nowcasting: A spatiotemporal sequence forecasting view | |
CN110852199A (en) | Foreground extraction method based on double-frame coding and decoding model | |
CN116612468A (en) | Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism | |
CN116485867A (en) | Structured scene depth estimation method for automatic driving | |
CN116402874A (en) | Spacecraft depth complementing method based on time sequence optical image and laser radar data | |
CN116205962A (en) | Monocular depth estimation method and system based on complete context information | |
Knöbelreiter et al. | Self-supervised learning for stereo reconstruction on aerial images | |
Du et al. | Srh-net: Stacked recurrent hourglass network for stereo matching | |
CN115511759A (en) | Point cloud image depth completion method based on cascade feature interaction | |
CN117593702B (en) | Remote monitoring method, device, equipment and storage medium | |
CN115131418A (en) | Monocular depth estimation algorithm based on Transformer | |
Neupane et al. | Building footprint segmentation using transfer learning: a case study of the city of melbourne | |
CN117710429A (en) | Improved lightweight monocular depth estimation method integrating CNN and transducer | |
CN112950786A (en) | Vehicle three-dimensional reconstruction method based on neural network | |
CN110532868B (en) | Method for predicting free space semantic boundary |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |