CN116402874A

CN116402874A - Spacecraft depth complementing method based on time sequence optical image and laser radar data

Info

Publication number: CN116402874A
Application number: CN202310393175.2A
Authority: CN
Inventors: 汪洪源; 刘祥; 李子奇; 王梓健; 陈昱; 王秉文; 颜志强; 金后
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-07-07

Abstract

A spacecraft depth complement method based on time sequence optical images and laser radar data belongs to the technical field of space target three-dimensional structure perception. The method aims at solving the problem that the existing single-frame target depth completion method ignores time related information of continuous frames, so that target time sequence depth completion results are inconsistent. The method comprises the steps of obtaining a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar based on a space target three-dimensional model simulation, and generating a target dense depth ground label of an image; and training the time sequence spacecraft depth complement neural network model by using the target gray image time sequence and the sparse depth image time sequence as samples, obtaining the trained time sequence spacecraft depth complement neural network model, embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing the prediction of the dense depth of the space target based on real-time sensing data acquired by a monocular visible light camera and a laser radar. The method is used for depth completion of spacecraft detection.

Description

Spacecraft depth complementing method based on time sequence optical image and laser radar data

Technical Field

The invention relates to a spacecraft depth complement method based on time sequence optical images and laser radar data, and belongs to the technical field of space target three-dimensional structure perception.

Background

With the rapid development of space technology, the type of space on-orbit task for human gradually shows a diversified development trend. The three-dimensional structure of the spacecraft is perceived, and the point cloud data are acquired, so that a plurality of space on-orbit tasks are smoothly executed, such as fragment cleaning, on-orbit maintenance, intersection butt joint and the like.

Currently, the mainstream three-dimensional structure perception schemes of spatial targets can be mainly divided into a stereoscopic vision system, a time-of-flight (TOF) camera, and a combination of a monocular camera and a lidar. The stereoscopic vision system recovers the depth of the extracted characteristic points by using a triangulation principle, and has poor effect on a smooth surface or an object with repeated textures; in addition, the base line of the binocular camera greatly limits the working distance of the system, and the space on-orbit task requirement is difficult to meet. TOF cameras calculate the exact depth of a target by measuring the time delay between transmitting and receiving a laser pulse. Although accurate depth can be obtained at high density, the working distance of TOF cameras is typically less than 10m, which is limited by the in-orbit power limit, preventing their use in practical applications. The working distance between the monocular camera and the laser radar is far, and the dense depth of the spacecraft is recovered by utilizing the optical image and the sparse ranging information. Compared with a binocular system and a TOF camera, the combination of the monocular camera and the laser radar can effectively increase the working distance of the system, reduce the sensitivity to illumination conditions and materials, and is more suitable for being used in space practical application.

Since achieving dense depth restoration of targets based on monocular cameras and lidar has important applications in many scenarios, a large number of deep learning-based target depth complement algorithms have been proposed in recent years to meet various depth-based application requirements. Although the problem of target depth completion based on single-frame sensor data has made an important progress, the data to be processed in the actual on-orbit working process is sensor sequence data, and the existing single-frame target depth completion method ignores time related information of continuous frames, so that the target time sequence depth completion results are inconsistent. The present invention is therefore aimed at studying a method of spatial spacecraft depth complement based on sequence data.

Disclosure of Invention

Aiming at the problem that the existing single-frame target depth completion method ignores time-related information of continuous frames, so that target time sequence depth completion results are inconsistent, the invention provides a spacecraft depth completion method based on time sequence optical images and laser radar data.

The invention relates to a spacecraft depth complementing method based on time sequence optical images and laser radar data, which comprises the following steps of,

collecting a plurality of space target three-dimensional models, and setting simulation working conditions of the space target three-dimensional models and sensor parameters of a monocular visible light camera and a laser radar; obtaining a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar based on simulation of a plurality of space target three-dimensional models, and generating a target dense depth ground label of the image;

constructing a time sequence spacecraft depth complement neural network model: comprising a plurality of target depth prediction branches cascaded in chronological order;

each target depth prediction branch includes an encoding stage and a decoding stage:

the coding stage comprises a foreground segmentation module, a gray image feature extraction module, a morphological preprocessing module and a depth image feature extraction module; the decoding stage comprises an LSTM module and a deconvolution layer;

the prediction process for the spatial target depth at the time t comprises the following steps:

sparse depth image I at time t _s,t Preprocessing by a morphological preprocessing module to obtain a preprocessed depth image, and inputting the preprocessed depth image into a depth image feature extraction module;

target gray scale image I at time t _g,t Extracting multi-scale gray scale image feature images with different semantic levels and different resolutions through a gray scale image feature extraction module;

sparse depth image I for time t _s,t And target gray scale image I _g,t After cascade operation, inputting the image to a foreground segmentation module for target foreground segmentation to obtain a foreground segmentation image;

the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then performs feature fusion with a gray image feature image step by step according to the sequence from large scale to small scale to obtain a multi-mode data fusion feature F at the moment t _t ；

Multi-mode data fusion feature F at t moment _t Modeling a time sequence relation with the corresponding level characteristic state at the time t-1 step by step through an LSTM module, and performing characteristic decoding through a deconvolution layer; finally, combining the layer characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t;

training a sequential spacecraft depth complement neural network model by using a target gray image sequential sequence of a monocular visible light camera and a sparse depth image sequential sequence of a laser radar as sample images to obtain a trained sequential spacecraft depth complement neural network model;

and embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing prediction of dense depth of the space target based on real-time sensing data acquired by the monocular visible light camera and the laser radar.

According to the spacecraft depth complement method based on time sequence optical image and laser radar data, the target gray level image I _g,t Extracting gray image feature images with five sizes respectively as target gray images I by a gray image feature extraction module _g,t 1/2, 1/4, 1/8, 1/16 and 1/32.

According to the spacecraft depth complement method based on time sequence optical images and laser radar data, a depth image feature extraction module obtains t-moment multi-mode data fusion features F _t The process of (1) comprises:

the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then the preprocessed depth image is processed through the primary residual error module and fused with the 1/2 gray image feature image through the primary feature fusion module to obtain primary data fusion features; the primary data fusion features are processed by a secondary residual error module and then fused with the 1/4 gray image feature map by a secondary feature fusion module to obtain secondary data fusion features; the secondary data fusion features are processed by the tertiary residual error module and then fused with the 1/8 gray level image feature map by the tertiary feature fusion module to obtain tertiary data fusion features; the three-level data fusion features are processed by a four-level residual error module and then fused with the 1/16 gray image feature map by a four-level feature fusion module to obtain four-level data fusion features; the four-level data fusion feature is processed by a five-level residual error module and then is subjected to element addition with the 1/32 gray level image feature map to obtain the five-level data fusion feature which is used as a t-moment multi-mode data fusion feature F _t 。

According to the spacecraft depth complement method based on the time sequence optical image and the laser radar data, the method for obtaining the target depth prediction result at the time t in the decoding stage comprises the following steps:

fusion of t-moment multi-mode data to feature F _t Five-stage characteristic state at time t-1 through five-stage LSTM module

Modeling the time sequence relationship to obtain five times at the time tStage timing enhancement feature->

And five-level characteristic memory state->

And->

Five-level characteristic state of common composition t time>

Five-level timing enhancement feature->

Performing feature decoding through a five-stage deconvolution layer to obtain a five-stage feature decoding result; the five-level feature decoding result passes through a four-level LSTM module and a four-level feature state at the time t-1

Modeling the time sequence relationship to obtain four-level time sequence enhancement characteristic ++at the time t>

And four-level characteristic memory state->

And->

Four-level characteristic states which together form the time t>

Four-level timing enhancement feature->

Performing feature decoding through a four-level deconvolution layer to obtain a four-level feature decoding result; the four-level characteristic decoding result passes through a three-level LSTM module and the three-level characteristic state at the time t-1>

Modeling the time sequence relationship to obtain three-level time sequence enhancement characteristic +.>

And three-level characteristic memory state->

And (3) with

Three-level characteristic state of common composition t time>

Three-level timing enhancement feature->

Performing feature decoding through the three-level deconvolution layer to obtain a three-level feature decoding result; the third-level characteristic decoding result passes through a second-level LSTM module and a second-level characteristic state at the time t-1>

Modeling the time sequence relationship to obtain the secondary time sequence enhancement characteristic +.>

And a secondary feature memory state

And->

Second-order characteristic states which together form the time t>

Second order timing enhancement feature->

Performing feature decoding through a secondary deconvolution layer to obtain a secondary feature decoding result; the secondary characteristic decoding result passes through a primary LSTM module and the primary characteristic state at the time t-1>

Modeling the time sequence relationship to obtain the first-level time sequence enhancement characteristic +.>

And the first-level characteristic memory state->

And->

First order characteristic states which together form the time t>

One level of timing enhancement feature->

Performing feature decoding through the primary deconvolution layer to obtain a primary feature decoding result;

and combining the primary characteristic decoding result with the foreground segmentation image to obtain a target depth prediction result at the moment t.

According to the spacecraft depth complement method based on time sequence optical images and laser radar data, a foreground segmentation module uses sparse depth image I _s,t And target gray scale image I _g,t As input, predicting each pixel in the cascade image by means of an encoding-decoding structure with a skip connectionAnd setting the pixel depth prediction result lower than the target threshold value to 0 to obtain the image after foreground segmentation.

According to the spacecraft depth complement method based on the time sequence optical image and the laser radar data, the simulation working conditions of the set space target three-dimensional model comprise three-dimensional model material parameter setting, texture map setting, illumination adding, earth background adding, starry sky background adding, setting of relative positions and relative postures of a target and an observation platform and setting of output nodes.

According to the spacecraft depth complementing method based on the time sequence optical image and the laser radar data, in the training process of the time sequence spacecraft depth complementing neural network model, the network weight parameter gradient is calculated by utilizing the error between the target depth prediction result and the target dense depth ground label of the image each time, so that the network parameter updating is realized.

According to the spacecraft depth complementing method based on the time sequence optical image and the laser radar data, after the network parameters of the time sequence spacecraft depth complementing neural network model are updated for set times, performance verification is carried out on the network model by using verification set sample data, and the network parameters corresponding to the optimal performance of the network model are used as the network parameters of the time sequence spacecraft depth complementing neural network model after training.

According to the spacecraft depth complement method based on the time sequence optical image and the laser radar data, a three-dimensional rendering software simulation is adopted on a space target three-dimensional model to obtain a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of the laser radar.

According to the spacecraft depth complementing method based on the time sequence optical image and the laser radar data, in the decoding stage, spacecraft size priori knowledge is introduced, and pixel depths, of which the deviation between a target depth prediction result and a laser radar average ranging result is larger than a preset deviation threshold value, are filtered.

The invention has the beneficial effects that: the method fully utilizes the space-time coherence contained in continuous frame data through the LSTM module in the decoding stage, thereby obtaining the depth complement result with high precision and consistent time of the space target. Firstly, collecting a space target three-dimensional structure model, and realizing simulation of sensor imaging data and automatic generation of a depth label under different working conditions and different camera parameters by utilizing three-dimensional software, thereby constructing a space target time sequence depth complement data set; then model training and parameter updating are carried out on the deep complement neural network model of the sequential spacecraft; and finally, the test data is injected into the trained time sequence spacecraft depth completion neural network model, so that the completion and completion precision evaluation of the target depth data can be realized.

The method fully excavates the target information association relation between different frame image data, improves the accurate recovery of the dense depth of the space target, and obtains the space target depth prediction result with stable time sequence; the method can solve the problem that the inter-frame complement results are inconsistent due to the fact that the existing single-frame depth complement method ignores time sequence data related information, has the advantages of being small in memory occupation, high in accuracy, high in speed, consistent in time sequence prediction result and the like, and achieves accurate recovery of the space target three-dimensional fine structure.

According to the method, a cyclic neural network is introduced into a standard decoder structure, so that the network can sense the change of target time sequence characteristics; the cyclic neural network adopts a multi-level embedded form, so that the network can adapt to working condition scenes with different change rates, and the network is more robust. The recurrent neural network can accumulate past target feature knowledge, and network prediction results are more and more accurate along with the time.

Drawings

FIG. 1 is a schematic flow chart of a spacecraft depth completion method based on time-series optical images and laser radar data;

FIG. 2 is an overall framework diagram of a time series spacecraft depth complement neural network model; in the diagram, t-2 is taken as a subscript variable, and the corresponding characteristic state at the moment t-2 is taken as a variable;

FIG. 3 is a schematic flow diagram of a feature fusion module within the encoding stage depth image feature extraction module;

fig. 4 is a block diagram of a long and short term memory network (LSTM module).

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.

The invention provides a spacecraft depth complementing method based on time sequence optical images and laser radar data, which is shown in the accompanying figures 1 to 3,

collecting a plurality of space target three-dimensional models, and setting on-orbit imaging simulation working conditions of the space target three-dimensional models and sensor parameters of a monocular visible light camera and a laser radar; a target gray level image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar are obtained based on simulation of a plurality of space target three-dimensional models, and a target dense depth ground label of the image is generated, so that a space target time sequence depth complement data set is constructed;

constructing a time sequence spacecraft deep complement neural network model and carrying out network parameter initialization design and super-parameter setting; the network model comprises a plurality of target depth prediction branches which are cascaded in time sequence;

the coding stage comprises a foreground segmentation module, a gray image feature extraction module, a morphological preprocessing module and a depth image feature extraction module; the decoding stage comprises an LSTM module and a deconvolution layer; the depth image feature extraction module comprises four feature fusion modules;

and embedding the trained time sequence spacecraft depth complement neural network model into a satellite-borne platform, and realizing prediction of dense depth of a space target and recovery of a target three-dimensional structure based on real-time sensing data acquired by a monocular visible light camera and a laser radar.

The spatial target three-dimensional model in this embodiment may be obtained by collection purchase.

The time sequence spacecraft depth complement neural network model adopts an encoder-decoder structure as a network base structure to obtain an end-to-end trainable neural network model.

In target gray level image and sparse depth image feature coding, in order to prevent the damage of too sparse data to convolution operation, pseudo dense depth data is obtained by adopting a morphological preprocessing method and is used as input of a depth image feature extraction module.

The LSTM module in the decoding stage in this embodiment is one of Recurrent Neural Networks (RNNs), and is hierarchically embedded in a standard decoder structure to capture feature deviations between frames of feature maps of different levels, so as to fully utilize time correlation between adjacent frames to generate a dense depth map with stable time sequence.

In this embodiment, in order to filter the interference of the unrelated background on the depth completion of the spacecraft, a foreground segmentation network in a single-frame spacecraft depth completion network (SDCNet) is used to filter the sky background.

In the present embodiment, at the feature decoding stage, the LSTM module is used to fuse the multi-mode data at the time t-1 with the feature F _t And the characteristic state s at the current moment _t-1 Modeling time sequence relation and outputting time sequence enhancement characteristics

Timing enhancement feature->

Feature decoding is performed by the deconvolution layer.

Still further, as shown in fig. 2, the target gradation image I in the present embodiment _g,t Extracting gray image feature images with five sizes respectively as target gray images I by a gray image feature extraction module _g,t 1/2, 1/4, 1/8, 1/16 and 1/32.

Further, referring to fig. 2, the depth image feature extraction module obtains a multi-mode data fusion feature F at time t _t The process of (1) comprises:

the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then fuses the preprocessed depth image with the 1/2 gray image feature image after being processed by the primary residual error module by the primary feature fusion module to obtain a depth imageA stage data fusion feature; the primary data fusion features are processed by a secondary residual error module and then fused with the 1/4 gray image feature map by a secondary feature fusion module to obtain secondary data fusion features; the secondary data fusion features are processed by the tertiary residual error module and then fused with the 1/8 gray level image feature map by the tertiary feature fusion module to obtain tertiary data fusion features; the three-level data fusion features are processed by a four-level residual error module and then fused with the 1/16 gray image feature map by a four-level feature fusion module to obtain four-level data fusion features; the four-level data fusion feature is processed by a five-level residual error module and then is subjected to element addition with the 1/32 gray level image feature map to obtain the five-level data fusion feature which is used as a t-moment multi-mode data fusion feature F _t 。

Still further, as shown in fig. 2 and 3, the method for obtaining the target depth prediction result at the time t in the decoding stage includes:

Modeling the time sequence relationship to obtain five-level time sequence enhancement characteristic +.>

And five-level characteristic memory state->

And->

Five-level characteristic state of common composition t time>

Five-level timing enhancement feature->

Performing feature decoding through a five-stage deconvolution layer to obtain a five-stage feature decoding result; passing the five-level feature decoding result through four levelsLSTM module and four-level characteristic state at t-1 moment

And four-level characteristic memory state->

And->

Four-level characteristic states which together form the time t>

Four-level timing enhancement feature->

And three-level characteristic memory state->

And (3) with

Three-level characteristic state of common composition t time>

Three-level timing enhancement feature->

And a secondary feature memory state

And->

Second-order characteristic states which together form the time t>

Second order timing enhancement feature->

And the first-level characteristic memory state->

And->

First order characteristic states which together form the time t>

One level of timing enhancement feature->

In the present embodiment, the foreground segmentation module uses sparse depth image I _s,t And target gray scale image I _g,t The cascade image of (2) is used as input, the probability that each pixel in the cascade image belongs to a target is predicted through the coding-decoding structure with jump connection, and the pixel depth prediction result lower than the target threshold value is set to 0, so that the image after foreground segmentation is obtained.

As an example, the simulation conditions for setting the three-dimensional model of the spatial target include three-dimensional model material parameter setting, texture map setting, illumination adding, earth background adding, starry sky background adding, setting of relative positions and relative postures of the target and the observation platform, and setting of output nodes.

Still further, in the training process of the time sequence spacecraft depth complement neural network model, the network weight parameter gradient is calculated by utilizing the error between the target depth prediction result and the target dense depth ground label of the image each time, so that the network parameter updating is realized.

In this embodiment, after the network parameters of the time-series spacecraft depth-complement neural network model are updated for a set number of times, performance verification is performed on the network model by using verification set sample data, and the network parameters corresponding to the network model with optimal performance are stored as the network parameters of the trained time-series spacecraft depth-complement neural network model.

And taking the saved network parameters as final network weights, loading test data to the network to obtain a deep complement result, and comparing the deep complement result with a ground label to evaluate network accuracy.

As an example, a target gray image time sequence of a monocular visible light camera and a sparse depth image time sequence of a laser radar are obtained by adopting three-dimensional rendering software simulation to a space target three-dimensional model.

In the embodiment, in the decoding stage, spacecraft size priori knowledge is introduced, and pixel depths, where the deviation between a target depth prediction result and a laser radar average ranging result is greater than a preset deviation threshold, are filtered, so that a high-quality depth prediction result is obtained.

The following describes the working process of the deep complement neural network model of the time sequence spacecraft in detail:

referring to fig. 2, a target gray-scale image I is inputted at time t _g,t And sparse depth image I _s,t Different sensor data features are aggregated by adopting a multi-source deep image feature extraction module, so that a Gao Biaozheng-force multi-mode data fusion feature F is obtained _t Can be expressed as:

F _t ＝F _enc o _der ([I _g,t ,I _s,t ],θ _enc o _der )，

f in the formula _encoder For encoder feature extraction function, θ _encoder Representing the network parameters that the encoder structure needs to learn.

In combination with the illustration of fig. 3, the feature fusion module in the encoding stage realizes the fusion of the target gray image features and the depth image features based on the attention mechanism, and provides features with high characterization capability for the subsequent target depth decoding.

The feature fusion module mainly comprises a feature embedding layer, a cross channel fusion layer and a space attention layer; the following is a detailed description through the primary feature fusion module, and the corresponding operation is performed on the primary feature added with the upper right corner mark 1:

the primary characteristic fusion module uses the primary characteristic of the gray image

And depth image oneStage feature

(C represents the number of characteristic channels, H and W represent the length and width of the characteristic map respectively) as input, and obtaining primary data fusion characteristics>

The feature embedding layer encodes the feature graphs of different channels to generate corresponding feature vectors; the characteristic embedding layer respectively characterizes gray scale

And depth feature->

The method comprises the steps of decomposing the region into M mutually non-overlapping regions (each feature block is of a size of S multiplied by S), and extracting the region features by using a depth separable convolution operation with a convolution kernel of the size of S multiplied by S and a convolution step of S. In addition, pair->

And->

The regional blocks of the (2) are subjected to maximum pooling operation and average pooling operation to extract regional global features, and finally three features are cascaded to output gray level image primary embedded features +.>

First-level embedding of features with depth image->

Wherein d is _k ＝3×H×W/S ² 。

The cross channel attention layer takes the characteristic encoding result of the characteristic embedding layer as input, firstly adopts linear transformation to respectively calculate gray image primary embedded characteristic query vectors and depth image primary embedded characteristic key vectors of n attention heads, and can be specifically expressed as:

in the middle of

Linear mapping weight matrix for computing query vector and key vector, respectively, < >>

Respectively embedding a feature query vector and a depth image embedded feature key vector for the gray level image of the ith attention head; n is the number of attention headers.

The gray image feature and depth image feature channel association weight matrix of the ith attention head is further calculated by using the scaling dot product attention, and can be specifically expressed as:

w in _i The weights are associated for the feature channels of the ith attention head, softmax (·) is a normalization function.

Finally, according to the association weight matrix, the gray image primary characteristic is carried out

Primary feature of depth image F _s ¹ Channel fusion is performed. Specifically, the method can be expressed as:

h in _g 、h _s Respectively one-level characteristics of gray-scale images

First-level features of depth image->

Feature vectors spread in rows, w _i For the feature channel associated weight of the ith attention head, reshape (·) is a vector dimension transform operation, +.>

The first channel fusion feature for the ith attention head.

Finally, cascading the channel fusion characteristics obtained by calculation of n attention heads, and further realizing multi-head attention characteristic fusion by using convolution operation to obtain primary channel fusion characteristics

Specifically, the method can be expressed as:

wherein Conv (·) is a convolution operation, [ ·; and is a characteristic cascade operation.

Spatial attention layer with primary channel fusion features

First-order features of gray-scale images->

For input, features at different spatial positions are characterized by means of channel average pooling and channel maximum pooling operations, and a feature cascade is input to a convolution layer to obtain first-level spatial attention weights. Specifically, the method can be expressed as:

in the middle of

Is the first-level spatial attention weight; max (max) _c (·)、avg _c (. Cndot.) is the channel maximum pooling and average pooling operations, respectively; sigma (·) is a Sigmoid normalization function; conv (-) and [; carrying out]A convolution operation and a feature concatenation operation, respectively.

Based on the first-order spatial attention weight, the features at different spatial positions are weighted and summed to obtain a first-order fusion feature

Specifically, the method can be expressed as:

where +..

Still further, in the feature encoding stage, four levels of fused features

The five-level residual error module is used for processing and adding the five-level residual error module with 1/32 gray image characteristic image elements to obtain five-level data fusion characteristics serving as t-moment multi-mode data fusion characteristics F _t 。

In the decoding stage, a long and short-term memory network (LSTM module) is introduced before each deconvolution operation, as shown in fig. 4. The LSTM module is mainly composed of an input gate i _t Forgetting door f _t Output gate o _t The flow direction of different frame information is controlled, the internal data processing process is described in detail by taking a five-stage LSTM module as an example, and the corresponding variable is added with an upper right corner mark 5 to correspond to the following:

the same-level characteristic memory state of one frame above five-level LSTM module

The same level time sequence enhancement feature of the previous frame>

Current frame fusion feature F _t For input, output the characteristic memory state of the current frame +.>

And current frame timing enhancement features

And->

Characteristic states which together form the current frame +.>

Timing enhancement feature->

As input to a subsequent deconvolution operation.

And->

The calculation process of (1) can be expressed as:

wherein [. Cndot.; carrying out]Characterizing the feature cascade operation, # denotes a convolution operation, +. _f 、W _i 、W _o 、W _c 、b _f 、b _i 、b _o 、b _c And memorizing the parameters to be learned of the network for a long time. Conv in FIG. 4 represents a convolution operation, and tanh represents a hyperbolic tangent function.

In summary, the method of the invention introduces a cyclic neural network into the decoder structure, so that the network can mine the target time sequence characteristic change; the method comprises the steps of embedding a circulating neural network into the deconvolution operation of different levels in a hierarchical manner by adopting a multi-scale mechanism, so that the network can adapt to target characteristic changes under different movement speeds and different movement modes; and finally, the network prediction result is more stable through the time correlation of the depth mining time sequence data, and the depth complement precision can be continuously improved along with the time.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims

1. A spacecraft depth complementing method based on time sequence optical images and laser radar data is characterized by comprising the following steps of,

2. The spacecraft depth complementing method based on time-series optical images and laser radar data according to claim 1, wherein the target gray scale image I _g,t Extracting gray image feature images with five sizes respectively as target gray images I by a gray image feature extraction module _g,t 1/2, 1/4, 1/8, 1/16 and 1/32.

3. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 2, wherein,

the depth image feature extraction module obtains a t-moment multi-mode data fusion feature F _t The process of (1) comprises:

the depth image feature extraction module firstly processes the preprocessed depth image through a convolution layer, and then the preprocessed depth image is processed through the primary residual error module and fused with the 1/2 gray image feature image through the primary feature fusion module to obtain primary data fusion features; the primary data fusion features are processed by a secondary residual error module and then fused with the 1/4 gray image feature map by a secondary feature fusion module to obtain secondary data fusion features; the secondary data fusion features are processed by the tertiary residual error module and then fused with the 1/8 gray level image feature map by the tertiary feature fusion module to obtain tertiary data fusion features; the three-level data fusion features are processed by a four-level residual error module and then fused with the 1/16 gray image feature map by a four-level feature fusion module to obtain four-level data fusion features; the four-level data fusion characteristic is processed by a five-level residual error module and then is subjected to element addition with the 1/32 gray level image characteristic image to obtain the five-level data fusion characteristic which is used as a t-time multi-mode data fusion characteristicSign F _t 。

4. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 3,

the method for obtaining the target depth prediction result at the time t in the decoding stage comprises the following steps:

And five-level characteristic memory state->

And->

Five-level characteristic state of common composition t time>

Five-level timing enhancement feature->

Performing feature decoding through a five-stage deconvolution layer to obtain a five-stage feature decoding result; the decoding result of the five-level characteristic is processed by a four-level LSTM module and the four-level characteristic state at the time t-1>

And four-level characteristic memory state->

And->

Four-level characteristic states which together form the time t>

Four-level timing enhancement feature->

And three-level characteristic memory state->

And->

Three-level characteristic state of common composition t time>

Three-level timing enhancement feature->

And secondary characteristic memory state->

And->

Second-order characteristic states which together form the time t>

Second order timing enhancement feature->

And the first-level characteristic memory state->

And (3) with

First order characteristic states which together form the time t>

One level of timing enhancement feature->

5. The spacecraft depth filling method based on time sequence optical images and laser radar data according to claim 4, wherein the foreground segmentation module uses sparse depth image I _s,t And target gray scale image I _g,t The cascade image of (2) is used as input, the probability that each pixel in the cascade image belongs to a target is predicted through the coding-decoding structure with jump connection, and the pixel depth prediction result lower than the target threshold value is set to 0, so that the image after foreground segmentation is obtained.

6. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 1, wherein,

the simulation working conditions of the three-dimensional model of the set space target comprise three-dimensional model material parameter setting, texture map setting, illumination adding, earth background adding, starry sky background adding, and setting of relative positions and relative postures of the target and the observation platform and output node setting.

7. The spacecraft depth completing method based on time sequence optical images and laser radar data according to claim 1, wherein in the training process of a time sequence spacecraft depth completing neural network model, a network weight parameter gradient is calculated by utilizing an error between each target depth predicting result and a target dense depth ground label of an image, and network parameter updating is achieved.

8. The spacecraft depth complementing method based on the time-series optical image and the laser radar data, according to claim 7, is characterized in that after the network parameters of the time-series spacecraft depth complementing neural network model are updated for a set number of times, performance verification is carried out on the network model by using verification set sample data, and the network parameters corresponding to the optimal performance of the network model are used as the network parameters of the trained time-series spacecraft depth complementing neural network model.

9. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 1, wherein,

and simulating the space target three-dimensional model by adopting three-dimensional rendering software to obtain a target gray level image time sequence of the monocular visible light camera and a sparse depth image time sequence of the laser radar.

10. The spacecraft depth filling method based on time-series optical images and lidar data according to claim 1, wherein,

and in the decoding stage, introducing priori knowledge of the spacecraft size, and filtering out pixel depths, wherein the deviation between the target depth prediction result and the laser radar average ranging result is larger than a preset deviation threshold value.