CN111598951A

CN111598951A - Method, device and storage medium for identifying space target

Info

Publication number: CN111598951A
Application number: CN202010417159.9A
Authority: CN
Inventors: 张涛; 李少朋; 李林泽
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-08-28
Anticipated expiration: 2040-05-18
Also published as: CN111598951B

Abstract

The application discloses a method, a device and a storage medium for identifying a space target, and particularly relates to receiving a first image frame containing the space target moving in real time, inputting the first image frame into a pre-trained pose identification model, generating a pose relation between the space target and an image acquisition device, operating the image acquisition device to be close to the space target according to the pose relation, acquiring an image frame to be identified containing a characteristic part of the space target to be identified, inputting the image frame to be identified into the pre-trained characteristic part identification model, and generating characteristic part information to be identified in the image frame to be identified through the characteristic part identification model. The method and the device for recognizing the spatial target have the advantages that the pose recognition model and the characteristic part recognition model are applied to pose recognition and characteristic part detection of the spatial target, the universality is better for different targets and scenes, and the accuracy and the real-time performance of the method and the device meet the space target perception requirements.

Description

Method, device and storage medium for identifying space target

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a method, an apparatus, and a storage medium for identifying a spatial target.

Background

With the continuous development of space technology, space robots become a research hotspot. Space robots are generally referred to as service satellites equipped with robotic arms. Due to the smart operability of the mechanical arm, the capability of the service satellite for completing various complex space tasks can be improved, such as rotation elimination, capturing, butt joint, fine operation and the like. The planning and control of space robots in various tasks therefore become a hot issue of research. Space robots have complex dynamics, belong to incomplete systems when floating freely, and the planning and control problem is more complex compared with ground robots.

With the progress of artificial intelligence in recent years, the application of artificial intelligence in space robot tasks has become a new research hotspot. The artificial intelligence can improve the autonomy of the space robot to a certain extent, is an important ring for the space robot to transit from the planning and control of the space robot in a loop, such as teleoperation, to the autonomous planning and control without human participation, and provides a theoretical basis for the research of the space intelligence. The occurrence of the deep learning further improves the processing capability of the reinforcement learning, namely the deep reinforcement learning, and the deep reinforcement learning adopts the deep learning to extract experience obtained by interaction with the environment, namely the characteristics of the sample, under the optimization framework of the reinforcement learning, so that the characterization capability of the reinforcement learning is greatly improved, and the reinforcement learning has better application prospect in the planning and control problem of the robot. However, in the field of robot planning and control, reinforcement learning faces the problems of high computational complexity, low sample utilization rate, high sample acquisition difficulty, sparse or difficult design of reward functions, low model estimation error, low real-time performance and accuracy and the like, and has type requirements on applicable space robots and no universality.

Disclosure of Invention

The embodiment of the application provides a method for identifying a spatial target, which overcomes the problems of poor universality, weak real-time performance and low accuracy rate of the traditional perception method for identifying the spatial target in a spatial complex environment, and improves the universality, the real-time performance and the accuracy of identifying the spatial target in real time.

The method comprises the following steps:

receiving a first image frame containing a real-time moving spatial target;

inputting the first image frame into a pre-trained pose recognition model and generating a pose relation between the space target and an image acquisition device, wherein the pose recognition model is used for representing the pose relation between the space target and the image acquisition device;

according to the pose relation, operating the image set equipment to be close to the space target, and acquiring an image frame to be identified containing the characteristic part to be identified of the space target;

inputting the image frame to be recognized into a pre-trained characteristic part recognition model, and generating the characteristic part information to be recognized of the space target contained in the image frame to be recognized through the characteristic part recognition model.

Optionally, the method further comprises a training step of the pose recognition model:

acquiring a first sample image frame containing the space target moving in real time, and taking a pose relation between the space target and image acquisition equipment when the first sample image frame is acquired as first label information;

inputting the first sample image frame and the first label information into the pose recognition model to be trained, and optimizing the pose recognition model based on a first loss function generated during training, wherein the pose recognition model is a regression network model with a softmax layer as an affine regression layer.

Optionally, the method further comprises a training step of the feature recognition model:

acquiring a second sample image frame containing the identified part of the space target, and taking a binary mask image corresponding to the identified part, category information corresponding to the identified part and coordinate information of a detection frame where the identified part is located, which are contained in the second sample image frame, as second label information;

and inputting the second sample image frame and the second label information into the feature part recognition model to be trained, and optimizing the feature part recognition model based on a second loss function generated during training.

Optionally, collecting image frames of a space target model generated in a virtual environment in a simulated manner, coloring the identified part on the space target model, and respectively generating a first two-dimensional image frame projected by a coloring model corresponding to the identified part at different angles and a second two-dimensional image frame projected by an original model of the identified part before coloring;

generating a two-dimensional mask image corresponding to the identified part, category information corresponding to the identified part and coordinate information of a detection frame where the identified part is located based on the first two-dimensional image frame;

and taking the two-dimensional mask image corresponding to the identified part, the category information corresponding to the identified part and the coordinate information of the detection frame where the identified part is located as the second label information, and taking the second two-dimensional image frame as the second sample image frame.

Optionally, the second loss function is generated based on a classification layer, a regression layer and a binary mask image extraction layer which are included in a softmax layer in the feature recognition model, and based on a classification loss function generated by the classification layer, a regression loss function generated by the regression layer and a mask loss function generated by the binary mask image extraction layer;

and optimizing the characteristic part recognition model according to the second loss function.

Optionally, extracting image features in the image frame to be recognized through a convolutional neural network in the feature part recognition model, and generating an image feature map corresponding to the image frame to be recognized;

extracting coordinate information of a detection frame where the characteristic part to be identified is located in the image characteristic diagram through an area generation network in the characteristic part identification model;

correcting the coordinate information of the detection frame where the characteristic part to be recognized is located through a full-connection layer network in the characteristic part recognition model, and determining the category information corresponding to the characteristic part to be recognized;

determining the two-system mask image corresponding to the characteristic part to be identified for the characteristic part to be identified through a full convolution layer network in the characteristic part identification model;

and determining the coordinate information of the detection frame where the characteristic part to be recognized is located, the category information corresponding to the characteristic part to be recognized and the binary mask image corresponding to the characteristic part to be recognized as the characteristic part information to be recognized.

In another embodiment of the present invention, there is provided an apparatus for identifying a spatial target, the apparatus including:

a receiving module for receiving a first image frame containing a real-time moving spatial target;

a first generation module, configured to input the first image frame into a pre-trained pose recognition model, and generate a pose relationship between the spatial target and an image acquisition device, where the pose recognition model is used to represent the pose relationship between the spatial target and the image acquisition device;

the acquisition module is used for operating the image set equipment to be close to the space target according to the pose relation and acquiring an image frame to be identified containing a characteristic part to be identified of the space target;

and the second generation module is used for inputting the image frame to be recognized into a pre-trained characteristic part recognition model and generating the characteristic part information to be recognized of the space target contained in the image frame to be recognized through the characteristic part recognition model.

Optionally, the first training module comprises:

the acquisition unit is used for acquiring a first sample image frame containing the space target moving in real time, and taking the pose relation between the space target and the image acquisition equipment when the first sample image frame is acquired as first label information;

and the optimizing unit is used for inputting the first sample image frame and the first label information into the pose recognition model to be trained, and optimizing the pose recognition model based on a first loss function generated in the training process, wherein the pose recognition model is a regression network model of which the softmax layer is an affine regression layer.

In another embodiment of the invention, a non-transitory computer readable storage medium is provided, storing instructions that, when executed by a processor, cause the processor to perform the steps of a method of identifying a spatial target as described above.

In another embodiment of the present invention, a terminal device is provided, which includes a processor for executing the steps of the method for identifying a spatial target.

Based on the embodiment, firstly, a first image frame containing a real-time moving space target is received, secondly, the first image frame is input into a pre-trained pose recognition model, and a pose relation between the space target and an image acquisition device is generated, wherein the pose recognition model is used for representing the pose relation between the space target and the image acquisition device, furthermore, according to the pose relation, an image set device is operated to be close to the space target, and an image frame to be recognized containing a characteristic part to be recognized of the space target is obtained, and finally, the image frame to be recognized is input into the pre-trained characteristic part recognition model, and characteristic part information to be recognized of the space target contained in the image frame to be recognized is generated through the characteristic part recognition model. The method and the device for recognizing the spatial target feature position are applied to pose recognition and feature position detection of the spatial target through the pose recognition model and the feature position recognition model, compared with the traditional perception method, the method and the device for recognizing the spatial target feature position have better universality aiming at different targets and scenes, and the accuracy and the real-time performance of the method and the device for recognizing the spatial target feature position meet the perception requirements of the spatial target.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flow chart illustrating a method for identifying a spatial target according to an embodiment 100 of the present application;

fig. 2 is a schematic diagram illustrating a specific flow of a method for identifying a spatial target according to an embodiment 200 of the present application;

FIG. 3 is a schematic diagram illustrating a training step of a pose recognition model provided by an embodiment 300 of the present application;

FIG. 4a is a schematic diagram of a spatial object model provided by an embodiment of the present application;

FIG. 4b is a schematic diagram illustrating image frames of a corresponding spatial target at various viewing angles according to an embodiment of the present application;

FIG. 4c shows a schematic diagram of a second sample image frame and second label information provided for embodiments of the present application;

FIG. 4d is a model diagram of a feature recognition model provided for embodiments of the present application

Fig. 5 is a schematic diagram illustrating an apparatus for identifying a spatial target according to an embodiment 500 of the present application;

fig. 6 shows a schematic diagram of a terminal device provided in embodiment 600 of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.

With the development of space robot technology, the remote operation of the space manipulator under the assistance of the sensing system gradually replaces manual operation, but the remote operation is limited by large time delay of sky and ground information transmission, the applicable scene of the on-orbit remote operation is limited, and the stability is low. With the continuous breakthrough of the space on-orbit service technology, the autonomous operation control technology for the space on-orbit service becomes a new growth point for the development of the aerospace technology. The detection and identification of the space target are the precondition of space on-orbit service operation, so the target feature detection and identification technology is a key technology in autonomous operation control for the space on-orbit service.

Based on the problems in the prior art, the embodiment of the application provides a method for identifying a space target, which is mainly applicable to the technical field of computer vision. The detection and identification of the space target are the precondition of space on-orbit service operation, so the target feature detection and identification technology is a key technology in autonomous operation control for the space on-orbit service. The method has the advantages of being good in generalization capability aiming at the characteristics that target characteristics are unknown and target state changes rapidly of non-cooperative targets (the spatial targets are unknown in form, cannot transmit relative motion state information to a service spacecraft, and do not have corresponding cooperative identifications or known mechanical arm holding devices), better in adaptability aiming at different targets and scenes, and capable of meeting the requirements of spatial target detection and identification in the aspects of accuracy and real-time performance. The technical solution of the present invention is described in detail below with specific embodiments to implement a method for identifying a spatial target. Several of the following embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Fig. 1 is a schematic flowchart of a method for identifying a spatial target according to an embodiment 100 of the present application. The detailed steps are as follows:

s11, a first image frame containing a real-time moving spatial object is received.

In this step, the received image frames are acquired by an image acquisition device. In particular, the image acquisition device may be a camera, a video camera, a Virtual Reality (VR) device or a visual sensor. The visual sensor is simple in structure, rich in applicable scenes, low in price and strong in real-time performance, and becomes the most main sensor of the space manipulator, and furthermore, the image acquisition equipment in the embodiment of the application mainly acquires space targets moving in real time in an orbit in a complex and special space environment such as a space environment, such as satellite equipment. In addition, the space manipulator can be used for controlling the image acquisition equipment to acquire the space target moving in real time. At the same time, the processing system receives a first image frame.

And S12, inputting the first image frame into a pre-trained pose recognition model and generating a pose relation between the space target and the image acquisition equipment, wherein the pose recognition model is used for representing the pose relation between the space target and the image acquisition equipment.

In the step, firstly, through supervised learning, a regression network model is trained by a first sample image frame acquired in advance and first label information representing the pose relationship between a space target and image acquisition equipment, and a pose recognition model is generated. Further, based on a pose recognition model trained in advance, a pose relationship between the space target and the image acquisition device is generated. The pose relationship between the space target and the image acquisition equipment comprises a pose vector representing the translation relationship and the rotation relationship between the image acquisition equipment and the space target.

And S13, operating the image set equipment to approach the space target according to the pose relation, and acquiring the image frame to be recognized containing the characteristic part to be recognized of the space target.

In the step, based on the obtained translation relation and rotation relation in the pose vector, the space manipulator is operated to control the image acquisition equipment to approach the space target, and the image acquisition equipment is used for acquiring the image frame to be identified, including the characteristic part to be identified of the space target.

And S14, inputting the image frame to be recognized into a pre-trained characteristic part recognition model, and generating the characteristic part information to be recognized of the space target contained in the image frame to be recognized through the characteristic part recognition model.

In this step, feature parts in the spatial target in the image frame to be recognized are predicted based on a second sample image frame of the recognized part of the spatial target acquired in the virtual environment and a feature part recognition model trained by second label information including a binary mask image corresponding to the recognized part, category information corresponding to the recognized part, and coordinate information of a detection frame where the recognized part is located, so as to generate feature part information to be recognized. The information of the characteristic part to be recognized comprises a two-system mask image corresponding to the characteristic part to be recognized of the recognized space target, category information and coordinate information of a detection frame where the two-system mask image is located.

As described above, according to the above embodiment, first, a first image frame containing a space target moving in real time is received, then, the first image frame is input into a pose recognition model trained in advance, and a pose relationship between the space target and an image capturing device is generated, where the pose recognition model is used to represent the pose relationship between the space target and the image capturing device, further, according to the pose relationship, an image set device is operated to approach the space target, and an image frame to be recognized containing a feature to be recognized of the space target is acquired, and finally, the image frame to be recognized is input into the feature recognition model trained in advance, and feature information to be recognized of the space target contained in the image frame to be recognized is generated by the feature recognition model. The method and the device for recognizing the spatial target feature position are applied to pose recognition and feature position detection of the spatial target through the pose recognition model and the feature position recognition model, compared with the traditional perception method, the method and the device for recognizing the spatial target feature position have better universality aiming at different targets and scenes, and the accuracy and the real-time performance of the method and the device for recognizing the spatial target feature position meet the perception requirements of the spatial target.

Fig. 2 is a schematic diagram illustrating a specific flow of a method for identifying a spatial target according to an embodiment 200 of the present application. The application scenario of the embodiment of the application is mainly a scenario for identifying a space target to perform space on-orbit service. The detailed process of the specific flow is as follows:

s201, receiving a first image frame containing a space target moving in real time.

S202, inputting the first image frame into a pre-trained pose recognition model, and generating a pose relation between the space target and the image acquisition equipment.

Here, specific steps of training the pose recognition model are described in embodiment 300 of the present application.

And S203, operating the image set equipment to approach the space target according to the pose relation.

Here, the processing system operates the space manipulator to bring an image capture device carried by the space manipulator into proximity with the space object based on the pose vector in the pose relationship.

And S204, acquiring an image frame to be recognized containing the characteristic part to be recognized of the space target.

Here, the feature to be identified is a specific part constituting a space object, such as a solar panel of a satellite or the like. Furthermore, the image frames to be identified are also acquired by the image acquisition equipment controlled by the space manipulator and are sent to the processing system.

S205, building a space target model in the virtual environment.

Here, a large amount of training data is the basis of the deep learning method, but the construction of the sample data set is difficult for the spatial target, and the main reasons are: the images of the space targets are difficult to obtain, and for the same target, when the scale, the posture and the working condition scene (such as shooting angle, distance, illumination and the like) are different, the characteristics are changed, and the data set needs to contain the images of the same target under different scales, postures and working conditions, so that the images are difficult to collect; for a large amount of sample data, the labeling process is complex, especially for a target segmentation task, if a traditional labeling tool is adopted for labeling, the outline of the capturing part needs to be encircled, the number of labeling points is large, and time and energy are extremely wasted.

In order to solve the above problem, the embodiment of the present application proposes to establish a spatial target model by using 3dsMax software, and add textures to each main feature of the spatial target model by using 3 dsMax. Fig. 4a is a schematic diagram of a spatial object model provided in an embodiment of the present application. Further, after the space target model is built and converted into the FBX format, the space target model can be imported into the Unity to set environmental conditions, compile scripts to simulate a motion state and acquire images. The Unity platform takes the imported object file as a GameObject object, and can write C # script to control the motion state of the object file. The Unity can set the lighting conditions and place the image frames captured by the image capture device. And obtaining an image frame of the corresponding space target model under the visual angle according to the set pose relation of the image acquisition equipment and the space target model, and using the image frame as a data sample source.

And S206, labeling the samples acquired in the space target model.

In this step, after the image frame is collected from the space target model generated in the virtual environment, the sample annotation of the image frame is performed. The specific process of sample labeling is as follows: and coloring the identified part on the space target model, and respectively generating a first two-dimensional image frame projected by a coloring model corresponding to the identified part under different angles and a second two-dimensional image frame projected by an original model before coloring the identified part. And generating a two-dimensional mask image corresponding to the identified part, category information corresponding to the identified part and coordinate information of a detection frame where the identified part is located based on the first two-dimensional image frame.

Specifically, a 3dsMax is adopted to introduce a space target model, add textures, then access the space target model into a virtual environment, set the position and motion rule of virtual image acquisition equipment, set the position and motion rule of the space target, record coordinates according to the set image acquisition equipment and the space target position, and obtain an image frame of the corresponding space target under the view angle as a sample source. Fig. 4b is a schematic diagram of image frames of the corresponding spatial object at various viewing angles according to the embodiment of the present application.

Further, the labeling process of the sample is complex, if the traditional labeling tool is used for labeling, the outline of the capturing part needs to be encircled, the number of labeling points is large, and time and energy are consumed extremely. The method and the device for processing the image data in the space target model have the advantages that the virtual environment is utilized to color the captured recognized part of the space target model, and then the first two-dimensional image frame projected by the color model corresponding to the recognized part under each angle and the second two-dimensional image frame projected by the original model before the recognized part is not colored are generated at the same time.

S207, a second sample image frame and second label information are generated.

In this step, the binary mask image corresponding to the identified portion, the category information corresponding to the identified portion, and the coordinate information of the detection frame in which the identified portion is located are used as second label information, and the second two-dimensional image frame is used as a second sample image frame. Specifically, according to the coloring model, information such as a mask of the identified part is extracted by using a traditional image filtering and a method such as corrosion, expansion, filling and the like, so that label information of the identified part which can be captured in the coloring model is generated. Further, a two-system mask image of the captured identified part, category information and coordinate information of the located detection frame are generated through the coloring model to serve as second label information of the identified part in the original model before coloring. Fig. 4c is a schematic diagram of a second sample image frame and second tag information according to an embodiment of the present application. Wherein, the category information is output in a text mode.

And S208, constructing a characteristic part recognition model to be trained.

Here, the deep learning algorithms that can be used for target detection can be divided into two broad categories: (1) the two-stage detection algorithm divides the detection problem into two stages, firstly generates candidate regions (region poppesals), then classifies the candidate regions (position refinement is generally needed), and typical representatives of the algorithms are R-CNN system algorithms based on the region poppesals, such as R-CNN, Fast R-CNN and Fast R-CNN, and Mask-RCNN capable of completing pixel-level detection and segmentation; (2) the One-stage detection algorithm does not need a region pro-stage, directly generates the class probability and the position coordinate of an object by using a regression method, and compares typical algorithms such as YOLO and SSD. Further, the embodiment of the application uses an improved Mask R-CNN structure, and the detection speed is improved by optimizing a network head. The "header" of the network here refers to the second stage of the two-step process, i.e., the detected recognition portion after the candidate region is generated.

Fig. 4d is a schematic model diagram of the feature recognition model according to the embodiment of the present application. The method comprises the steps that an input image frame extracts a Feature Map (Feature Map) through a Convolutional Neural Network (CNN), a candidate frame of a Feature part to be identified is generated through a region nomination network (RPN), the Feature Map with the candidate frame is converted into a full connection Layer (FC Layer) and a full convolution Layer (FCN) with fixed dimensionality and sent to the tail end of a network, the full connection Layer outputs the accurate position of the Feature part and classifies the Feature part, the full convolution Layer judges the category of each pixel, and Feature part segmentation is completed. Further, a basic convolutional neural network (backhaul) for feature extraction is a deep residual error network (ResNet), and is composed of a most basic convolutional layer, a pooling layer and other neural network structures.

S209, inputting the second sample image frame and the second label information into the feature part recognition model to be trained, and generating a corresponding second loss function.

Here, the acquired second sample image frame including the recognized part of the space object, and the second label information including the binary mask image corresponding to the recognized part included in the second sample image frame, the category information corresponding to the recognized part, and the coordinate information of the detection frame in which the recognized part is located are displayedAnd inputting the information into a characteristic part recognition model to be trained. Specifically, a softmax layer in the feature recognition model is modified into a classification layer, a regression layer and a binary mask image extraction layer. Classification loss function L generated based on classification layer_clsRegression loss function L generated by regression layer_regAnd a mask loss function L generated by the binary mask image extraction layer_maskGenerating a second Loss function Loss:

Loss＝L_cls+L_reg+L_mask

further, classification loss needs to consider two processes, one is that the RPN extracts a target region, and the other is that the target region category is judged, which is the cross entropy loss of the two-classification and multi-classification problems, respectively, and the cross entropy loss function is:

where N is the number of samples, K is the number of labels for the class information (K class objects plus background have K +1 labels), is the true class label, and the probability when the sample i label is K is p_i,k。

The regression loss function considers two processes together, the regression loss function being:

t_ithe pan zoom parameter (relative to anchor) representing the real detection box,

representing a predicted panning scaling parameter, p_iForeground or background is identified and no penalty is calculated when it is background (i.e. no object is detected).

The mask loss is average binary cross entropy loss, an interest Region (ROI) extracted by RPN is input, K binary masks with the coding resolution of m are output, namely each of K categories corresponds to one binary mask, a sigmoid function is used for each pixel, and inter-category competition is avoided.

And S210, optimizing the characteristic part recognition model based on the second loss function generated in the training process, and generating the optimized characteristic part recognition model.

S211, inputting the image frame to be recognized into the feature part recognition model, and generating the feature part information to be recognized of the space target contained in the image frame to be recognized.

Extracting image features in the image frame to be recognized through a convolutional neural network in the feature part recognition model, and generating an image feature map corresponding to the image frame to be recognized; extracting coordinate information of a detection frame where the characteristic part to be identified is located in the image characteristic diagram through an area generation network in the characteristic part identification model; correcting coordinate information of a detection frame where the characteristic part to be recognized is located through a full-connection layer network in the characteristic part recognition model, and determining category information corresponding to the characteristic part to be recognized; determining a two-system mask image corresponding to the characteristic part to be recognized through a full convolution layer network in the characteristic part recognition model; and determining the coordinate information of the detection frame where the characteristic part to be recognized is located, the category information corresponding to the characteristic part to be recognized and the two-system mask image corresponding to the characteristic part to be recognized as the characteristic part information to be recognized.

Specifically, the important feature detection of the spatial target needs to perform pixel-level segmentation of the feature part, so that the function is more suitable to be expanded by adopting a Region-Convolutional Neural Networks (R-CNN) algorithm. The general idea is as follows: firstly, extracting image features by using a backbone Network (convolutional neural Network (CNN)), then extracting coordinate information of a detection frame (generating the detection frame by using a region-generated Network (RPN)) from an image feature map, finally classifying and correcting the detection frame by using a full-connected layer (full-connected layer) of a head Network, and predicting a binary mask image (judging whether each pixel is a foreground or a background) by using the full-convolutional layer of the head Network.

The embodiment of the application is based on a deep learning method, a deep neural network with characteristic part perception and target pose recognition is designed and constructed respectively, and visual information is fused through the deep neural network to complete space target intelligent pose recognition and important characteristic detection. On the basis of designing a characteristic part recognition model, aiming at the problem that large-scale data samples are needed for deep learning and sample data of a space target is difficult to collect, a virtual environment is set up; aiming at the problem that the traditional labeling method consumes time and energy, the mask of the part to be identified is extracted by means of coloring the characteristic part of the model, filtering the traditional image and the like, so that the automatic labeling of the sample is completed. The method has better universality aiming at different targets and scenes, and the accuracy and the real-time performance of the method also meet the space target perception requirement.

Fig. 3 is a schematic diagram illustrating a training procedure of a pose recognition model provided in embodiment 300 of the present application. The detailed process of the specific flow is as follows:

s301, a first image frame containing a real-time moving space target is acquired.

S302, recording the pose relationship between the space target and the image acquisition equipment at the current moment when the first graphic frame is acquired, and taking the pose relationship as first label information.

Wherein the pose relationship comprises a pose vector representing a translational relationship and a rotational relationship between the image capture device and the spatial target. The pose vector contains three pieces of position information x (spatial three-dimensional coordinates) of the image capturing apparatus and a direction q (four rotation directions) represented by a quaternion. Further, after a first sample image frame containing a real-time moving space target is acquired, the pose relationship between the space target and the image acquisition device when the first sample image frame is acquired is used as first tag information.

S303, inputting the first image frame into a pose recognition model to be trained, and generating a first loss function corresponding to the pose recognition model based on an output result.

Here, the network used for supervised learning is GooLeNet, and the pose recognition model to be trained is a regression network model with a softmax layer as an affine regression layer. Specifically, the regression network model modifies the softmax layer of the network into an affine regression layer, outputs pose vectors (7 dimensions, 3 positions and 4 directions) included in the pose relationship, and outputs parameters of the pose vectors, namely the position x of the image acquisition device and the direction q represented by a quaternion. The first loss function for model training is:

wherein

For practical pose relationship, β is a scale factor.

S304, optimizing the pose recognition model based on the first loss function generated in the training process, and generating a final pose recognition model.

The method for identifying the space target is realized based on the steps. By applying the intelligent method to the feature detection and pose identification of the space target, the problems of poor universality, poor real-time performance and low accuracy of a non-cooperative target faced by a traditional perception method in a space complex environment are solved. Further, aiming at the problem that the data samples of the space target are difficult to collect and label, a method for building a virtual environment to generate a large amount of sample data and carrying out batch automatic labeling by using a traditional image processing means is provided. Meanwhile, aiming at an actual task scene, the establishment of a virtual environment is similar to the process, in order to fully learn the characteristics of the target to be identified, more types of space aircraft models can be introduced, and a more complete space target data set is manufactured. Or a special recognition network is trained aiming at a specific target to be recognized, and the network parameters are uploaded to the on-satellite equipment, so that the target feature detection and recognition tasks can be flexibly realized. In the feature recognition model for important feature detection, besides target positioning and classification, instance segmentation is introduced to facilitate subsequent further operation on features.

Based on the same inventive concept, the embodiment 500 of the present application further provides an apparatus for identifying a spatial object, wherein as shown in fig. 5, the apparatus includes:

a receiving module 51 for receiving a first image frame containing a real-time moving spatial object;

a first generating module 52, configured to input the first image frame into a pre-trained pose recognition model, and generate a pose relationship between the spatial target and the image capturing device, where the pose recognition model is used to represent the pose relationship between the spatial target and the image capturing device;

the obtaining module 53 is configured to operate the image set device to approach the spatial target according to the pose relationship, and obtain an image frame to be recognized including a feature part to be recognized of the spatial target;

and a second generating module 54, configured to input the image frame to be recognized into a feature recognition model trained in advance, and generate feature information to be recognized of the spatial target included in the image frame to be recognized through the feature recognition model.

In this embodiment, specific functions and interaction manners of the receiving module 51, the first generating module 52, the obtaining module 53 and the second generating module 54 may refer to the description of the embodiment corresponding to fig. 1, and are not described herein again.

Optionally, the apparatus further comprises a first training module 55:

the acquisition unit is used for acquiring a first sample image frame containing the real-time moving space target and taking the pose relation between the space target and the image acquisition equipment when the first sample image frame is acquired as first label information;

and the optimizing unit is used for inputting the first sample image frame and the first label information into a pose recognition model to be trained, and optimizing the pose recognition model based on a first loss function generated during training, wherein the pose recognition model is a regression network model with a softmax layer as an affine regression layer.

As shown in fig. 6, another embodiment 600 of the present application further provides a terminal device, which includes a processor 601, where the processor 601 is configured to execute the steps of the method for identifying a spatial target. As can also be seen from fig. 6, the terminal device provided by the above embodiment further comprises a non-transitory computer readable storage medium 602, the non-transitory computer readable storage medium 602 having stored thereon a computer program, which when executed by the processor 601, performs the above steps of a method for identifying a spatial target. In practice, the terminal device may be one or more computers, as long as the computer-readable medium and the processor are included.

In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, a FLASH, etc., and when executed, the computer program on the storage medium can perform the steps of the above-mentioned method for identifying a spatial object. In practical applications, the computer readable medium may be included in the apparatus/device/system described in the above embodiments, or may exist alone without being assembled into the apparatus/device/system. The computer readable storage medium carries one or more programs which, when executed, enable performing the steps of a method of identifying a spatial target as described above.

According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can still change or easily conceive of the technical solutions described in the foregoing embodiments or equivalent replacement of some technical features thereof within the technical scope disclosed in the present application; such changes, variations and substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application and are intended to be covered by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of identifying a spatial target, comprising:

receiving a first image frame containing a real-time moving spatial target;

2. The method of claim 1, wherein prior to the step of inputting the first image frame into a pre-trained pose recognition model, the method further comprises a training step of the pose recognition model:

3. The method according to claim 1, wherein the step of inputting the image frame to be recognized into a pre-trained feature recognition model is preceded by a training step of the feature recognition model, comprising:

4. The method according to claim 3, wherein the step of acquiring a second sample image frame including the identified portion of the spatial target, and using a binary mask image corresponding to the identified portion, the category information corresponding to the identified portion, and the coordinate information of the detection frame in which the identified portion is located, which are included in the second sample image frame, as the second label information comprises:

acquiring image frames of a space target model generated in a virtual environment in a simulating manner, coloring the identified part on the space target model, and respectively generating a first two-dimensional image frame projected by a coloring model corresponding to the identified part under different angles and a second two-dimensional image frame projected by an original model before coloring the identified part;

5. The method of claim 3, wherein the step of optimizing the feature recognition model based on the second loss function generated during training comprises:

generating a second loss function based on a classification layer, a regression layer and a binary mask image extraction layer which are included in a softmax layer in the feature part recognition model, and based on a classification loss function generated by the classification layer, a regression loss function generated by the regression layer and a mask loss function generated by the binary mask image extraction layer;

6. The method according to claim 3, wherein the step of generating the feature information to be identified of the spatial object contained in the image frame to be identified by the feature identification model comprises:

extracting image features in the image frame to be recognized through a convolutional neural network in the feature part recognition model, and generating an image feature map corresponding to the image frame to be recognized;

7. An apparatus for identifying a spatial target, the apparatus comprising:

8. The apparatus of claim 7, further comprising a first training module comprising:

9. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of a method of identifying a spatial target as claimed in any one of claims 1 to 6.

10. A terminal device, characterized in that it comprises a processor for carrying out the steps of a method of identifying a spatial object according to any one of claims 1 to 6.