CN116363217A

CN116363217A - Method, device, computer equipment and medium for measuring pose of space non-cooperative target

Info

Publication number: CN116363217A
Application number: CN202310638984.5A
Authority: CN
Inventors: 王梓; 余英建; 李璋; 苏昂; 于起峰
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-06-30
Anticipated expiration: 2043-06-01
Also published as: CN116363217B

Abstract

The invention provides a method, a device, computer equipment and a medium for measuring the pose of a space non-cooperative target, which relate to the relative navigation positioning of space, the selection of semantic key points on the space non-cooperative target in the pose measurement field of the non-cooperative target in the computer vision field, the construction of a training data set, the construction of a depth neural network prediction semantic key point set, the training of the depth neural network by using the training data set, the prediction of the semantic key point set in an input image by using the trained depth neural network after the training of the depth neural network is completed, finally, the establishment of a weighted N-point perspective problem based on the prediction, and the solution of the problem to obtain the position and the pose of the non-cooperative target in the input image under a camera coordinate system. The method can adapt to a complex space non-control environment and realize reliable position and posture prediction of the space non-cooperative target under a camera coordinate system.

Description

Method, device, computer equipment and medium for measuring pose of space non-cooperative target

Technical Field

The invention mainly relates to the technical field of radar imaging remote sensing, in particular to a method, a device, computer equipment and a medium for measuring pose of a spatial non-cooperative target.

Background

With the rapid development of space technology, for example, the position and the gesture of a target spacecraft relative to a service spacecraft are required to be measured in the tasks of formation flight, invalid satellites, space debris removal and the like, the existing method obtains the relative position and the gesture by predicting the position of a semantic key point on an image defined on the target spacecraft and then solving an N-point perspective problem.

However, the existing method takes each key point as an independent target, trains the deep neural network to predict the position of each key point in the image, lacks overall modeling of the spacecraft, and is difficult to adapt to a complex space uncontrolled environment.

Disclosure of Invention

Aiming at the technical problems existing in the prior art, the invention provides a method, a device, computer equipment and a medium for measuring the pose of a space non-cooperative target.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

in one aspect, the present invention provides a method for measuring pose of a spatial non-cooperative target, including:

acquiring coordinates of each semantic key point on the space non-cooperative target under a space non-cooperative target body coordinate system;

acquiring sample images containing space non-cooperative targets, projecting coordinates of all semantic key points under a space non-cooperative target body coordinate system to an image coordinate system to obtain coordinates of all semantic key points in each sample image, obtaining a semantic key point true value set of the space non-cooperative targets on each sample image, and constructing a training data set;

constructing a deep neural network for predicting a semantic key point set;

training the deep neural network by using the training data set until training converges;

predicting a semantic key point set of a space non-cooperative target in an input image by using the trained deep neural network to obtain a corresponding relation between coordinates of each semantic key point in the predicted semantic key point set in an image coordinate system and coordinates of each semantic key point in the space non-cooperative target body coordinate system;

and solving the position and the gesture of the spatial non-cooperative target in the input image under a camera coordinate system based on the corresponding relation.

Further, the number of semantic key points on the spatial non-cooperative targets is more than or equal to 4.

Further, the semantic key point true value set of the spatial non-cooperative target on each sample image consists of all semantic key point elements, the first

The semantic key point element is described by +.>

Index classification item of corresponding relation between semantic key point elements and semantic key points on space non-cooperative target coordinate system>

One description->

X-axis image coordinate classification item of coordinates of semantic key point elements on X-axis of image +.>

And a description of->

Y-axis image coordinate classification item of coordinates of semantic key point elements on image Y-axis +.>

Composition is prepared.

Further, the deep neural network comprises a feature extraction network, a feature encoder, a feature decoder and three prediction heads, wherein the three prediction heads are an index classification item prediction head, an X-axis image coordinate classification item prediction head and a Y-axis image coordinate classification item prediction head respectively;

the feature extraction network is used for extracting a feature map from an input image; the feature encoder is used for encoding the extracted features to obtain a feature map after global information encoding; the feature decoder is used for inquiring the feature map coded by the feature encoder by taking the key point inquiry vector as input to obtain the decoded feature corresponding to each prediction element; the index classification item predicting head, the X-axis image coordinate classification item predicting head and the Y-axis image coordinate classification item predicting head respectively predict the index classification item, the X-axis image coordinate classification item and the Y-axis image coordinate classification item of the semantic key point element by receiving the decoded features output by the feature decoder.

Further, the invention uses the training data set to train the deep neural network using a random gradient descent method.

Further, the invention trains the deep neural network, comprising:

the semantic key point true value set of the space non-cooperative target in the input image is

Wherein->

The semantic key element is marked as +.>

The number of semantic key point elements in the semantic key point truth value set is +.>

，/>

Equal to the number of semantic keypoints on spatially non-cooperative targets +.>

；

The semantic key point predicted value set predicted by the deep neural network based on the input image is as follows

Wherein->

The semantic key element is marked as +.>

The number of semantic key point elements in the semantic key point predicted value set is +.>

And->

；

True value set to semantic keypoints

Supplementing zero element to obtain a set->

Make->

The number of the semantic key point elements is +.>

And semantic key point predictor set +.>

The number of semantic key point elements which can only be the same;

defining index function, obtaining optimal index function by minimizing bipartite matching loss function

The following are provided:

；

wherein

、/>

、/>

Respectively +.>

Index classification items, X-axis image coordinate classification items and Y-axis image coordinate classification items of semantic key point prediction elements in a semantic key point prediction set; />

、/>

、/>

Respectively +.>

Index classification item, X-axis image coordinate classification item and Y-axis image coordinate classification item of semantic key point prediction elements in semantic key point true value set, +.>

Indicate->

Index of each semantic key point prediction element in the semantic key point true value set; />

Is a balance parameter; />

Is cross entropy loss; when the X-axis image coordinate classification item and the Y-axis image coordinate classification item are Gaussian distribution, the +.>

Is KL loss; when the X-axis image coordinate classification item and the Y-axis image coordinate classification item are one-hot encoded, the ++>

Is cross entropy loss;

pairing semantic keypoint elements in a semantic keypoint predictor set using an optimal indexing function

Is a semantic key element of the group.

Further, the invention trains the deep neural network, further comprises constructing a loss function of the deep neural network, and supervising the training of the deep neural network, wherein the loss function of the deep neural network is as follows:

；

wherein

Is a coordinate term loss function; />

、/>

、/>

Respectively the +.>

Index classification items, X-axis image coordinate classification items and Y-axis image coordinate classification items of semantic key point elements in the semantic key point true value set of the semantic key point prediction elements; />

Represents the +.>

The semantic keypoint predictor is the best index of semantic keypoint elements in the semantic keypoint truth set.

Further, the coordinate term loss function of the invention

The method comprises the following steps:

；

wherein

and />

The average of the predicted value and the true value of the coordinate classification term are respectively:

；

wherein

Classifying the predicted dimension of the term for the coordinate term; />

Prediction variance for coordinate classification term:

；

further, when the deep neural network is trained, the conditions for training convergence are as follows:

setting the maximum iteration number, and ending training when the iteration number exceeds the maximum iteration number;

or, setting a loss function threshold, and ending training when the loss function value obtained by current calculation is smaller than the loss function threshold;

or, when the currently calculated loss function value is no longer reduced, the training is ended.

Further, the position and the posture of the spatial non-cooperative target in the camera coordinate system are obtained through the following steps:

the first semantic key point set of the input image obtained through prediction

X-axis image coordinate classification item and Y-axis image coordinate classification item of each semantic key point element +.>

、/>

Get->

Coordinate position of each semantic key element on X-axis and Y-axis of image coordinate system +.>

、/>

；

；

wherein

and />

Are respectively->

X-axis image coordinate classification item and Y-axis image coordinate classification item of each semantic key point element are at the +.>

Probability on individual positions, +.>

and />

Respectively the width and height of the input image, +.>

The coefficient is the ratio of the resolution of the coordinate classification item to the image scale;

acquiring the first semantic key point set of the predicted input image

The uncertainty of the positions of the semantic key point elements on the X axis and the Y axis of the image coordinate system is as follows:

；

the first semantic key point set of the input image obtained through prediction

Index classification item of each semantic key point element to obtain the +.>

Coordinates +.>

：

；

wherein ,

indicate->

Personal languageIndex of semantic key point elements to predefined semantic key points;

constructing a weighted N-point perspective model, and obtaining the position and the posture of a non-cooperative target under a camera coordinate system by solving the weighted N-point perspective model, wherein the weighted N-point perspective model is as follows:

；

wherein

and />

The optimal estimated values of the rotation matrix and the translation vector of the space non-cooperative target under the camera coordinate system are respectively called the pose of the space non-cooperative target; />

Is an indication function if and only if the condition in brackets is 1, otherwise equal to 0; />

Is a robust estimation function; />

Is a weighted re-projection residual, expressed as:

；

；

wherein ,

is an internal parameter matrix of the camera, +.>

Pose as spatially non-cooperative target, wherein +.>

Rotation matrix under camera coordinate system for spatially non-cooperative targets, +.>

For translation vectors of spatially non-cooperative targets in the camera coordinate system,

is->

Coordinates under a space non-cooperative target object coordinate system corresponding to each semantic key point element, ++>

Is->

Photographic depth of individual semantic key elements.

In another aspect, the present invention provides a spatial non-cooperative target pose measurement apparatus, comprising:

the first module is used for acquiring coordinates of each semantic key point on the space non-cooperative target under a space non-cooperative target body coordinate system;

the second module is used for acquiring sample images containing the space non-cooperative targets, projecting the coordinates of the semantic key points under the space non-cooperative target body coordinate system to the image coordinate system to obtain the coordinates of the semantic key points in each sample image, obtaining a semantic key point true value set of the space non-cooperative targets on each sample image, and constructing a training data set;

the third module is used for constructing a deep neural network for predicting the semantic key point set;

a fourth module for training the deep neural network using the training data set until training converges;

a fifth module, configured to predict a semantic key point set of a spatial non-cooperative target in the input image by using the trained deep neural network, to obtain a correspondence between coordinates of each semantic key point in the predicted semantic key point set in an image coordinate system and coordinates thereof in a spatial non-cooperative target object coordinate system;

and a sixth module, configured to solve, based on the correspondence, a position and an attitude of the spatial non-cooperative target in the input image under a camera coordinate system.

In another aspect, the present invention provides a computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

constructing a deep neural network for predicting a semantic key point set;

In another aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

constructing a deep neural network for predicting a semantic key point set;

Compared with the prior art, the invention has the technical effects that:

according to the method, the position of each semantic key point in the spatial non-cooperative target in the image coordinate system is predicted by training the deep neural network, so that the position and the gesture of the spatial non-cooperative target in the camera coordinate system are solved. The method can adapt to a complex space non-control environment and realize reliable position and posture prediction of the space non-cooperative target under a camera coordinate system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment;

FIG. 2 is a schematic diagram of a semantic keypoint truth value set representation of spatial non-cooperative targets in an embodiment;

FIG. 3 is a block diagram of a deep neural network used in one embodiment;

FIG. 4 is a training flow diagram of a deep neural network in one embodiment;

FIG. 5 is a flow diagram of target position and pose estimation based on a deep neural network in one embodiment.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, in one embodiment, a method for measuring pose of a spatial non-cooperative target is provided, including the following steps:

(1) Constructing a training data set;

the method comprises the steps of obtaining sample images containing space non-cooperative targets, projecting coordinates of semantic key points under a space non-cooperative target body coordinate system to an image coordinate system to obtain coordinates of the semantic key points in each sample image, obtaining a semantic key point true value set of the space non-cooperative targets on each sample image, and constructing a training data set.

(2) Constructing a deep neural network for predicting a semantic key point set;

(3) Training a deep neural network using the training dataset;

and training the deep neural network based on the training data set by using an optimization algorithm until training converges.

(4) Predicting an input image by using the trained deep neural network;

(5) And solving the position and the posture of the spatial non-cooperative target in the input image under a camera coordinate system.

It is understood that the number of semantic keypoints on the spatially non-cooperative targets is 4 or more.

The semantic keypoint truth value set of the spatial non-cooperative target on each sample image consists of all semantic keypoint elements. As shown in FIG. 2, in an embodiment, a schematic diagram of a semantic key point truth value set representation of a spatial non-cooperative target is shown, in which 11 semantic key points are selected on the spatial non-cooperative target, and all semantic key points form a semantic key point truth value set, and each semantic key point element in the semantic key point truth value set is respectively recorded as

Wherein the first

Individual semantic key elements->

Consists of three items including a description +.>

One description->

And a description of->

，/>

、/>

、/>

。/>

and />

Respectively the width and height of the input image, +.>

Is the resolution coefficient of the coordinate classification term, +.>

and />

One-dimensional gaussian distribution representation may be employed:

；

wherein

and />

Is the true value of the semantic key point element on the image coordinate system,/or->

For a fixed spatial variance, its value may be dependent on the resolution of the image>

、/>

And resolution of coordinate classification item>

And (5) adjusting. In addition, the expression +.A.Thermocoded version is also used>

and />

，

；

Index classification item

Describes->

And the corresponding relation between each semantic key point element and the semantic key point on the space non-cooperative target body coordinate system is expressed by adopting single-hot coding. Let->

Individual semantic key elements->

Corresponding to the first +.>

Semantic key point, then->

The method comprises the following steps:

；

in addition, a background element is introduced

Pixels on the image which are not semantic keypoints are described, wherein +.>

Index classification item->

All are zero:

；

element(s)

May also be referred to as a null element. The background class elements are individually set as one class, so that the dimension of the index classification item is the number of semantic key points plus 1,1 represents the background element.

There are two situations in which a training dataset is constructed:

1) When a three-dimensional model of a space non-cooperative target exists, three-dimensional model editing software can be used for selecting semantic key points of the surface of the space non-cooperative target, and the coordinates of the semantic key points under a space non-cooperative target body coordinate system (also called a body coordinate system for short) are recorded;

2) When only a sample image with pose labels exists, a plurality of semantic key points can be manually selected, and the coordinates of each semantic key point in a body coordinate system are calculated from a plurality of images by using a multi-view intersection technology. The number of semantic key points is recorded as

（/>

) First->

The coordinates of the semantic key points in the body coordinate system are +.>

Obtaining the first +.in a sample image by a pinhole imaging model>

Coordinates of the individual semantic keypoints:

；

wherein

Is->

Homogeneous coordinates of the semantic key points in the image coordinate system. According to the method, the image coordinates of each semantic key point on each image can be obtained, and then the semantic key point set true value of the spatial non-cooperative target on each sample image can be obtained.

The depth neural network adopted by an embodiment comprises a feature extraction network (TZ), a feature encoder, a feature decoder and three prediction heads, wherein the three prediction heads are an index classification item prediction head, an X-axis image coordinate classification item prediction head and a Y-axis image coordinate classification item prediction head respectively.

The feature extraction network is used for extracting a feature map from an input image; the feature encoder is used for encoding the extracted feature map to obtain a feature map with global information encoded, and plays a role in global feature fusion interaction. The feature decoder is used for inquiring the feature map coded by the feature encoder by taking the key point inquiry vector as input to obtain the decoded feature corresponding to each prediction element; the index classification item predicting head, the X-axis image coordinate classification item predicting head and the Y-axis image coordinate classification item predicting head respectively predict the index classification item, the X-axis image coordinate classification item and the Y-axis image coordinate classification item of the semantic key point element by receiving the decoded features output by the feature decoder.

The deep neural network used in one embodiment is shown in fig. 3, and in fig. 3, the input image is extracted into a feature extraction network (TZ) to extract a feature map. The number of feature encoders is K, namely a first feature encoder (BM 1) and a second feature encoder (BM 2), respectively. In fig. 3, by stacking the feature encoder and the feature decoder, the capability of feature encoding and decoding is improved, and the effect of neural network prediction is improved.

The optimization algorithm adopted by the invention for training the deep neural network is not limited, and a person skilled in the art can select the optimization algorithm in the prior art according to the situation.

In one embodiment of the invention, the deep neural network is trained using a stochastic gradient descent method based on the training dataset.

As shown in fig. 4, in one embodiment, a method for training the deep neural network is provided, including:

(1) Inputting a prediction set and a truth value set;

Wherein->

The semantic key element is marked as +.>

The number of semantic key point elements in the semantic key point truth value set is equal to the number of semantic key points on the space non-cooperative target +.>

；

Wherein the first

The semantic key element is marked as +.>

And->

。

(2) And supplementing background elements to the truth value set to make the number of elements in the prediction set and the truth value set equal.

True value set to semantic keypoints

Supplementing zero element to obtain a set->

Make->

The number of the semantic key point elements is +.>

And semantic key point predictor set +.>

The number of semantic key point elements which can only be the same;

(3) Minimizing the matching loss function results in an optimal index function.

The following are provided:

；

wherein

、/>

、/>

Respectively +.>

、/>

、/>

Respectively +.>

Indicate->

Index of each semantic key point prediction element in the semantic key point true value set;

is a balance parameter; />

Is cross entropy loss; when the X-axis image coordinate classification item and the Y-axis image coordinate classification item are adoptedWhere the term is Gaussian, then->

Is a cross entropy loss.

(4) The elements of a truth set are paired for each element of the prediction set by the best index function.

Is a semantic key element of the group.

(5) And calculating a loss function value predicted by the deep neural network.

On the basis of the method for training the deep neural network, the method further comprises the steps of constructing a loss function of the deep neural network and supervising the training of the deep neural network, wherein the loss function of the deep neural network is as follows:

；

wherein

Is a coordinate term loss function; />

、/>

、/>

Respectively the +.>

Individual semantic key point prediction element in languageIndex classification items of semantic key point elements in the semantic key point true value set, X-axis image coordinate classification items and Y-axis image coordinate classification items; />

Represents the +.>

The coordinate term loss function

The method comprises the following steps:

；

wherein

and />

；

wherein

Classifying the predicted dimension of the term for the coordinate term; />

Prediction variance for coordinate classification term:

；

it will be appreciated that the present invention is not limited to the end conditions for terminating training of the model, and those skilled in the art can make reasonable settings based on methods known in the art or based on empirical, conventional means, including but not limited to setting the maximum number of iterations, etc. As in training the deep neural network, the conditions for training convergence are any one of the following three:

(a) Setting the maximum iteration number, and ending training when the iteration number exceeds the maximum iteration number;

(b) Setting a loss function threshold, and ending training when the loss function value obtained by current calculation is smaller than the loss function threshold;

(c) And ending training when the currently calculated loss function value is not reduced any more.

In one embodiment, given an image of a spatially non-cooperative target, the estimation process of the position and the pose of the spatially non-cooperative target in the camera coordinate system is shown in fig. 5, and includes the following steps:

(1) A training image is input.

(2) And predicting by the deep neural network to obtain a key point set.

A set of semantic keypoints of the spatial non-cooperative targets in the input image is predicted using the deep neural network that has completed training.

(3) The position and uncertainty of the element on the image coordinate system are obtained through the X-axis and Y-axis classification items of the image.

The first semantic key point set of the input image obtained through prediction

、/>

Get->

、/>

；/>

；

Here, the

and />

Are respectively->

Probability on individual positions, +.>

and />

Respectively the width and height of the input image, +.>

Is a coefficient, which is the ratio of the resolution of the coordinate classification term to the image scale.

Acquiring the first semantic key point set of the predicted input image

；

(4) And establishing a corresponding relation between the coordinates of the key point image and the coordinates of the body coordinate system through the index classification item.

The first semantic key point set of the input image obtained through prediction

Index classification item of each semantic key point element to obtain the +.>

Coordinates +.>

：

；

wherein ,

indicate->

Index of individual semantic keypoint elements to predefined semantic keypoints.

(5) And constructing and solving an N-point perspective model with weights.

；

wherein

and />

Is an indication function if and only if the condition in brackets is true is 1, otherwise equal to 0. The function of the indication function is to eliminate the background elements of the network prediction when estimating the pose. />

Is a robust estimation function, e.g. Huber Loss, etc., is a common function in robust estimation and will not be described here. />

Is a weighted re-projection residual, expressed as:

；/>

wherein ,

is->

Prediction uncertainty of image coordinates of each semantic key point element:

；

wherein ,

is an internal parameter matrix of the camera, +.>

Pose as spatially non-cooperative target, wherein +.>

is->

Is->

Photographic depth of individual semantic key elements.

The weighted N-point perspective model can be solved through a general optimization library g2o or ceres to obtain the gesture and the position of the space non-cooperative target under a camera coordinate system, namely

and />

。

In one embodiment, a spatial non-cooperative target pose measurement apparatus is provided, including:

The implementation method of each module and the construction of the model can be the method described in any of the foregoing embodiments, which is not described herein.

In another aspect, the present invention provides a computer device, including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the spatial non-cooperative target pose measurement method provided in any of the embodiments described above when executing the computer program. The computer device may be a server. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing sample data. The network interface of the computer device is used for communicating with an external terminal through a network connection.

In another aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the spatial non-cooperative target pose measurement method provided in any of the embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The invention is not a matter of the known technology.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method for measuring the pose of the spatial non-cooperative target is characterized by comprising the following steps of:

constructing a deep neural network for predicting a semantic key point set;

2. The method for pose measurement of a spatially non-cooperative target according to claim 1, wherein the number of semantic keypoints on the spatially non-cooperative target is 4 or more.

3. The method for measuring the pose of a spatially non-cooperative target according to claim 1 or 2, wherein the set of semantic keypoint truth values of the spatially non-cooperative target on each sample image is composed of all semantic keypoint elements, the first

The semantic key point element is described by +.>

One description->

And a description of->

Composition is prepared.

4. The method for measuring the pose of the spatial non-cooperative target according to claim 3, wherein the depth neural network comprises a feature extraction network, a feature encoder, a feature decoder and three prediction heads, wherein the three prediction heads are an index classification item prediction head, an X-axis image coordinate classification item prediction head and a Y-axis image coordinate classification item prediction head respectively;

5. A method of spatial non-cooperative target pose measurement according to claim 3, wherein the training data set is used to train the deep neural network using a random gradient descent method.

6. The method of spatial non-cooperative target pose measurement according to claim 5, wherein training the deep neural network comprises:

Wherein->

The semantic key element is marked as +.>

；

Wherein->

The semantic key element is marked as +.>

And->

；

True value set to semantic keypoints

Supplementing zero element to obtain a set->

Make->

The number of the semantic key point elements is +.>

And semantic key point predictor set +.>

The number of semantic key point elements which can only be the same;

The following are provided:

；

wherein

、/>

、/>

Respectively +.>

、/>

、/>

Respectively +.>

Indicate->

Is a balance parameter; />

Is cross entropy loss;

Is a semantic key element of the group.

7. The method of claim 6, further comprising constructing a loss function of the deep neural network, and supervising the training of the deep neural network, wherein the loss function of the deep neural network is as follows:

；

wherein

Is a coordinate term loss function; />

、/>

、/>

Respectively the +.>

Represents the +.>

8. The method of claim 7, wherein the coordinate term loss function

The method comprises the following steps:

；

wherein

and />

；

wherein

Classifying the predicted dimension of the term for the coordinate term; />

Prediction variance for coordinate classification term:

。

9. the method for measuring pose of spatial non-cooperative target according to claim 7 or 8, wherein when training the deep neural network, conditions for training convergence are:

10. The method for measuring the pose of a spatially non-cooperative target according to claim 4 or 5 or 6 or 7 or 8, wherein the position and the pose of the spatially non-cooperative target in a camera coordinate system are obtained by:

the first semantic key point set of the input image obtained through prediction

、/>

Get->

、/>

；

；

wherein

and />

Are respectively->

Probability on individual positions, +.>

and />

Respectively the width and height of the input image, +.>

acquiring the first semantic key point set of the predicted input image

；

the first semantic key point set of the input image obtained through prediction

Index classification item of each semantic key point element to obtain the +.>

Coordinates +.>

：

；

wherein ,

indicate->

Indexing the semantic key point elements to predefined semantic key points;

；

wherein

and />

Is a robust estimation function; />

Is a weighted re-projection residual, expressed as:

；

；

wherein ,

is an internal parameter matrix of the camera, +.>

Pose as spatially non-cooperative target, wherein +.>

Translation vector in camera coordinate system for spatially non-cooperative target, +.>

Is->

Is->

Photographic depth of individual semantic key elements.

11. The utility model provides a space non-cooperation target position appearance measuring device which characterized in that includes:

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the spatial non-cooperative target pose measurement method according to claim 1.

13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the spatial non-cooperative target pose measurement method according to claim 1.