CN117953057A

CN117953057A - Pose estimation method and equipment for indoor article flight transfer robot

Info

Publication number: CN117953057A
Application number: CN202410051336.4A
Authority: CN
Inventors: 林必毅; 任晓波; 王志敏
Original assignee: Shenzhen Huasairuifei Intelligent Technology Co ltd
Current assignee: Shenzhen Huasairuifei Intelligent Technology Co ltd
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-04-30

Abstract

A pose estimation method and equipment of an indoor article flying and carrying robot, the method comprises the following steps: according to the method, pose estimation of the indoor article flight transfer robot is carried out based on a pre-constructed convolutional neural network according to a first image frame and a second image frame, wherein the first image frame is an image frame of the indoor article flight transfer robot before article placement/before article removal, and the second image frame is an image frame of the indoor article flight transfer robot after article placement/after article removal, and at least three multi-mode prior information is integrated in an output channel of the convolutional neural network, so that the accuracy of pose estimation can be ensured at the moment of indoor GPS signal loss and article placement/article removal.

Description

Pose estimation method and equipment for indoor article flight transfer robot

Technical Field

The invention relates to the technical field of pose estimation, in particular to a pose estimation method and equipment of an indoor article flying and carrying robot.

Background

The flying robot is a special intelligent air robot, can fly autonomously according to a pre-planned path, or can make corresponding flying actions according to wireless instructions without the need of airborne flying personnel to control the gesture of the flying robot, and the flying gesture is realized by converting the corresponding instruction actions into PWM electric signals by utilizing a microprocessor so as to drive a rotating motor and changing the rotating speed of the corresponding motor; compared with the conventional robots, the flying robot has smaller volume, more flexible maneuvering performance and wider flying area, and can normally operate in severe environments especially in dangerous areas; in the civil field, unmanned vehicles can carry out indoor inspection, article handling, positioning navigation and other works, so that the unmanned vehicles gradually become the current research hot spot. In order to realize intelligent flight of the indoor article flight transfer robot, the self pose estimation of the flight robot is a foundation and key guarantee of the indoor transfer application field; unlike ground robots, the common ground robot only has two-degree-of-freedom position estimation, even if the ground is in uneven condition, the ground robot only needs to estimate three degrees of freedom, and the height range of the ground robot is within a certain constraint condition, and the flying robot pose estimation comprises six degrees of freedom, namely three-degree-of-freedom position estimation and three-degree-of-freedom direction estimation. Because flying robots are objects with high maneuvering performance, pose estimation of six degrees of freedom is a very complicated problem, and in order to ensure that pose information calculated by the flying robots is accurate, various advanced devices such as a global satellite positioning system, an altimeter, a speedometer, an inertial measurement unit, a high-definition camera, a laser detection device, a radar scanning device and the like are arranged on the flying robots.

At present, researchers are endlessly researched on the pose estimation of the flying robot, and research on pose estimation methods of the flying robot from different angles is carried out, such as: the patent with the application number of CN202310488557.3 discloses a mask-based unmanned aerial vehicle pose estimation data enhancement method, wherein an RGB camera is used for manufacturing an unmanned aerial vehicle pose estimation data set, and after the pose estimation data set is acquired, the mask method is used for enhancing the data set, so that the unmanned aerial vehicle pose estimation data set is expanded; the pose estimation method suitable for indoor autonomous landing of the unmanned aerial vehicle disclosed in the patent with the application number of CN201610933178.0 adopts a relative pose estimation method based on a monocular measurement novel cooperative mark, can provide information such as rolling angle, relative height and the like for tasks such as autonomous landing of the unmanned aerial vehicle, designs a novel cooperative mark, does not need color information, has directivity, is suitable for measurement under different distance conditions, images the cooperative mark by using a camera, reduces system complexity, takes less time in algorithm solving, and can be used for occasions requiring directional positioning of the unmanned aerial vehicle.

However, the pose estimation method proposed at the present stage cannot be well applied to the indoor object flying and carrying robot, mainly because: the attitude control system of the indoor object flying and carrying robot has the problems and difficulties of poor stability, low positioning precision and the like under the condition of losing indoor GPS signals; at the moment that the indoor article flying and carrying robot carries out article placement or article taking away, the attitude deformation can be caused by loading or unloading articles on the machine body, and the high precision and the stability of the flying attitude and the position information are difficult to ensure only by the attitude control system of the flying robot.

Disclosure of Invention

The method provided by the invention improves the accuracy of pose estimation of the indoor article flying and carrying robot when articles are placed/taken away.

In a first aspect, an embodiment of the present invention provides a method for estimating a pose of an indoor article flight transfer robot, including: acquiring a first image frame and a second image frame, wherein the first image frame is an image frame of the indoor article flying and carrying robot before article placement/before article removal, and the second image frame is an image frame of the indoor article flying and carrying robot after article placement/after article removal; according to the first image frame and the second image frame, estimating the pose of the indoor article flight transfer robot based on a pre-constructed convolutional neural network to obtain the actual pose of the indoor article flight transfer robot when the article is placed/taken away, wherein at least three pieces of multi-mode prior information are integrated in an output channel of the convolutional neural network, and the three pieces of multi-mode prior information respectively correspond to preset initial time, article placement time and article taking time.

In some embodiments, before integrating the multimodal prior information in an output channel of the convolutional neural network, uncertainty estimation is performed on the multimodal prior information based on the convolutional neural network; the uncertainty estimation for the multi-modal prior information based on the convolutional neural network comprises the following steps: acquiring the multi-mode priori information; extracting an uncertainty estimated object from the multi-modal prior information through the convolutional neural network, the uncertainty estimated object comprising at least one of a pose of the indoor item flight handling robot and a spatial feature of an image; and performing uncertainty estimation on the uncertainty estimated object.

In some embodiments, the output channels of the convolutional neural network may be represented by: Wherein p ₁ is multi-mode prior information corresponding to the initial time, p ₂ is multi-mode prior information corresponding to the article placement time, p ₃ is multi-mode prior information corresponding to the article removal time, and p is an output channel of the convolutional neural network.

In some embodiments, the method for obtaining multi-modal prior information includes: the method for acquiring the multi-mode prior information comprises the following steps: performing matrix conversion on pre-acquired RGB image frames to obtain RGB information, wherein the RGB information is expressed as a matrix of 13 multiplied by 13, and the RGB image is an image frame at the initial moment/an image frame at the article placement moment/an image frame at the article removal moment; the prior information of multiple modes of initial time/article placing time/article taking time is respectively converted into corresponding pose matrixes, wherein the multiple modes at least comprise RGB image information and characters, and the pose matrixes are 13 multiplied by 13 matrixes; fusing the RGB information at a moment with each pose matrix at the moment to obtain fused prior information corresponding to each mode at the moment; selecting the minimum value in the fused prior information as prior result information at the moment; acquiring an image frame of the indoor article flying and carrying robot before an initial moment, before article placement or before article removal; performing perspective transformation on the image frames before the initial moment/before the article is placed/before the article is taken out based on the convolutional neural network to obtain perspective transformed image frames; converting the perspective transformed image frame into a perspective transformation matrix; and obtaining the multi-mode prior information at the moment according to the RGB information, the prior result information and the perspective transformation matrix.

In some embodiments, the multimodal a priori information may be represented by: Wherein M is the image frame after the perspective transformation, N _RGB is the RGB information, N _prior is the priori result information, and p is the multimodal priori information.

In some embodiments, the estimating the pose of the indoor article flight handling robot based on the pre-constructed convolutional neural network includes: constructing a least square model by taking the square sum of the positions of any feature points between the first image frame and the second image frame as an objective function based on the convolutional neural network; according to the least square model, estimating the pose of the indoor article flight transfer robot to obtain a rotation matrix and a translation matrix, wherein the translation matrix is used for describing translation transformation of the indoor article flight transfer robot in a three-dimensional space, and the rotation matrix is used for describing angle transformation of the indoor article flight transfer robot in the three-dimensional space; obtaining a translation coefficient according to the translation matrix; and predicting the actual pose of the indoor article flight transfer robot according to the translation coefficient, the rotation matrix and a preset pose sample set.

In some embodiments, the least squares model may be represented by: Wherein R is the rotation matrix, t is the translation matrix, R ₁ is the rotation matrix of the indoor article flight transfer robot before article placement/article removal, t ₁ is the translation matrix of the indoor article flight transfer robot before article placement/article removal,/> And p is the multi-mode prior information for the initial pose generated based on the output channel of the convolutional neural network.

In some embodiments, the method further comprises: and optimizing the actual pose to obtain the optimized pose.

In a second aspect, another embodiment of the present invention provides a pose estimation apparatus of an indoor article flight transfer robot, including: memory, a processor and a program stored on the memory and executable on the processor, which processor implements a method as described above when executing the program.

In a third aspect, another embodiment of the present invention provides a computer storage medium having a program stored thereon, the program being executable by a processor to implement a method as described above.

According to the method of the embodiment, the pose estimation of the indoor article flight transfer robot is performed based on the pre-constructed convolutional neural network according to the first image frame and the second image frame, wherein the first image frame is the image frame of the indoor article flight transfer robot before the article is placed/before the article is taken away, and the second image frame is the image frame of the indoor article flight transfer robot after the article is placed/after the article is taken away, and because at least three pieces of multi-mode prior information are integrated in the output channel of the convolutional neural network, the pose estimation accuracy can be ensured at the moment of indoor GPS signal loss and article placement/article taking away.

Drawings

FIG. 1 is a flow chart of a pose estimation method of an indoor article flight transfer robot provided by the invention;

FIG. 2 is a flow chart of uncertainty estimation for multimodal apriori information based on convolutional neural networks, according to one embodiment;

FIG. 3 is a flow chart of a method for obtaining multi-modal prior information according to one embodiment;

FIG. 4 is a pose estimation of an indoor item flight handling robot based on a pre-constructed convolutional neural network according to one embodiment;

fig. 5 is a block diagram of a pose estimation apparatus provided by the present invention;

Fig. 6 is a block diagram of a computer storage medium according to the present invention.

Detailed Description

The application will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, related operations of the present application have not been shown or described in the specification in order to avoid obscuring the core portions of the present application, and may be unnecessary to persons skilled in the art from a detailed description of the related operations, which may be presented in the description and general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning.

Referring to fig. 1, in an embodiment of the present invention, a pose estimation method of an indoor article flight transfer robot is provided, including:

S10: and acquiring a first image frame and a second image frame, wherein the first image frame is an image frame of the indoor article flying and carrying robot before the article is placed/before the article is taken away, and the second image frame is an image frame of the indoor article flying and carrying robot after the article is placed/after the article is taken away.

In some embodiments, in the flight process of the flying robot, attitude information of the flying robot is obtained through attitude detection sensors such as a gyroscope and an accelerometer, attitude estimation is performed according to the attitude information to obtain attitude angle information of the flying robot, and a flight control system issues a flight attitude adjustment instruction according to the attitude angle information, so that it can be seen that the attitude estimation is crucial to whether the flying robot can keep a stable attitude in flight, so that influence of attitude change on flight performance of the unmanned aerial vehicle is reduced; in this embodiment, pose estimation of the indoor article flying and carrying robot is performed according to the acquired first image frame and second image frame, so as to solve the problem that the existing pose estimation method is inaccurate in estimating the instantaneous pose of the indoor article flying and carrying robot for article placement or article removal.

S20: according to the first image frame and the second image frame, estimating the pose of the indoor article flight transfer robot based on a pre-constructed convolutional neural network to obtain the actual pose of the indoor article flight transfer robot when the article is placed/taken away, wherein at least three pieces of multi-mode prior information are integrated in an output channel of the convolutional neural network, and correspond to the preset initial moment, the article placement moment and the article taking moment respectively.

In an indoor or unknown environment without GPS signals, a mainstream GPS integrated navigation system cannot be normally used, wherein the most important link for realizing autonomous navigation is to accurately estimate the pose of a flying robot.

In some embodiments, the convolutional neural network includes five convolutional layers, three pooling layers and three full link layers, and the construction method of the convolutional neural network is related to the prior art and will not be described herein.

In some embodiments, the initial time may be any time during the flight of the indoor item flight transfer robot.

In some embodiments, before integrating the multi-modal prior information in the output channel of the convolutional neural network, uncertainty estimation is performed on the multi-modal prior information based on the convolutional neural network; uncertainty estimation is performed on multi-modal prior information based on a convolutional neural network, as shown in fig. 2, including:

S210: and acquiring multi-mode prior information.

S211: an uncertainty estimated object is extracted from the multimodal apriori information through a convolutional neural network, the uncertainty estimated object including at least one of a pose of the indoor item flight handling robot and a spatial feature of the image.

In some embodiments, the spatial features of the image include the underlying features of gray scale, color, texture, shape, location, etc. of the image.

S212: uncertainty estimation is performed on the uncertainty estimated object.

In some embodiments, other uncertainty errors caused by jitter, noise, algorithm defects, and the like may exist in the multi-mode prior information, and if pose prediction of the indoor article flight transfer robot is directly performed according to the multi-mode prior information, the accuracy of pose estimation may be affected; therefore, in this embodiment, uncertainty estimation is performed on specific contents of the multi-modal prior information, the output of the convolutional neural network to which the uncertainty estimation function is added is distributed according to the uncertainty estimation result, and the uncertainty of each specific content of the multi-modal prior information can be estimated by analyzing the distribution of the output.

In some embodiments, the uncertainty estimation is performed on the specific content of the multi-mode prior information by using a gaussian process classification method, and the gaussian process classification can provide modeling on uncertainty based on Bayesian inference and the theory of gaussian distribution, so that the method has good flexibility and uncertainty estimation capability, and is widely applied to classification problems.

In some embodiments, uncertainty estimation for specific content of the multimodal prior information at least includes uncertainty estimation in the object class, object pose and image space as 2D gaussian, and the class of the output variable is predicted by modeling the uncertainty of the object class, object pose and image space, and the posterior distribution is calculated by prior distribution of the gaussian process and conditional probability distribution of the observed data, so as to classify the unknown data.

In some embodiments, the output channels of the convolutional neural network may be represented by:

Wherein p ₁ is the multi-mode prior information corresponding to the initial time, p ₂ is the multi-mode prior information corresponding to the article placement time, p ₃ is the multi-mode prior information corresponding to the article removal time, Is the output channel of the convolutional neural network.

In some embodiments, as shown in fig. 3, the method for obtaining multi-mode prior information includes:

S220: the RGB image frames acquired in advance are subjected to matrix conversion to obtain RGB information, the RGB information is expressed as a matrix of 13×13, and the RGB image is an image frame at the initial time/an image frame at the article placement time/an image frame at the article removal time.

In some embodiments, R, G, B of the RGB image is described by different gray levels, the RGB image comprises R, G, B three two-dimensional matrices with values between 0 and 255, in this embodiment the RGB image information is aggregated into a 13 x 13 matrix.

S221: the prior information of multiple modes of the initial moment, the article placing moment and the article taking moment is respectively converted into corresponding pose matrixes, wherein the multiple modes at least comprise RGB image information and characters, and the pose matrixes are 13 multiplied by 13 matrixes.

In some embodiments, the a priori information for the multiple modalities refers to a different form of a priori information from different sources, including: pictures, text, video, voice, etc.

In some embodiments, the prior information is converted to a pose matrix in order to fuse the prior information with the RGB information.

S222: and fusing the RGB information at one moment with each pose matrix at the moment to obtain fused prior information corresponding to each mode at the moment.

S223: and selecting the minimum value in the fused prior information as prior result information at the moment.

In some embodiments, the minimum value in the fused prior information is used as the prior result information of the moment, so that the influence of the current pose can be reduced as much as possible in the subsequently predicted poses.

S224: the method comprises the steps of acquiring image frames before an indoor article flying and carrying robot is started, before articles are placed, and before the articles are taken away.

S225: and performing perspective transformation on the image frames before the initial moment, before the article is placed or before the article is taken away based on the convolutional neural network, so as to obtain perspective transformed image frames.

In some embodiments, perspective transformation refers to an inclination angle between a camera loaded on the indoor article flight transfer robot and the ground, rather than a direct vertical downward (orthographic projection), which may cause an image frame collected by the indoor article flight transfer robot to be inclined or distorted, requiring correction of the image frame into the orthographic projection form.

S226: the perspective transformed image frame is converted into a perspective transformation matrix.

In some embodiments, the perspective transformation matrix is a 13×13 matrix.

S227: and obtaining multi-mode prior information at the moment according to the RGB information, the prior result information and the perspective transformation matrix.

In some embodiments, the multimodal a priori information may be represented by:

Wherein M is a perspective transformation matrix, N _RGB is RGB information, N _prior is priori result information, and p is multi-mode priori information.

In some embodiments, pose estimation of the indoor article flight handling robot is performed based on a pre-constructed convolutional neural network, as shown in fig. 4, including:

S230: based on the convolutional neural network, a least square model is built by taking the square sum of the positions of any feature points between the first image frame and the second image frame as an objective function.

In some embodiments, the feature points refer to points where the gray value of the image changes drastically or points with a larger curvature on the edges of the image (i.e. the intersection point of two edges), and the feature points in the image frame can reflect the essential features of the image and can identify the target object in the image.

In some embodiments, the least square model finds the best function matching of the data by minimizing the square sum of errors, and the least square method is used to obtain unknown data, so that the square sum of errors between the obtained data and the actual data is the minimum.

In some embodiments, the least squares model may be represented by:

wherein R is a rotation matrix, t is a translation matrix, R ₁ is a rotation matrix at the last moment, t ₁ is a translation matrix at the last moment, And p is multi-mode prior information for the initial pose generated based on the output channel of the convolutional neural network.

S231: according to the least square model, estimating the pose of the indoor article flight transfer robot to obtain a translation matrix and a rotation matrix, wherein the translation matrix is used for describing translation transformation of the indoor article flight transfer robot in a three-dimensional space, and the rotation matrix is used for describing angle transformation of the indoor article flight transfer robot in the three-dimensional space.

S232: and obtaining a translation coefficient according to the translation matrix.

In some embodiments, the translation matrix is used for describing translation transformation of the indoor article flight transfer robot in a three-dimensional space, the translation matrix includes a translation coefficient and coordinates of a first feature point in a first image frame, the translation coefficient includes a distance of the indoor article flight transfer robot translating along an X-axis, a distance of the indoor article flight transfer robot translating along a Y-axis, and a distance of the indoor article flight transfer robot translating along a Z-axis, the first feature point is any feature point in the first image frame, coordinates of a second feature point corresponding to the first feature point in a second image frame can be obtained according to the translation coefficient and the coordinates of the first feature point, and the translation coefficient can be obtained when the coordinates of the first feature point and the coordinates of the second feature point are known.

S233: and predicting the actual pose of the indoor article flying and carrying robot corresponding to the initial pose according to the translation coefficient, the rotation matrix and a preset pose sample set.

In some embodiments, the pose of the indoor article flight transfer robot includes a position and a pose corresponding to the translation coefficient and the rotation matrix, respectively, so that the actual pose of the indoor article flight transfer robot corresponding to the initial pose can be predicted according to the translation coefficient, the rotation matrix and a preset pose sample set.

In some embodiments, the actual pose comprises an estimate of six degrees of freedom of the indoor article flight transfer robot, the six degrees of freedom comprise a movement degree of freedom along three coordinate axis directions of x, y and z and a rotation degree of freedom around the three coordinate axes, and the position and the pose of the indoor article flight transfer robot of the object can be completely determined through the actual pose.

In some embodiments, in order to further improve the accuracy of the pose estimation of the indoor article flying and carrying robot, the method for optimizing the actual pose is as follows: based on the spatial information of the object in the actual pose recovery image, pose optimization is performed according to the spatial information of the object in the image, and other existing methods can be adopted, so that the method is not limited in the text.

Another embodiment of the present invention provides a pose estimation apparatus of an indoor article flight transfer robot, as shown in fig. 5, including: memory 100, processor 200, and a program stored on memory 100 and executable on the processor, processor 200 implements methods as described above when executing the program.

In another embodiment of the present invention, a computer storage medium is provided, as shown in fig. 6, on which a program is stored in the medium 300, and the program can be executed by the processor 200 to implement the method as described above.

The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims

1. The pose estimation method of the indoor article flying and carrying robot is characterized by comprising the following steps of:

Acquiring a first image frame and a second image frame, wherein the first image frame is an image frame of the indoor article flying and carrying robot before article placement/before article removal, and the second image frame is an image frame of the indoor article flying and carrying robot after article placement/after article removal;

According to the first image frame and the second image frame, estimating the pose of the indoor article flight transfer robot based on a pre-constructed convolutional neural network to obtain the actual pose of the indoor article flight transfer robot when the article is placed/taken away, wherein at least three pieces of multi-mode prior information are integrated in an output channel of the convolutional neural network, and the three pieces of multi-mode prior information respectively correspond to preset initial time, article placement time and article taking time.

2. The method of claim 1, wherein the multi-modal prior information is uncertainty estimated based on the convolutional neural network prior to integrating the multi-modal prior information in an output channel of the convolutional neural network; the uncertainty estimation for the multi-modal prior information based on the convolutional neural network comprises the following steps:

Acquiring the multi-mode priori information;

Extracting an uncertainty estimated object from the multi-modal prior information through the convolutional neural network, the uncertainty estimated object comprising at least one of a pose of the indoor item flight handling robot and a spatial feature of an image;

And performing uncertainty estimation on the uncertainty estimated object.

3. The method of claim 2, wherein the output channels of the convolutional neural network are represented by:

Wherein p ₁ is multi-mode prior information corresponding to the initial time, p ₂ is multi-mode prior information corresponding to the article placement time, p ₃ is multi-mode prior information corresponding to the article removal time, and p is an output channel of the convolutional neural network.

4. The method of claim 1, wherein the method for obtaining the multi-modal prior information comprises:

Performing matrix conversion on pre-acquired RGB image frames to obtain RGB information, wherein the RGB information is expressed as a matrix of 13 multiplied by 13, and the RGB image is an image frame at the initial moment/an image frame at the article placement moment/an image frame at the article removal moment;

The prior information of multiple modes of initial time/article placing time/article taking time is respectively converted into corresponding pose matrixes, wherein the multiple modes at least comprise RGB image information and characters, and the pose matrixes are 13 multiplied by 13 matrixes;

Fusing the RGB information at a moment with each pose matrix at the moment to obtain fused prior information corresponding to each mode at the moment;

selecting the minimum value in the fused prior information as prior result information at the moment;

Acquiring an image frame of the indoor article flying and carrying robot before an initial moment, before article placement or before article removal;

performing perspective transformation on the image frames before the initial moment/before the article is placed/before the article is taken out based on the convolutional neural network to obtain perspective transformed image frames;

converting the perspective transformed image frame into a perspective transformation matrix;

And obtaining the multi-mode prior information at the moment according to the RGB information, the prior result information and the perspective transformation matrix.

5. The method of claim 4, wherein the multimodal priori information is represented by:

Wherein M is the perspective transformation matrix, N _RGB is the RGB information, N _prior is the priori result information, and p is the multi-mode priori information.

6. The method of claim 1, wherein the estimating the pose of the indoor item flight handling robot based on a pre-constructed convolutional neural network comprises:

constructing a least square model by taking the square sum of the positions of any feature points between the first image frame and the second image frame as an objective function based on the convolutional neural network;

According to the least square model, estimating the pose of the indoor article flight transfer robot to obtain a rotation matrix and a translation matrix, wherein the translation matrix is used for describing translation transformation of the indoor article flight transfer robot in a three-dimensional space, and the rotation matrix is used for describing angle transformation of the indoor article flight transfer robot in the three-dimensional space;

obtaining a translation coefficient according to the translation matrix;

And predicting the actual pose of the indoor article flight transfer robot according to the translation coefficient, the rotation matrix and a preset pose sample set.

7. The method of claim 6, wherein the least squares model is represented by:

Wherein R is the rotation matrix, t is the translation matrix, R ₁ is the rotation matrix of the indoor article flight transfer robot before article placement/before article removal, t ₁ is the translation matrix of the indoor article flight transfer robot before article placement/before article removal, p is the initial pose generated based on the output channel of the convolutional neural network, and p is the multi-mode prior information.

8. The method of claim 1, wherein the method further comprises: and optimizing the actual pose to obtain the optimized pose.

9. A pose estimation device of an indoor article flight transfer robot, characterized by comprising: memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1-8 when executing the program.

10. A computer storage medium having stored thereon a program executable by a processor to implement the method of any of claims 1-8.