CN116793345A

CN116793345A - Posture estimation method and device of self-mobile equipment and readable storage medium

Info

Publication number: CN116793345A
Application number: CN202310497766.4A
Authority: CN
Inventors: 丘润; 贺颖; 于非
Original assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Current assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority date: 2023-05-05
Filing date: 2023-05-05
Publication date: 2023-09-22

Abstract

The application discloses a method and a device for estimating the posture of self-mobile equipment and a readable storage medium. The method comprises the following steps: acquiring a first depth image from the advancing direction of the mobile device; acquiring characteristic points of a first depth image; determining semantic information of a first depth image and a detection frame of a first dynamic target; for each feature point, determining the dynamic probability of the feature point according to the detection frame, the semantic information and the first depth information of the feature point of the first dynamic target, wherein the dynamic probability is used for describing the probability of the feature point of the dynamic target; removing target feature points, wherein the target feature points are feature points with dynamic probability larger than a preset probability value; and determining the gesture of the self-mobile device according to the rest characteristic points. The method and the device can accurately remove the characteristic points of the dynamic target, reduce the influence caused by the characteristic points of the dynamic target, accurately determine the gesture of the self-mobile device and improve the gesture estimation precision.

Description

Posture estimation method and device of self-mobile equipment and readable storage medium

Technical Field

The application belongs to the technical field of self-mobile equipment, and particularly relates to a method and a device for estimating the posture of the self-mobile equipment and a readable storage medium.

Background

Instant positioning and mapping (SLAM, simultaneousLocalization and mapping) is to create a map in a completely unknown environment under the condition that the position of a robot is uncertain, and simultaneously, to perform autonomous positioning and navigation by using the map. The self-mobile device realizes autonomous positioning through a gesture detection technology based on data acquired by the sensor.

However, the gesture detection technology assumes that the environment is static, and determines the gesture according to the static feature points. In a real scenario, moving objects (e.g., moving cars, people, etc.) are often present, which can bring erroneous data, thereby affecting the accuracy of the pose estimation from the mobile device.

Disclosure of Invention

The embodiment of the application provides a method and a device for estimating the posture of self-mobile equipment, the self-mobile equipment, a readable storage medium and a computer program product, which can solve the problem of low precision of posture estimation.

In a first aspect, an embodiment of the present application provides a method for estimating a pose of a self-mobile device, including:

acquiring a first depth image from the advancing direction of mobile equipment, wherein the first depth image is a current frame image;

acquiring characteristic points of the first depth image;

Determining semantic information of the first depth image and a detection frame of a first dynamic target;

for each feature point, determining the dynamic probability of the feature point according to a detection frame of the first dynamic target, the semantic information and the first depth information of the feature point, wherein the dynamic probability is used for describing the probability that the feature point is positioned on the dynamic target, the first depth information is used for describing the depth value of a pixel point of a second depth image, and the second depth image is the previous frame image;

removing target feature points, wherein the target feature points are feature points with the dynamic probability larger than a preset probability value;

and determining the gesture of the self-mobile device according to the rest characteristic points.

In one embodiment, determining a detection box of a first dynamic object in the first depth image comprises:

when the first depth image has a missed detection target, acquiring IMU information of the self-mobile device and a detection frame of a second dynamic target of the second depth image;

and determining the detection frame of the first dynamic target according to the IMU information and the detection frame of the second dynamic target.

In one embodiment, the determining the detection frame of the first dynamic target according to the IMU information and the detection frame of the second dynamic target includes:

Determining a transformation matrix between the first depth image and the second depth image according to the IMU information;

determining motion information of a corresponding dynamic target according to a detection frame of a third dynamic target and a detection frame of the second dynamic target, wherein the detection frame of the third dynamic target is obtained by carrying out target prediction on the first depth image;

and determining the detection frame of the first dynamic target according to the transformation matrix and the corresponding motion information based on the detection frame of the second dynamic target.

In one embodiment, determining semantic information for the first depth image comprises:

and determining semantic information of the first depth image according to second depth information corresponding to the detection frame of the first dynamic target, wherein the second depth information is used for describing the depth value of the pixel point of the first depth image.

In one embodiment, the determining, for each of the feature points, the dynamic probability of the feature point according to the detection frame of the first dynamic target, the semantic information, and the first depth information of the feature point includes:

for each feature point, determining an image area where the feature point is located according to the semantic information and a detection frame of the first dynamic target;

Determining a first dynamic probability of the feature points according to the image area where the feature points are located;

projecting the first depth information of the feature points to the first depth image to obtain third depth information of the feature points;

determining a second dynamic probability of the feature point according to the third depth information and the first depth information of the feature point;

and determining the current dynamic probability of the feature point according to the first dynamic probability, the second dynamic probability and the historical dynamic probability of the feature point in the second depth image.

In one embodiment, the determining, according to the semantic information and the detection frame of the first dynamic object, an image area where the feature point is located includes:

when the second depth information of the feature points is smaller than the semantic information, determining that the feature points are located in an image area covered by the first dynamic target, wherein the semantic information is used for describing the semantic mask value of the pixel points of the first depth image;

and when the second depth information of the feature points is in the depth range of the detection frame of the first dynamic target, determining that the feature points are positioned in the image area in the detection frame of the first dynamic target and are not positioned in the image area covered by the first dynamic target.

In one embodiment, the determining the second dynamic probability of the feature point according to the first depth information and the first depth information of the feature point includes:

determining difference information according to the third depth information and the first depth information of the feature points;

and determining the second dynamic probability of the characteristic point according to the difference information.

In a second aspect, an embodiment of the present application provides a posture estimation apparatus of a self-mobile device, including:

the mobile device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first depth image from the advancing direction of the mobile device, and the first depth image is a current frame image; acquiring characteristic points of the first depth image;

the determining module is used for determining semantic information of the first depth image and a detection frame of the first dynamic target; for each feature point, determining the dynamic probability of the feature point according to a detection frame of the first dynamic target, the semantic information and the first depth information of the feature point, wherein the dynamic probability is used for describing the probability that the feature point is positioned on the dynamic target, the first depth information is used for describing the depth value of a pixel point of a second depth image, and the second depth image is the previous frame image;

The removing module is used for removing target feature points, wherein the target feature points are feature points with the dynamic probability larger than a preset probability value;

and the gesture estimation module is used for determining the gesture of the self-mobile device according to the rest characteristic points.

In a third aspect, an embodiment of the present application provides a self-mobile device, including:

a vehicle body including a vehicle body and wheels; and

a control module for performing the method according to any of the first aspects above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as in any of the first aspects above.

In a fifth aspect, an embodiment of the application provides a computer program product for, when run on a self-mobile device, causing the self-mobile device to perform the method of any one of the first aspects above.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

according to the embodiment of the application, semantic information of the first depth image and a detection frame of the first dynamic target are determined; for each feature point, determining the dynamic probability of the feature point according to the detection frame, the semantic information and the first depth information of the feature point of the first dynamic target, wherein the dynamic probability is used for describing the probability of the feature point of the dynamic target; and removing the target feature points, wherein the target feature points are feature points with dynamic probability larger than a preset probability value, so that the feature points of the dynamic target can be accurately removed, the influence caused by the feature points of the dynamic target is reduced, the gesture of the self-moving equipment can be accurately determined, and the gesture estimation precision is improved.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a first method for estimating a pose of a self-mobile device according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a second flow chart of a method for estimating a pose of a self-mobile device according to an embodiment of the present application;

FIG. 3 is an exemplary diagram of a first dynamic target detection box provided by an embodiment of the present application;

FIG. 4 is a third flow chart of a method for estimating a pose of a self-mobile device according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a scenario of feature points provided by an embodiment of the present application;

fig. 6 is a schematic structural diagram of a gesture estimation apparatus of a self-mobile device according to an embodiment of the present application;

Fig. 7 is a schematic system architecture of a self-mobile device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Fig. 1 is a schematic flow chart of a first method for estimating a pose of a self-mobile device according to an embodiment of the present application. As shown in fig. 1, the method includes:

s11: a first depth image is acquired from a direction of travel of the mobile device.

Wherein the first depth image is a current frame image.

In one possible implementation, the first depth image is acquired by a visual sensor (e.g., an RGB sensor) mounted to the self-mobile device. Specifically, the first depth image is obtained by capturing an ambient image from the forward direction of the mobile device by the RGB sensor.

S12: feature points of the first depth image are acquired.

In the application, the feature points of the first depth image are acquired through an optical flow method. Specifically, based on the feature points of the second depth image, the positions of the feature points of the second depth image in the first depth image are determined according to the constant brightness, the time duration and the spatial consistency, so that the feature points of the first depth image are obtained.

S13: and determining semantic information of the first depth image and a detection frame of the first dynamic target.

In one possible implementation, the dynamic target of the first depth image may be detected by a target detection algorithm, to obtain a detection frame of the first dynamic target.

In one possible implementation, the first depth image is identified by a semantic segmentation network, and semantic information of the first depth image is obtained.

S14: and determining the dynamic probability of the feature points according to the detection frame of the first dynamic target, the semantic information and the first depth information of the feature points aiming at each feature point.

The dynamic probability is used for describing the probability of the feature point to be located on the dynamic target, the first depth information is used for describing the depth value of the pixel point of the second depth image, and the second depth image is the previous frame image. The depth value is the pixel coordinates of the pixel point.

In the application, based on semantic constraint and multi-view geometric constraint, the prior motion probability of each feature point is determined according to the detection frame of the first dynamic target, semantic information and first depth information of the feature point. And then obtaining the dynamic probability of each feature point based on the probability propagation principle.

Under multiple constraints, the dynamic probability of each feature point can be accurately determined, and a basis is provided for removing the feature point corresponding to the first dynamic target in the first depth image.

S15: and removing target feature points, wherein the target feature points are feature points with dynamic probability larger than a preset probability value.

In the application, the preset probability value may be determined in advance according to an actual application scenario. When the dynamic probability of the feature point is larger than a preset probability value, the feature point is the feature point on the first dynamic target and needs to be removed.

S16: and determining the gesture of the self-mobile device according to the rest characteristic points.

In the application, the remaining feature points do not include feature points on the first dynamic object. According to the residual characteristic points, the gesture of the self-mobile device can be accurately determined, and error data brought by a dynamic target to a static environment gesture detection technology are reduced.

In an actual scene, a geometric method is used for detecting a dynamic target, and a corresponding relation between two frames of images is obtained according to a epipolar geometric theory, so that whether the target is the dynamic target is judged according to depth change. When the number of dynamic targets in the image is large and the occupied area of the dynamic targets is large, the method can inaccurately determine the positions of the dynamic targets, so that the accuracy of the posture estimation of the self-mobile equipment is affected. According to the detection frame and semantic information of the first dynamic target, the first dynamic target is obtained, so that the accuracy of target detection is improved by utilizing the semantic information, the dynamic target is accurately obtained, and the accuracy of posture estimation of the self-mobile equipment is improved.

Fig. 2 is a schematic flow chart of a second method for estimating a pose of a self-mobile device according to an embodiment of the present application. As shown in fig. 2, step S13 includes:

s131: and when the first depth image has a missed detection target, acquiring IMU information of the mobile device and a detection frame of a second dynamic target of the second depth image.

In the application, when the first depth image has the missed detection target, all dynamic targets of the first depth image are obtained through IMU information. IMU information may be obtained through gyroscopes mounted to the self-mobile device. The IMU information includes the attitude angle or angular velocity or acceleration of the object.

Specifically, the detection frame of the first dynamic object of the first depth image may be determined based on the detection frame of the second dynamic object of the second depth image according to IMU information (Inertial Measurement Unit ). The detection frame of the first dynamic target of the first depth image can be further obtained by supplementing the missed detection target according to IMU information through the detection frame of the second dynamic target based on the second depth image.

In one possible implementation, it is known from the cross-correlation ratio whether the first depth image has a missed detection target. Wherein the second depth image has no missed detection target.

Specifically, a dynamic target of the first depth image is predicted based on a second dynamic target of the second depth image, and a prediction detection frame of the dynamic target of the first depth image is obtained. And performing target detection on the first depth image by using a target detection algorithm to obtain an algorithm detection frame of the dynamic target of the first depth image. And calculating the cross ratio between the prediction detection frame and the algorithm detection frame, and if the cross ratio is smaller than the cross ratio threshold, determining that the first depth image has a missed detection target.

S132: and determining a detection frame of the first dynamic target according to the IMU information and the detection frame of the second dynamic target.

In one possible implementation, the pose of the dynamic object may be extrapolated based on the dynamic object being in constant motion between two adjacent frames of images. According to IMU information and a detection frame of a second dynamic target, determining the detection frame of the first dynamic target comprises the following steps:

s21: and determining a transformation matrix between the first depth image and the second depth image according to the IMU information.

In application, IMU information is utilized to predict pose state, and a transformation matrix between a first depth image and a second depth image is determinedWhere k represents the sequence number of the first depth image and k-1 represents the sequence number of the second depth image.

S22: and determining the motion information of the corresponding dynamic target according to the detection frame of the third dynamic target and the detection frame of the second dynamic target.

The detection frame of the third dynamic target is obtained by carrying out target prediction on the first depth image. Specifically, the prediction detection frame of the dynamic target of the first depth image is the detection frame of the third dynamic target.

In the application, the motion information of the corresponding dynamic target is determined according to the pixel coordinates of the center of the detection frame of the third dynamic target and the pixel coordinates of the center of the detection frame of the second dynamic target.

The formula for calculating the motion information is:wherein (1)>Motion information for the ith dynamic object, < +.>Pixel coordinates of the center of the detection frame for the ith third dynamic object, +.>The pixel coordinates of the center of the detection frame for the ith second dynamic object.

S23: and determining the detection frame of the first dynamic target according to the transformation matrix and the corresponding motion information based on the detection frame of the second dynamic target.

In application, the formula of the detection frame for determining the first dynamic target is:

wherein (1)>Pixel coordinates of a first corner of the detection frame for the ith first dynamic object, +.>Pixel coordinates of the second corner of the detection frame for the ith first dynamic object,/- >Pixel coordinates of the first corner of the detection frame for the ith second dynamic object, +.>The pixel coordinates of the second corner of the detection frame for the ith second dynamic object. And then determining the pixel coordinates of the third corner and the fourth corner of the detection frame of the ith first dynamic object according to the pixel coordinates of the first corner and the pixel coordinates of the second corner of the detection frame of the ith first dynamic object.

Fig. 3 is an exemplary diagram of a detection box of a first dynamic target according to an embodiment of the present application. As shown in FIG. 3, detection of a first dynamic objectThe frame determination formula obtains the upper left corner A (x) of the detection frame of the first dynamic object ₁ ,y ₁ ) And lower right corner C (x ₂ ,y ₂ ) Then according to the upper left corner A (x ₁ ,y ₁ ) And lower right corner C (x ₂ ,y ₂ ) Determining the lower left corner D (x ₁ ,y ₂ ) And upper right corner B (x ₂ ,y ₁ )。

According to the embodiment, when the first depth image has the missed detection target, the IMU information of the mobile device and the detection frame of the second dynamic target of the second depth image are acquired, the second depth image is the previous frame image, the detection frame of the first dynamic target is determined according to the IMU information and the detection frame of the second dynamic target, and the detection frames of the missed detection targets of the first depth image are acquired, so that the detection frames of all the first dynamic targets of the first depth image are acquired, the problem of inaccurate detection caused by abnormal sensors or overlarge sensor detection data deviation is solved, and a basis is provided for reducing the probability of feature point error classification.

Fig. 4 is a third flowchart of a method for estimating a pose of a self-mobile device according to an embodiment of the present application. As shown in fig. 4, step S13 includes:

s31: and determining semantic information of the first depth image according to the second depth information corresponding to the detection frame of the first dynamic target.

The second depth information is used for describing the depth value of the pixel point of the first depth image. The depth value is the pixel coordinates of the pixel point.

In one possible implementation, the semantic information of the first depth image is determined according to the second depth information of the four corners of the detection frame of the first dynamic object and the second depth information of the center of the detection frame. Wherein the semantic information is used to describe semantic mask values for pixels of the first depth image.

By way of example, the formula for determining semantic information is:

wherein d _seg For semantic mask values，d _max ＝max(d _A +d _B +d _C +d _D )，d _max Maximum value of second depth information for four corners of the detection frame; d, d _cen Epsilon is a constant for detecting the second depth information of the center of the frame.

When the semantic segmentation network detects a dynamic target, more computing resources are needed, the situation that feature points are wrongly identified as the feature points on the dynamic target can occur, so that the number of key points of a tracking thread tends to be reduced, the network can only identify a target of a specific category, unknown dynamic targets can not be processed, and further processing of the key points and stability are affected. According to the embodiment of the application, the semantic information of the first depth image is determined according to the second depth information corresponding to the detection frame of the first dynamic target, so that the dynamic target can be accurately identified, the dynamic target is detected by replacing a semantic segmentation network, the computing resources are reduced, and the utilization rate of the computing resources is improved.

Correspondingly, step S14 includes:

s32: and determining an image area where the feature points are located according to the semantic information and the detection frame of the first dynamic target for each feature point.

In one possible implementation, step S32, based on semantic constraints, includes:

s321: and when the second depth information of the feature points is smaller than the semantic information, determining that the feature points are positioned in the image area covered by the first dynamic target.

Typically, the second depth information of the feature point is smaller than the semantic information, indicating that the feature point is in the semantic mask, the feature point being a feature point on a dynamic object.

Specifically, a difference value between the depth value of the feature point and the semantic mask value is calculated, and when the difference value is greater than zero, the feature point is determined to be located in an image area covered by the first dynamic target.

S322: and when the second depth information of the feature points is in the depth range of the detection frame of the first dynamic target, determining that the feature points are positioned in the image area in the detection frame of the first dynamic target and are not positioned in the image area covered by the first dynamic target.

In general, the second depth information of the feature point is within the depth range of the detection frame of the first dynamic object, which feature point is likely to be a feature point on the boundary of the dynamic object,

For example, the depth value (x, y) of the feature point is obtained from the upper left corner A (x ₁ ,y ₁ ) And lower right corner C (x ₂ ,y ₂ ). When determining x ₁ ≤x≤x ₂ ，y ₁ ≤y≤y ₂ That is, the second depth information of the feature points is within the depth range of the detection frame of the first dynamic object, and the image area where the feature points are located within the detection frame of the first dynamic object is determined and is not located in the image area covered by the first dynamic object.

Fig. 5 is a view illustrating a scenario of feature points according to an embodiment of the present application. As shown in fig. 5, the characteristic points of the second depth information smaller than the semantic information are characteristic points located in the human image area. The feature points of the second depth information within the depth range of the detection frame of the first dynamic object are the feature points which are located in the image area of the human detection frame and are not located in the image area of the human.

S33: and determining the first dynamic probability of the feature points according to the image area where the feature points are located.

Illustratively, the formula for determining the first dynamic probability is:

wherein P (S) is a first dynamic probability, which is a priori probability. The value of the first dynamic probability may be set according to the actual application scenario, and is not limited to 1,0.5,0.

S34: and projecting the first depth information of the feature points to the first depth image to obtain third depth information of the feature points.

In general, based on geometric constraints, the feature points on the static object have small changes in depth values on two adjacent frames of images, while the feature points on the dynamic object have large changes in depth values on two adjacent frames of images.

Specifically, the depth value of the feature point in the second depth image is projected onto the first depth image, and the depth value of the re-projection, namely third depth information, is obtained.

S35: and determining the second dynamic probability of the feature point according to the third depth information and the first depth information of the feature point.

In one possible implementation, step S35 includes:

s351: and determining difference information according to the third depth information and the first depth information of the feature points.

Specifically, the formula for determining the difference information is: _△ z= |z-z| wherein, _△ z is difference information, z is first depth information, and z' is third depth information.

S352: and determining the second dynamic probability of the feature points according to the difference information.

Illustratively, determining the second dynamic probability formula is:

wherein P (G) is a second dynamic probability, which is also a priori probability, t ₁ ≥t ₂ . Exemplary, t ₁ Set to 0.2, t ₂ Let 0.1, but not limited to this. The value of the second dynamic probability may be set according to the actual application scenario, and is not limited to 1,0.5,0.

S36: and determining the current dynamic probability of the feature point according to the first dynamic probability, the second dynamic probability and the historical dynamic probability of the feature point in the second depth image.

In one possible implementation, the corresponding dynamic probabilities are determined based on semantic constraints and geometric constraints, i.e. the third dynamic probability is determined from the first dynamic probability and the second dynamic probability. And then determining the current dynamic probability of the feature point according to the third dynamic probability and the historical dynamic probability by using the probability propagation model.

Determining a third dynamic probability formula as P (X) =wp (S) + (1-w) P (G); wherein P (X) is the third dynamic probability, and w is the weight，w＝N _seg /(N _seg +N _geo )，N _seg For the number of feature points with the first dynamic probability larger than a first preset threshold under semantic constraint, N _geo The first preset threshold and the second preset threshold may be the same or different for the number of feature points for which the second dynamic probability is greater than the second preset threshold under geometric constraints.

Determining the current dynamic probability formula as P (D _t )＝ηP(D _t-1 ) ++ (1-. Eta.) P (X), wherein P (D) _t ) For the current dynamic probability, P (D _t-1 ) For the historical dynamic probability, eta is a balance weight, and the value of eta is set according to the actual application scene.

According to the embodiment of the application, by aiming at each feature point, the image area where the feature point is located is determined according to semantic information and a detection frame of a first dynamic target, and the first dynamic probability of the feature point is determined according to the image area where the feature point is located; projecting the first depth information of the feature points to the first depth image to obtain third depth information of the feature points; determining a second dynamic probability of the feature point according to the third depth information and the first depth information of the feature point; and determining the current dynamic probability of the feature points according to the first dynamic probability, the second dynamic probability and the historical dynamic probability of the feature points in the second depth image, so as to accurately determine the dynamic probability of each feature point and accurately obtain the feature points of the dynamic target.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the methods described in the above embodiments, only those relevant to the embodiments of the present application are shown for convenience of explanation.

Fig. 6 is a schematic structural diagram of a posture estimating apparatus of a self-mobile device according to an embodiment of the present application.

As shown in fig. 6, the apparatus includes:

an acquiring module 10, configured to acquire a first depth image from a forward direction of the mobile device, where the first depth image is a current frame image; acquiring characteristic points of a first depth image;

a determining module 11, configured to determine semantic information of a first depth image and a detection frame of a first dynamic target; for each feature point, determining the dynamic probability of the feature point according to a detection frame of the first dynamic target, semantic information and first depth information of the feature point, wherein the dynamic probability is used for describing the probability of the feature point in the dynamic target, the first depth information is used for describing the depth value of a pixel point of a second depth image, and the second depth image is a previous frame image;

the removing module 12 is configured to remove a target feature point, where the target feature point is a feature point with a dynamic probability greater than a preset probability value;

the pose estimation module 13 is configured to determine a pose of the self-mobile device according to the remaining feature points.

In one embodiment, the determining module is specifically configured to obtain IMU information of the mobile device and a detection frame of a second dynamic target of a second depth image when the first depth image has a missed detection target, where the second depth image is a previous frame image; and determining a detection frame of the first dynamic target according to the IMU information and the detection frame of the second dynamic target.

In one embodiment, the determining module is specifically configured to determine a transformation matrix between the first depth image and the second depth image according to IMU information; determining motion information of a corresponding dynamic target according to a detection frame of the third dynamic target and a detection frame of the second dynamic target, wherein the detection frame of the third dynamic target is obtained by carrying out target prediction on the first depth image; and determining the detection frame of the first dynamic target according to the transformation matrix and the corresponding motion information based on the detection frame of the second dynamic target.

In one embodiment, the determining module is specifically configured to determine semantic information of the first depth image according to second depth information corresponding to the detection frame of the first dynamic target, where the second depth information is used to describe a depth value of a pixel point of the first depth image.

In one embodiment, the determining module is specifically configured to determine, for each feature point, an image area in which the feature point is located according to the semantic information and a detection frame of the first dynamic target; determining a first dynamic probability of the feature points according to the image area where the feature points are located; projecting the first depth information of the feature points to the first depth image to obtain third depth information of the feature points; determining a second dynamic probability of the feature point according to the third depth information and the first depth information of the feature point; and determining the current dynamic probability of the feature point according to the first dynamic probability, the second dynamic probability and the historical dynamic probability of the feature point in the second depth image.

In one embodiment, the determining module is specifically configured to determine that the feature point is located in an image area covered by the first dynamic target when the second depth information of the feature point is smaller than semantic information, where the semantic information is used to describe a semantic mask value of a pixel point of the first depth image; and when the second depth information of the feature points is in the depth range of the detection frame of the first dynamic target, determining that the feature points are positioned in the image area in the detection frame of the first dynamic target and are not positioned in the image area covered by the first dynamic target.

In one embodiment, the determining module is specifically configured to determine difference information according to the third depth information and the first depth information of the feature point; and determining the second dynamic probability of the feature points according to the difference information.

Fig. 7 is a schematic system structure of a self-mobile device according to an embodiment of the application. As shown in fig. 7, the self-mobile device 2 of this embodiment includes: at least one processor 20 (only one is shown in fig. 7), a memory 21 and a computer program 22 stored in the memory 21 and executable on the at least one processor 20, the processor 20 implementing the steps in any of the various method embodiments described above when executing the computer program 22.

The self-mobile device 2 may include, but is not limited to, a processor 20, a memory 21. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the self-mobile device 2 and is not limiting of the self-mobile device 2, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 20 may be a central processing unit (Central Processing Unit, CPU), and the processor 20 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 21 may in some embodiments be an internal storage unit of the self-mobile device 2, such as a hard disk or a memory of the self-mobile device 2. The memory 21 may in other embodiments also be an external storage device of the self-mobile device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the self-mobile device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the self-mobile device 2. The memory 21 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 21 may also be used for temporarily storing data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the application provides self-mobile equipment, which comprises: a vehicle body including a vehicle body and wheels; and a control module for performing the steps as in the various method embodiments.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the respective method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform the steps of the method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for estimating a pose of a self-mobile device, comprising:

acquiring characteristic points of the first depth image;

2. The method of claim 1, wherein determining a detection box of a first dynamic object in the first depth image comprises:

3. The method of claim 2, wherein the determining the detection box of the first dynamic object from the IMU information and the detection box of the second dynamic object comprises:

4. A method according to any of claims 1 to 3, wherein determining semantic information of the first depth image comprises:

5. The method of claim 4, wherein determining, for each of the feature points, a dynamic probability of the feature point based on the detection box of the first dynamic object, the semantic information, and the first depth information of the feature point, comprises:

6. The method of claim 5, wherein determining the image area in which the feature point is located according to the semantic information and the detection frame of the first dynamic object comprises:

when the second depth information of the feature points is smaller than the semantic information, determining that the feature points are located in an image area covered by the first dynamic target, wherein the semantic information is used for describing semantic mask values of pixel points of the first depth image;

7. The method of claim 5, wherein the determining the second dynamic probability of the feature point based on the third depth information and the first depth information of the feature point comprises:

8. A posture estimation apparatus of a self-mobile device, characterized by comprising:

9. A self-moving device, comprising:

a vehicle body including a vehicle body and wheels; and

a control module for performing the method of any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.