CN111427373A

CN111427373A - Pose determination method, device, medium and equipment

Info

Publication number: CN111427373A
Application number: CN202010213807.9A
Authority: CN
Inventors: 唐庆; 刘余钱; 陆潇
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2020-07-17
Anticipated expiration: 2040-03-24
Also published as: JP2022542082A; CN111427373B; WO2021190167A1

Abstract

The embodiment of the disclosure provides a pose determining method, a device, a medium and equipment. The embodiment of the disclosure comprehensively utilizes various types of semantic information, and improves the accuracy and robustness of pose determination.

Description

Pose determination method, device, medium and equipment

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a pose determination method, apparatus, medium, and device.

Background

Mobile device location technology is one of the key technologies for autonomous driving systems. The cost of the vision camera is low, and the large wide-angle and high-pixel camera also provides large-range and high-precision observation data, so that the mobile equipment positioning technology combining the vision camera and a high-precision map is more and more favored by the automatic driving industry. However, conventional positioning techniques have low positioning accuracy.

Disclosure of Invention

The present disclosure provides a pose determination method, apparatus, medium, and device.

According to a first aspect of embodiments of the present disclosure, there is provided a pose determination method, the method including: the method comprises the steps of obtaining a plurality of categories of first semantic objects in a first image of an area where the mobile equipment is located, and obtaining a second semantic object in a semantic map, wherein at least two categories of the plurality of categories of first semantic objects are located in different spatial dimensions; determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories; determining a pose of the movable device according to the matching semantic object.

In some embodiments, the determining a matching semantic object in the second semantic object that matches the first semantic object of each of the plurality of classes of first semantic objects comprises: generating a second image according to the pixel values of the first semantic objects of the plurality of categories; determining, from the second image, a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories.

In some embodiments, the obtaining a second semantic object in the semantic map includes: acquiring a pose estimation value of the movable equipment, and determining a target search range for searching the semantic map according to the acquired pose estimation value; and searching the second semantic target from the target searching range.

In some embodiments, the obtaining the pose estimate for the movable device comprises: acquiring a first pose of the movable equipment at a first moment; determining, from the first pose, a pose estimate for the movable device at a second time, the first time being prior to the second time.

In some embodiments, the pose estimates comprise estimates of position and estimates of orientation; the target search range is an area in the direction and within a preset distance range of the position.

In some embodiments, the determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of classes comprises: and determining a second semantic target with the shortest distance to the first semantic target of each category as a matching semantic target of the first semantic target of each category.

In some embodiments, the method further comprises: before determining a second semantic target with the shortest distance to the first semantic target of each category as a matching semantic target of the first semantic target of each category, acquiring a projection semantic target of the second semantic target in a second image; a position of the second semantic object in the semantic map corresponds to a position of the projected semantic object in the second image, the second image generated based on pixel values of the first semantic object of the plurality of classes; determining a first distance between the projected semantic object and a first semantic object of each category, where the first distance is a distance between the second semantic object and the first semantic object of each category.

In some embodiments, the determining the second semantic object with the shortest distance to the first semantic object of each category as the matching semantic object of the first semantic objects of each category includes: and determining a second semantic target which has the shortest distance and the same shape with the first semantic target of each category as a matching semantic target of the first semantic target of each category.

In some embodiments, the shape of the second semantic object is determined based on contour information of a semantic object taking a projection of the second semantic object in the second image.

In some embodiments, the method further comprises: determining a first position of each of at least one first target point of the first semantic target; the first target point is determined based on contour information of the first semantic object; determining a second position of each of at least one second target point of the projected semantic target; the second target point is determined based on contour information of the projected semantic target; and determining the distance between the projection semantic target and the first semantic target according to the first position of each first target point and the second position of each second target point.

In some embodiments, the determining the distance of the projected semantic object from the first semantic object according to the first position of each first object point and the second position of each second object point comprises: determining the distance between each first target point and each corresponding second target point according to the first position of each first target point and the second position of each second target point; and determining the average value of the distance between each first target point and each corresponding second target point as the distance between the first semantic target and the projection semantic target.

In some embodiments, where the outline of the first semantic target is a polygon, the first target point is a vertex of a bounding box of the first semantic target; and/or under the condition that the outline of the first semantic target is a long strip, the first target point is the vertex of the line segment corresponding to the first semantic target.

In some embodiments, the determining the pose of the movable device from the matching semantic objects comprises: establishing a pose constraint condition according to the matched semantic target; determining a pose of the movable device according to the pose constraint condition.

In some embodiments, the pose constraint condition is determined according to a pose change relationship of the movable device within a preset time period.

In some embodiments, the establishing a pose constraint according to the matching semantic object includes: determining a plane where the movable equipment is located and a normal vector of the plane according to the position of the matched semantic target; determining a pose distribution error of the movable device from at least one of the plane, the normal vector, and a distance between the matching semantic target and the first semantic target; and establishing the pose constraint condition according to the pose distribution error.

In some embodiments, the determining a pose distribution error of the movable device from at least one of the plane, the normal vector, and a distance between the matching semantic target and the first semantic target comprises: determining a height distribution error of a target point of the movable device according to the plane; determining a pitch angle distribution error and a roll angle distribution error of the movable equipment according to the normal vector; determining a reprojection distance error according to the distance between the matching semantic target and the first semantic target; establishing the pose constraint condition according to the pose distribution error comprises the following steps: establishing the pose constraint condition according to at least one of the height distribution error, the pitch angle distribution error, the roll angle distribution error and the reprojection distance error.

In some embodiments, the pose constraints are: the sum of the height distribution error, the pitch angle distribution error, the roll angle distribution error, and the reprojection distance error is minimized.

In some embodiments, the determining the pose of the movable device from the matching semantic objects comprises: determining a three-dimensional pose of the movable device according to the matching semantic target; the method further comprises the following steps: and controlling the running state of the movable equipment according to the three-dimensional pose.

According to the method and the device for determining the pose of the mobile equipment, the first semantic targets of multiple categories in the first image of the area where the mobile equipment is located are obtained, the matched semantic targets matched with the first semantic targets are determined from the second semantic targets in the semantic map, and then the pose of the mobile equipment is determined according to the matched semantic targets. The embodiment of the invention comprehensively utilizes the semantic information of multiple types, and compared with the situation that the pose cannot be determined according to the semantic objects of the single type because the number of the semantic objects of the single type is less in some regions, by adopting the technical scheme of the invention, the influence on the accuracy of the pose determination process due to the fact that the semantic objects of the single type in the multiple regions are closer is reduced, and the accuracy and the robustness of pose determination are improved.

According to a second aspect of the embodiments of the present disclosure, there is provided a pose determination method, the method including: acquiring a first semantic target in a first image of an area where the mobile equipment is located, and acquiring a second semantic target in a semantic map; determining a matching semantic object in the second semantic object that matches the first semantic object; and establishing a pose constraint condition according to the matched semantic object, and determining the three-dimensional pose of the movable equipment according to the pose constraint condition.

According to the method and the device, the first semantic target in the first image of the area where the movable equipment is located and the second semantic target in the semantic map are obtained, the matched semantic target matched with the first semantic target in the second semantic target is determined, the pose constraint condition is established according to the matched semantic target, the three-dimensional pose of the movable equipment is determined according to the pose constraint condition, and due to the fact that the constraint condition is introduced in the pose determination process, in the process of solving the pose, the solved pose is used as the pose of the movable equipment only under the condition that the solved pose meets the constraint condition, the three-dimensional pose of the movable equipment is determined through the constraint condition, and the pose determination accuracy is improved.

According to a third aspect of the embodiments of the present disclosure, there is provided a pose determination apparatus, the apparatus including: the mobile equipment comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of categories of first semantic targets in a first image of an area where the mobile equipment is located and acquiring a second semantic target in a semantic map, and the first semantic targets of at least two categories of the plurality of categories of first semantic targets are located in different spatial dimensions; a first determining module for determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories; a second determination module to determine a pose of the movable device according to the matching semantic object.

In some embodiments, the first determining module comprises: an image generation unit for generating a second image according to the pixel values of the first semantic object of the plurality of categories; a first determining unit configured to determine, from the second image, a matching semantic object that matches the first semantic object of each of the plurality of classes in the second semantic object.

In some embodiments, the first obtaining module comprises: the first acquisition unit is used for acquiring a pose estimation value of the movable equipment and determining a target search range for searching the semantic map according to the acquired pose estimation value; and the searching unit is used for searching the second semantic target from the target searching range.

In some embodiments, the first obtaining unit includes: the acquiring subunit is used for acquiring a first pose of the movable device at a first moment; a first determining subunit, configured to determine, according to the first pose, a pose estimate for the movable device at a second time, the first time being before the second time.

In some embodiments, the first determination module is to: and determining a second semantic target with the shortest distance to the first semantic target of each category as a matching semantic target of the first semantic target of each category.

In some embodiments, the apparatus further comprises: the second acquisition module is used for acquiring a projection semantic target of a second semantic target in a second image before determining the second semantic target with the shortest distance to the first semantic target of each category as a matching semantic target of the first semantic target of each category; a position of the second semantic object in the semantic map corresponds to a position of the projected semantic object in the second image, the second image generated based on pixel values of the first semantic object of the plurality of classes; a third determining module, configured to determine a first distance between the projected semantic object and the first semantic object of each category, where the first distance is a distance between the second semantic object and the first semantic object of each category.

In some embodiments, the first determination module is to: and determining a second semantic target which has the shortest distance and the same shape with the first semantic target of each category as a matching semantic target of the first semantic target of each category.

In some embodiments, the apparatus further comprises: a fourth determining module for determining a first position of each of at least one first target point of the first semantic target; the first target point is determined based on contour information of the first semantic object; a fifth determining module for determining a second position of each of at least one second target point of the projected semantic target; the second target point is determined based on contour information of the projected semantic target; and the sixth determining module is used for determining the distance between the projection semantic target and the first semantic target according to the first position of each first target point and the second position of each second target point.

In some embodiments, the sixth determining module comprises: a second determining unit, configured to determine, according to the first position of each first target point and the second position of each second target point, a distance between each first target point and each corresponding second target point; and the third determining unit is used for determining the average value of the distance between each first target point and each corresponding second target point as the distance between the first semantic target and the projection semantic target.

In some embodiments, where the outline of the first semantic target is a polygon, the first target point is a vertex of a bounding box of the first semantic target. In some embodiments, in a case that the outline of the first semantic target is a long strip, the first target point is a vertex of a line segment corresponding to the first semantic target.

In some embodiments, the second determining module comprises: the first establishing unit is used for establishing a pose constraint condition according to the matched semantic target; a fourth determination unit configured to determine the pose of the movable device according to the pose constraint condition.

In some embodiments, the first establishing unit comprises: the second determining subunit is used for determining a plane where the movable equipment is located and a normal vector of the plane according to the position of the matched semantic target; a third determining subunit to determine a pose distribution error of the movable device according to at least one of the plane, the normal vector, and a distance between the matching semantic target and the first semantic target; and the establishing subunit is used for establishing the pose constraint condition according to the pose distribution error.

In some embodiments, the third determining subunit is to: determining a height distribution error of a target point of the movable device according to the plane; determining a pitch angle distribution error and a roll angle distribution error of the movable equipment according to the normal vector; determining a reprojection distance error according to the distance between the matching semantic target and the first semantic target; the establishing subunit is configured to: establishing the pose constraint condition according to at least one of the height distribution error, the pitch angle distribution error, the roll angle distribution error and the reprojection distance error.

In some embodiments, the second determination module is to: determining a three-dimensional pose of the movable device according to the matching semantic target; the device further comprises: and the control module is used for controlling the running state of the movable equipment according to the three-dimensional pose.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a pose determination apparatus, the apparatus including: the third acquisition module is used for acquiring a first semantic target in a first image of the area where the mobile equipment is located and acquiring a second semantic target in a semantic map; a seventh determining module, configured to determine a matching semantic object that matches the first semantic object in the second semantic objects; and the eighth determining module is used for establishing a pose constraint condition according to the matched semantic object and determining the three-dimensional pose of the movable equipment according to the pose constraint condition.

In some embodiments, the eighth determining module comprises: a fifth determining unit, configured to determine, according to the position of the semantic object, a plane where the mobile device is located and a normal vector of the plane; a sixth determining unit, configured to determine a pose distribution error of the movable device according to at least one of the plane, the normal vector, and a distance between the matching semantic object and the first semantic object; and the second establishing unit is used for establishing the pose constraint condition according to the pose distribution error.

In some embodiments, the sixth determination unit includes: a fourth determining subunit configured to determine a height distribution error of a target point of the movable device from the plane; a fifth determining subunit, configured to determine a pitch angle distribution error and a roll angle distribution error of the mobile device according to the normal vector; a sixth determining subunit, configured to determine a reprojection distance error according to a distance between the matching semantic target and the first semantic target; the second establishing unit is used for: establishing the pose constraint condition according to at least one of the height distribution error, the pitch angle distribution error, the roll angle distribution error and the reprojection distance error.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method of any of the embodiments.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the embodiments when executing the program.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flow chart of one implementation of a pose determination method of some embodiments of the present disclosure.

FIG. 2 is a schematic diagram of semantic objects of some embodiments of the present disclosure.

Fig. 3(a) is a schematic diagram of a target search range of some embodiments of the present disclosure.

Fig. 3(b) is a schematic diagram of a target search range of some embodiments of the present disclosure.

Fig. 4(a) is a schematic diagram of a semantic matching process of some embodiments of the present disclosure.

FIG. 4(b) is a schematic diagram of a semantic matching process of further embodiments of the disclosure.

FIG. 5 is a schematic illustration of semantic matching results of some embodiments of the present disclosure.

Fig. 6 is a pose estimation effect schematic diagram of some embodiments of the present disclosure.

Fig. 7 is a schematic diagram of pose determination principles of some embodiments of the present disclosure.

Fig. 8 is a flowchart of another implementation of a pose determination method of some embodiments of the present disclosure.

Fig. 9 is a block diagram of one implementation of a pose determination apparatus of some embodiments of the present disclosure.

Fig. 10 is a block diagram of another implementation of the pose determination apparatus of some embodiments of the present disclosure.

Fig. 11 is a schematic structural diagram of a computer device according to some embodiments of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the technical solutions in the embodiments of the present disclosure better understood and make the above objects, features and advantages of the embodiments of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.

As shown in fig. 1, which is a flowchart of a pose determination method according to an embodiment of the present disclosure, in a possible implementation manner, the method includes:

Step S101: the method comprises the steps of obtaining a plurality of categories of first semantic objects in a first image of an area where the mobile equipment is located, and obtaining a second semantic object in a semantic map, wherein at least two categories of the plurality of categories of first semantic objects are located in different spatial dimensions;

Step S102: determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories;

Step S103: determining a pose of the movable device according to the matching semantic object.

In the above embodiments, the mobile device may be configured with a visual camera to capture a first image of the area, and may be configured with a positioning device to obtain coarse or accurate position information. The movable device may include, but is not limited to, a vehicle, a movable robot, etc., the vehicle may be any type of vehicle, such as a car, a bus, a truck, etc., and accordingly, the area where the movable device is located may be a road area, and the first image is a road image; the mobile robot may be any type of robot, such as an industrial robot, a sweeping robot, a toy robot, an educational robot, etc., and accordingly, the area where the mobile device is located may be a work area of the mobile robot, and the first image is a work area image. The mobile device and its area may also be other types of devices and areas, and the disclosure is not limited thereto. The following describes a scheme of an embodiment of the present disclosure, taking an area where the mobile device is located as a road area, and taking the first image as a road image as an example.

For step S101, a road image may be acquired in real time by a visual camera on the mobile device, and the road image may include a first semantic object of multiple categories near the current location of the mobile device, where the categories may be pre-divided according to actual needs, for example, according to the function of the first semantic object, the categories may include, but are not limited to, at least one of the following or a combination of more of the following: road surface indication line type, road surface traveling direction sign type, traffic signal light type, street lamp type, and the like.

Specifically, the road surface indication line category may include at least one of an indication line for dividing a lane, such as a solid road surface line, a broken road surface line, a double yellow line, and the like, and an indication line having a specific meaning, such as a stop line, a zebra crossing, and the like. The road travel direction marking category may include at least one of a turn marking, a guide line marking, a center circle marking, and the like. The traffic sign category may include at least one of a speed limit sign, a height limit sign, a no entry sign, a road indication sign, and the like. The traffic light category may include at least one of a traffic light, a flashing warning light, a lane light, and the like. The street lamp category can comprise various street lamps for realizing the road lighting function or having the functions of installing points and beautifying roads.

Furthermore, the categories may also be divided according to the position of the first semantic object, e.g. the first semantic object on the road surface is divided into one category and the first semantic object above the road is divided into another category. The categories may also be divided according to other dividing conditions, which are not limited by this disclosure.

After the road image is acquired, a plurality of classes of first semantic objects may be acquired from the road image, for example, a class of first semantic objects such as a road surface indicating line class, a class of first semantic objects such as traffic signals, and a class of first semantic objects of a road surface travel direction marking may be acquired. A schematic diagram of semantic objects according to an embodiment of the present disclosure is shown in fig. 2, where the diagram includes a first semantic object of a road surface travel direction sign category, such as a left turn direction sign, a straight direction sign, and a right turn direction sign; a first semantic object of a road surface indicator line category, such as a road solid line and a zebra crossing; and a first semantic object, such as a traffic light, that includes a category of traffic signals.

After acquiring a plurality of classes of first semantic objects in a road image of a movable device, generating a second image according to pixel values of the plurality of classes of first semantic objects; determining, from the second image, a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories. In the second image, the different pixel values represent different classes of first semantic objects, e.g. the first semantic object of the road surface indicator category has a pixel value of 1, the first semantic object of the traffic signal category has a pixel value of 2, etc. Since the road image may include more background information unrelated to the first semantic object, the efficiency of the determination can be improved by generating the second image and determining the matching semantic object of the first semantic object based on the second image.

The second image may include category information and location information for each first semantic object. The road image may be input to a pre-trained machine learning model, e.g., a deep learning network, to detect the category information and the location information of the first semantic object in the road image. The output of the machine learning model may also be an image (i.e., the second image). Further, the detection result of the machine learning model may also be parsed to obtain geometric information of each first semantic object, for example, shape information, bounding box information, and/or size information of the first semantic object. For a first semantic object with a polygonal shape, such as a zebra crossing, a traffic sign and the like, position information of a plurality of vertexes of a bounding box of the first semantic object can be acquired; for a first semantic object with a strip shape, such as a street lamp, a stop line, a lane line, and the like, straight-line segment sequence information corresponding to the first semantic object may be acquired, including position information of end points of a plurality of straight-line segments in the straight-line segment sequence. The second image may include a plurality of categories of first semantic objects, and the geometric information of each of the first semantic objects may be separately parsed.

In order to improve the accuracy of pose determination, as many first semantic targets of different classes in the road image as possible can be acquired. For example, if 3 kinds of first semantic objects are included in the road image, the 3 kinds of first semantic objects are acquired simultaneously. Wherein the first semantic objects of at least two of the plurality of classes are located in different spatial dimensions. For example, the obtained first semantic objects of the plurality of categories include a first semantic object of a ground-indicator category and a first semantic object of a traffic-light category, wherein the first semantic object of the ground-indicator category is a first semantic object on a ground plane and the first semantic object of the traffic-light category is a first semantic object perpendicular to the ground plane.

The semantic map is used for storing each second semantic object in an area including the area where the mobile device is located and position information corresponding to the second semantic object, and in practical applications, the semantic map may be a high-precision semantic map. In order to improve pose determination efficiency, semantic maps may be pre-stored. For example, where the start and end points of the removable device are known, a semantic map relating to the navigation path between the start and end points may be obtained and stored. Of course, the above method is only one possible implementation manner of the present disclosure, and in practical applications, a request may also be sent to the map server when the semantic map needs to be acquired, so as to acquire the semantic map returned by the map server.

When the second semantic target in the semantic map is obtained, all the second semantic targets in the semantic map can be obtained, and the second semantic target can also be searched in a target searching range in the semantic map. The target searching range is determined, and then the second semantic target is searched from the target searching range, so that the searching range can be effectively reduced, and the searching efficiency is improved. Optionally, a pose estimation value of the movable device may be obtained, and a target search range in the semantic map may be determined according to the pose estimation value of the movable device. Optionally, the mobile device may also be located based on a locating device on the mobile device, and then the target search range in the semantic map is determined according to the locating result.

In some embodiments, a first pose of the movable device at a first time may be acquired; determining, from the first pose, a pose estimate for the movable device at a second time, the first time being prior to the second time. The Positioning device may be a Global Positioning System (GPS), an Inertial Measurement Unit (IMU), or the like. By the method, the pose estimation value of the movable equipment can be determined quickly, and the pose estimation efficiency is improved.

The second time may be a time when the last frame of the first image is captured (which may be referred to as a current time), and the first time may be a time when at least one frame of the image before the last frame of the first image is captured, for example, a time when the first frame of the first image is captured after the pointing device on the movable apparatus is initialized (which may be referred to as an initial time). Accordingly, the first pose may be a pose at the time of capturing the previous frame image (referred to as a previous frame pose) or a pose of the movable device at the initial time (referred to as an initial pose). Further, when the previous frame of image of the road image is found, the previous frame of pose may be used as the first pose; in a case where the previous frame image of the road image is not found, the initial pose may be adopted as the first pose.

In determining the estimate of the pose of the movable device at the second time from the first pose, the estimate of the pose may be determined based on a motion model of the movable device. The motion model may be a constant velocity model, an acceleration model, or the like. Taking the motion model as a constant velocity model, the first time being an initial time, and the second time being a current time as an example, the travel distance within the time difference can be calculated according to the travel speed of the mobile device and the time difference according to the time difference between the initial time and the current time, and then the estimated value of the current pose of the mobile device can be calculated according to the initial pose of the mobile device and the travel distance. In the actual driving process, the movable equipment is often difficult to keep driving at a constant speed or at a constant acceleration, the driving process can be divided into a plurality of sections, and each section can adopt a motion model respectively.

The pose estimate may comprise an estimate of a location and accordingly the target search range may be an area within a preset distance range of the location in the semantic map, as shown by the circular area in fig. 3 (a). The preset distance may be determined according to the positioning accuracy of the positioning means on the movable device. The positioning accuracy of consumer GPS devices is typically around 10 meters, so the preset distance can be set to a value of around 10 meters. The pose estimate may also include both an estimate of the position and an estimate of the orientation, and accordingly the target finding range may be the area of the semantic map in the orientation and within a preset distance range of the position, as shown by the sector area in fig. 3 (b). And determining the target searching range according to the orientation, so that the searching area can be reduced, and the searching efficiency of the second semantic target is improved.

For step S102, a matching semantic object of the first semantic object of each category may be determined from the second semantic objects according to a distance between the second semantic object and the first semantic object of each category. For example, a second semantic object that is the same as the first semantic object of each category and has the shortest distance may be determined as the matching semantic object of the first semantic object of each category. In a theoretical case, the first semantic object and the location where it matches the semantic object are theoretically the same. However, due to the error of the pose estimation value, a certain position difference exists between the first semantic target and the matching semantic target. And the second semantic target with the shortest distance to the first semantic target is most probably the matching semantic target of the first semantic target, so that the matching semantic target of the first semantic target can be more accurately determined by the method.

Since the manner of determining the matching semantic object of the first semantic object of each category is the same, now take the first semantic object of one of the categories as an example, the determination process of the matching semantic object thereof is described. Taking the first way of determining the matching semantic object as an example, assume that the first semantic object is O ₁₁The second semantic object obtained from the semantic map comprises O ₂₁,O₂₂,…,O_2nAnd n is a positive integer, O can be calculated separately ₂₁,O₂₂,…,O_2nTo O ₁₁Are assumed to be d respectively ₁,d₂,…,d_nThen, obtain d₁,d₂,…,d_nMinimum of (d) is assumed _kwhere k ∈ {1,2, …, n }. then d will be _kThe corresponding second semantic object is determined to be O ₁₁Matching semantic objects.

Further, the matching semantic object of the first semantic object of each category can be determined from the second semantic objects according to the shape of the first semantic object of each category, the shape of the second semantic object and the distance between the second semantic object and the first semantic object of each category. For example, the second semantic object which is the shortest distance and has the same shape as the first semantic object of each category may be determined as the matching semantic object of the first semantic object of each category. By the method, the accuracy of determining the matched semantic targets can be improved, and particularly, the accuracy is higher under the condition that the number of the second semantic targets with the shortest distance to the first semantic target of each category is greater than 1.

For example, assuming that a triangular first semantic object exists at a certain pixel position in the first image, a circular speed limit sign and a triangular "pay attention to danger" sign exist at the corresponding position of the semantic map, and since the positions of the two signs are the same, it cannot be determined which of the two signs is the matching semantic object of the triangular first semantic object according to the distance. At this time, a triangle "pay attention to danger" sign can be determined as a matching semantic object of the first semantic object by detecting the shapes of the two signs.

The first semantic target and the second semantic target respectively correspond to an imaging plane and a three-dimensional physical space of the image acquisition device, so that when the distance between the second semantic target and the first semantic target of each category is determined, the second semantic target can be projected onto the same plane as the first semantic target to obtain a projected semantic target corresponding to the second semantic target, and the distance between the projected semantic target and the first semantic target is determined as the distance between the corresponding second semantic target and the first semantic target. Therefore, the distance between the second semantic object and the first semantic object can be calculated on the same plane, and the calculation complexity is reduced. Specifically, a projection semantic object of the second semantic object in a second image may be obtained; a position of the second semantic object in the semantic map corresponds to a position of the projected semantic object in the second image, the second image generated based on pixel values of the first semantic object of the plurality of classes; determining a first distance between the projected semantic object and a first semantic object of each category, where the first distance is a distance between the second semantic object and the first semantic object of each category.

In particular, a first position of each of at least one first target point of the first semantic target may be determined; the first target point is determined based on contour information of the first semantic object; determining a second position of each of at least one second target point of the projected semantic target; the second target point is determined based on contour information of the projected semantic target; and determining the distance between the projection semantic target and the first semantic target according to the first position of each first target point and the second position of each second target point. By the method, the distance between the semantic objects can be calculated according to the outline of the semantic objects, and the distance calculation accuracy is improved.

For example, the distance between each first target point and each corresponding second target point may be determined according to the first position of each first target point and the second position of each second target point; and determining the average value of the distance between each first target point and each corresponding second target point as the distance between the first semantic target and the projection semantic target. The distance may be an euclidean distance, a chebyshev distance, a mahalanobis distance, a lank distance, or the like, which is not limited by the present disclosure.

The contour information of a semantic object can be used to characterize the shape of the semantic object. Optionally, in a case that the outline of the first semantic target is a polygon, the first target point is a vertex of a bounding box of the first semantic target. Optionally, when the outline of the first semantic target is a long strip, the first target point is a vertex of a line segment corresponding to the first semantic target. By the method, the matched semantic target can be determined based on points (the top points of the bounding boxes of the semantic target) or lines (the straight line segments corresponding to the semantic target) according to the outlines of different semantic targets, and the accuracy of determining the semantic target is improved.

In practical application, a road image or a second image corresponding to the road image may be input into a deep learning network, a first output image of the deep learning network is obtained, an outline of a graph (i.e., a first semantic object) composed of pixels with the same pixel value is obtained from the first output image, and then the outline is smoothed, so as to obtain a bounding box or a line segment corresponding to the first semantic object. Similarly, the semantic map may be input into a deep learning network, a second output image of the deep learning network may be obtained, an outline of a graph (i.e., a second semantic object) formed by pixels with the same pixel value may be obtained from the second output image, and then the outline may be smoothed, so as to obtain a bounding box or a line segment corresponding to the second semantic object.

As shown in fig. 4(a), assuming that the bounding box of the first semantic object is a quadrilateral, the coordinates of four vertices a1, B1, C1 and D1 of the quadrilateral can be obtained, and the coordinates of four corresponding vertices a2, B2, C2 and D2 of the bounding box of the projection semantic object are obtained, then the distances between the corresponding vertices, i.e., the distances between a1 and a2, B1 and B2, C1 and C2, D1 and D2 are calculated according to the coordinates, assuming D1, D2, D3 and D4 respectively, and then the distances between the first semantic object and the projection semantic object are obtained by averaging D1, D2, D3 and D4.

As shown in fig. 4(B), assuming that two end points of the straight line segment corresponding to the first semantic object are a1 and B1, respectively, coordinates of a1 and B1 may be obtained, coordinates of end points a2 and B2 of the straight line segment corresponding to the projected semantic object may be obtained, then distances between the corresponding end points a1 and a2, and between B1 and B2 are calculated according to the coordinates, which are d1 and d2, respectively, and then, the distances between the first semantic object and the projected semantic object are obtained by averaging d1 and d 2. In practical application, the number of the straight-line segments forming the first semantic object may be multiple, that is, the first semantic object is formed by a straight-line segment sequence including a plurality of straight-line segments, the average distance between each straight-line segment forming the first semantic object and an end point of a corresponding straight-line segment forming the projection semantic object is determined, and the average value of the average distances corresponding to the straight-line segments is determined as the distance between the first semantic object and the projection semantic object.

It should be noted that, in the case that the first semantic object and the projected semantic object both include a polygonal first portion and an elongated second portion (for example, a traffic light includes a light body and a light pole), the distance between the first portion of the projected semantic object and the first portion of the first semantic object and the distance between the second portion of the projected semantic object and the second portion of the first semantic object may be determined respectively, and then the distance between the projected semantic object and the first semantic object may be determined according to the distance between the first portion and the distance between the second portion. The distance between each part is calculated according to the previous embodiment, and is not described herein again.

The result of the semantic matching is shown in FIG. 5, O ₁₁And O ₁₂Being two first semantic objects, O ₂₁To O ₂₅Projection semantic objects corresponding to the second semantic object, and O can be converted according to the shapes of the first semantic object and the second semantic object and the distance between the first semantic object and the second semantic object ₂₂Is determined as O ₁₁Matching semantic object of, will O ₂₃Is determined as O ₁₂Matching semantic objects.

With respect to step S103, in some embodiments, pose constraints may be established according to the matching semantic objects; determining a pose of the movable device according to the pose constraint condition. The pose constraint condition may be determined according to a pose change relationship of the movable device within a preset time period. Because the constraint condition is introduced in the pose determining process, when the pose is solved, the pose change condition of the movable equipment can be constrained according to the pose change relation of the movable equipment in a preset time period, and only under the condition that the solved pose meets the constraint condition, the solved pose is taken as the pose of the movable equipment, so that the pose determining accuracy is improved.

Specifically, the traveling process of the movable platform can be divided into a plurality of small stages, and the movable device can be assumed to travel on a plane in each stage, so that on one hand, the computational complexity can be reduced, and only components in three degrees of freedom (a component in the moving direction of the movable device, a component in a direction perpendicular to the moving direction of the movable device on a horizontal plane, and the yaw angle of the movable device) are changed when the pose of the movable device is calculated, and on the other hand, the computational accuracy can be improved by taking the component as a constraint. Determining a plane of the road surface where the movable equipment is located and a normal vector of the plane according to the position of the matched semantic target; determining a pose distribution error of the movable device from at least one of the plane, the normal vector, and a distance between the matching semantic target and the first semantic target; and establishing the pose constraint condition according to the pose distribution error.

For example, a height distribution error of a target point of the movable device may be determined from the plane. Also for example, a pitch angle distribution error and a roll angle distribution error of the movable device may be determined from the normal vector. As another example, a reprojection distance error may be determined based on a distance between the matching semantic object and the first semantic object. Therefore, the pose constraint condition may be established in accordance with at least one of the height distribution error, the pitch angle distribution error, the roll angle distribution error, and the reprojection distance error.

Wherein, in the case that there is a first semantic object with a polygonal shape, the reprojection distance error includes a reprojection distance error of a point, that is, a sum of distances between a plurality of vertices of the bounding box of the first semantic object and a plurality of corresponding vertices of the bounding box of the projected semantic object. In the embodiment shown in fig. 4(a), this is the sum of d1, d2, d3 and d 4. In the case of a first semantic object shaped as a bar, the reprojection distance error includes a reprojection distance error of a line, that is, a sum of distances of respective end points of straight line segments constituting the first semantic object and respective corresponding end points of straight line segments constituting the projection. In the embodiment shown in fig. 4(b), it is the sum of d1 and d 2. In the case where the first semantic object includes both a portion shaped as a polygon and a portion shaped as a bar, the reprojection distance error is the sum of a reprojection distance error of a point and a reprojection distance error of a line.

In other embodiments, a mapping table may also be stored in advance, where the mapping table is used to record each position in the area where the mobile device is located, a plane (e.g., a road surface) corresponding to the position, and a normal vector of the plane. Then, the plane and normal vector where the movable device is located can be searched in the mapping table according to the estimated value of the position of the movable device. Since the height, roll angle, and pitch angle of each point on the movable device (e.g., a center point of the movable device) are very small in variation when the movable device moves on a plane, the height distribution error, pitch angle distribution error, and roll angle distribution error can be determined using the plane as a priori information. Meanwhile, when the first semantic object and the second semantic object are exactly matched, the reprojection distance error between the two semantic objects should be as small as possible. Accordingly, a distance constraint may be established that minimizes the sum of the height distribution error, the pitch angle distribution error, the roll angle distribution error, and the reprojection distance error. The constraints of an embodiment of the present disclosure are as follows:

d^kFor the optimized target residual term at the current time k, D _lAnd D _prespectively representing errors of the straight line segments and the vertex of the bounding box, Q and P are respectively a set of the straight line segments and the vertex of the bounding box which are matched currently, and pi is three-dimensional information (such as the straight line segments L _iAnd bounding box vertices P_j) Pose T through current time k ^kprojection function projected onto road image, and L _iThe matched image line characteristic is l _iAnd P is _jThe matching image point feature is p _jWhen the prior information plane of the road where the mobile equipment is located is known, and the height H of a camera shooting the road image on the mobile equipment from the road surface is known, D _planeThe difference between the height z vertical to the road surface and the height H of the camera from the road surface in the pose under plane constraint is shown,

Representing the roll angle of the position and posture under plane constraint when the axis of the yaw angle is perpendicular to the road surface

the pitch angle theta and the distance between the plane corresponding rotation components, alpha, beta and gamma are coefficients of each item respectively.

the pose estimation effect is shown in FIG. 6, FIG. 6 is a right rear view of the vehicle during traveling, and the front of the vehicle is respectively a dotted line, a road edge solid line, a road surface turning sign, a zebra crossing, a traffic light and a large road sign board above the road from bottom to top.

Fig. 7 is a schematic diagram of a pose determination principle according to an embodiment of the present disclosure, and as shown in the drawing, on one hand, a road image may be detected to obtain a plurality of categories of first semantic objects, and then position information of vertices of bounding boxes of the first semantic objects and position information of end points of straight line segments may be obtained. On the other hand, a local semantic map within a preset range can be inquired, the position information of the top point of the bounding box of each second semantic object and the position information of the end point of the straight line segment in the local semantic map are obtained, the plane where the movable equipment is located is obtained, and constraint conditions are constructed according to the point, line and plane information. And finally, carrying out pose estimation on the movable equipment according to the constraint conditions.

In some embodiments, the determined attitude of the movable device is a three-dimensional attitude, that is, includes coordinates in a traveling direction of the movable device, a direction perpendicular to the traveling direction of the movable device on a road surface, a height direction of the movable device, and a pitch angle, a yaw angle, and a roll angle of the movable device.

The embodiment of the disclosure realizes the solution of the three-dimensional orientation in the pose estimation process of the movable platform by using various semantics and solution constraint conditions; the semantic targets of multiple categories, the prefabricated semantic map and the visual camera are comprehensively utilized, and a stable and continuous pose estimation result can be obtained even in an area with rare single semantic meaning; meanwhile, pose solution is carried out based on point, line and surface constraint conditions, and the pose estimation accuracy is improved.

After determining the pose of the movable device from the matching semantic objects, the travel state of the movable device may be controlled from the three-dimensional pose. The driving state includes a speed, an acceleration, an angle (including at least any one of a pitch angle, a yaw angle, and a roll angle) of the movable device, and the like. The embodiment of the disclosure can accurately determine the current pose of the movable equipment in the fields of automatic driving and the like. In an intelligent driving System such as ADAS, the pose estimation is carried out on the current movable equipment through the embodiment of the invention, the accuracy of the pose estimation of the current movable equipment is improved, and the System such as the ADAS (advanced driving assistance System) is further assisted to carry out more accurate auxiliary driving (such as emergency danger avoidance, automatic parking and the like). By acquiring the three-dimensional pose, the accuracy of controlling the running state of the movable equipment can be improved.

As shown in fig. 8, this is another possible implementation manner of the pose determination method according to the embodiment of the present disclosure, and in this implementation manner, the method includes:

Step S801: acquiring a first semantic target in a first image of an area where the mobile equipment is located, and acquiring a second semantic target in a semantic map;

Step S802: determining a matching semantic object in the second semantic object that matches the first semantic object;

Step S803: and establishing a pose constraint condition according to the matched semantic object, and determining the three-dimensional pose of the movable equipment according to the pose constraint condition.

It should be noted that, with respect to the specific determination manner and definition of the first image, the first semantic object, the semantic map, the second semantic object, the matching semantic object, the pose constraint condition, and the like, reference may be made to the foregoing contents, which are not described herein again.

In some embodiments, the first semantic object may include a plurality of categories of first semantic objects.

According to the method and the device, the first semantic target in the first image of the area where the movable equipment is located and the second semantic target in the semantic map are obtained, the matched semantic target matched with the first semantic target in the second semantic target is determined, the pose constraint condition is established according to the matched semantic target, the three-dimensional pose of the movable equipment is determined according to the pose constraint condition, the three-dimensional pose of the movable equipment is determined according to the constraint condition, and the pose determination accuracy is improved.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

As shown in fig. 9, an embodiment of the present disclosure also provides a pose estimation apparatus, including:

A first obtaining module 901, configured to obtain multiple categories of first semantic objects in a first image of an area where a mobile device is located, and obtain a second semantic object in a semantic map, where at least two categories of the multiple categories of first semantic objects are located in different spatial dimensions;

A first determining module 902 for determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories;

A second determining module 903, configured to determine a pose of the mobile device according to the matching semantic object.

As shown in fig. 10, an embodiment of the present disclosure also provides a pose determination apparatus, including:

A third obtaining module 1001, configured to obtain a first semantic target in a first image of an area where the mobile device is located, and obtain a second semantic target in a semantic map;

A seventh determining module 1002, configured to determine a matching semantic object matching the first semantic object in the second semantic objects;

An eighth determining module 1003, configured to establish a pose constraint condition according to the matching semantic object, and determine a three-dimensional pose of the mobile device according to the pose constraint condition.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

Accordingly, embodiments of the present disclosure also provide a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method according to any of the embodiments when executing the program.

The embodiments of the apparatus of the present specification can be applied to a computer device, such as a server or a terminal device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. In terms of hardware, as shown in fig. 11, the hardware structure of the computer device in which the apparatus of this specification is located is shown in fig. 11, except for the processor 1101, the memory 1102, the network interface 1103 and the nonvolatile memory 1104 shown in fig. 11, a server or an electronic device in which the apparatus is located in the embodiment may also include other hardware according to the actual function of the computer device, which is not described again.

Accordingly, the embodiments of the present disclosure also provide a computer storage medium on which a computer program is stored, which when executed by a processor implements the method according to any of the embodiments.

The present disclosure may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable commands, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

Claims

1. A pose determination method, characterized in that the method comprises:

The method comprises the steps of obtaining a plurality of categories of first semantic objects in a first image of an area where the mobile equipment is located, and obtaining a second semantic object in a semantic map, wherein at least two categories of the plurality of categories of first semantic objects are located in different spatial dimensions;

Determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories;

Determining a pose of the movable device according to the matching semantic object.

2. The method of claim 1, wherein determining a matching semantic object in the second semantic object that matches the first semantic object of each of the first semantic objects of the plurality of classes comprises:

Generating a second image according to the pixel values of the first semantic objects of the plurality of categories;

Determining, from the second image, a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories.

3. The method according to claim 1 or 2, wherein the obtaining of the second semantic object in the semantic map comprises:

Acquiring a pose estimation value of the movable equipment, and determining a target search range for searching the semantic map according to the acquired pose estimation value;

And searching the second semantic target from the target searching range.

4. The method of claim 3, wherein the obtaining pose estimates for the mobile device comprises:

Acquiring a first pose of the movable equipment at a first moment;

Determining, from the first pose, a pose estimate for the movable device at a second time, the first time being prior to the second time.

5. The method of claim 4, wherein the pose estimates comprise estimates of position and estimates of orientation;

The target search range is an area in the direction and within a preset distance range of the position.

6. The method of any of claims 1 to 5, wherein the determining a matching semantic object in the second semantic object that matches the first semantic object in each of the plurality of classes comprises:

And determining a second semantic target with the shortest distance to the first semantic target of each category as a matching semantic target of the first semantic target of each category.

7. The method of claim 6, further comprising:

Before determining a second semantic target with the shortest distance to the first semantic target of each category as a matching semantic target of the first semantic target of each category, acquiring a projection semantic target of the second semantic target in a second image; a position of the second semantic object in the semantic map corresponds to a position of the projected semantic object in the second image, the second image generated based on pixel values of the first semantic object of the plurality of classes;

Determining a first distance between the projected semantic object and a first semantic object of each category, where the first distance is a distance between the second semantic object and the first semantic object of each category.

8. The method according to claim 6 or 7, wherein the determining the second semantic object with the shortest distance to the first semantic object of each category as the matching semantic object of the first semantic object of each category comprises:

And determining a second semantic target which has the shortest distance and the same shape with the first semantic target of each category as a matching semantic target of the first semantic target of each category.

9. The method according to claim 7 or 8, characterized in that the method further comprises:

Determining a first position of each of at least one first target point of the first semantic target; the first target point is determined based on contour information of the first semantic object;

Determining a second position of each of at least one second target point of the projected semantic target; the second target point is determined based on contour information of the projected semantic target;

And determining the distance between the projection semantic target and the first semantic target according to the first position of each first target point and the second position of each second target point.

10. The method of claim 9, wherein determining the distance between the projected semantic object and the first semantic object according to the first position of each first object point and the second position of each second object point comprises:

Determining the distance between each first target point and each corresponding second target point according to the first position of each first target point and the second position of each second target point;

And determining the average value of the distance between each first target point and each corresponding second target point as the distance between the first semantic target and the projection semantic target.

11. The method according to claim 9 or 10, wherein in case the outline of the first semantic object is a polygon, the first target point is a vertex of a bounding box of the first semantic object; and/or

And under the condition that the outline of the first semantic target is a long strip, the first target point is the vertex of the line segment corresponding to the first semantic target.

12. The method of any of claims 1-11, wherein determining the pose of the movable device from the matching semantic objects comprises:

Establishing a pose constraint condition according to the matched semantic target;

Determining a pose of the movable device according to the pose constraint condition.

13. The method according to claim 12, wherein the pose constraint condition is determined according to a pose change relationship of the movable device within a preset time period.

14. The method according to claim 12 or 13, wherein the establishing pose constraints according to the matching semantic objects comprises:

Determining a plane where the movable equipment is located and a normal vector of the plane according to the position of the matched semantic target;

Determining a pose distribution error of the movable device from at least one of the plane, the normal vector, and a distance between the matching semantic target and the first semantic target;

And establishing the pose constraint condition according to the pose distribution error.

15. The method of claim 14, wherein determining the pose distribution error of the movable device from at least one of the plane, the normal vector, and a distance between the matching semantic object and the first semantic object comprises:

Determining a height distribution error of a target point of the movable device according to the plane;

Determining a pitch angle distribution error and a roll angle distribution error of the movable equipment according to the normal vector;

Determining a reprojection distance error according to the distance between the matching semantic target and the first semantic target;

Establishing the pose constraint condition according to the pose distribution error comprises the following steps:

Establishing the pose constraint condition according to at least one of the height distribution error, the pitch angle distribution error, the roll angle distribution error and the reprojection distance error.

16. The method according to claim 15, characterized in that the pose constraints are: the sum of the height distribution error, the pitch angle distribution error, the roll angle distribution error, and the reprojection distance error is minimized.

17. The method of any of claims 1 to 16, wherein the determining the pose of the movable device from the matching semantic objects comprises:

Determining a three-dimensional pose of the movable device according to the matching semantic target;

The method further comprises the following steps:

And controlling the running state of the movable equipment according to the three-dimensional pose.

18. A pose determination apparatus, characterized by comprising:

The mobile equipment comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of categories of first semantic targets in a first image of an area where the mobile equipment is located and acquiring a second semantic target in a semantic map, and the first semantic targets of at least two categories of the plurality of categories of first semantic targets are located in different spatial dimensions;

A first determining module for determining a matching semantic object of the second semantic objects that matches the first semantic object of each of the plurality of categories;

A second determination module to determine a pose of the movable device according to the matching semantic object.

19. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 13.

20. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 13 when executing the program.