WO2024036984A1 - 目标定位方法及相关系统、存储介质 - Google Patents

目标定位方法及相关系统、存储介质 Download PDF

Info

Publication number
WO2024036984A1
WO2024036984A1 PCT/CN2023/086234 CN2023086234W WO2024036984A1 WO 2024036984 A1 WO2024036984 A1 WO 2024036984A1 CN 2023086234 W CN2023086234 W CN 2023086234W WO 2024036984 A1 WO2024036984 A1 WO 2024036984A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
marker
target
global
positioning
Prior art date
Application number
PCT/CN2023/086234
Other languages
English (en)
French (fr)
Inventor
龙云飞
彭成涛
朱森华
涂丹丹
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2024036984A1 publication Critical patent/WO2024036984A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present application relates to the field of positioning technology, and in particular to a target positioning method and related systems and storage media.
  • Various sensors such as lidar, vision camera, inertial measurement unit IMU, wheel odometer, radar, and global positioning system (Global Positioning System, GPS) have their own advantages and disadvantages, and also have various permutations and combinations.
  • the common Lidar+IMU fusion algorithm is difficult to solve the problem of loop detection and relocation; the common Camera+IMU fusion algorithm is difficult to solve the accurate depth estimation problem; the common Lidar+Camera+IMU algorithm does not achieve true multi-sensor tightness. Coupling; in order to cope with rain, snow and fog weather, the cost of adding Radar sensors to conventional sensor combinations such as Lidar+Camera+IMU is high.
  • Certain scenarios may cause certain sensors to fail (for example, there is strong electromagnetic interference in the substation scene, which makes the positioning error of GPS or real-time kinematic carrier phase difference technology (Real-time kinematic, RTK) extremely large; for example, bumpy roads may cause wheeled
  • RTK real-time kinematic carrier phase difference technology
  • the slippage and wear of the odometer will lead to a larger cumulative error in the robot's long-term inspection; another example is the low echo reflection in an open scene, which may prevent the lidar from detecting effective feature points; another example is rain, fog, snow and other scenes , various sensors such as Lidar and Camera will experience serious performance degradation, causing the robot to be unable to recognize its own accurate pose. Therefore, how to develop a high-precision, high-robustness, and cost-effective multi-sensor fusion positioning algorithm is a challenge faced by the entire industry. huge challenge.
  • This application discloses a target positioning method, related systems, and storage media, which can achieve high-precision, high-robustness, and cost-effective positioning of targets.
  • embodiments of the present application provide a target positioning method, including:
  • the result is obtained based on the global pose of the first marker and the relative pose between the first marker and the target.
  • the global pose of the target is obtained based on the global pose of the first marker and the relative pose between the first marker and the target.
  • vehicles, robots, servers, etc. first perform a rough positioning of the target (such as an unmanned vehicle, a robot, etc.), and then obtain the global position of the first marker based on the image of the first marker and the rough pose of the target. pose, and obtain the relative pose between the first marker and the target based on the pose of the target in the camera coordinate system and the pose of the first marker in the camera coordinate system, and then based on the global position of the first marker
  • the global pose of the target is obtained by the relative pose between the first landmark and the target.
  • the target is first roughly positioned, and then the target is precisely positioned based on the image of the first marker and the rough pose. This can help to obtain an ultra-high-precision global pose estimate of the target.
  • the above-mentioned rough positioning is based on lidar, and fine positioning is based on visual cameras.
  • This solution first performs coarse positioning based on lidar, and then performs fine positioning based on visual cameras.
  • the accuracy of lidar positioning is 5-10cm, and the visual fine positioning can achieve a positioning accuracy of about 1-2cm.
  • the combination of lidar coarse positioning and visual fine positioning can meet customer needs and achieve high-precision, high-robustness, and cost-effective positioning of targets.
  • obtaining the global pose of the first marker based on the coarse pose of the target and the image of the first marker includes:
  • a semantic positioning local map is obtained according to the local map of the target location and the semantic positioning global map.
  • the semantic positioning global map includes the global poses of M landmarks.
  • the semantic positioning local map includes the positions of N landmarks.
  • Global pose, the N markers are markers among the M markers, and M and N are both positive integers;
  • the global pose of the first landmark is obtained from the semantic localization local map according to the image of the first landmark.
  • lidar By first using lidar for rough positioning, the minimum operation requirement for visual fine positioning that the marker must be greater than 1/10 in the screen can be met; otherwise, if only the visual fine positioning module is used, the minimum startup requirements for visual fine positioning cannot be met.
  • This solution first performs coarse positioning based on lidar, and then performs fine positioning based on visual cameras.
  • the accuracy of lidar positioning is 5-10cm, which is difficult to meet the customer's high-precision positioning requirements of 1-2cm, while visual fine positioning can Achieve positioning accuracy of about 1-2cm.
  • the combination of lidar coarse positioning and visual fine positioning can meet customer needs and achieve high-precision, high-robustness, and cost-effective positioning of targets.
  • the global pose of the landmark is found through semantic positioning of the local map. Since there may be many landmarks, searching by further reducing the scope makes the global pose estimation of the marker more accurate and improves the efficiency of target positioning.
  • the method further includes:
  • the textured three-dimensional models of the M landmarks are registered to the global point cloud map to obtain the semantic positioning global map.
  • the semantic positioning global map can be established in offline mode, and then the online mode is used to locate the target.
  • the separation of offline modules and online modules can greatly reduce the computing power consumption of online modules, greatly reducing the hardware costs of vehicles or robots and greatly improving battery life.
  • obtaining the pose of the first marker in the camera coordinate system based on the image of the first marker includes:
  • the image of the first marker is input into the preset model for processing to obtain the pose of the first marker in the camera coordinate system, wherein the training data of the preset model is obtained by converting the initial training image
  • the background in the data is subjected to one or more processes including replacement, Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction, and/or Gaussian blur is performed on the markers in the initial training image data.
  • the relative pose between the obtained marker and the target is more accurate and more robust.
  • a target positioning device including:
  • a rough positioning module is used to rough position the target in the preset area and obtain the rough pose of the target
  • the first processing module is used to obtain the image of the first marker in the preset area, and obtain the global position of the first marker according to the coarse pose of the target and the image of the first marker. posture;
  • a second processing module configured to obtain the pose of the first marker in the camera coordinate system based on the image of the first marker
  • the third processing module is used to obtain the pose of the target in the camera coordinate system, and calculate the pose of the target in the camera coordinate system according to the pose of the first marker in the camera coordinate system.
  • the relative pose between the first marker and the target is obtained by obtaining the pose below;
  • a positioning module configured to obtain the global pose of the target based on the global pose of the first marker and the relative pose between the first marker and the target.
  • vehicles, robots, servers, etc. first perform a rough positioning of the target (such as an unmanned vehicle, a robot, etc.), and then obtain the global position of the first marker based on the image of the first marker and the rough pose of the target. pose, and obtain the relative pose between the first marker and the target based on the pose of the target in the camera coordinate system and the pose of the first marker in the camera coordinate system, and then based on the global position of the first marker
  • the global pose of the target is obtained by the relative pose between the first landmark and the target.
  • the target is first roughly positioned, and then the target is precisely positioned based on the image of the first marker and the rough pose. This can help to obtain an ultra-high-precision global pose estimate of the target.
  • the above-mentioned rough positioning is based on lidar, and fine positioning is based on visual cameras.
  • This solution first performs coarse positioning based on lidar, and then performs fine positioning based on visual cameras.
  • the accuracy of lidar positioning is 5-10cm, and the visual fine positioning can achieve a positioning accuracy of about 1-2cm.
  • the combination of lidar coarse positioning and visual fine positioning can meet customer needs and achieve high-precision, high-robustness, and cost-effective positioning of targets.
  • the first processing module is used to:
  • a semantic positioning local map is obtained according to the local map of the target location and the semantic positioning global map.
  • the semantic positioning global map includes the global poses of M landmarks.
  • the semantic positioning local map includes the positions of N landmarks.
  • Global pose, the N markers are markers among the M markers, and M and N are both positive integers;
  • the global pose of the first landmark is obtained from the semantic localization local map according to the image of the first landmark.
  • the device further includes a fourth processing module, used for:
  • the textured three-dimensional models of the M landmarks are registered to the global point cloud map to obtain the semantic positioning global map.
  • the second processing module is also used to:
  • the image of the first marker is input into the preset model for processing to obtain the pose of the first marker in the camera coordinate system, wherein the training data of the preset model is obtained by converting the initial training image
  • the background in the data is subjected to one or more of replacement, Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, reduction, and/or Or, it is obtained by performing one or more processes of Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction on the landmarks in the initial training image data.
  • the present application provides a computing device cluster, including at least one computing device, each computing device including a processor and a memory; wherein the processor of the at least one computing device is used to execute the at least one computing device.
  • the instructions stored in the memory enable the computing device cluster to execute the method provided in any possible implementation manner of the first aspect.
  • the present application provides a computer storage medium that includes computer instructions.
  • the computer instructions When the computer instructions are run on an electronic device, the electronic device causes the electronic device to execute the method provided in any possible implementation manner of the first aspect. .
  • embodiments of the present application provide a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the method provided in any possible implementation manner of the first aspect.
  • the device described in the second aspect, the computing device cluster described in the third aspect, the computer storage medium described in the fourth aspect or the computer program product described in the fifth aspect provided above are all used to execute the first Any of the methods provided in this aspect. Therefore, the beneficial effects it can achieve can be referred to the beneficial effects in the corresponding methods, and will not be described again here.
  • Figure 1a is a schematic architectural diagram of a target positioning system provided by an embodiment of the present application.
  • Figure 1b is a schematic diagram of the system architecture of a vehicle provided by an embodiment of the present application.
  • Figure 2 is a schematic flow chart of a target positioning method provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart of another target positioning method provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of target positioning provided by an embodiment of the present application.
  • Figure 5 is a schematic structural diagram of a target positioning device provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of the hardware structure of a computing device provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the hardware structure of a computing device cluster provided by an embodiment of the present application.
  • Deep learning A type of machine learning technology based on deep neural network algorithms. Its main feature is the use of multiple nonlinear transformation structures to process and analyze data. It is mainly used in scenarios such as perception and decision-making in the field of artificial intelligence, such as image and speech recognition, natural language translation, computer games, etc.
  • Laser vision fused location In mobile robots or autonomous driving, it is very important to know the location of the robot. This is positioning. However, the accuracy of positioning involving a single lidar sensor or positioning including a single vision camera is often not enough. Fusion of lidar and vision cameras (or On this basis, the positioning method including wheel odometer and inertial measurement unit is called laser vision fusion positioning.
  • Object pose estimation Estimate the position and orientation of the target object (6 degrees of freedom (DoF) in 6 directions, 6DoF pose includes 3-dimensional position and 3-dimensional spatial orientation) , this is the object pose estimation.
  • position translation is described by variables on the three coordinate axes of X, Y, and Z
  • orientation is also described by the rotation amount of the three axes of X, Y, and Z.
  • the inertial measurement unit is a device that measures the three-axis attitude angle (or angular rate) and acceleration of an object.
  • an IMU contains three single-axis accelerometers and three single-axis gyroscopes.
  • the accelerometers detect the acceleration signals of the object in three independent axes of the carrier coordinate system, while the gyroscopes detect the angular velocity signal of the carrier relative to the navigation coordinate system. Measure the angular velocity and acceleration of an object in three-dimensional space, and use this to calculate the object's pose.
  • this application provides a target positioning method and related systems and storage media, which can achieve target positioning. High-precision, high-robustness, and cost-effective positioning.
  • Figure 1a is a schematic diagram of a target positioning system applicable to the embodiment of the present application.
  • the system includes a vehicle 101 and a server 102.
  • the vehicle 101 in the embodiment of the present application is a device that moves through power drive.
  • the vehicle 101 is a device with communication capabilities and computing capabilities, and can provide mobile travel services to users.
  • the vehicle 101 typically includes a variety of subsystems, including, but not limited to, a travel system, a sensor system, a control system, one or more peripheral devices, a power supply or user interface, and the like.
  • vehicle 101 may also include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each subsystem and element of vehicle 101 may be interconnected via wires or wirelessly.
  • the vehicle 101 in the embodiment of the present application may be a car or an electric vehicle, a track-running vehicle, an intelligent vehicle (such as an unmanned vehicle) or an intelligent mobile robot, etc.
  • smart vehicles support the function of sensing the road environment through the on-board sensing system, automatically planning the driving route and controlling the vehicle to reach the predetermined target location.
  • Smart cars make intensive use of technologies such as computers, sensing, information fusion, communications, artificial intelligence, or automatic control. They are a high-tech complex that integrates functions such as environmental perception, planning and decision-making, and multi-level assisted driving.
  • the intelligent vehicle may be a car or a wheeled mobile robot equipped with an assisted driving system or a fully autonomous driving system.
  • Server 102 is a device with centralized computing capabilities.
  • the server 102 can be implemented by a server, a virtual machine, a cloud, a roadside device, a robot, or other devices.
  • the type of server includes but is not limited to a general-purpose computer, a dedicated server computer, a blade server, etc.
  • This application does not strictly limit the number of servers included in the server 102. The number may be one or multiple (such as server clusters, etc.).
  • a virtual machine refers to a computing module simulated by software that has complete hardware system functions and runs in a completely isolated environment.
  • the server 102 can also be implemented through other computing instances, such as containers.
  • the cloud is a software platform that uses application virtualization technology, which allows one or more software and applications to be developed and run in an independent virtualized environment.
  • the cloud can be deployed on a public cloud, a private cloud, or a hybrid cloud, etc.
  • Roadside devices are devices installed on the side of the road (or intersection, or roadside, etc.).
  • the road can be an outdoor road (such as a main road, a service road, an elevated road, or a temporary road, etc.) or an indoor road (such as an indoor parking lot). roads in the field, etc.).
  • Roadside units can provide services to vehicles.
  • the roadside device can be an independent device or it can be integrated into other devices.
  • the roadside device can be integrated into equipment such as smart gas stations, charging piles, smart lights, street lights, telephone poles, or traffic signs.
  • FIG. 1b is a schematic system architecture diagram of an exemplary vehicle 101.
  • the vehicle 101 includes multiple vehicle integration units (VIU), communication boxes (telematic boxes, TBOX), cockpit domain controllers (cockpit domain controller, CDC) ), mobile data center (MDC), vehicle domain controller (VDC).
  • VIP vehicle integration units
  • TBOX communication boxes
  • TBOX cockpit domain controllers
  • MDC mobile data center
  • VDC vehicle domain controller
  • the vehicle 101 also includes various types of sensors arranged on the vehicle body, including: laser radar, millimeter wave sensor, ultrasonic radar, and camera device. There can be multiple sensors of each type. It should be understood that the sensor number and position layout in Figure 1b is a schematic, and those in the art can reasonably select the type, quantity, and position layout of sensors according to needs. Four VIUs are shown in Figure 1b. It should be understood that the number and location of VIUs in Figure 1b are only an example. Those skilled in the art can select the appropriate number and location of VIUs based on actual needs.
  • the vehicle integration unit VIU provides multiple vehicle components with some or all of the data processing functions or control functions required by the vehicle components.
  • a VIU can have one or more of the following functions.
  • Electronic control function that is, VIU is used to realize the electronic control functions provided by the electronic control unit (ECU) inside some or all vehicle components.
  • ECU electronice control unit
  • the control function required by a certain vehicle component or the data processing function required by a certain vehicle component.
  • the same functions as the gateway that is, the VIU can also have some or all of the same functions as the gateway, such as protocol rectification function, protocol encapsulation and forwarding function, and data format conversion function.
  • Data processing function that is, processing and calculating data obtained from the actuators of multiple vehicle palm components.
  • the data involved in the above functions can include the operating data of the actuators in the vehicle speed components, such as the motion parameters of the actuators, the working status of the actuators, etc.
  • the data involved in the above functions can also be obtained through the vehicle.
  • the data collected by the data collection unit (for example, a sensitive element) of the component, for example, the road information of the road the vehicle is traveling on, or the weather information collected through the vehicle's sensitive element, is not specifically limited in this article by comparing the embodiments.
  • vehicles, robots, servers, etc. first perform a rough positioning of the target (such as an unmanned vehicle, a robot, etc.), and then obtain the global position of the first marker based on the image of the first marker and the rough pose of the target. pose, and obtain the relative pose between the first marker and the target based on the pose of the target in the camera coordinate system and the pose of the first marker in the camera coordinate system, and then based on the global position of the first marker
  • the global pose of the target is obtained by the relative pose between the first landmark and the target.
  • the target is first roughly positioned, and then the target is precisely positioned based on the image of the first marker and the rough pose. This can help to obtain an ultra-high-precision global pose estimate of the target.
  • lidar coarse positioning is based on lidar
  • fine positioning is based on vision cameras.
  • the accuracy of lidar positioning is 5-10cm
  • the visual fine positioning can achieve a positioning accuracy of about 1-2cm.
  • the combination of lidar coarse positioning and visual fine positioning can meet customer needs and achieve high-precision, high-robustness, and cost-effective positioning of targets.
  • FIG. 2 is a schematic flowchart of a target positioning method provided by an embodiment of the present application.
  • this method can be applied to the aforementioned target positioning system, such as the target positioning system shown in Figure 1a.
  • the target positioning method shown in Figure 2 may include steps 201-205. It should be understood that, for convenience of description, this application is described through the sequence 201-205, and is not intended to limit the execution to the above sequence. The embodiments of the present application do not limit the execution sequence, execution time, number of executions, etc. of one or more of the above steps.
  • the execution subject of the embodiment of the present application may be a vehicle, for example, executed by a vehicle-mounted device (such as a vehicle machine), or may also be a terminal device such as a mobile phone or a computer.
  • the target positioning method provided by this application can be executed locally, or it can be executed by the cloud by uploading the image of the target and the image of the marker to the cloud.
  • the cloud can be implemented by a server, which can be a virtual server, a physical server, etc., or it can also be other devices, which is not specifically limited in this solution.
  • the following description takes the example in which the execution subject of steps 201-205 of the target positioning method is a vehicle (for example, an unmanned vehicle). This application is also applicable to other execution subjects. Steps 201-205 are as follows:
  • the preset area can be, for example, all areas of the substation, or a certain park, a certain forest, a certain household, a certain road, etc. This solution does not impose strict restrictions on this area.
  • the target can be, for example, a vehicle, a robot, etc., or it can also be other objects that can move autonomously or cannot move autonomously, etc. This solution does not strictly limit this.
  • This rough positioning can be understood as a rough estimate of the target's pose.
  • Pose estimation is to estimate the position and orientation of the target object in 6DoF.
  • position translation is described by variables on the three coordinate axes of X, Y, and Z
  • orientation is also described by the rotation amount of the three axes of X, Y, and Z.
  • sensors such as lidar and inertial measurement unit IMU in the vehicle are used to roughly locate the target based on the HDL-localization algorithm to obtain the coarse pose of the target.
  • HDL-localization a high-dimensional model
  • other algorithms can also be used, such as fast-lio-localization, Monte Carlo positioning and other algorithms. This solution does not impose strict restrictions on this.
  • the marker can be any object in the application scenario corresponding to the preset area, including but not limited to electrical boxes in substations, telephone poles in the park, fir trees in the forest, coffee tables at home, and roadside on the road. Devices, etc., this plan does not impose strict restrictions on this.
  • the marker can be an asymmetric object and have a certain texture (for example, it can be a complex texture).
  • a certain texture for example, it can be a complex texture.
  • the proportion of the marker in the screen is 1/10 to 1/2, etc.
  • obtaining the global pose of the first marker based on the coarse pose of the target and the image of the first marker includes steps 2021-2023, specifically as follows:
  • the global point cloud map of the preset area may be obtained by obtaining point cloud data of the preset area, and establishing the global point cloud map based on the point cloud data.
  • SLAM simultaneous localization and mapping
  • the approximate position of the vehicle or robot can be known based on the coarse pose.
  • the approximate position threshold of the vehicle is selected to be a radius of 10cm and superimposed on the global point cloud map to obtain a local map.
  • the semantic positioning global map includes the global poses of M landmarks, and the semantic positioning local map includes N landmarks.
  • the global pose of the object, the N markers are markers among the M markers, and M and N are both positive integers;
  • the semantic positioning global map can be obtained by distinguishing semantic elements corresponding to landmarks from semantic elements corresponding to non-markers.
  • the global pose of the marker is stored in the semantic positioning global map, which can be queried by subsequent algorithm modules to obtain the global pose of the marker.
  • the semantic positioning global map is a map of all elements containing the same semantics in the preset area.
  • the semantic positioning global map can be understood as a semantic positioning global map of all tables.
  • the semantic localization local map is a partial map of the semantic localization global map.
  • the N landmarks in the above-mentioned semantic localization local map are the landmarks among the M landmarks in the semantic localization global map.
  • the local map and the semantic positioning global map are superimposed to obtain the semantic positioning local map.
  • the map range ⁇ 10cm of the local map with the global map of semantic positioning, landmarks and targets can be placed in a local map.
  • subsequent target positioning can be more accurate and efficient. high.
  • the semantic positioning global map can be obtained through the following steps A1-A2:
  • A1. Perform three-dimensional reconstruction of the M markers in the image according to the image of the first marker to obtain textured three-dimensional models of the M markers;
  • the image of the first marker may only include the first marker, or may include multiple markers with the same semantics as the first marker. That is, M can be 1 or an integer greater than 1.
  • a textured three-dimensional model of M markers including the first marker can be obtained.
  • BundleFusion algorithm uses the BundleFusion algorithm to perform three-dimensional reconstruction to obtain a three-dimensional reconstructed landmark model with real texture.
  • other methods can be used for 3D reconstruction, such as Kintinuous, ElasticFusion, etc. This solution does not impose strict restrictions on this.
  • the three-dimensional model of the landmark is registered to the global point cloud map through the point cloud registration method.
  • the point cloud registration method may be, for example, using the TEASER++ method.
  • the global pose of each landmark in the semantic localization global map is known.
  • the global pose of each landmark can be obtained by querying the database corresponding to the semantic positioning global map.
  • the marker can be an electrical box.
  • the method of obtaining the semantic positioning global map of the marker can be obtained in advance and can be used repeatedly.
  • the above example only uses one method as an example. It can also be obtained through other methods. This solution does not strictly limit this.
  • a partial map of semantic positioning is first obtained, a part of the markers are screened out, and then the first marker is retrieved from the image based on the image of the first marker.
  • the semantic positioning determines the first landmark in the local map, and then obtains the global pose of the first landmark.
  • the vision camera selects the landmark with the largest mask area to determine the It is determined as the first marker, thereby excluding the other two markers with smaller mask areas, leaving the only marker, and based on the semantic positioning of the local map, the global pose of the first marker is obtained.
  • the vehicle or robot take a picture based on a camera on the vehicle or robot to obtain the image of the first marker. Based on the image, the pose of the landmark in the camera coordinate system can be obtained.
  • the pose of the marker in the camera coordinate system can be obtained based on deep learning method processing.
  • the image of the first marker is input into a preset model for processing to obtain the pose of the first marker in the camera coordinate system, where the training data of the preset model is obtained by converting the initial
  • the background in the training image data is subjected to one or more processes including replacement, Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction, and/or the markers in the initial training image data are processed. Obtained by one or more of Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction.
  • the preset model can be any image processing model, and this solution does not strictly limit it.
  • an initial model is trained based on multiple training image data, and the parameters of the initial model are iteratively adjusted based on a preset stopping condition until the stopping condition is reached, that is, a trained preset model is obtained.
  • the stopping condition can be that the number of training times reaches 100, or that the loss value is less than a preset threshold, etc.
  • the pose of the 6DoF landmark relative to the camera is obtained.
  • the pose of the marker relative to the camera is converted into the pose of the marker relative to the vehicle or robot.
  • the object positioning algorithm may be, for example, a PVnet algorithm.
  • other algorithms can also be used, such as DenseFusion, etc. This solution does not impose strict restrictions on this.
  • the pose of the target in the camera coordinate system may be preset.
  • the camera has a relative position relative to the center of the vehicle or robot, which is the pose of the target in the camera coordinate system.
  • the relative pose between the first marker and the target can be obtained through coordinate transformation.
  • the global pose of the vehicle or robot is obtained as the final precise pose 6DoF output.
  • the global pose of the marker is transferred to the pose of the marker relative to the vehicle or robot through conventional coordinate transformation, thereby obtaining the pose of the vehicle or robot.
  • other methods can also be used to process to obtain the global pose of the target, and this solution does not impose strict restrictions on this.
  • vehicles, robots, servers, etc. first perform a rough positioning of the target (such as an unmanned vehicle, a robot, etc.), and then obtain the global position of the first marker based on the image of the first marker and the rough pose of the target. pose, and obtain the relative pose between the first marker and the target based on the pose of the target in the camera coordinate system and the pose of the first marker in the camera coordinate system, and then based on the global position of the first marker pose and the relative pose between the first marker and the target are obtained The global pose of the target. Using this method, the target is first roughly positioned, and then the target is precisely positioned based on the image of the first marker and the rough pose. This can help to obtain an ultra-high-precision global pose estimate of the target.
  • the target such as an unmanned vehicle, a robot, etc.
  • the above-mentioned rough positioning is based on lidar, and fine positioning is based on visual cameras.
  • This solution first performs coarse positioning based on lidar, and then performs fine positioning based on visual cameras.
  • the accuracy of lidar positioning is 5-10cm, and the visual fine positioning can achieve a positioning accuracy of about 1-2cm.
  • the combination of lidar coarse positioning and visual fine positioning can meet customer needs and achieve high-precision, high-robustness, and cost-effective positioning of targets.
  • FIG. 3 is a schematic flow chart of another target positioning method provided by an embodiment of the present application.
  • this method can be applied to the aforementioned target positioning system, such as the target positioning system shown in Figure 1a.
  • the target positioning method shown in Figure 3 may include steps 301-308.
  • this application describes the sequence 301-308 for convenience of description, and is not intended to limit execution to the above sequence.
  • the embodiments of the present application do not limit the execution sequence, execution time, number of executions, etc. of one or more of the above steps.
  • the following description takes the execution subject of steps 301-308 of the target positioning method as a vehicle as an example. This application is also applicable to other execution subjects such as robots or servers. Steps 301-308 are as follows:
  • the image of the marker may be captured by a camera.
  • step 301 please refer to the records in the foregoing embodiments, and will not be described again here.
  • step 302 please refer to the records in the foregoing embodiments, and will not be described again here.
  • step 303 please refer to the records in the foregoing embodiments, and will not be described again here.
  • FIG 4 it is a schematic diagram of target positioning provided by an embodiment of the present application.
  • the figure shows that semantic localization mapping of landmarks can be performed in offline mode. That is, steps 301-303 may be performed in offline mode.
  • the offline mode runs asynchronously, and the online mode runs in real time. Offline mode will generally be run once before online mode is run.
  • steps 301-303 can be executed once and then directly used again.
  • the semantic positioning global map After obtaining the semantic positioning global map, it can be applied to various other target positioning in the preset area, such as other vehicle positioning, robot positioning, or pedestrian positioning, etc. This solution does not impose strict restrictions on this.
  • This solution selects objects that are already included in specific application scenarios as positioning markers. Compared with the manual arrangement of QR codes in the existing technology, this solution uses this method, which will not destroy the specific application scenario, reduce labor costs, and can Widely used in positioning field.
  • ultra-high-precision relative global poses of landmarks can be obtained, which helps to improve the accuracy of the final target's precise pose.
  • FIG. 4 shows that steps 304-307 are performed in online mode. details as follows:
  • the target is roughly positioned through the vehicle's lidar.
  • the introduction of this step please refer to the records in the foregoing embodiments, and will not be described again here.
  • Figure 4 shows the method of coarse positioning of targets based on Lidar and inertial measurement unit IMU.
  • the data obtained by lidar scanning are downsampled and the point cloud is de-distorted.
  • the point cloud is registered based on the data obtained by the inertial measurement unit IMU and the data obtained after the point cloud de-distortion.
  • the point cloud is registered based on the global map. Go global Optimize to obtain the coarse pose of the target.
  • a marker when the number of markers in the visual camera (camera) is greater than 1, a marker can be determined, and then the target can be positioned based on the marker.
  • the unique landmark is positioned based on the semantic positioning local map to obtain the global pose of the landmark.
  • the semantic positioning local map to obtain the global pose of the landmark.
  • the image of the first landmark is input into a preset model for processing to obtain the pose of the first landmark in the camera coordinate system, wherein the training of the model
  • the data is processed by one or more of replacement, Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction of the background in the initial training image data, and the marks in the initial training image data are
  • the object is obtained by performing one or more of Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction.
  • changing the angle and background of observing markers allows for data amplification of training data.
  • the background is subtracted from the two-dimensional pictures of the landmarks taken 10° apart from the left and right and 10° apart from the top and bottom, and replaced with other backgrounds in the specific application scenario.
  • Both the background and landmarks can be subjected to Gaussian blur (Gaussian kernel size 1 to 5), translation (random up, down, left, and right panning range 1-30 pixels), cropping (random up, down, left, and right cropping range 1-30 pixels), contrast transformation (contrast Random adjustment range ⁇ 20%), Gamma transformation (Gamma parameter 0.01-0.2), amplification (random amplification ratio 1-10%), reduction (random reduction ratio 1-10%) and other data amplification operations.
  • Gaussian blur Gaussian kernel size 1 to 5
  • translation random up, down, left, and right panning range 1-30 pixels
  • cropping random up, down, left, and right cropping range 1-30 pixels
  • contrast transformation contrast Random adjustment range ⁇ 20%)
  • Gamma transformation (Gamma parameter 0.01-0.2)
  • amplification random amplification ratio 1-10%)
  • reduction random reduction ratio 1-10%)
  • other data amplification operations In this way, the model is trained based on the deep learning algorithm,
  • the global pose of the first marker is transferred to the pose of the first marker relative to the vehicle or robot through conventional coordinate conversion, thereby obtaining the pose of the vehicle or robot.
  • other methods can be used to process and obtain the global pose of the target, and this solution does not impose strict restrictions on this.
  • This embodiment provides a new high-precision positioning method that integrates lidar and visual sensors.
  • a global map of semantic positioning is established offline.
  • start the online positioning mode for example, a robot or autonomous vehicle first performs coarse positioning based on lidar, and then performs fine positioning based on vision cameras. This can help obtain ultra-high-precision global pose estimation of the target.
  • the separation of offline modules and online modules can greatly reduce the computing power consumption of online modules, greatly reducing the hardware costs of vehicles or robots and greatly improving battery life.
  • lidar positioning is 5-10cm, which is difficult to meet customers' high-precision positioning requirements of 1-2cm, while visual precision positioning can achieve a positioning accuracy of about 1-2cm.
  • the combination of lidar coarse positioning and visual fine positioning can satisfy meet customer needs.
  • this solution can be widely used in fields such as driverless vehicles and robot positioning, such as power inspection scenarios, unmanned taxis, power inspections, park inspections, oil and gas inspections, geological exploration, and logistics.
  • a large number of scenarios such as transportation, home services, and unmanned nucleic acid testing.
  • this solution can also be in other fields or scenarios, and this solution does not limit this.
  • the division of multiple units or modules is only a logical division based on functions and does not limit the specific structure of the device.
  • some of the functional modules may be subdivided into more small functional modules, and some of the functional modules may also be combined into one functional module.
  • some devices include a receiving unit and a transmitting unit.
  • the sending unit and the receiving unit can also be integrated into a communication unit, and the communication unit can realize the functions realized by the receiving unit and the sending unit.
  • each unit corresponds to its own program code (or program instruction).
  • program codes corresponding to these units are run on the processor, the unit is controlled by the processing unit and executes the corresponding process to achieve the corresponding function. .
  • Embodiments of the present application also provide a device for implementing any of the above methods.
  • a target positioning device is provided that includes modules (or means) for implementing each step performed by a vehicle in any of the above methods.
  • FIG. 5 is a schematic structural diagram of a target positioning device provided by an embodiment of the present application.
  • the target positioning device is used to implement the aforementioned target positioning method, such as the target positioning method shown in Figures 2 and 3.
  • the device may include a coarse positioning module 501, a first processing module 502, a second processing module 503, a third processing module 504 and a positioning module 505, specifically as follows:
  • the rough positioning module 501 is used to roughly position the target in the preset area and obtain the rough pose of the target;
  • the first processing module 502 is used to obtain the image of the first marker in the preset area, and obtain the global image of the first marker according to the coarse pose of the target and the image of the first marker. posture;
  • the second processing module 503 is used to obtain the pose of the first marker in the camera coordinate system according to the image of the first marker;
  • the third processing module 504 is used to obtain the pose of the target in the camera coordinate system, and calculate the pose of the target in the camera coordinate system and the position of the first marker in the camera coordinate system.
  • the relative pose between the first marker and the target is obtained by the corresponding pose;
  • the positioning module 505 is configured to obtain the global pose of the target based on the global pose of the first marker and the relative pose between the first marker and the target.
  • the first processing module 502 is used to:
  • a semantic positioning local map is obtained according to the local map of the target location and the semantic positioning global map.
  • the semantic positioning global map includes the global poses of M landmarks.
  • the semantic positioning local map includes the positions of N landmarks.
  • Global pose, the N markers are markers among the M markers, and M and N are both positive integers;
  • the global pose of the first landmark is obtained from the semantic localization local map according to the image of the first landmark.
  • the device further includes a fourth processing module, used for:
  • the textured three-dimensional models of the M landmarks are registered to the global point cloud map to obtain the semantic positioning global map.
  • the second processing module 503 is also used to:
  • the image of the first marker is input into the preset model for processing to obtain the pose of the first marker in the camera coordinate system, wherein the training data of the preset model is obtained by converting the initial training image
  • the background in the data is subjected to one or more processes including replacement, Gaussian blur, translation, cropping, contrast transformation, Gamma transformation, enlargement, and reduction, and/or Gaussian blur is performed on the markers in the initial training image data. , translation, cropping, contrast transformation, Gamma transformation, enlargement, reduction, one or more processing.
  • the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504 and the positioning module 505 can all be implemented by software, or can be implemented by hardware.
  • the implementation of the coarse positioning module 501 will be introduced next, taking the coarse positioning module 501 as an example.
  • the implementation of the first processing module 502, the second processing module 503, the third processing module 504 and the positioning module 505 can refer to the implementation of the coarse positioning module 501.
  • the coarse positioning module 501 may include code running on a computing instance.
  • the computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more.
  • coarse location module 501 may include code running on multiple hosts/virtual machines/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.
  • the multiple hosts/VMs/containers used to run the code can be distributed in the same virtual private cloud (VPC), or across multiple VPCs.
  • VPC virtual private cloud
  • Cross-region communication between two VPCs in the same region and between VPCs in different regions requires a communication gateway in each VPC, and the interconnection between VPCs is realized through the communication gateway. .
  • the coarse positioning module 501 may include at least one computing device, such as a server.
  • the coarse positioning module 501 may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL), or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • Multiple computing devices included in the coarse positioning module 501 may be distributed in the same region or in different regions. Multiple computing devices included in the coarse positioning module 501 may be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the coarse positioning module 501 may be distributed in the same VPC or in multiple VPCs.
  • the plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
  • the coarse positioning module 501 can be used to perform any steps in the target processing method, and the first processing module 502, the second processing module 503, the third processing module 504 and the positioning module 505 can all be used.
  • the coarse positioning module 501, the first processing module 502, the second processing module 503, The steps that the third processing module 504 and the positioning module 505 are responsible for can be specified as needed.
  • the target positioning method is implemented through the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504 and the positioning module 505 respectively. Different steps are required to realize the full functionality of the target positioning device.
  • each module in each device above is only a division of logical functions. In actual implementation, it can be fully or partially integrated into a physical entity, or it can also be physically separated.
  • the modules in the target positioning device can be implemented in the form of the processor calling software; for example, the target positioning device includes a processor, the processor is connected to a memory, instructions are stored in the memory, and the processor calls the instructions stored in the memory to achieve the above. Any method or function of each module of the device is implemented, where the processor is, for example, a general-purpose processor, such as a central processing unit (CPU) or a microprocessor, and the memory is a memory within the device or a memory outside the device.
  • CPU central processing unit
  • microprocessor a microprocessor
  • the modules in the device can be implemented in the form of hardware circuits, and some or all of the unit functions can be implemented through the design of the hardware circuits, which can be understood as one or more processors; for example, in one implementation,
  • the hardware circuit is an application-specific integrated circuit (ASIC), which realizes the functions of some or all of the above units through the design of the logical relationships of the components in the circuit; for another example, in another implementation, the hardware circuit is It can be realized by programmable logic device (PLD), taking field programmable gate array (FPGA) as an example, which can include a large number of logic gate circuits, and the logic gate circuits are configured through configuration files. connection relationships, thereby realizing the functions of some or all of the above units. All modules of the above device may be fully implemented by the processor calling software, or all may be implemented by hardware circuits, or part of the modules may be implemented by the processor calling software, and the remaining part may be implemented by hardware circuits.
  • PLD programmable logic device
  • FPGA field programmable gate array
  • computing device 600 includes: bus 602, processor 604, memory 606, and communication interface 608.
  • the processor 604, the memory 606 and the communication interface 608 communicate through the bus 602.
  • Computing device 600 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 600.
  • the bus 602 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 6, but it does not mean that there is only one bus or one type of bus.
  • Bus 602 may include a path that carries information between various components of computing device 600 (eg, memory 606, processor 604, communications interface 608).
  • the processor 604 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.
  • CPU central processing unit
  • GPU graphics processing unit
  • MP microprocessor
  • DSP digital signal processor
  • Memory 606 may include volatile memory, such as random access memory (RAM).
  • the processor 604 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard disk drive (HDD) or solid state drive (SSD). drive, SSD).
  • ROM read-only memory
  • flash memory flash memory
  • HDD hard disk drive
  • SSD solid state drive
  • the memory 606 stores executable program code, and the processor 604 executes the executable program code to respectively implement the aforementioned coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504 and the positioning module. 505 function to implement the target positioning method. That is, the memory 606 stores instructions for executing the target positioning method.
  • the communication interface 608 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement computing devices. 600 Communications with other devices or communications networks.
  • computing device 600 shown in FIG. 6 only shows the bus 602, the processor 604, the memory 606 and the communication interface 608, during specific implementation, those skilled in the art will understand that the computing device 600 also includes Other devices necessary for normal operation. At the same time, depending on specific needs, those skilled in the art should understand that the computing device 600 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the computing device 600 may only include components necessary to implement the embodiments of the present application, and does not necessarily include all components shown in FIG. 6 .
  • An embodiment of the present application also provides a computing device cluster.
  • the computing device cluster includes at least one computing device.
  • the computing device may be a server, such as a central server, an edge server, or a local server in a local data center.
  • the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.
  • the computing device cluster includes at least one computing device 600 .
  • the same instructions for performing the target positioning method may be stored in the memory 606 of one or more computing devices 600 in the computing device cluster.
  • the memory 606 of one or more computing devices 600 in the computing device cluster may also store part of the instructions for executing the target positioning method.
  • a combination of one or more computing devices 600 may collectively execute instructions for performing a target location method.
  • the memory 606 in different computing devices 600 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the target positioning device. That is, instructions stored in the memory 606 in different computing devices 600 may implement one or more modules among the coarse positioning module 501, the first processing module 502, the second processing module 503, the third processing module 504 and the positioning module 505. function.
  • one or more computing devices in a cluster of computing devices may be connected through a network.
  • the network may be a wide area network or a local area network, etc.
  • the connection between two computing devices is via a network.
  • the connection to the network is made through a communication interface in each computing device.
  • instructions for performing the functions of the coarse positioning module 501 are stored in the memory of the first computing device.
  • instructions for executing the functions of the first processing module 502 , the second processing module 503 , the third processing module 504 and the positioning module 505 are stored in the memory of the second computing device.
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores instructions, which when run on a computer or processor, cause the computer or processor to execute one of the above methods. or multiple steps.
  • An embodiment of the present application also provides a computer program product containing instructions.
  • the computer program product is run on a computer or processor, the computer or processor is caused to perform one or more steps in any of the above methods.
  • A/B can mean A or B; where A and B can be singular numbers. Or plural.
  • plural means two or more than two.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • at least one of a, b, or c can mean: a, b, c, ab, ac, bc, or abc, where a, b, c can be single or multiple .
  • words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects. Those skilled in the art can understand that words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not limit the number and execution order.
  • “exemplary” or “example” Words such as “such as” are used to mean examples, illustrations or illustrations. Any embodiment or design solution described as “exemplary” or “such as” in the embodiments of the present application should not be construed as being superior to other embodiments or design solutions. More preferred or advantageous. To be precise, the use of words such as “exemplary” or “for example” is intended to present relevant concepts in a concrete manner to facilitate understanding.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted over a computer-readable storage medium.
  • the computer instructions can be transmitted from one website, computer, server or data center to another through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means A website site, computer, server or data center for transmission.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the available media may be read-only memory (ROM), random access memory (RAM), or magnetic media, such as floppy disks, hard disks, tapes, disks, or optical media, such as , digital versatile disc (digital versatile disc, DVD), or semiconductor media, such as solid state drive (solid state disk, SSD), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

本申请实施例提供一种目标定位方法及相关系统、存储介质。该方法包括:对预设区域中的目标进行粗定位,得到所述目标的粗位姿;获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;根据所述第一标志物的图像和所述目标在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。采用该手段,可以有助于获得超高精度的目标的全局位姿估计。

Description

目标定位方法及相关系统、存储介质
本申请要求于2022年8月16日提交中国专利局、申请号为202210980993.8、申请名称为“目标定位方法及相关系统、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及定位技术领域,尤其涉及一种目标定位方法及相关系统、存储介质。
背景技术
随着科技的不断发展,自动驾驶以及移动机器人等具有较大市场规模。在移动机器人或者自动驾驶中,定位是一件十分重要的基础性工作。但要实现高精度、高鲁棒性、高性价比的定位方式是一件非常有挑战性的事情。价格高昂的单传感器可以较好地解决定位问题,比如较高级别的惯性测量单元(Inertial Measurement Unit,IMU)。但考虑到大多数场景对高成本是排斥的,单纯依赖价格昂贵的高精传感器的解决方案的适用场景较少。所以低成本的多传感器融合是一个较为可行的覆盖更多场景的解决方案。激光雷达Lidar、视觉摄像头Camera、惯性测量单元IMU、轮式里程计、雷达Radar、全球定位系统(Global Positioning System,GPS)等多种传感器具有各自的优缺点,也具有多种排列组合。常见的Lidar+IMU的融合算法难以解决回环检测和重定位问题;常见的Camera+IMU的融合算法难以解决精确的深度估计问题;常见的Lidar+Camera+IMU算法并没有做到真正的多传感器紧耦合;为了应对雨雪雾天气,在Lidar+Camera+IMU等常规传感器组合的基础上专门增加Radar传感器的成本又较高。特定场景可能让某些传感器失效(比如变电站场景存在强烈的电磁干扰,让GPS或实时动态载波相位差分技术(Real-time kinematic,RTK)的定位误差变得极大;比如路面颠簸可能让轮式里程计打滑和磨损,导致机器人长时间巡检的累计误差越大;再如空旷场景的回波反射少,可能让激光雷达无法探测到有效的特征点;再如雨、雾、雪等场景下,Lidar、Camera等多种传感器都会发生严重的性能退化,导致机器人无法识别自己的准确位姿。因此,如何研发出高精度、高鲁棒性、高性价比的多传感器融合定位算法是整个行业面临的巨大挑战。
发明内容
本申请公开了一种目标定位方法及相关系统、存储介质,可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
第一方面,本申请实施例提供一种目标定位方法,包括:
对预设区域中的目标进行粗定位,得到所述目标的粗位姿;
获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿;
获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所 述目标的全局位姿。
本申请实施例,车辆、机器人或者服务器等通过先对目标(如无人驾驶车辆、机器人等)进行粗定位,然后基于第一标志物的图像和目标的粗位姿得到第一标志物的全局位姿,并根据目标在相机坐标系下的位姿和第一标志物在相机坐标系下的位姿得到第一标志物与目标之间的相对位姿,进而根据第一标志物的全局位姿以及第一标志物与目标之间的相对位姿得到目标的全局位姿。采用该手段,先对目标进行粗定位,再结合第一标志物的图像以及该粗位姿对目标进行精定位,这样可以有助于获得超高精度的目标的全局位姿估计。
例如,上述粗定位是基于激光雷达来进行的,精定位是基于视觉摄像头进行的。本方案先进行基于激光雷达的粗定位,再进行基于视觉摄像头的精定位,其中激光雷达定位的精度在5-10cm,视觉精定位可以做到1-2cm左右的定位精度。激光雷达粗定位和视觉精定位的结合,可以满足客户的需求,且可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
在一种可能的实现方式中,所述根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿,包括:
根据所述目标的粗位姿和所述预设区域的全局点云地图,得到所述目标所在位置的局部地图;
根据所述目标所在位置的局部地图和语义定位全局地图得到语义定位局部地图,所述语义定位全局地图中包括M个标志物的全局位姿,所述语义定位局部地图中包括N个标志物的全局位姿,所述N个标志物为所述M个标志物中的标志物,M和N均为正整数;
根据所述第一标志物的图像从所述语义定位局部地图中获取所述第一标志物的全局位姿。
通过先使用激光雷达进行粗定位,可以满足视觉精定位需要标志物在画面中大于1/10的最低运行要求;否则如果只使用视觉精定位模块的话,无法满足视觉精定位的最低启动要求。本方案先进行基于激光雷达的粗定位,再进行基于视觉摄像头的精定位,其中激光雷达定位的精度在5-10cm,很难满足客户的1-2cm的高精定位需求,而视觉精定位可以做到1-2cm左右的定位精度。激光雷达粗定位和视觉精定位的结合,可以满足客户的需求,且可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
另一方面,通过语义定位局部地图来查找标志物的全局位姿,由于可能标志物很多,通过进一步缩减范围进行查找,使得标志物的全局位姿估计的精度更高,提高目标定位的效率。
在一种可能的实现方式中,所述方法还包括:
根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图。
其中,语义定位全局地图可以是在离线模式建立的,然后采用在线模式对目标进行定位。采用离线模块与在线模块分离的方式,可以极大地减少在线模块的算力消耗,使得例如车辆或机器人的硬件成本大大降低、续航大大提升。
在一种可能的实现方式中,所述根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿,包括:
将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述预设模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,和/或,对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma 变换、放大、缩小中的一项或多项处理得到的。
通过扩增训练数据来训练模型,使得获取到的标志物与所述目标之间的相对位姿更加精准,鲁棒性更高。
第二方面,本申请实施例提供一种目标定位装置,包括:
粗定位模块,用于对预设区域中的目标进行粗定位,得到所述目标的粗位姿;
第一处理模块,用于获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
第二处理模块,用于根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿;
第三处理模块,用于获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
定位模块,用于根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。
本申请实施例,车辆、机器人或者服务器等通过先对目标(如无人驾驶车辆、机器人等)进行粗定位,然后基于第一标志物的图像和目标的粗位姿得到第一标志物的全局位姿,并根据目标在相机坐标系下的位姿和第一标志物在相机坐标系下的位姿得到第一标志物与目标之间的相对位姿,进而根据第一标志物的全局位姿以及第一标志物与目标之间的相对位姿得到目标的全局位姿。采用该手段,先对目标进行粗定位,再结合第一标志物的图像以及该粗位姿对目标进行精定位,这样可以有助于获得超高精度的目标的全局位姿估计。
例如,上述粗定位是基于激光雷达来进行的,精定位是基于视觉摄像头进行的。本方案先进行基于激光雷达的粗定位,再进行基于视觉摄像头的精定位,其中激光雷达定位的精度在5-10cm,视觉精定位可以做到1-2cm左右的定位精度。激光雷达粗定位和视觉精定位的结合,可以满足客户的需求,且可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
在一种可能的实现方式中,所述第一处理模块,用于:
根据所述目标的粗位姿和所述预设区域的全局点云地图,得到所述目标所在位置的局部地图;
根据所述目标所在位置的局部地图和语义定位全局地图得到语义定位局部地图,所述语义定位全局地图中包括M个标志物的全局位姿,所述语义定位局部地图中包括N个标志物的全局位姿,所述N个标志物为所述M个标志物中的标志物,M和N均为正整数;
根据所述第一标志物的图像从所述语义定位局部地图中获取所述第一标志物的全局位姿。
在一种可能的实现方式中,所述装置还包括第四处理模块,用于:
根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图。
在一种可能的实现方式中,所述第二处理模块,还用于:
将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述预设模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,和/ 或,对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理得到的。
第三方面,本申请提供了一种计算设备集群,包括至少一个计算设备,每个计算设备包括处理器和存储器;其中,所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如第一方面任一种可能的实施方式提供的方法。
第四方面,本申请提供了一种计算机存储介质,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如第一方面任一种可能的实施方式提供的方法。
第五方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面任一种可能的实施方式提供的方法。
可以理解地,上述提供的第二方面所述的装置、第三方面所述的计算设备集群、第四方面所述的计算机存储介质或者第五方面所述的计算机程序产品均用于执行第一方面中任一所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1a是本申请实施例提供的一种目标定位系统的架构示意图;
图1b是本申请实施例提供的一种车辆的系统架构示意图;
图2是本申请实施例提供的一种目标定位方法的流程示意图;
图3是本申请实施例提供的另一种目标定位方法的流程示意图;
图4是本申请实施例提供的一种目标定位示意图;
图5是本申请实施例提供的一种目标定位装置的结构示意图;
图6是本申请实施例提供的一种计算设备的硬件结构示意图;
图7是本申请实施例提供的一种计算设备集群的硬件结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请实施例的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
为了便于理解,以下示例地给出了部分与本申请实施例相关概念的说明以供参考。如下所述:
1.深度学习(deep learning):一类基于深层次神经网络算法的机器学习技术,其主要特征是使用多重非线性变换构对数据进行处理和分析。主要应用于人工智能领域的感知、决策等场景,例如图像和语音识别、自然语言翻译、计算机博弈等。
2.激光视觉融合定位(lidar-vision fused location):在移动机器人或者自动驾驶中,知道机器人的位置是一件十分重要的事情,这就是定位。但包含单个激光雷达传感器的定位或者包含单个视觉摄像头的定位的精度常常是不够的,融合激光雷达和视觉摄像头(或者还可以 在此基础上包含轮式里程计、惯性测量单元)的定位方法被称为激光视觉融合定位。
3.物体位姿估计(object pose estimation):对目标物体的位置和朝向(6个方向的自由度(degree of freedom,DoF),6DoF位姿包括了3维位置和3维空间朝向)进行估算,这就是物体位姿估计。在三维空间的一般情况下,位置平移由X、Y、Z三种坐标轴上的变量进行描述,朝向也由对X、Y、Z三轴的旋转量来进行描述。
4.惯性测量单元(inertial measurement unit):惯性测量单元是测量物体三轴姿态角(或角速率)以及加速度的装置。一般地,一个IMU包含了三个单轴的加速度计和三个单轴的陀螺,加速度计检测物体在载体坐标系统独立三轴的加速度信号,而陀螺检测载体相对于导航坐标系的角速度信号,测量物体在三维空间中的角速度和加速度,并以此解算出物体的位姿。
上述对概念的示例性说明可以应用在下文的实施例中。
由于现有技术中的多传感器融合并不能达到高精度、高鲁棒性、高性价比的融合定位,有鉴于此,本申请提供一种目标定位方法及相关系统、存储介质,可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
以下将结合附图,来详细介绍本申请实施例的系统架构。请参见图1a,图1a是本申请实施例适用的一种目标定位系统的示意图,该系统包括车辆101和服务端102。
其中,本申请实施例中的车辆101是通过动力驱动进行运动的装置。车辆101是具有通信能力和计算能力的装置,能够为用户提供移动出行服务。车辆101通常包括多种子系统,例如包括但是不限于行进系统、传感器系统、控制系统、一个或多个外围设备、电源或用户接口等。可选地,车辆101还可以包括更多或更少的子系统,并且每个子系统可包括多个元件。另外,车辆101的每个子系统和元件可以通过有线或者无线互连。
需要说明的是,本申请实施例中的车辆101可以是汽车或电动车,还可以是轨道运行的车辆,还可以是智能车辆(例如无人驾驶车辆)或智能移动机器人等。
其中,智能车辆支持通过车载传感系统感知道路环境,自动规划行车路线并控制车辆到达预定目标位置的功能。智能汽车集中运用了计算机、传感、信息融合、通讯、人工智能机或自动控制等技术,是一个集环境感知、规划决策、多等级辅助驾驶等功能于一体的高新技术综合体。示例性地,智能车辆例如可以为拥有辅助驾驶系统或者全自动驾驶系统的汽车或轮式移动机器人等。
服务端102是具有集中计算能力的装置。示例性的,服务端102可以通过服务器、虚拟机、云端、路侧装置或机器人等装置实现。
当服务端102包含服务器时,服务器的类型包含但不限于是通用计算机、专用服务器计算机、刀片式服务器等。本申请对服务端102包含的服务器数量不做严格限制,其数量可以是一个,也可以是多个(如服务器集群等)。
虚拟机是指通过软件模拟的具有完整硬件系统功能的、运行在一个完全隔离环境中的计算模块。当然,除了虚拟机,服务端102还可以通过其他的计算实例来实现,例如容器等。
云端是采用应用程序虚拟化技术的软件平台,能够让一个或者多个软件、应用在独立的虚拟化环境中开发、运行。可选的,当服务端102通过云端来实现时,云端可以部署在公有云、私有云、或者混合云上等。
路侧装置是设置于道路边(或路口、或路侧等)的装置,道路可以是室外道路(例如主干道、辅路、高架、或者临时道路等),还可以是室内的道路(例如室内停车场中的道路等)。路侧装置能够为车辆提供服务。需要说明的是,路侧装置可以是一个独立的设备,也可以集 成在其他设备中。示例性地,路侧装置可以集成在智能加油站、充电桩、智能信号灯、路灯、电线杆、或者交通指示牌等设备中。
图1b为一种示例性的车辆101的系统架构示意图,车辆101包括多个车辆集成单元(vehicle integration unit,VIU),通信盒子(telematic box,TBOX),座舱域控制器(cockpit domain controller,CDC),移动教据中心(mobile data center,MDC),整车控制器(vehicle domain controller,VDC)。
车辆101还包括设置在车身上的多种类型的传感器,包括:激光雷达,毫米波置达,超声雷达,摄像装置。每种类型的传感器可以包括多个。应当理解的是,图1b中的传感器数量和位置布局仪为一种示意,本领域人员可以依据需要合理地选择传感器的种类、数量和位置布局。在图1b中示出了四个VIU。应当理解的是,图1b中的VIU的数量和位置仅为一种示例。本领域技术人员可以依据实际需求选择合适的VIU的数量和位置。
车辆集成单元VIU为多个车辆零部件提供车辆零部件所需的部分或全部的数据处理功能或控制功能。VIU可以具有以下多种功能中的一种或多种。
1、电子控制功能,即VIU用于实现部分或全部车辆零部件内部的电子控制单元(electronic control unit,ECU)提供的电子控制功能。例如,某一车辆零部件所需的控制功能,又例如,某一车辆零部件所需的数据处理功能。
2、与网关相同的功能,即VIU还可以具有部分或全部与网关相同的功能,例如,协议整换功能、协议封装并转发功能以及数据格式转换功能。
3、数据的处理功能,即对从多个车辆掌部件的执行器获取的数据进行处理、计算等。
需要说明的是,上述功能中涉及的数据,可以包括车辆率部件中执行器的运行数据,例如,执行器的运动参数,执行器的工作状态等,上述功能中涉及的数据还可以是通过车辆零部件的数据采集单元(例如,敏感元件)采集的数据,例如,通过车辆的敏感元件采集的车辆所行驶道路的道路信息,或者天气信息等,本电请实施例对比不做具体限定。
本申请实施例,车辆、机器人或者服务器等通过先对目标(如无人驾驶车辆、机器人等)进行粗定位,然后基于第一标志物的图像和目标的粗位姿得到第一标志物的全局位姿,并根据目标在相机坐标系下的位姿和第一标志物在相机坐标系下的位姿得到第一标志物与目标之间的相对位姿,进而根据第一标志物的全局位姿以及第一标志物与目标之间的相对位姿得到目标的全局位姿。采用该手段,先对目标进行粗定位,再结合第一标志物的图像以及该粗位姿对目标进行精定位,这样可以有助于获得超高精度的目标的全局位姿估计。
例如,粗定位是基于激光雷达来进行的,精定位是基于视觉摄像头进行的。通过先进行基于激光雷达的粗定位,再进行基于视觉摄像头的精定位,其中激光雷达定位的精度在5-10cm,视觉精定位可以做到1-2cm左右的定位精度。激光雷达粗定位和视觉精定位的结合,可以满足客户的需求,且可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
上面说明了本申请实施例的架构,下面对本申请实施例的方法进行详细介绍。
参照图2所示,是本申请实施例提供的一种目标定位方法的流程示意图。可选的,该方法可以应用于前述的目标定位系统,例如图1a所示的目标定位系统。如图2所示的目标定位方法可以包括步骤201-205。应理解,本申请为了方便描述,故通过201-205这一顺序进行描述,并不旨在限定一定通过上述顺序进行执行。本申请实施例对于上述一个或多个步骤的执行的先后顺序、执行的时间、执行的次数等不做限定。本申请实施例的执行主体可以是车辆,例如由车载装置(如车机)来执行,或者还可以是手机、电脑等终端设备。需要说明的是, 本申请提供的目标定位方法,可以在本地执行,也可以通过将目标的图像和标志物的图像上传到云端,由云端执行。其中,云端可以由服务器来实现,该服务器可以是虚拟服务器、实体服务器等,其还可以是其他装置,本方案对此不做具体限定。下文以目标定位方法的步骤201-205的执行主体为车辆(例如无人驾驶车辆)来执行为例进行描述,对于其他执行主体本申请同样也适用。步骤201-205具体如下:
201、对预设区域中的目标进行粗定位,得到所述目标的粗位姿;
该预设区域,例如可以是变电站的所有区域,或者某园区、某森林、某户人家、某道路等,本方案对该区域不作严格限制。
该目标例如可以是车辆、机器人等,或者还可以是其他可自主移动或不可自主移动的物体等,本方案对此不作严格限制。
该粗定位,可以理解为,对目标的位姿进行大致的估计。
该位姿可以理解为,位置和姿态(Position&Pose),其中位置是三自由度平移,姿态是三自由度的旋转。位姿估计也即是对目标物体的位置和朝向6DoF进行估算。在三维空间的一般情况下,位置平移由X、Y、Z三种坐标轴上的变量进行描述,朝向也由对X、Y、Z三轴的旋转量来进行描述。
在一种可能的实现方式中,基于车辆中的激光雷达Lidar和惯性测量单元IMU等传感器基于HDL-localization算法来对目标进行粗定位,以得到目标的粗位姿。当然还可以是其他算法,例如fast-lio-localization、蒙特卡洛定位等算法,本方案对此不作严格限制。
202、获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
该标志物,可以是该预设区域对应的应用场景中的任意物体,包括但不限于变电站中的电箱、园区中的电线杆、森林里的杉树、家里的茶几、道路上的路侧装置等,本方案对此不作严格限制。
可选的,标志物可以是非对称物体,且具有一定纹理(例如可以是复杂纹理)。例如,当目标如车辆到达粗定位范围时,标志物占画面比例在1/10~1/2等。
在一种可能的实现方式中,所述根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿,包括步骤2021-2023,具体如下:
2021、根据所述目标的粗位姿和所述预设区域的全局点云地图,得到所述目标所在位置的局部地图;
其中,该预设区域的全局点云地图可以是通过获取预设区域的点云数据,根据所述点云数据建立全局点云地图。
可选的,通过使用同步定位与地图构建(simultaneous localization and mapping,SLAM)方法建立全局点云地图。例如,通过运行SLAM程序,将激光雷达Lidar扫描得到的点云数据进行处理,基于LIO-SAM建图算法建立全局点云地图。
当然,还可以基于其他算法,例如LOAM、Lego-LOAM、FAST-LIO等建图算法,本方案对此不作限制。
其中,基于粗位姿可以知道车辆或机器人的大致位置,例如选定车辆的大致位置阈值10cm的半径与全局点云地图进行叠加,可得到局部地图。
2022、根据所述目标所在位置的局部地图和语义定位全局地图得到语义定位局部地图,所述语义定位全局地图中包括M个标志物的全局位姿,所述语义定位局部地图中包括N个标志物的全局位姿,所述N个标志物为所述M个标志物中的标志物,M和N均为正整数;
该语义定位全局地图,可以是通过将标志物对应的语义元素与非标志物对应的语义元素进行区分而得到的。其中标志物的全局位姿存储在语义定位全局地图中,可供后续算法模块查询以得到标志物的全局位姿。
需要说明的是,该语义定位全局地图中是该预设区域中包含同一语义的所有元素的地图。例如,当标志物是桌子时,该语义定位全局地图可以理解为所有桌子的语义定位全局地图。
相应地,该语义定位局部地图即为该语义定位全局地图中的部分地图。上述语义定位局部地图中的N个标志物是语义定位全局地图中的M个标志物中的标志物。
例如,将局部地图和所述语义定位全局地图进行叠加,得到语义定位局部地图。例如,将局部地图的地图范围±10cm与语义定位全局地图叠加,这样可以将标志物和目标放置在一个局部地图中,通过在局部地图中进行处理,使得后续进行目标定位时更精准,效率更高。
在一种可能的实现方式中,语义定位全局地图可以通过如下步骤A1-A2获得:
A1、根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
可以理解的,该第一标志物的图像中可以仅包括该第一标志物,也可以包括与该第一标志物语义相同的多个标志物。也即该M可以是1,也可以是大于1的整数。
基于该图像可得到包括该第一标志物在内的M个标志物的带有纹理的三维模型。
可选的,通过获取例如深度摄像头等对标志物进行的各个角度的拍照得到的图像,然后使用BundleFusion算法进行三维重建,得到具有真实纹理的三维重建的标志物模型。当然还可以采用其他方式进行三维重建,例如Kintinuous、ElasticFusion等,本方案对此不作严格限制。
A2、将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图。
基于得到的标志物的三维模型以及上述全局点云地图,通过点云配准方法,进而将标志物的三维模型配准到全局点云地图中。可选的,点云配准的方法例如可以是使用TEASER++方法。
当然还可以采用其他算法,例如迭代最近点算法(Iterative Closest Point,ICP)等,本方案对此不作严格限制。
其中,通过点云配准,语义定位全局地图里每个标志物的全局位姿是已知的。可选的,可通过查询语义定位全局地图对应的数据库即可得到每个标志物的全局位姿。
举例说明:标志物可以是电箱,例如语义定位全局地图中的第一柱状物对应实际变电站场景中的电箱,其全局位姿包括:3维位置为X=24,Y=6,Z=0;3维空间朝向为俯仰角pitch=0°,航向角yaw=90°,横滚角roll=0°。
需要说明的是,该得到标志物的语义定位全局地图的方式可以是预先得到的,且可以重复使用的。上述示例仅以一种方式为例进行介绍,其还可以通过其他方式获取到,本方案对此不作严格限制。
2023、根据所述第一标志物的图像从所述语义定位局部地图中获取所述第一标志物的全局位姿。
由于语义定位全局地图中可能有多个标志物,因此为了准确的获取该第一标志物,通过先获取语义定位局部地图,筛选掉一部分标志物,然后根据所述第一标志物的图像再从该语义定位局部地图中确定第一标志物,进而获取所述第一标志物的全局位姿。
例如,通过视觉摄像头拍摄的画面中有3个标志物,通过选择最大掩码面积的标志物确 定为第一标志物,从而排除掉其他两个掩码面积较小的标志物,留下唯一的标志物,基于语义定位局部地图进而得到第一标志物的全局位姿。
当然还可以采用其他方法来确定唯一标志物,本方案对此不作严格限制。
203、根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿;
可选的,基于车辆或机器人上带有的相机进行拍摄,以得到第一标志物的图像。基于该图像可得到该标志物在相机坐标系下的位姿。
在一种可能的实现方式中,可以是基于深度学习的方法进行处理得到标志物在相机坐标系下的位姿。例如,将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述预设模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,和/或,对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理得到的。
该预设模型可以是任意图像处理模型,本方案对此不作严格限制。例如,基于多个训练图像数据对初始模型进行训练,基于预设的停止条件不断迭代调整初始模型的参数直到达到该停止条件,即得到训练好的预设模型。可选的,该停止条件可以是训练次数达到100次等,或者损失值小于预设阈值等。
在另一种可能的实现方式中,通过将例如相机拍摄的标志物的二维图像输入到物体定位算法中进行处理,得到6DoF的标志物相对相机的位姿。
然后,基于目标如车辆或机器人等自身结构决定的标定好的坐标变换,把标志物相对相机的位姿转换为标志物相对车辆或机器人的位姿。
其中,该物体定位算法例如可以是PVnet算法。当然,还可以是其他算法,例如DenseFusion等,本方案对此不作严格限制。
204、获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
该目标在所述相机坐标系下的位姿可以是预设的。可选的,通过预先将相机设置于车辆或机器人上,则相机相对于车辆或机器人的中心具有一个相对位置,即为目标在所述相机坐标系下的位姿。
基于上述第一标志物在所述相机坐标系下的位姿,以及该目标在所述相机坐标系下的位姿,通过坐标转换可得到第一标志物与所述目标之间的相对位姿。
205、根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。
结合标志物相对车辆或机器人的位姿,得到车辆或机器人的全局位姿,作为最终的精位姿6DoF输出。
例如,通过常规的坐标转换把标志物的全局位姿传递到标志物相对车辆或机器人的位姿,从而得到车辆或机器人的位姿。当然还可以采用其他方式进行处理得到目标的全局位姿,本方案对此不作严格限制。
本申请实施例,车辆、机器人或者服务器等通过先对目标(如无人驾驶车辆、机器人等)进行粗定位,然后基于第一标志物的图像和目标的粗位姿得到第一标志物的全局位姿,并根据目标在相机坐标系下的位姿和第一标志物在相机坐标系下的位姿得到第一标志物与目标之间的相对位姿,进而根据第一标志物的全局位姿以及第一标志物与目标之间的相对位姿得到 目标的全局位姿。采用该手段,先对目标进行粗定位,再结合第一标志物的图像以及该粗位姿对目标进行精定位,这样可以有助于获得超高精度的目标的全局位姿估计。
例如,上述粗定位是基于激光雷达来进行的,精定位是基于视觉摄像头进行的。本方案先进行基于激光雷达的粗定位,再进行基于视觉摄像头的精定位,其中激光雷达定位的精度在5-10cm,视觉精定位可以做到1-2cm左右的定位精度。激光雷达粗定位和视觉精定位的结合,可以满足客户的需求,且可以实现对目标进行高精度、高鲁棒性、高性价比的定位。
参照图3所示,是本申请实施例提供的另一种目标定位方法的流程示意图。可选的,该方法可以应用于前述的目标定位系统,例如图1a所示的目标定位系统。如图3所示的目标定位方法可以包括步骤301-308。应理解,本申请为了方便描述,故通过301-308这一顺序进行描述,并不旨在限定一定通过上述顺序进行执行。本申请实施例对于上述一个或多个步骤的执行的先后顺序、执行的时间、执行的次数等不做限定。下文以目标定位方法的步骤301-308的执行主体为车辆为例进行描述,对于其他执行主体例如机器人或者服务器等,本申请同样也适用。步骤301-308具体如下:
301、获取预设区域中的第一标志物的图像,并根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
可选的,该标志物的图像可以是通过相机拍摄得到的。
针对步骤301的介绍可参阅前述实施例中的记载,在此不再赘述。
302、获取所述预设区域的点云数据,并根据所述点云数据建立全局点云地图;
针对步骤302的介绍可参阅前述实施例中的记载,在此不再赘述。
303、将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图;
针对步骤303的介绍可参阅前述实施例中的记载,在此不再赘述。
如图4所示,为本申请实施例提供的一种目标定位示意图。该图示出了标志物的语义定位地图可以是在离线模式下进行的。也即步骤301-303可以是在离线模式下进行的。其中,与在线模式相比,离线模式是异步运行的,在线模式是实时运行。离线模式一般会在在线模式运行前就运行一次。
需要说明的是,该步骤301-303可以是一次执行,然后可以直接拿来重复使用。
例如,在得到语义定位全局地图后,则可以将其应用于该预设区域的其他多种目标定位,比如其他车辆定位,或者机器人定位,或者行人定位等,本方案对此不作严格限制。
本方案通过选择特定应用场景中已经自带的物体作为定位标志物,相较于现有技术中人工布置二维码,本方案采用该手段,不会破坏特定应用场景,减少了人力成本,可广泛应用于定位领域。通过建立离线的语义定位全局地图,可以获得超高精度的标志物相对全局的位姿,有助于提升最终目标的精位姿的精度。
图4示出了步骤304-307是在在线模式下进行的。具体如下:
304、对所述预设区域中的目标进行粗定位,得到所述目标的粗位姿;
在一种可能的实现方式中,通过车辆的激光雷达来对目标进行粗定位。针对该步骤的介绍,可参阅前述实施例中的记载,在此不再赘述。
图4示出了基于激光雷达Lidar和惯性测量单元IMU来对目标进行粗定位的方式。其中,基于激光雷达扫描得到的数据进行降采样,并进行点云去畸变处理,然后基于惯性测量单元IMU获取的数据与点云去畸变处理后得到的数据进行点云配准,最后基于全局地图进行全局 优化,进而得到目标的粗位姿。
305、根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
可选的,如图4所示,当视觉摄像头(相机)中的标志物的个数大于1个时,可通过确定一个标志物,然后基于该标志物进行目标的定位。
例如,视觉摄像头的画面中有3个标志物,则可以选择最大掩码面积的标志物,排除掉其他两个掩码面积较小的标志物,留下唯一的标志物来用于目标定位使用。
具体地,对该唯一标志物基于语义定位局部地图进行标志物定位,得到标志物的全局位姿。针对该部分的介绍,可参阅前述实施例的记载,在此不再赘述。
306、根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿;
在一种可能的实现方式中,将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,并对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理得到的。
例如,变换观察标志物的角度和背景,以便对训练数据进行数据扩增。具体地,对左右隔10°和上下隔10°拍摄的标志物的二维图片进行背景扣除,并替换成特定应用场景中的其他背景。背景和标志物均可进行高斯模糊(高斯核大小1~5)、平移(随机上下左右平移范围1-30个像素)、裁剪(随机上下左右裁剪范围1-30个像素)、对比度变换(对比度随机调节范围±20%)、Gamma变换(Gamma参数0.01-0.2)、放大(随机放大比例1-10%)、缩小(随机缩小比例1-10%)等数据扩增操作。这样基于深度学习算法对模型进行训练,进而可以使得算法的稳定性更强,精度更高,以及鲁棒性也更高。
307、获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
针对该部分的介绍,可参阅前述实施例的记载,在此不再赘述。
308、根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。
通过常规的坐标转换把第一标志物的全局位姿传递到第一标志物相对车辆或机器人的位姿,从而得到车辆或机器人的位姿。当然还可以采用其他方式进行处理得到目标的全局位姿,本方案对此不作严格限制。
该实施例提供一种全新的融合激光雷达与视觉传感器的高精定位方法。首先,以离线的方式建立语义定位全局地图。然后,启动在线定位模式(例如机器人或自动驾驶车辆先进行基于激光雷达的粗定位,再进行基于视觉摄像头的精定位)。这样可以有助于获得超高精度的目标的全局位姿估计。
采用离线模块与在线模块分离的方式,可以极大地减少在线模块的算力消耗,使得例如车辆或机器人的硬件成本大大降低、续航大大提升。
另一方面,使用激光雷达进行粗定位,可以满足视觉精定位需要标志物在画面中大于1/10的最低运行要求;否则如果只使用视觉精定位模块的话,无法满足视觉精定位的最低启动要求。
在一方面,激光雷达定位的精度在5-10cm,很难满足客户的1-2cm的高精定位需求,而视觉精定位可以做到1-2cm左右的定位精度。激光雷达粗定位和视觉精定位的结合,可以满 足客户的需求。
需要说明的是,本方案可以广泛应用于无人驾驶车辆、机器人定位等领域,例如可以是电力巡检场景、无人出租车、电力巡检、园区巡检、油气巡检、地质勘探、物流运输、家庭服务、无人核酸检测等大量场景。当然还可以是其他领域或场景,本方案对此不作限制。
需要说明的是,在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,各个实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的装置。可以理解的,本申请各个装置实施例中,对多个单元或者模块的划分仅是一种根据功能进行的逻辑划分,不作为对装置具体的结构的限定。在具体实现中,其中部分功能模块可能被细分为更多细小的功能模块,部分功能模块也可能组合成一个功能模块,但无论这些功能模块是进行了细分还是组合,装置所执行的大致流程是相同的。例如,一些装置中包含接收单元和发送单元。一些设计中,发送单元和接收单元也可以集成为通信单元,该通信单元可以实现接收单元和发送单元所实现的功能。通常,每个单元都对应有各自的程序代码(或者说程序指令),这些单元各自对应的程序代码在处理器上运行时,使得该单元受处理单元的控制而执行相应的流程从而实现相应功能。
本申请实施例还提供用于实现以上任一种方法的装置,例如,提供一种目标定位装置包括用以实现以上任一种方法中车辆所执行的各步骤的模块(或手段)。
例如,参照图5所示,是本申请实施例提供的一种目标定位装置的结构示意图。该目标定位装置用于实现前述的目标定位方法,例如图2、图3所示的目标定位方法。
如图5所示,该装置可包括粗定位模块501、第一处理模块502、第二处理模块503、第三处理模块504和定位模块505,具体如下:
粗定位模块501,用于对预设区域中的目标进行粗定位,得到所述目标的粗位姿;
第一处理模块502,用于获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
第二处理模块503,用于根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿;
第三处理模块504,用于获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
定位模块505,用于根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。
在一种可能的实现方式中,所述第一处理模块502,用于:
根据所述目标的粗位姿和所述预设区域的全局点云地图,得到所述目标所在位置的局部地图;
根据所述目标所在位置的局部地图和语义定位全局地图得到语义定位局部地图,所述语义定位全局地图中包括M个标志物的全局位姿,所述语义定位局部地图中包括N个标志物的全局位姿,所述N个标志物为所述M个标志物中的标志物,M和N均为正整数;
根据所述第一标志物的图像从所述语义定位局部地图中获取所述第一标志物的全局位姿。
在一种可能的实现方式中,所述装置还包括第四处理模块,用于:
根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图。
在一种可能的实现方式中,所述第二处理模块503,还用于:
将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述预设模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,和/或,对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理得到的。
其中,粗定位模块501、第一处理模块502、第二处理模块503、第三处理模块504和定位模块505均可以通过软件实现,或者可以通过硬件实现。示例性的,接下来以粗定位模块501为例,介绍粗定位模块501的实现方式。类似的,第一处理模块502、第二处理模块503、第三处理模块504和定位模块505的实现方式可以参考粗定位模块501的实现方式。
模块作为软件功能单元的一种举例,粗定位模块501可以包括运行在计算实例上的代码。其中,计算实例可以包括物理主机(计算设备)、虚拟机、容器中的至少一种。进一步地,上述计算实例可以是一台或者多台。例如,粗定位模块501可以包括运行在多个主机/虚拟机/容器上的代码。需要说明的是,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的区域(region)中,也可以分布在不同的region中。进一步地,用于运行该代码的多个主机/虚拟机/容器可以分布在相同的可用区(availability zone,AZ)中,也可以分布在不同的AZ中,每个AZ包括一个数据中心或多个地理位置相近的数据中心。其中,通常一个region可以包括多个AZ。
同样,用于运行该代码的多个主机/虚拟机/容器可以分布在同一个虚拟私有云(virtual private cloud,VPC)中,也可以分布在多个VPC中。其中,通常一个VPC设置在一个region内,同一region内两个VPC之间,以及不同region的VPC之间跨区通信需在每个VPC内设置通信网关,经通信网关实现VPC之间的互连。
模块作为硬件功能单元的一种举例,粗定位模块501可以包括至少一个计算设备,如服务器等。或者,粗定位模块501也可以是利用专用集成电路(application-specific integrated circuit,ASIC)实现、或可编程逻辑器件(programmable logic device,PLD)实现的设备等。其中,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD)、现场可编程门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合实现。
粗定位模块501包括的多个计算设备可以分布在相同的region中,也可以分布在不同的region中。粗定位模块501包括的多个计算设备可以分布在相同的AZ中,也可以分布在不同的AZ中。同样,粗定位模块501包括的多个计算设备可以分布在同一个VPC中,也可以分布在多个VPC中。其中,所述多个计算设备可以是服务器、ASIC、PLD、CPLD、FPGA和GAL等计算设备的任意组合。
需要说明的是,在其他实施例中,粗定位模块501可以用于执行目标处理方法中的任意步骤,第一处理模块502、第二处理模块503、第三处理模块504和定位模块505均可以用于执行目标定位方法中的任意步骤,粗定位模块501、第一处理模块502、第二处理模块503、 第三处理模块504和定位模块505负责实现的步骤可根据需要指定,通过粗定位模块501、第一处理模块502、第二处理模块503、第三处理模块504和定位模块505分别实现目标定位方法中不同的步骤来实现目标定位装置的全部功能。
应理解以上各个装置中各模块的划分仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。此外,目标定位装置中的模块可以以处理器调用软件的形式实现;例如目标定位装置包括处理器,处理器与存储器连接,存储器中存储有指令,处理器调用存储器中存储的指令,以实现以上任一种方法或实现该装置各模块的功能,其中处理器例如为通用处理器,比如中央处理单元(central processing unit,CPU)或微处理器,存储器为装置内的存储器或装置外的存储器。或者,装置中的模块可以以硬件电路的形式实现,可以通过对硬件电路的设计实现部分或全部单元的功能,该硬件电路可以理解为一个或多个处理器;例如,在一种实现中,该硬件电路为专用集成电路(application-specific integrated circuit,ASIC),通过对电路内元件逻辑关系的设计,实现以上部分或全部单元的功能;再如,在另一种实现中,该硬件电路为可以通过可编程逻辑器件(programmable logic device,PLD)实现,以现场可编程门阵列(field programmable gate array,FPGA)为例,其可以包括大量逻辑门电路,通过配置文件来配置逻辑门电路之间的连接关系,从而实现以上部分或全部单元的功能。以上装置的所有模块可以全部通过处理器调用软件的形式实现,或全部通过硬件电路的形式实现,或部分通过处理器调用软件的形式实现,剩余部分通过硬件电路的形式实现。
参照图6所示,是本申请实施例提供的一种计算设备的硬件结构示意图。如图6所示,计算设备600包括:总线602、处理器604、存储器606和通信接口608。处理器604、存储器606和通信接口608之间通过总线602通信。计算设备600可以是服务器或终端设备。应理解,本申请不限定计算设备600中的处理器、存储器的个数。
总线602可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。总线602可包括在计算设备600各个部件(例如,存储器606、处理器604、通信接口608)之间传送信息的通路。
处理器604可以包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、微处理器(micro processor,MP)或者数字信号处理器(digital signal processor,DSP)等处理器中的任意一种或多种。
存储器606可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM)。处理器604还可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器,机械硬盘(hard disk drive,HDD)或固态硬盘(solid state drive,SSD)。
存储器606中存储有可执行的程序代码,处理器604执行该可执行的程序代码以分别实现前述粗定位模块501、第一处理模块502、第二处理模块503、第三处理模块504和定位模块505的功能,从而实现目标定位方法。也即,存储器606上存有用于执行目标定位方法的指令。
通信接口608使用例如但不限于网络接口卡、收发器一类的收发模块,来实现计算设备 600与其他设备或通信网络之间的通信。
应注意,尽管图6所示的计算设备600仅仅示出了总线602、处理器604、存储器606和通信接口608,但是在具体实现过程中,本领域的技术人员应当理解,计算设备600还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,计算设备600还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,计算设备600也可仅仅包括实现本申请实施例所必须的器件,而不必包括图6中所示的全部器件。
本申请实施例还提供了一种计算设备集群。该计算设备集群包括至少一台计算设备。该计算设备可以是服务器,例如是中心服务器、边缘服务器,或者是本地数据中心中的本地服务器。在一些实施例中,计算设备也可以是台式机、笔记本电脑或者智能手机等终端设备。
如图7所示,所述计算设备集群包括至少一个计算设备600。计算设备集群中的一个或多个计算设备600中的存储器606中可以存有相同的用于执行目标定位方法的指令。
在一些可能的实现方式中,该计算设备集群中的一个或多个计算设备600的存储器606中也可以分别存有用于执行目标定位方法的部分指令。换言之,一个或多个计算设备600的组合可以共同执行用于执行目标定位方法的指令。
需要说明的是,计算设备集群中的不同的计算设备600中的存储器606可以存储不同的指令,分别用于执行目标定位装置的部分功能。也即,不同的计算设备600中的存储器606存储的指令可以实现粗定位模块501、第一处理模块502、第二处理模块503、第三处理模块504和定位模块505中的一个或多个模块的功能。
在一些可能的实现方式中,计算设备集群中的一个或多个计算设备可以通过网络连接。其中,所述网络可以是广域网或局域网等等。两个计算设备之间通过网络进行连接。具体地,通过各个计算设备中的通信接口与所述网络进行连接。在这一类可能的实现方式中,第一计算设备中的存储器中存有执行粗定位模块501的功能的指令。同时,第二计算设备中的存储器中存有执行第一处理模块502、第二处理模块503、第三处理模块504和定位模块505的功能的指令。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请实施例还提供了一种包含指令的计算机程序产品。当该计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
应理解,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;其中A,B可以是单数或者复数。并且,在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例 如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。所显示或讨论的相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者通过该计算机可读存储介质进行传输。该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是只读存储器(read-only memory,ROM),或随机存取存储器(random access memory,RAM),或磁性介质,例如,软盘、硬盘、磁带、磁碟、或光介质,例如,数字通用光盘(digital versatile disc,DVD)、或者半导体介质,例如,固态硬盘(solid state disk,SSD)等。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (11)

  1. 一种目标定位方法,其特征在于,包括:
    对预设区域中的目标进行粗定位,得到所述目标的粗位姿;
    获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
    根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿;
    获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
    根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿,包括:
    根据所述目标的粗位姿和所述预设区域的全局点云地图,得到所述目标所在位置的局部地图;
    根据所述目标所在位置的局部地图和语义定位全局地图得到语义定位局部地图,所述语义定位全局地图中包括M个标志物的全局位姿,所述语义定位局部地图中包括N个标志物的全局位姿,所述N个标志物为所述M个标志物中的标志物,M和N均为正整数;
    根据所述第一标志物的图像从所述语义定位局部地图中获取所述第一标志物的全局位姿。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
    将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的位姿,包括:
    将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述预设模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,和/或,对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理得到的。
  5. 一种目标定位装置,其特征在于,包括:
    粗定位模块,用于对预设区域中的目标进行粗定位,得到所述目标的粗位姿;
    第一处理模块,用于获取所述预设区域中的第一标志物的图像,并根据所述目标的粗位姿以及所述第一标志物的图像得到所述第一标志物的全局位姿;
    第二处理模块,用于根据所述第一标志物的图像得到所述第一标志物在相机坐标系下的 位姿;
    第三处理模块,用于获取所述目标在所述相机坐标系下的位姿,并根据所述目标在所述相机坐标系下的位姿和所述第一标志物在所述相机坐标系下的位姿得到所述第一标志物与所述目标之间的相对位姿;
    定位模块,用于根据所述第一标志物的全局位姿以及所述第一标志物与所述目标之间的相对位姿得到所述目标的全局位姿。
  6. 根据权利要求5所述的装置,其特征在于,所述第一处理模块,用于:
    根据所述目标的粗位姿和所述预设区域的全局点云地图,得到所述目标所在位置的局部地图;
    根据所述目标所在位置的局部地图和语义定位全局地图得到语义定位局部地图,所述语义定位全局地图中包括M个标志物的全局位姿,所述语义定位局部地图中包括N个标志物的全局位姿,所述N个标志物为所述M个标志物中的标志物,M和N均为正整数;
    根据所述第一标志物的图像从所述语义定位局部地图中获取所述第一标志物的全局位姿。
  7. 根据权利要求6所述的装置,其特征在于,所述装置还包括第四处理模块,用于:
    根据所述第一标志物的图像对所述图像中的M个标志物进行三维重建,以得到所述M个标志物的带有纹理的三维模型;
    将所述M个标志物的带有纹理的三维模型配准至所述全局点云地图中,以得到所述语义定位全局地图。
  8. 根据权利要求5至7任一项所述的装置,其特征在于,所述第二处理模块,还用于:
    将所述第一标志物的图像输入至预设模型中进行处理,得到所述第一标志物在相机坐标系下的位姿,其中,所述预设模型的训练数据是通过将初始训练图像数据中的背景进行替换、高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理,和/或,对所述初始训练图像数据中的标志物进行高斯模糊、平移、裁剪、对比度变换、Gamma变换、放大、缩小中的一项或多项处理得到的。
  9. 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;其中,所述至少一个计算设备的处理器用于执行所述至少一个计算设备的存储器中存储的指令,以使得所述计算设备集群执行如权利要求1至4任意一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至4任意一项所述的方法。
  11. 一种计算机程序产品,其特征在于,当计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至4任意一项所述的方法。
PCT/CN2023/086234 2022-08-16 2023-04-04 目标定位方法及相关系统、存储介质 WO2024036984A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210980993.8A CN117635721A (zh) 2022-08-16 2022-08-16 目标定位方法及相关系统、存储介质
CN202210980993.8 2022-08-16

Publications (1)

Publication Number Publication Date
WO2024036984A1 true WO2024036984A1 (zh) 2024-02-22

Family

ID=89940526

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/086234 WO2024036984A1 (zh) 2022-08-16 2023-04-04 目标定位方法及相关系统、存储介质

Country Status (2)

Country Link
CN (1) CN117635721A (zh)
WO (1) WO2024036984A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118038008B (zh) * 2024-04-15 2024-07-12 武汉人云智物科技有限公司 基于ptz多摄像头联动的水电厂人员定位方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102967305A (zh) * 2012-10-26 2013-03-13 南京信息工程大学 基于大小回字标志物的多旋翼无人机位姿获取方法
CN111369622A (zh) * 2018-12-25 2020-07-03 中国电子科技集团公司第十五研究所 虚实叠加应用的相机世界坐标位置获取方法、装置和系统
CN112836698A (zh) * 2020-12-31 2021-05-25 北京纵目安驰智能科技有限公司 一种定位方法、装置、存储介质及电子设备
US20210302993A1 (en) * 2020-03-26 2021-09-30 Here Global B.V. Method and apparatus for self localization
CN114581509A (zh) * 2020-12-02 2022-06-03 魔门塔(苏州)科技有限公司 一种目标定位方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102967305A (zh) * 2012-10-26 2013-03-13 南京信息工程大学 基于大小回字标志物的多旋翼无人机位姿获取方法
CN111369622A (zh) * 2018-12-25 2020-07-03 中国电子科技集团公司第十五研究所 虚实叠加应用的相机世界坐标位置获取方法、装置和系统
US20210302993A1 (en) * 2020-03-26 2021-09-30 Here Global B.V. Method and apparatus for self localization
CN114581509A (zh) * 2020-12-02 2022-06-03 魔门塔(苏州)科技有限公司 一种目标定位方法及装置
CN112836698A (zh) * 2020-12-31 2021-05-25 北京纵目安驰智能科技有限公司 一种定位方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN117635721A (zh) 2024-03-01

Similar Documents

Publication Publication Date Title
US11852729B2 (en) Ground intensity LIDAR localizer
EP3759562B1 (en) Camera based localization for autonomous vehicles
CN110497901B (zh) 一种基于机器人vslam技术的泊车位自动搜索方法和系统
JP6644742B2 (ja) 頑健で効率的な車両測位用のアルゴリズム及びインフラ
CN111417871A (zh) 基于激光雷达利用高清晰度地图的集成运动估计的迭代最近点处理
CN112740268B (zh) 目标检测方法和装置
US20210365038A1 (en) Local sensing based autonomous navigation, and associated systems and methods
CN108983781A (zh) 一种无人车目标搜索系统中的环境探测方法
EP3645971B1 (en) Map feature identification using motion data and surfel data
WO2021003487A1 (en) Training data generation for dynamic objects using high definition map data
JP2016157197A (ja) 自己位置推定装置、自己位置推定方法およびプログラム
Wu et al. Robust LiDAR-based localization scheme for unmanned ground vehicle via multisensor fusion
CN111402387A (zh) 从用于导航自主车辆的高清晰度地图的点云中移除短时点
Christensen et al. Autonomous vehicles for micro-mobility
US20210072041A1 (en) Sensor localization from external source data
WO2024036984A1 (zh) 目标定位方法及相关系统、存储介质
CN114092660A (zh) 高精地图生成方法、装置及用于生成地图的车辆
CN113366488A (zh) 具有自动对象标记方法和设备的自主驾驶数据集生成
WO2022115213A1 (en) Motion planning in curvilinear coordinates for autonomous vehicles
CN117470258A (zh) 一种地图构建方法、装置、设备及介质
Li et al. Intelligent vehicle localization and navigation based on intersection fingerprint roadmap (IRM) in underground parking lots
Ren et al. SLAM in Autonomous Driving
US20240219184A1 (en) Object aided localization without complete object information
CN112556701A (zh) 用于定位交通工具的方法、装置、设备和存储介质
WO2024144926A1 (en) Object aided localization without complete object information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23853897

Country of ref document: EP

Kind code of ref document: A1