CN114051628A

CN114051628A - Method and device for determining target object point cloud set

Info

Publication number: CN114051628A
Application number: CN202080047348.XA
Authority: CN
Inventors: 高海涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-02-15
Anticipated expiration: 2040-10-30
Also published as: CN114051628B; WO2022088104A1

Abstract

A method and a device for determining a cloud set of target object points are applied to the fields of automatic driving or intelligent driving and the like, and the cloud set corresponding to a target object can be determined. The method comprises the following steps: acquiring image data from a vision sensor and point cloud data from a detection sensor; obtaining at least one three-dimensional (3D) cone space corresponding to a target object in a first coordinate system; obtaining a plurality of candidate point cloud sets according to the at least one 3D cone space; and determining a point cloud set of the target object in the plurality of candidate point cloud sets. The scheme can be further used for improving the capability of an automatic driving or Advanced Driving Assistance System (ADAS) and can be applied to the Internet of vehicles, such as vehicle external connection V2X, workshop communication long term evolution technology (LTE-V), vehicle-vehicle V2V and the like.

Description

Method and device for determining target object point cloud set

Technical Field

The application relates to the field of automatic driving, in particular to a method and a device for determining a target object point cloud set.

Background

Along with the development of cities, traffic is more and more congested, and people tend to be fatigued when driving. In order to meet the travel requirements of people, the automatic driving application is used. The key to autonomous driving is the ability to identify the surrounding road environment with high accuracy, so that autonomous driving is safe and reliable. The existing automatic driving vehicle is provided with a laser sensor, peripheral laser data are obtained in real time, and a high-precision map is combined for the vehicle to make a correct driving decision.

In the process of making a high-precision map, some objects in the collected laser point cloud are temporary objects, the positioning precision of the objects is influenced by the change of the objects along with time, and the temporary objects are not suitable for being used as a part of the map and need to be removed in the process of making the map. A temporary object comprising: moving objects (e.g., pedestrians, moving cars) and movable objects (e.g., stationary bicycles and cars, etc.). How to determine a point cloud set of a target object in a laser point cloud is a technical problem to be solved by the embodiment of the application.

Disclosure of Invention

The application provides a method and a device for determining a target object point cloud set, so as to determine a point cloud set corresponding to a target object.

In a first aspect, a method for determining a cloud set of target object points is provided, the method comprising: acquiring image data from a vision sensor and point cloud data from a detection sensor; determining at least one 3D cone space in a first coordinate system corresponding to a detection sensor according to a target object included in the image data; obtaining a plurality of candidate point cloud sets according to the at least one 3D cone space; and determining a point cloud set of the target object in the plurality of candidate point cloud sets. Optionally, the 3D cone space may be a projection of the target object in a first coordinate system, and the 3D cone space may include a plurality of point cloud data in the first coordinate system; the point cloud set of the target object may include a plurality of point cloud data.

By implementing the method, the image data from the vision sensor is used for completing the fusion with the point cloud data of the detection sensor, the point cloud set of the target object is determined, the depth information of the target object does not need to be acquired, and the requirement on the vision sensor is reduced.

Optionally, after the point cloud set of the target object is obtained, the point cloud set of the target object may be used to identify the target object, or a high-precision map may be created, or positioning may be performed, without limitation. Further, if the method is used for manufacturing a high-precision map, or positioning, etc., the method may further include: and removing the point cloud set of the target object from the point cloud data of all objects around the detection sensor, so as to obtain second point cloud data and the like.

In a possible implementation manner, the determining at least one 3D conical space in the first coordinate system corresponding to the detection sensor according to the target object included in the image data includes: identifying a target object and a contour of the target object in the image data; for example, the target object may be identified in the above-described image data using an AI identification algorithm; identifying a contour of the target object in the target object; and determining the 3D conical space in the first coordinate system corresponding to the detection sensor according to the contour of the target object.

By the method, the 3D conical space can be projected in the first coordinate system according to the contour of the target object, the point cloud set of the target object is determined in the point cloud data included in the 3D conical space, and the accuracy of identifying the point cloud set of the target object can be improved.

In a possible implementation manner, the process of obtaining the 3D cone space according to the contour of the target object includes: acquiring pixels included in the contour of a target object, converting coordinates of the pixels in a pixel coordinate system into coordinates in a second coordinate system (a camera coordinate system) corresponding to a vision sensor, wherein the coordinates in the second coordinate system corresponding to the contour of the target object can be called a 3D point set; converting the 3D point set from the second coordinate system to the first coordinate system; in a first coordinate system, performing curve fitting on coordinates corresponding to the contour of the target object to obtain a fitted curve; and projecting rays to the fitted curve from the origin of the first coordinate to obtain a 3D conical space.

By the method, the transformation from the outline of the target object to the 3D conical space in the first coordinate system can be realized, and the corresponding space range of the target object in the first coordinate system is obtained. The point cloud data included in the spatial range includes a point cloud set corresponding to the target object.

In one possible implementation, the converting the 3D point set from the second coordinate system to the first coordinate system includes: and converting the 3D point set from the second coordinate system to the first coordinate system according to the first conversion relation from the second coordinate system to the vehicle body coordinate system and the second conversion relation from the first coordinate system to the vehicle body coordinate system.

Of course, in the embodiment of the present application, the conversion relationship between the second coordinate system and the first coordinate system may also be directly established, and the conversion between the second coordinate system and the first coordinate system is simplified without performing the conversion between the vehicle body coordinate system.

Optionally, the converting the contour of the target object from the coordinates in the pixel coordinate system to the coordinates in the second coordinate system includes: the coordinates of the contour of the target object in the pixel coordinate system may be converted into coordinates in the second coordinate system according to the internal reference of the second coordinate system.

Optionally, the method further includes: from one 3D point set, a plurality of 3D point sets corresponding to a plurality of magnifications are obtained, and each 3D point set may correspond to a different magnification. Each 3D point set corresponds to a 3D cone space.

In a possible implementation manner, the determining a point cloud set of the target object according to the 3D cone space includes: clustering point clouds included in the 3D conical space to obtain a plurality of candidate point cloud sets; and determining the point cloud sets meeting the conditions of the target object in the candidate point cloud sets.

It should be noted that there are multiple understandings of the point cloud sets that satisfy the condition of the target object, and one understandings is: and the target object is conditioned to have the highest credibility of the point cloud sets in the candidate point cloud sets. For example, the credibility of a plurality of candidate point cloud sets obtained by clustering can be calculated, wherein each candidate point cloud set is obtained by clustering; selecting a point cloud set with the highest reliability as a point cloud set of a target object in the candidate point cloud sets; alternatively, another is understood as: the target object is conditioned by that the credibility of the point cloud set in the multiple candidate point cloud sets is greater than or equal to a first threshold, and the first threshold may be preconfigured, predefined, or factory set, and the like, without limitation.

Optionally, the process of clustering point clouds included in the 3D cone space to obtain a plurality of candidate point cloud sets may include: determining a first distance from the point cloud included in each 3D cone space to the origin of the first coordinate system, which may be an average distance from all the point clouds included in the 3D cone space to the origin of the first coordinate system; determining a coefficient corresponding to the first distance according to the corresponding relation between the distance and the coefficient; and clustering point clouds included in the 3D conical space according to the coefficient corresponding to the first distance to obtain a plurality of candidate point cloud sets, wherein each candidate point cloud set comprises at least one point cloud.

By implementing the method, the point cloud set corresponding to the target object can be obtained without obtaining the depth information of the target object, and the complexity of the scheme is reduced. Meanwhile, the outlines of the target object and the target object are identified by means of an AI algorithm, so that the accuracy of the scheme for identifying the point cloud set of the target object can be improved.

In a second aspect, there is provided an apparatus for implementing any one of the above first aspect or the first aspect, including corresponding functional modules or units, respectively, for implementing the steps in the above method. The functions can be realized by hardware, and corresponding software can be executed by hardware, and the hardware or the software comprises one or more modules or units corresponding to the functions.

In a third aspect, an apparatus is provided that includes at least one processor and at least one memory. Wherein the at least one memory is to store computing programs or instructions, the at least one processor is coupled with the at least one memory; the computer program or instructions, when executed by a processor, cause the apparatus to perform the method of the first aspect or any of the first aspects described above.

In a fourth aspect, a sensor or fusion device is provided, which may be a detection sensor such as a laser sensor, e.g. a lidar. The sensor or fusion device may comprise a device as described in the second or third aspects above.

In a fifth aspect, a terminal is provided, which may comprise the apparatus of the second or third aspect, or the sensor or the fusion apparatus of the fourth aspect. Optionally, the terminal may be an intelligent transportation device (vehicle or unmanned aerial vehicle), an intelligent home device, an intelligent manufacturing device, or a robot. The intelligent transport device may be, for example, an Automated Guided Vehicle (AGV), or an unmanned transport vehicle.

In a sixth aspect, a system is provided comprising the apparatus of the second or third aspect, a detection sensor, and a vision sensor;

in a seventh aspect, a computer-readable storage medium is provided, in which a computer program or instructions are stored, which, when executed by an apparatus, cause the apparatus to perform the method of the first aspect or any one of the first aspects.

In an eighth aspect, the present application provides a computer program product comprising a computer program or instructions which, when executed by an apparatus, causes the apparatus to perform the method of the first aspect or any one of the first aspects.

Drawings

Fig. 1, fig. 2 and fig. 3 are diagrams of a current solution for removing a cloud set of target object points according to an embodiment of the present application;

fig. 4 is a flowchart of a method for determining a cloud set of target object points according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a pixel coordinate system to camera coordinate system conversion provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a fitted curve provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a 3D point set and projected 3D cone provided by an embodiment of the present application;

fig. 8 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 9 is a schematic diagram of removing a target object point cloud according to an embodiment of the present disclosure;

fig. 10 and 11 are schematic structural diagrams of an apparatus provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described in detail and in full below with reference to the accompanying drawings in the embodiments of the present application.

The automatic driving of the automobile has high navigation and positioning requirements, and the positioning method of high-precision map and laser radar point cloud matching is an important method for high-precision navigation and positioning. In the process of manufacturing a high-precision map, some objects in the collected laser point cloud are temporary objects, and the objects can change along with time to influence the positioning precision. These temporary objects are not suitable as part of the map and need to be removed during the drawing process. At present, the schemes for removing a target object point cloud set from a laser point cloud mainly include the following:

first, as shown in FIG. 1, the point cloud set of the target object may be manually cleared using an editing tool. The problems of high labor cost, easy misoperation and the like exist. In the positioning process, the point cloud data needs to be processed in real time, and cannot be manually removed, so that the positioning precision is influenced.

In the second scheme, as shown in fig. 2, an Artificial Intelligence (AI) or a clustering algorithm may be used to directly identify a target object from the laser point cloud and remove the target object from the point cloud. Because the points in the laser point cloud are three-dimensional, the three-dimensional points are unstructured data, have the characteristics of sparsity, disorder, non-uniform distribution, large quantity change and the like, and are difficult to deeply learn. Meanwhile, the laser point cloud has divergence, and object points at a distance are sparse, so that the identification difficulty is high or the identification cannot be realized.

In the third scheme, as shown in fig. 3, a laser and vision combined method identifies a target object by using an AI algorithm, identifies depth information of the target object by using a multi-view camera or a depth camera, calculates a three-dimensional (3D) position of an object based on the depth information of the target object, and then finds a point cloud of the 3D position in a laser point cloud and removes the point cloud. The 3D position of the target object is identified from the image by depending on a multi-view camera or a depth camera, so that the cost is high and the technical difficulty is high.

Based on the above, the embodiments of the present application provide a method, in which a point cloud set corresponding to a target object can be identified in a laser point cloud without depending on depth information of the target object acquired by a multi-view camera or a depth camera, and by directly using fusion of image data of the target object and the laser point cloud, the hardware requirement of the whole scheme is reduced.

As shown in fig. 4, a method flow for determining a cloud set of target object points is provided, which at least includes:

step 401: image data from a vision sensor and point cloud data from a detection sensor are acquired.

Wherein, the vision sensor can be a monocular camera, a multi-view camera or a depth camera and the like. Optionally, because the monocular camera is lower in price than the monocular camera or the depth camera, if the monocular camera is used in the scheme of the embodiment of the present application, the cost of the whole scheme may be reduced. The detection sensor may be a laser sensor or the like.

Step 402: obtaining at least one 3D cone space corresponding to a target object in a first coordinate system corresponding to a detection sensor, where the target object is located in an image indicated by the image data, and the first coordinate system is a coordinate system corresponding to the detection sensor. In one scenario, the detection sensor may be mounted on an autonomous vehicle, and the detection sensor may scan surrounding objects and collect point cloud data of the surrounding objects. In one understanding, the point cloud data refers to a set of vectors in a three-dimensional coordinate system. These vectors are typically represented in the form of X, Y, Z three-dimensional coordinates and are generally used primarily to represent the shape of the external surface of an object.

In the embodiment of the application, external image data acquired by a visual sensor is acquired firstly; and identifying a target object and a contour of the target object in the external image data; and then, according to the contour of the target object, determining a 3D conical space corresponding to the contour of the target object in a first coordinate system corresponding to the detection sensor. In one understanding, the 3D cone space described above may be considered as a projection of the contour of the target object in a first coordinate system. To some extent, the point cloud set included in the 3D cone space in the first coordinate system is the point cloud set of the target object. In practical applications, the scene is complex, various influencing factors exist, and the point cloud set included in the 3D cone space may include point cloud sets of other objects in addition to the point cloud set corresponding to the target object. For example, in one scenario, subject to various conditions, the identification of the target object contour may not be completely accurate. For example, the identified target object may have a contour that is larger than the actual contour of the target object. Thus, in the embodiment of the present application, in the 3D cone space obtained by using the above identified contour larger than the actual contour of the target object to perform the projection, in addition to the point cloud set of the target object, a point cloud set corresponding to other objects may be included. Or, in another scenario, if there may be other objects in the vicinity of the target object, the image data of the target object acquired by the detection sensor may include other objects, and further if the other objects overlap the target object in space, the 3D cone space corresponding to the contour of the target object may also include a cloud set of points of the other objects. For example, if the target object is a bicycle, and the bicycle overlaps with the tree in space, the 3D point set projected by using the outline of the bicycle may include the point cloud set of the tree in addition to the point cloud set of the bicycle. Therefore, further processing of steps 403 and 404 is subsequently required, in which a plurality of candidate point cloud sets are clustered in the point cloud sets included in the 3D cone space, and one point cloud set is selected as a point cloud set of the target object in the plurality of candidate point cloud sets. For the specific processing procedure of step 403 and step 404, the following description can be referred to.

Of course, it should be noted that, in the embodiment of the present application, a scheme of directly using the point cloud included in the 3D cone space as the point cloud set of the target object, that is, a scheme of determining the point cloud set of the target object only by using step 401 and step 402, is also within the scope of the embodiment of the present application.

In one possible implementation, the target object and/or the contour of the target object may be identified in the external image data acquired by the vision sensor using an AI algorithm. For example, a large amount of image data may be obtained in advance, the AI model may be trained to obtain a neural network, and the neural network may identify the target object and/or the contour of the target object in the image data after training. In the embodiment of the present application, the external image data acquired by the video sensor may be input into the trained neural network, and the output of the neural network is the data corresponding to the target object. The AI model may be a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN), or the like. Alternatively, the AI model described above may be continuously updated over time. Regarding the manner of identifying the target object contour in the target object, similar to the manner described above, the detailed description is omitted. It will be appreciated by those skilled in the art that this step is not limited to the use of AI algorithms to identify the target object and the contours of the target object, and that other graphical algorithms may be used, as will be seen in the prior art.

In the following embodiments, the process of obtaining at least one 3D cone space based on the target object contour will be described in detail:

1) and obtaining at least one 3D point set in a second coordinate system corresponding to the vision sensor according to the contour of the target object.

First, it should be noted that the embodiments of the present application mainly relate to four coordinate systems, namely, a first coordinate system corresponding to the above-mentioned detection sensor, which is also referred to as a laser coordinate system, a second coordinate system corresponding to the vision sensor, which is also referred to as a camera coordinate system, and a third coordinate system corresponding to the following image pixels, which is also referred to as a pixel coordinate system, and a fourth coordinate system corresponding to the following vehicle, which is also referred to as a vehicle body coordinate system.

In the first coordinate system, a polar coordinate system is adopted, and a coordinate system consisting of a pole, a polar axis and a polar diameter is adopted in a plane. Taking a point O on the plane, which may be defined as the pole, a ray Ox is drawn from O, called the polar axis. One unit of length is then taken, usually with the prescribed angle being positive in the counterclockwise direction. Thus, the position of any point P on the plane can be determined by the length ρ of the line segment OP and the angle θ from Ox to OP, and the ordered number pair (ρ, θ) is called the polar coordinate of the point P and is denoted as P (ρ, θ); ρ is the polar diameter of point P and θ is the polar angle of point P. In the second coordinate system, the origin is the optical center of the vision sensor, the X-axis and the Y-axis are parallel to the X-axis and the Y-axis of the image, the Z-axis is the optical axis of the vision sensor, and the Z-axis is perpendicular to the image plane. In the third coordinate system, a coordinate system using pixels as a unit may be established with an upper left corner of the image as an origin, and an abscissa and an ordinate of the pixel are the number of columns and the number of rows where the image pixel is located, respectively. In the fourth system, the origin coincides with the center of mass of the vehicle, and when the vehicle is stationary on a horizontal road surface, the X axis points forward of the vehicle parallel to the ground direction, the X axis points upward through the center of mass, and the Y points to the left side of the driver.

It should be noted that the contour of the target object may be composed of a plurality of pixels, each pixel corresponding to a pixel coordinate. The coordinates of a plurality of pixels included in the contour of the target object may be converted into the set of 3D points in the second coordinate system. In one understanding, the 3D point set includes a plurality of coordinates in a second coordinate system, each coordinate corresponding to a pixel coordinate in the pixel coordinate system. For example, in the pixel coordinate system, the contour of the target object includes N pixel coordinates, and the N pixel coordinates may be converted into N coordinates in the second coordinate system, where the N coordinates may be referred to as a 3D point set. In one possible solution, the process of converting a pixel coordinate in the pixel coordinate system into a coordinate in the second coordinate system satisfies the following formula:

where xy represents the coordinates in the pixel coordinate system, i.e. the third coordinate system, XYZ represents the coordinates in the second coordinate system, i.e. the vision sensor, s, f_x，f_y，c_x，c_yWhen is a camera internal reference, and f_x，f_yFocal lengths of the vision sensor in the X-direction and Y-direction, respectively, c_xAnd c_yS is the scale factor, which is the center of the vision sensor.

In an example, as shown in fig. 5, P is a target object, an image of the target object P is acquired by using a vision sensor, and image data including the target object P is obtained, and the target object P and a contour of the target object P can be identified in the image data; and converting pixel coordinates included in the contour of the target object P into coordinates in a second coordinate system corresponding to the vision sensor, wherein the coordinates in the second coordinate system corresponding to the contour of the target object P correspond to a 3D point set.

2) Converting the 3D point set from the second coordinate system to the first coordinate system.

In one example, the vision sensor external parameter may be calibrated, and a transformation relationship from the second coordinate system to the fourth coordinate system may be obtained (T1); calibrating external parameters of the detection sensor to obtain a conversion relation from the first coordinate system to the fourth coordinate system (T2); the conversion relation from the second coordinate system to the first coordinate system meets the following conditions:

first coordinate system coordinate T2^-1T1 (second coordinate system coordinates);

as described above, the 3D point set is composed of the coordinates of the plurality of second coordinate systems. In this embodiment, each coordinate in the 3D point set may be converted into a coordinate in the first coordinate system according to the above formula relationship. Alternatively, in the embodiment of the present application, the conversion relationship between the first coordinate system and the second coordinate system may also be directly established, and the conversion of the fourth coordinate system is not performed, which is not limited.

3) In the first coordinate system, curve fitting is carried out on coordinates corresponding to the 3D point set, rays are projected from an origin of the detection sensor, and a 3D conical space is obtained.

As is apparent from the above description, the 3D point set is composed of a plurality of coordinates in the second coordinate system, and the coordinates in the 3D point set can be converted from the second coordinates to the first coordinate system through the conversion in step 2). Through the operation, a series of coordinate points can be obtained in the first coordinate system, and curve fitting can be performed on the plurality of points to obtain a fitted curve. In one example, as shown in fig. 6, a series of coordinate points in the first coordinate system obtained by converting the 3D point set may be represented by "circles", and the coordinates may be converted into a sine curve by curve fitting. As shown in fig. 7, assuming that the fitted curve is located on the section composed of the vertices (a, b, c, D), the 3D cone space shown in fig. 7 can be obtained by projecting the fitted curve from the origin F of the first coordinate system. Of course, the above description is given by taking the case where the points in the first coordinate system corresponding to the 3D point set are located in the same plane or cross section, and it is not limited whether the points in the first coordinate system corresponding to the 3D point set are located in the same plane or cross section.

Step 403: and obtaining a plurality of candidate point cloud sets according to the at least one 3D cone space.

It is understood that in the embodiment of the present application, one 3D tapered space may be obtained, and a plurality of 3D tapered spaces may also be obtained. This is done primarily because: the same object has different distances from the detection sensor, and the number of the acquisition points scanned by the detection sensor is different. For example, the closer the object is to the detection sensor, the denser, i.e., the more dots the detection sensor scans, and the farther the object is from the detection sensor, the sparser, i.e., the less dots the detection sensor scans. To some extent, it can be said that the target object is at a different distance from the detection sensor, and the 3D cone space that may be projected into the detection sensor coordinate system may be different.

It should be noted that, as can be seen from the above description, the 3D cone space is obtained by curve fitting and projection of points in the first coordinate system corresponding to 3D point sets, and one 3D point set corresponds to one 3D cone space. In the embodiment of the present application, if a plurality of 3D cone spaces need to be obtained, a plurality of 3D point sets need to be obtained first. In the embodiment of the present application, one 3D point set can be obtained in the manner described in step 402. Optionally, in this embodiment of the application, the process of obtaining a plurality of 3D point sets from one 3D point set may include: the obtained 3D point set can be amplified at equal magnification according to the ray direction, and a plurality of 3D point sets can be obtained according to different amplification factors, wherein each 3D point set in the plurality of 3D point sets corresponds to the amplification factor in the different amplification factors. As shown in fig. 7, E is the origin of the second coordinate system, and one 3D point set is obtained according to the method described in the above step 402, and this 3D point set may be referred to as a first 3D point set for the sake of convenience of distinction. Wherein all points in the first 3D point set are located in a cross section 1(a, b, c, D), and the cross section 1(a, b, c, D) may be enlarged by a magnification m according to a radial direction to obtain a cross section 2(a ', b', c 'D'), and the cross section 2 may include a second 3D point set, and coordinates of the second 3D point set and the first 3D point set may satisfy the relationship of the magnification m.

In the embodiment of the application, a candidate point cloud set can be obtained according to a plurality of 3D cone spaces. The process of processing a 3D cone space to obtain a candidate point cloud set may include: determining a point cloud included in a 3D conical space in a first coordinate system corresponding to a detection sensor; and clustering the point clouds included in the 3D conical space to obtain a plurality of candidate point cloud sets. In one understanding, clustering may be understood as grouping closely spaced points by a threshold, with each candidate point cloud set possibly being a different class of point cloud sets. For example, 3 types of candidate point cloud sets are obtained through clustering, the first type of candidate point cloud set may be a point cloud set corresponding to a bicycle, the second type of candidate point cloud set is a point cloud set corresponding to a pedestrian, and the third type of candidate point cloud set is a candidate point cloud set of a tree.

In an example of the application, the process of clustering point clouds included in the 3D cone space in the step 403 to obtain a plurality of candidate point cloud sets may include: a first distance of the point clouds included in the 3D cone space to the first coordinate system origin is determined, which may be an average distance of all point clouds included in the 3D cone space to the first coordinate system origin. And clustering the point cloud sets in the 3D conical space according to the distance coefficient corresponding to the first distance to obtain a plurality of candidate point cloud sets. It should be noted that the distance coefficient corresponding to the first distance described above may also be referred to as a distance coefficient corresponding to the first distance, a coefficient corresponding to the first distance, and the like. For convenience of description, the coefficient corresponding to the first distance is described as an example below. In a possible implementation scheme, after determining an average distance from a point cloud included in the 3D conical space to an origin of a laser coordinate system, that is, the first distance, a coefficient corresponding to the first distance may be determined according to a correspondence relationship between the distance and the coefficient; and clustering all point clouds in the 3D conical space according to the coefficient corresponding to the first distance, wherein each type is used as a candidate point cloud set of the target object. In one understanding, the correspondence between the distance and the coefficient may be: the distances of the laterally adjacent points and the longitudinally adjacent points of the sensor are detected at different distances. Therefore, the coefficient corresponding to the first distance should be substantially two values, namely, the distance between the laterally adjacent points (also referred to as density) and the distance between the longitudinally adjacent points (also referred to as density) of the detection sensor. In one understanding, the detection sensor periodically transmits a laser signal for each frame for scanning objects around the detection sensor. For example, each frame of laser signals includes 32 lines of signals, the 32 lines of signals are arranged longitudinally, each line of signals is arranged transversely, and each line of signals is composed of a plurality of dots. The distance between adjacent points of the transverse lines refers to the distance between adjacent points in each line signal. The distance between the longitudinally adjacent points is the distance between the corresponding points of any two adjacent transverse line signals. For example, 32 line signals are numbered from 0 to 31, each line signal includes 100 dots, the adjacent dot of the horizontal line refers to a distance between any two adjacent dots of the 0 to 100 dots, and the adjacent dot of the vertical line refers to a distance between any two adjacent dots of the 32 line signals, such as a distance between a 99 th dot of the 0 line signal and a 99 th dot of the 1 line signal.

It should be noted that, if the candidate point cloud set generated after clustering is greater than the preset threshold, the threshold may be predefined, preconfigured, factory set, or the like, and is not limited. A process of selecting a part of point cloud sets from the candidate point cloud sets, performing reliability calculation in the subsequent step 404, and selecting a point cloud set corresponding to the target object. For example, in an implementation manner, after the multiple candidate point cloud sets are obtained through the clustering manner, if the number of point clouds included in one or more candidate point cloud sets is small, for example, smaller than a second threshold, the second threshold may also be preconfigured, predefined, or set at the time of factory shipment, and it may be considered that the candidate point cloud set with the small number is a discrete point, and is noise, which may be removed in advance, and does not further participate in the process of selecting the point cloud set corresponding to the target object in the subsequent step 404.

Step 404: and determining a point cloud set of the target object in the plurality of candidate point cloud sets.

Optionally, the point cloud set of the target object may be a point cloud set that meets the condition of the target object in a plurality of candidate point cloud sets. The target object may be that the reliability of the point cloud set in the candidate point cloud sets is the highest, as described in the first scheme below, or that the reliability of the point cloud set in the candidate point cloud sets is greater than or equal to a first threshold, as described in the second scheme below.

In the first scheme, the credibility of each candidate point cloud set in a plurality of candidate point cloud sets can be obtained; and determining the candidate point cloud set with the highest credibility as the point cloud set of the target object.

In one scheme, the credibility of each candidate point cloud set can be obtained by the following method: calculating the distance from all points in the candidate point cloud set to the original point in the first coordinate system, wherein the distance can be the average distance of all the points; and acquiring historical data, wherein the historical data is the number of point clouds corresponding to the target object at different distances. For example, as shown in table 1, through a large number of statistics, the following historical data can be obtained: the number of point clouds collected by the detection sensor should be X1 when the distance between the target object and the detection sensor is a, X2 when the distance between the target object and the detection sensor is B, and X3 when the distance between the target object and the detection sensor is C.

TABLE 1

In the embodiment of the application, according to the distance between the point cloud in each candidate cloud set and the origin, in the historical data, a distance which is most matched with the distance is obtained. For example, the A, B, C values are 50 meters, 100 meters and 150 meters, respectively. For a point cloud in a candidate point cloud set that is a distance 52 from the origin, the a distance 50 meters can be considered as a distance that best matches the distance. And then, calculating the credibility of the candidate point cloud set according to the number of the point clouds actually included in the candidate point cloud set and the number of the point clouds which should be included in the historical data. For example, still following the distance, the candidate point cloud set at the distance 52 from the origin includes 95 point clouds, and the 95 point clouds are actually scanned. In the historical data, when the distance between the target object and the detection sensor is 50 meters, the point cloud data that should be actually collected is 100, and the confidence level of the candidate point cloud set may be 1- (100-90)/100-90%.

Alternatively, in the second scheme, after determining the credibility of the plurality of candidate point cloud sets, a candidate point cloud set with the credibility greater than or equal to a first threshold may be selected as the point cloud set of the target object, where the first threshold may be preconfigured, predefined, or factory-set, and the like, without limitation. In a possible implementation manner, when the number of candidate point cloud sets greater than or equal to the first threshold is multiple, the multiple candidate point cloud sets may all be taken as point cloud sets of the target object; alternatively, one or more candidate point cloud sets may be selected from the candidate point cloud sets as a point cloud set of the target object. The scheme for selecting one or more candidate point cloud sets from the plurality of candidate point cloud sets is not limited, and the selected rule may be determined according to a specific implementation scenario, for example, random selection or a predefined screening rule.

According to the method and the device, the point cloud set corresponding to the target object can be obtained without obtaining the depth information of the target object, and the complexity of the scheme is reduced. Meanwhile, the outlines of the target object and the target object are identified by means of an AI algorithm, so that the accuracy of the scheme for identifying the point cloud set of the target object can be improved.

In the embodiment of the present application, after the point cloud set of the target object is obtained through the above steps 401 to 404. The embodiment of the present application is not limited to the application of the point cloud set of the target object. For example, the target object may be identified using a point cloud of the target object. Or, the point cloud of the target object and the like can be removed from the point cloud data acquired by the detection sensor, so as to obtain a more accurate map or positioning information and the like.

In an example, taking a detection sensor as a laser sensor as an example, as shown in fig. 8, the application of the embodiment of the present application is described in detail:

1) the data acquisition vehicle is provided with a laser sensor which can scan the surrounding objects to obtain a point cloud set corresponding to the surrounding objects; because some objects in the data acquisition vehicle are temporary objects, the data acquisition vehicle is not suitable for being used as a part of a map, and the positioning precision is influenced. Wherein, the temporary object can include: moving objects (e.g., pedestrians, moving cars) and movable objects (e.g., stationary bicycles and cars, etc.). Therefore, in the embodiment of the present application, if the vision sensor installed on the data collection vehicle finds that a temporary object exists around the data collection vehicle, an image including the temporary object may be collected, and the image including the temporary object may be used to identify the point cloud set corresponding to the temporary object by using the method in the embodiment shown in fig. 4. And removing the point cloud set of the temporary object from the point cloud sets of all objects around the laser sensor. And determining real world coordinates corresponding to the point cloud set of each temporary object, such as Global Positioning System (GPS) coordinates or Beidou coordinates, by using a current map making algorithm, thereby completing the making of a high-precision map.

2) The automatic driving automobile is provided with a laser sensor, and the laser sensor can scan the surrounding objects to obtain a point cloud set corresponding to the surrounding objects; then, by using a visual sensor installed on the automatic driving automobile, if a temporary object is found around the visual sensor, acquiring an image comprising the temporary object; similarly, by using the method in the embodiment shown in fig. 4, the image of the temporary object is used to identify the point cloud set corresponding to the temporary object; removing the point cloud set of the temporary object from the point cloud set of the objects around the laser sensor, which is acquired by the laser sensor; and finally, matching the point cloud set without the temporary object with the obtained high-precision map by using a positioning algorithm to obtain the position information of the automatic driving automobile.

By the method, temporary objects in the laser point cloud can be automatically removed in real time, the map making efficiency and precision are improved when a high-precision map is made, and hardware, labor and time costs are reduced; the matching degree of the point cloud and the map can be improved during real-time positioning, and the positioning precision is improved. In addition, since the temporary object is removed in the process of making the map and positioning, the accuracy of recognizing the position of the target object and the like can be improved.

As shown in fig. 9, taking the detection sensor as a laser sensor and the vision sensor as a monocular camera as an example, a scheme for removing a cloud set of target object points is provided, which at least includes:

1. in a laser coordinate system, calculating a 3D space range where a target object is located

1) And acquiring image data acquired by the monocular camera.

2) The target object is identified by an AI algorithm. Alternatively, the number of target objects may be plural. Aiming at each target object, respectively determining a point cloud set corresponding to each target object; and then, respectively removing the point cloud set corresponding to each object from all the point cloud data of the surrounding objects acquired by the detection sensor. See below for how to identify the point cloud set to which each target object corresponds. In the following description, a point cloud set for identifying and removing a single target object is taken as an example for explanation.

3) And identifying the contour of the target object through an AI algorithm to obtain the coordinates of the contour in a pixel coordinate system.

4) The contour of the target object on the pixel coordinate system is converted into a set of 3D points on the camera coordinate system.

For example, contour points (positions) on a pixel coordinate system are converted into a set of 3D points (positions) on a camera coordinate system. And generating a new 3D point set at the position with equal distance of multiple according to the ray direction (the original point is the original point of the camera coordinate system) of the 3D contour point set, and obtaining a plurality of corresponding 3D point sets according to different multiple.

5) And converting the 3D point set from a camera coordinate system to a laser coordinate system.

6) In a laser coordinate system, a 3D space range formed by the ray from the origin to the contour point is calculated.

For example, curve fitting is performed on each 3D contour point set to obtain a plurality of 3D contour curves. For each 3D contour curve, rays are cast from the laser origin to the curve to form a cone-shaped 3D space, each cone-shaped 3D space being a 3D space range of the target object.

2. Generating multiple candidate point cloud sets of a target object

1) And obtaining point cloud in a 3D space range according to geometric conversion.

2) And clustering the points in each 3D space range to obtain a plurality of candidate point cloud sets.

3. Selecting the best candidate point cloud set of the target object and removing

1) Calculating the credibility of each candidate point set according to the category of the target object, the number and the distance of the point clouds;

2) and aiming at each target object, selecting all candidate point cloud sets with highest reliability and exceeding a threshold value as the point cloud sets of the target object.

3) And removing the point clouds corresponding to the target object from all the point clouds of the surrounding objects acquired by the laser sensor to obtain the laser point clouds which do not contain the target object. The laser point cloud not including the target object may be used for positioning, making a high-precision map, or identifying the target object, and the like, without limitation.

It should be noted that the embodiment shown in fig. 9 and the embodiment shown in fig. 4 are referred to each other, and the embodiment shown in fig. 9 is not described in detail, and the embodiment shown in fig. 4 is referred to the description. According to the method, the monocular camera is adopted, only the AI algorithm is used for identifying the target object and the contour of the target object, and the accurate 3D position of the target object is not required to be acquired by the aid of the monocular camera or the depth camera, so that the target object in the laser point cloud can be identified.

The method of the embodiment of the present application is described in detail above with reference to fig. 1 to 9. The following describes the device provided by the embodiment of the present application in detail with reference to fig. 10 and 11. It is to be understood that the description of the apparatus embodiments corresponds to the description of the method embodiments. Therefore, what is not described in detail can be referred to each other as described in the above method embodiments.

Fig. 10 is a schematic block diagram of an apparatus 1000 provided in an embodiment of the present application, for implementing the above-described function of determining a target object point cloud set. The device may be a software module or a system-on-a-chip, for example. The chip may be formed of a chip, and may also include chips and other discrete devices. The apparatus 1000 includes an obtaining unit 1001 and a processing unit 1002, where the obtaining unit 1001 may communicate with other devices, and may also be referred to as a communication interface, a transceiver unit, or an input/output interface. Alternatively, the apparatus 1000 may be a vehicle-mounted terminal, or a chip or a circuit configured on the vehicle-mounted terminal. Alternatively, the apparatus 1000 may be an on-vehicle central processing unit, or a chip or a circuit configured in the on-vehicle central processing unit. Alternatively, the apparatus 1000 may be a Cockpit Domain Controller (CDC), or a chip or circuit configured in the CDC. Alternatively, the apparatus 1000 may be a detection sensor, or a chip or circuit configured in the detection sensor. Alternatively, the detection sensor may be a laser sensor or the like.

In a possible implementation scheme, the obtaining unit 1001 is configured to perform the transceiving related operations in the foregoing method embodiments, and the processing unit 1002 is configured to perform the processing related operations in the foregoing method embodiments.

For example, an acquisition unit 1001 for acquiring image data from a vision sensor and point cloud data from a detection sensor; the processing unit 1002 is configured to obtain at least one three-dimensional 3D conical space corresponding to a target object in a first coordinate system, where the target object is located in an image indicated by the image data, and the first coordinate system is a coordinate system corresponding to the detection sensor, obtain a plurality of candidate point cloud sets according to the at least one 3D conical space, and determine a point cloud set of the target object in the plurality of candidate point cloud sets.

In one possible design, the point cloud set of the target object is used to identify the target object.

In another possible design, the processing unit 1002 is further configured to obtain second point cloud data, where the second point cloud data is obtained by removing the point cloud set of the target object from the point cloud data of the detection sensor.

Optionally, the obtaining at least one three-dimensional 3D cone space corresponding to the target object in the first coordinate system includes: identifying the target object and a contour of the target object in the image data; obtaining at least one 3D point set in a second coordinate system according to the contour of the target object, wherein the second coordinate system is a coordinate system corresponding to the vision sensor;

for each set of 3D points in the second coordinate system, performing the following operations: converting the set of 3D points from the second coordinate system to the first coordinate system; in the first coordinate system, curve fitting is performed on each 3D point set, and rays are projected from the origin of the first coordinate system, so that the 3D cone space is obtained.

Illustratively, the obtaining at least one 3D point set in a second coordinate system according to the contour of the target object includes: and converting the pixel coordinates of the contour of the target object in the image data into a 3D point set in the second coordinate system according to the internal reference of the second coordinate system.

Optionally, the processing unit 1002 is further configured to: and obtaining a plurality of 3D point sets corresponding to a plurality of magnification factors according to the 3D point sets.

For example, the transforming the set of 3D points from the second coordinate system to the first coordinate system includes: and converting the 3D point set from the second coordinate system to the first coordinate system according to the first conversion relation from the second coordinate system to the vehicle body coordinate system and the second conversion relation from the first coordinate system to the vehicle body coordinate system.

For example, the deriving a plurality of candidate point cloud sets from the at least one 3D cone space comprises: for each 3D cone space, the following operations are performed: determining a first distance of a point cloud included in the 3D cone space to a first coordinate system origin; and clustering the point cloud sets in the 3D conical space according to the distance coefficient corresponding to the first distance to obtain the candidate point cloud sets.

Optionally, the candidate point cloud set of the target object is a point cloud set that satisfies a condition of the target object in the plurality of candidate point cloud sets; and/or a point cloud set confidence level of the target object is greater than or equal to a first threshold.

The division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, in the embodiments of the present application, each functional unit may be integrated into one processor, may exist alone physically, or may be integrated into one unit by two or more units. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an apparatus 1100 according to an embodiment of the present disclosure, where the apparatus 1100 may be a detection sensor or a vehicle, or a component in the detection sensor or the vehicle, such as a chip or an integrated circuit. The apparatus 1100 may include at least one processor 1102 and a communication interface 1104. Further optionally, the apparatus may further comprise at least one memory 1101. Still further optionally, a bus 1103 may be included, wherein the memory 1101, processor 1102 and communication interface 1104 are coupled via the bus 1103.

The memory 1101 is used to provide a storage space, and data such as an operating system and a computer program may be stored in the storage space. The memory 1101 may be one or a combination of Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or portable read-only memory (CD-ROM), among others.

The processor 1102 is a module for performing arithmetic operation and/or logical operation, and may specifically be one or a combination of multiple processing modules, such as a Central Processing Unit (CPU), a picture processing unit (GPU), a Microprocessor (MPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Complex Programmable Logic Device (CPLD), a coprocessor (which assists the central processing unit to complete corresponding processing and application), and a Micro Control Unit (MCU).

Communication interface 1104 may be used to provide information input or output to the at least one processor. And/or the communication interface may be used for receiving and/or transmitting data from/to the outside, and may be a wired link interface including, for example, an ethernet cable, and may also be a wireless link (Wi-Fi, bluetooth, general wireless transmission, vehicle-mounted short-range communication technology, etc.) interface. Optionally, communication interface 1104 may also include a transmitter (e.g., a radio frequency transmitter, antenna, etc.) or a receiver, etc. coupled to the interface.

The processor 1102 in the apparatus 1100 is configured to read the computer program stored in the memory 1101 for executing the aforementioned method for determining a cloud set of target object points, such as the method for determining a cloud set of target object points described in the embodiment shown in fig. 4.

For example, the processor 1102 in the apparatus 1100 is configured to read the computer program stored in the memory 1101, and is configured to perform the following operations:

the method comprises the steps of obtaining image data from a visual sensor and point cloud data from a detection sensor, obtaining at least one three-dimensional (3D) cone space corresponding to a target object in a first coordinate system, wherein the target object is located in an image indicated by the image data, the first coordinate system is the coordinate system corresponding to the detection sensor, obtaining a plurality of candidate point cloud sets according to the at least one 3D cone space, and determining the point cloud set of the target object in the candidate point cloud sets.

In another possible design, the processor 1102 is further configured to: and acquiring second point cloud data, wherein the second point cloud data is obtained by removing the point cloud set of the target object from the point cloud data of the detection sensor.

Optionally, the processor 1102 is further configured to perform the following operations: and obtaining a plurality of 3D point sets corresponding to a plurality of magnification factors according to the 3D point sets.

Optionally, the candidate point cloud set of the target object is a point cloud set that satisfies a condition of the target object in the plurality of candidate point cloud sets. And/or the confidence level of the point cloud set of the target object is greater than or equal to a first threshold.

The embodiment of the application also provides a sensor or a fusion device, wherein the sensor can be a laser sensor or other sensors, such as a laser radar and the like. In one design, the sensor or fusion device includes at least one controller, which may include the device described above with respect to fig. 10 or 11. In another design, the sensor or fusion device includes the device shown in fig. 10 or fig. 11, and the device may be provided separately or integrated with at least one controller included in the sensor or fusion device.

The embodiment of the present application further provides a terminal, and the terminal may include the device described in fig. 10 or fig. 11, or the sensor or the fusion device provided in the above embodiment. Optionally, the terminal may be an intelligent transportation device (vehicle or unmanned aerial vehicle), an intelligent home device, an intelligent manufacturing device, or a robot. The intelligent transport device may be, for example, an Automated Guided Vehicle (AGV), or an unmanned transport vehicle.

Embodiments of the present application also provide a system, which includes the above-mentioned apparatus shown in fig. 10 or fig. 11, a detection sensor and a vision sensor.

Further, an apparatus is also provided in this application, which includes means for implementing the foregoing method embodiments. Alternatively, a processor and interface circuitry are included, the processor being configured to communicate with other devices via the interface circuitry and to perform the methods of the above method embodiments. Alternatively, the apparatus comprises a processor for calling a program stored in a memory to perform the method in the above method embodiment.

Embodiments of the present application also provide a readable storage medium, which includes instructions, when executed on a computer, cause the computer to perform the method in the above method embodiments.

The embodiment of the present application further provides a chip system, where the chip system includes a processor and may further include a memory, and is used to implement the method in the foregoing method embodiment. The chip system may be formed by a chip, and may also include a chip and other discrete devices.

Embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above method embodiments.

In the embodiments of the present application, the processor may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

In the embodiment of the present application, the memory may be a nonvolatile memory, such as a Hard Disk Drive (HDD) or a solid-state drive (SSD), and may also be a volatile memory, for example, a random-access memory (RAM). The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

The method provided by the embodiment of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., an SSD), among others.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of determining a cloud set of target object points, comprising:

acquiring image data from a vision sensor and point cloud data from a detection sensor;

obtaining at least one three-dimensional 3D cone space corresponding to a target object in a first coordinate system, the target object being located in an image indicated by the image data, the first coordinate system being a coordinate system corresponding to the detection sensor;

obtaining a plurality of candidate point cloud sets according to the at least one 3D cone space;

determining, in the plurality of candidate point cloud sets, a point cloud set of the target object, the point cloud set of the target object identifying the target object.

2. A method of determining a cloud set of target object points, comprising:

determining a point cloud set of the target object in the plurality of candidate point cloud sets;

and acquiring second point cloud data, wherein the second point cloud data is obtained by removing the point cloud set of the target object from the point cloud data of the detection sensor.

3. The method of claim 1 or 2, wherein obtaining at least one three-dimensional 3D cone space corresponding to the target object in the first coordinate system comprises:

identifying the target object and a contour of the target object in the image data;

obtaining at least one 3D point set in a second coordinate system according to the contour of the target object, wherein the second coordinate system is a coordinate system corresponding to the vision sensor;

for each set of 3D points in the second coordinate system, performing the following operations:

converting the set of 3D points from the second coordinate system to the first coordinate system;

in the first coordinate system, curve fitting is performed on each 3D point set, and rays are projected from the origin of the first coordinate system, so that the 3D cone space is obtained.

4. The method of claim 3, wherein said deriving at least one 3D set of points in a second coordinate system from the contour of the target object comprises:

and converting the pixel coordinates of the contour of the target object in the image data into a 3D point set in the second coordinate system according to the internal reference of the second coordinate system.

5. The method of claim 3 or 4, further comprising:

and obtaining a plurality of 3D point sets corresponding to a plurality of magnification factors according to the 3D point sets.

6. The method of any of claims 3-5, wherein said converting the set of 3D points from the second coordinate system to the first coordinate system comprises:

and converting the 3D point set from the second coordinate system to the first coordinate system according to the first conversion relation from the second coordinate system to the vehicle body coordinate system and the second conversion relation from the first coordinate system to the vehicle body coordinate system.

7. The method of any one of claims 1 to 6, wherein said deriving a plurality of candidate point cloud sets from the at least one 3D cone space comprises:

for each 3D cone space, the following operations are performed:

determining a first distance of a point cloud included in the 3D cone space to a first coordinate system origin;

and clustering the point cloud sets in the 3D conical space according to the distance coefficient corresponding to the first distance to obtain the candidate point cloud sets.

8. The method of claim 7, wherein the candidate point cloud sets of the target object are point cloud sets of the plurality of candidate point cloud sets that satisfy a condition of the target object.

9. The method of any of claims 1-8, wherein a confidence level of the point cloud set of the target object is greater than or equal to a first threshold.

10. An apparatus, comprising:

an acquisition unit for acquiring image data from a vision sensor and point cloud data from a detection sensor;

the processing unit is configured to obtain at least one three-dimensional 3D conical space corresponding to a target object in a first coordinate system, where the target object is located in an image indicated by the image data, the first coordinate system is a coordinate system corresponding to the detection sensor, obtain a plurality of candidate point cloud sets according to the at least one 3D conical space, and determine a point cloud set of the target object in the candidate point cloud sets, where the point cloud set of the target object is used to identify the target object.

11. An apparatus, comprising:

the processing unit is configured to obtain at least one three-dimensional 3D conical space corresponding to a target object in a first coordinate system, where the target object is located in an image indicated by the image data, the first coordinate system is a coordinate system corresponding to the detection sensor, obtain a plurality of candidate point cloud sets according to the at least one 3D conical space, determine a point cloud set of the target object in the candidate point cloud sets, and obtain second point cloud data, where the second point cloud data is obtained by removing the point cloud set of the target object from the point cloud data of the detection sensor.

12. The apparatus according to claim 10 or 11, wherein obtaining at least one three-dimensional 3D cone space corresponding to the target object in the first coordinate system comprises:

13. The apparatus of claim 12, wherein said deriving at least one set of 3D points in a second coordinate system from the contour of the target object comprises:

14. The apparatus as recited in claim 12 or 13, said processing unit to further:

15. The apparatus of any of claims 12 to 14, wherein said converting the set of 3D points from the second coordinate system to the first coordinate system comprises:

16. The apparatus of any one of claims 10 to 15, wherein said deriving a plurality of candidate point cloud sets from the at least one 3D cone space comprises:

for each 3D cone space, the following operations are performed:

17. The apparatus of claim 16, wherein the candidate point cloud sets of the target object are point cloud sets of the plurality of candidate point cloud sets that satisfy a condition of the target object.

18. The apparatus of any of claims 10 to 17, wherein a confidence level of the point cloud set of the target object is greater than or equal to a first threshold.

19. An apparatus comprising at least one processor and at least one memory, the at least one memory stored with instructions that, when executed by the at least one processor, cause the apparatus to perform the method of any of claims 1-9.

20. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 9.