WO2023202401A1 - Method and apparatus for detecting target in point cloud data, and computer-readable storage medium - Google Patents

Method and apparatus for detecting target in point cloud data, and computer-readable storage medium Download PDF

Info

Publication number
WO2023202401A1
WO2023202401A1 PCT/CN2023/087273 CN2023087273W WO2023202401A1 WO 2023202401 A1 WO2023202401 A1 WO 2023202401A1 CN 2023087273 W CN2023087273 W CN 2023087273W WO 2023202401 A1 WO2023202401 A1 WO 2023202401A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
relative
key
key point
vector
Prior art date
Application number
PCT/CN2023/087273
Other languages
French (fr)
Chinese (zh)
Inventor
潘滢炜
李栋
邱钊凡
姚霆
梅涛
Original Assignee
京东科技信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技信息技术有限公司 filed Critical 京东科技信息技术有限公司
Publication of WO2023202401A1 publication Critical patent/WO2023202401A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a method, device and computer-readable storage medium for detecting targets in point cloud data.
  • 3D (3-dimension, three-dimensional) target detection is to identify and locate objects appearing in 3D point clouds, and has been widely used in fields such as autonomous driving and augmented reality.
  • 3D point clouds can provide the geometry of objects and capture the 3D structure of a scene.
  • a method for detecting targets in point cloud data including: inputting point cloud data into a point cloud feature extraction network, obtaining multiple key points in the output point cloud data, and each key point Characteristic information of the point; for each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the first key point of the key point is obtained.
  • Feature encoding classify each key point and determine the point classified as the target center as the reference center point; for each reference center point, based on the correlation between the reference center point and other reference center points, the reference center point
  • the first feature code of the point is encoded to obtain the second feature code of the reference center point; based on the second feature code of each reference center point, the location and category of each target in the point cloud data are predicted.
  • the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the third key point of the key point is obtained.
  • a feature encoding includes: for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the key point and other key points within the preset range of the key point The relative positional relationship between them determines the first feature encoding of the key point based on the self-attention mechanism.
  • determining the first feature encoding of the key point based on the self-attention mechanism includes: for each A key point, the characteristic information and position information of the key point, and the characteristic information and position information of other key points within the preset range of the key point are input into the first self-attention module of the encoder in the first conversion model; in the first In a self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point and the position information of the relative point are respectively input into the first position coding layer.
  • the second position coding layer and the third position coding layer determine the first relative position code, the second relative position code and the third relative position code of the key point and the relative point; according to the characteristic information of the relative point, respectively and
  • the product of the key matrix and the value matrix in the first self-attention module determines the key vector and value vector of the relative point; based on the product of the feature information of the key point and the query matrix in the first self-attention module, determines the key point Query vector; based on the first relative position code, the second relative position code and the third relative position code of the key point and each relative point, the key vector of each relative point, the value vector of each relative point and the key point
  • the query vector determines the first feature encoding of the key point.
  • determining the first feature code of the key point includes: for each relative point, the sum of the key point and the first relative position code of the relative point and the query vector of the key point is as the The modified query vector of the key point; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the second relative position code of the key point and the relative point is The sum of the three relative position codes and the value vector of the relative point is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point is multiplied by the product of the feature information of the key point
  • the dimension is input to the first normalization layer to obtain the weight of the relative point; the correction value vector of each relative point is weighted and
  • the first position coding layer, the second position coding layer and the third position coding layer are respectively a first feedforward network, a second feedforward network and a third feedforward network
  • the position information of the key point is Inputting the position information of the relative point into the first position coding layer, the second position coding layer and the third position coding layer respectively includes: inputting the difference between the coordinates of the key point and the coordinates of the relative point into the first feedforward network, respectively.
  • the second feedforward network and the third feedforward network are respectively a first feedforward network, a second feedforward network and a third feedforward network
  • the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the third key point of the key point is obtained.
  • a feature encoding includes: for each key point, based on the characteristic information of the key point, other related relations within the preset range of the key point Characteristic information of the key point, the relative positional relationship between the key point and other key points within the preset range of the key point, and the relative geometric structure relationship between the key point and other key points within the preset range of the key point, The first feature encoding of the key point is determined based on the self-attention mechanism.
  • the key point is consistent with other key points within the preset range of the key point.
  • the relative positional relationship between the key point and the relative geometric structure relationship between the key point and other key points within the preset range of the key point is consistent with other key points within the preset range of the key point.
  • the first feature encoding of the key point based on the self-attention mechanism includes: for each key point , input the characteristic information, position information and geometric structure information of the key point and the characteristic information, position information and geometric structure information of other key points within the preset range of the key point into the first self-attention of the encoder in the first conversion model Force module; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point and the position information of the relative point are input separately
  • the first position coding layer, the second position coding layer and the third position coding layer determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point;
  • the geometric structure information and the geometric structure information of the relative point are input into the geometric structure encoding layer to determine the relative geometric structure weight of the key point and the relative point; according to the characteristic information of the relative point, the key matrix and the key matrix in the first self-attention module are respectively The key
  • Determining the first feature encoding of the key point includes: for each relative point, combining the first relative position encoding of the key point and the relative point with the query vector of the key point.
  • the sum of is used as the modified query vector of the key point; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the sum of the key point and the key vector of the relative point is The sum of the third relative position code of the relative point and the value vector of the relative point is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point is the key point.
  • the relative geometric structure weight of the relative point and the dimension of the feature information of the key point are input into the first normalization layer to obtain the weight of the relative point; the correction value vector of each relative point is weighted according to the weight of each relative point. and, get the first feature code of the key point.
  • the product of the modified query vector of the key point and the modified key vector of the relative point, The relative geometric structure weight of the key point and the relative point and the dimension of the feature information of the key point are input into the first normalization layer.
  • Obtaining the weight of the relative point includes: combining the modified query vector of the key point and the relative point.
  • the product of the corrected key vector is divided by the square root of the dimension of the feature information of the key point, and then added to the relative geometric structure weight of the key point and the relative point.
  • the result is input into the first normalization layer to obtain the relative point. Weights.
  • the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases; the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases.
  • the increase in the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located increases the relative geometric structure weight of the key point and the relative point.
  • the difference between the curvature radius and the curvature radius of the local plane where the relative point is located increases; the relative geometric structure weight of the key point and the relative point increases with the normal vector of the local plane where the key point is located and the location of the relative point.
  • the angle between the normal vectors of the local plane increases as the result after the feature propagation layer increases.
  • the first feature code of the reference center point is encoded according to the correlation between the reference center point and other reference center points to obtain the second feature code of the reference center point.
  • the feature coding includes: for each reference center point, based on the first feature code of the reference center point, the first feature codes of other reference center points, and the relative position relationship between the reference center point and other reference center points, based on the The attention mechanism determines the second feature encoding of the reference center point.
  • Determining the second feature encoding of the reference center point based on the self-attention mechanism includes: for each reference center point, the first feature encoding and position information of the reference center point, the first feature encoding and position information of other reference center points Input the second self-attention module of the encoder in the second conversion model; in the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position of the reference center point is The information and the position information of the relative center point are respectively input into the fourth position coding layer, the fifth position coding layer and the sixth position coding layer to determine the fourth relative position coding and the fifth relative position of the reference center point and the relative center point.
  • the sixth relative position encoding determine the key vector and value vector of the relative center point according to the product of the characteristic information of the relative center point and the key matrix and value matrix in the second self-attention module respectively; according to the characteristics of the reference center point
  • the product of the information and the query matrix in the second self-attention module determines the query vector of the reference center point; based on the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and each relative center point
  • the position code, the key vector of each relative center point, the value vector of each relative center point and the query vector of the reference center point determine the second feature code of the reference center point.
  • the value vector of and the query vector of the reference center point, determining the second feature code of the reference center point includes: for each relative center point, combining the reference center point and the fourth position code of the relative center point with the reference center
  • the sum of the query vectors of the points is used as the modified query vector of the reference center point;
  • the sum of the fifth position code of the reference center point and the relative center point and the key vector of the relative center point is used as the correction of the relative center point Key vector;
  • use the sum of the sixth position code of the reference center point and the relative center point and the value vector of the relative center point as the correction value vector of the relative center point; combine the correction query vector of the reference center point with the value vector of the relative center point
  • the product of the modified key vector of the relative center point and the dimension of the first feature encoding of the reference center point is input to the second
  • the fourth relative position code, the fifth relative position code and the sixth relative position code are respectively the fourth feedforward network, the fifth feedforward network and the sixth feedforward network, and the position of the reference center point is
  • the information and the position information of the relative center point are respectively input into the fourth position encoding layer, the fifth position encoding layer and the sixth position encoding layer, including: inputting the difference between the coordinates of the reference center point and the coordinates of the relative center point into the fourth position encoding layer respectively.
  • Feedforward network, fifth feedforward network and sixth feedforward network are respectively the fourth feedforward network, the fifth feedforward network and the sixth feedforward network.
  • classifying each key point and determining the point classified as the target center as the reference center point includes: for each key point, encoding the first feature of the key point into the classification network to obtain the key point The classification result; determine whether the key point is the center point of the target based on the classification result.
  • the classification network is trained by using the position information of each key point with annotation information as training data, wherein, for each key point, the key point is located within the bounding box of an object and belongs to the distance In the case of the point closest to the target center, the label information of the key point is the point at the target center.
  • predicting the location and category of each target in the point cloud data according to the second feature code of each reference center point includes: inputting the second feature code of each reference center point into the decoder in the second transformation model, Obtain the feature vectors of each reference center point; input the feature vectors of each reference center point into the target detection network to obtain the location and category of each target in the point cloud data.
  • other key points within the preset range of the key point are determined using the following method: for each key point, other key points are determined in ascending order of distance from the key point. Sort, select a preset number of other key points in order from front to back, as other key points within the preset range of the key point.
  • a device for detecting targets in point cloud data including: a feature extraction module for inputting point cloud data into a point cloud feature extraction network to obtain multiple features in the output point cloud data. key points, and the characteristic information of each key point; the first encoding module is used for each key point, based on the correlation between the key point and other key points within the preset range of the key point, the key point The characteristic information of the point is encoded to obtain the first feature encoding of the key point; the classification module is used to classify each key point and determine the point classified as the target center as the reference center point; the second encoding module is used to target Each reference center point, according to the correlation between the reference center point and other reference center points, encodes the first feature code of the reference center point to obtain the second feature code of the reference center point; the target detection module, Used to predict the location and category of each target in the point cloud data based on the second feature encoding of each reference center point.
  • a device for detecting objects in point cloud data including: a processor; and a memory coupled to the processor for storing instructions.
  • the processing The device performs the detection method of objects in point cloud data as in any of the aforementioned embodiments.
  • a non-transitory computer-readable storage medium on which a computer program is stored, wherein when the program is executed by a processor, the object in the point cloud data of any of the foregoing embodiments is achieved. Steps of the detection method.
  • an object sorting device including: a detection device for objects in point cloud data and a sorting component according to any of the foregoing embodiments; the sorting component is used to detect objects in point cloud data according to Detect the location and category of each target in the point cloud data output by the device, and sort the targets.
  • the device further includes: a point cloud collection component, configured to collect point cloud data in a preset area, and send the point cloud data to a detection device for targets in the point cloud data.
  • a point cloud collection component configured to collect point cloud data in a preset area, and send the point cloud data to a detection device for targets in the point cloud data.
  • a computer program including: instructions, which when executed by the processor, cause the processor to perform detection of targets in point cloud data as in any of the foregoing embodiments. method.
  • Figure 1 shows a schematic flowchart of a method for detecting targets in point cloud data according to some embodiments of the present disclosure.
  • FIG. 2 shows a schematic diagram of a model of an object in point cloud data according to other embodiments of the present disclosure.
  • Figure 3 shows a schematic structural diagram of a device for detecting targets in point cloud data according to some embodiments of the present disclosure.
  • Figure 4 shows a schematic structural diagram of a device for detecting targets in point cloud data according to other embodiments of the present disclosure.
  • Figure 5 shows a schematic structural diagram of a device for detecting targets in point cloud data according to further embodiments of the present disclosure.
  • Figure 6 shows a schematic structural diagram of an object sorting device according to some embodiments of the present disclosure.
  • a technical problem to be solved by this disclosure is to propose a method for detecting targets in point cloud data to improve the accuracy of target detection in point cloud data.
  • the present disclosure provides a method for detecting targets in point cloud data, which is described below with reference to Figure 1 .
  • Figure 1 is a flow chart of some embodiments of a method for detecting objects in point cloud data of the present disclosure. As shown in Figure 1, the method in this embodiment includes: steps S102 to S110.
  • step S102 the point cloud data is input into the point cloud feature extraction network to obtain multiple key points in the output point cloud data and feature information of each key point.
  • the point cloud feature extraction network Given a point cloud of N points with XYZ coordinates as input, the point cloud feature extraction network can downsample the point cloud data and learn the depth features of each point, thereby outputting a subset of points, and each point is Represented by a C (C is a positive integer) dimensional feature, these points are regarded as key points.
  • Point cloud feature extraction networks such as VoxelNet, PointNet, PointNet++, and 3DSSD, are not limited to the examples given and are used to extract key points and feature information of key points in point cloud data.
  • the PointNet++ network is used as the point cloud feature extraction network.
  • the input point cloud is first down-sampled to 8 times the resolution (i.e. N/8 points) through 4 set abstraction layers, and then It is upsampled to 2 times the resolution (i.e. N/2 points) through the feature propagation layer, and each point is represented by a C-dimensional feature.
  • the set of key points is, for example, expressed as fi represents the feature information of the i-th key point, which is a feature vector.
  • the number and sampling methods of key points are not limited to the above examples and are determined based on the actual application model and test results.
  • step S104 for each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the first value of the key point is obtained. Feature encoding.
  • each key point sort other key points according to the distance from the key point from small to large, and select a preset number of other key points in order from front to back as the preset key point. other key points within the scope. For example, for each key point, the K key points closest to it are used as other key points within its corresponding preset range (local area).
  • the characteristic information of the key point for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relationship between the key point and other key points within the preset range of the key point.
  • the relative positional relationship between points determines the first feature encoding of the key point based on the self-attention mechanism.
  • the importance or contribution of other key points within the preset range of the key point relative to the encoding of the key point can be determined, and other key points within the preset range are combined when encoding the key point.
  • the characteristics of a key point describe the characteristics of the key point, which can improve the accuracy of the expression of the characteristics of the key point.
  • the relative positional relationship between the key point and other key points within the preset range of the key point is introduced into the attention mechanism, which improves the accuracy of feature expression and thereby improves the accuracy of target detection.
  • the characteristic information and position information of the key point and the characteristic information and position information of other key points within the preset range of the key point are input into the encoder in the first conversion model
  • the first self-attention module in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point is compared with the relative point location information respectively Input the first position coding layer, the second position coding layer and the third position coding layer, determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point; according to the relative point
  • the key vector and value vector of the relative point are determined based on the product of the feature information of the key point and the key matrix and value matrix in the first self-attention module respectively; according to the product of the feature information of the key point and the query matrix in the first self-attention module , determine the query vector of the key point; according to the first relative position code, the second relative position code
  • the sum of the first relative position code of the key point and the relative point and the query vector of the key point is used as the modified query vector of the key point;
  • the sum of the second relative position code of the point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point;
  • the third relative position code of the key point and the relative point is combined with the value of the relative point.
  • the sum of the vectors is used as the correction value vector of the relative point; input the product of the correction query vector of the key point and the correction key vector of the relative point and the dimension of the feature information of the key point into the first normalization layer to obtain the The weight of the relative point; perform a weighted sum of the correction value vectors of each relative point according to the weight of each relative point to obtain the first feature code of the key point.
  • the first transformation model is a Transformer model. Since the first transformation model is used to determine the correlation between internal points of the target, it can be called a Local Transformer.
  • the first transformation model may include the encoder part in the Transformer.
  • the first position encoding layer, the second position encoding layer and the third position encoding layer are the first feedforward network (FFN), the second feedforward network and the third feedforward network respectively.
  • FNN first feedforward network
  • the first relative position code, the second relative position code and the third relative position code is, for example, a softmax layer.
  • the encoder can also include a FFN (feedforward neural network) after the first self-attention module. If FFN is not included after the first self-attention module then can represent the first feature encoding, otherwise, The output after FFN represents the first feature encoding; is the query matrix in the first self-attention module, is the key moment in the first self-attention module array, is the first self-attention module median matrix, C is the dimension of the feature information of the key point, Respectively represent the functions corresponding to the first position coding layer, the second position coding layer and the third position coding layer.
  • FFN feedforward neural network
  • W PE and b PE represent the parameters of FFN (feedforward network or feature propagation layer), The corresponding W PE and b PE are different.
  • a multi-head attention mechanism can be applied in the encoder.
  • Each attention head can refer to the above formulas (1) and (2) to determine the encoding of key points.
  • the encoding of each attention head is spliced and multiplied by the preset matrix. (Or the product is further passed through FFN) to obtain the first feature encoding of the key point.
  • the parameters of the query matrix, key matrix, value matrix, and first position encoding layer, second position encoding layer, and third position encoding layer in each attention head are different.
  • the first feature encoding output by Local Transformer contains the contextual information of the local area where the key points are located, that is, the correlation between internal points of the target and points.
  • step S106 each key point is classified, and the point classified as the target center is determined as the reference center point.
  • the first feature code of the key point is input into the classification network to obtain a classification result of the key point; it is determined whether the key point is a point at the center of the target according to the classification result.
  • the classification network is trained by using the position information of each key point with annotation information as training data, wherein, for each key point, the key point is located within the bounding box of an object and belongs to the distance In the case of the point closest to the target center, the label information of the key point is the point at the target center.
  • the key points output by the first conversion model are dense points. Not every point represents a separate target (object). In order to reduce the redundancy of the final detection result, all key points are filtered and only those located at the center of the target are retained. key points. Therefore, it is necessary to judge whether each key point is the real target center.
  • each keypoint is assigned a label. If a keypoint is located within the bounding box of an object and is the point closest to the center of the object, it is assigned a positive label, otherwise it is assigned a negative label. Train a binary classification network based on the labels of key points. During the testing process, all key points are input into the binary classification network, and only key points with positive classification results are retained as reference center points.
  • step S108 for each reference center point, the first feature code of the reference center point is encoded according to the correlation between the reference center point and other reference center points to obtain the second feature of the reference center point. coding.
  • the He refers to the first feature code of the center point and the relative positional relationship between the reference center point and other reference center points, and determines the second feature code of the reference center point based on the self-attention mechanism.
  • the first feature code and position information of the reference center point, and the first feature codes and position information of other reference center points are input into the encoder in the second conversion model.
  • the second self-attention module in the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position information of the reference center point and the position information of the relative center point are
  • the fourth position coding layer, the fifth position coding layer and the sixth position coding layer are respectively input to determine the fourth relative position coding, the fifth relative position coding and the sixth relative position coding of the reference center point and the relative center point; according to The key vector and value vector of the relative center point are determined by multiplying the feature information of the relative center point with the key matrix and value matrix in the second self-attention module respectively; according to the feature information of the reference center point and the second self-attention module The product of the query matrix in the module determines the query vector of the reference center point; according to the fourth relative position code, the fifth relative position code, and the sixth relative
  • the sum of the reference center point, the fourth position code of the relative center point, and the query vector of the reference center point is used as the modified query vector of the reference center point.
  • the sum of the fifth position code of the reference center point and the relative center point and the key vector of the relative center point is used as the modified key vector of the relative center point;
  • the sixth position code of the reference center point and the relative center point is The sum of the position code and the value vector of the relative center point is used as the correction value vector of the relative center point; the product of the correction query vector of the reference center point and the correction key vector of the relative center point is multiplied by the third value vector of the reference center point.
  • the dimension of a feature code is input to the second normalization layer to obtain the weight of the relative center point; the correction value vector of each relative center point is weighted and summed according to the weight of each relative center point to obtain the second reference center point.
  • Feature encoding is input to the second normalization layer to obtain the weight of the relative center point; the correction value vector of each relative center point is weighted and summed according to the weight of each relative center point to obtain the second reference center point.
  • the second transformation model is a Transformer model. Since the second transformation model is used to determine the correlation between targets, it can be called a Global Transformer.
  • the second transformation model may include the encoder and decoder parts in the Transformer.
  • the fourth relative position coding, the fifth relative position coding and the sixth relative position coding are respectively the fourth feedforward network, the fifth feedforward network and the sixth feedforward network, for each reference center point and each corresponding Relative to the center point, input the difference between the coordinates of the reference center point and the coordinates of the relative center point into the fourth feedforward network, the fifth feedforward network and the sixth feedforward network respectively to determine the reference center point and the relative center point
  • the fourth relative position code, the fifth relative position code and the sixth relative position code is, for example, a softmax layer.
  • M reference center points are obtained from key points, and Global Transformer aims to learn the correlation between these M different targets.
  • the feature set of M reference center points (for example, M reference center points) is expressed as ) is input into the Global Transformer module to model the correlation between different targets:
  • h i is the output of the second self-attention module of the encoder of Global Transformer.
  • the encoder can also include FFN after the second self-attention module. If FFN is not included after the second self-attention module, h i can represent the second feature encoding. Otherwise, the output of h i after FFN represents the second feature encoding; is the query matrix in the second self-attention module, is the key matrix in the second self-attention module, is the second self-attention module median matrix, C is the dimension of the feature information of the key point, Respectively represent the functions corresponding to the fourth position coding layer, the fourth position coding layer and the fourth position coding layer.
  • the specific form of can refer to formula (2).
  • h i is the high-level feature of the i-th reference center of the output, which includes both the correlation of internal points of the target and the correlation between different targets.
  • step S110 the location and category of each target in the point cloud data are predicted based on the second feature encoding of each reference center point.
  • the second feature encoding of each reference center point is input to the decoder in the second conversion model to obtain the feature vector of each reference center point; the feature vector of each reference center point is input to the target detection network to obtain the point cloud. The location and category of each target in the data.
  • the target detection network is, for example, FFN.
  • the target detection network determines the location and category of each target in the point cloud data based on the feature vector that contains the correlation between internal points of the target and the correlation between different targets.
  • the key points of the point cloud data and the characteristic information of each key point are extracted.
  • the key point is determined based on the correlation between the key point and other key points within the preset range of the key point.
  • the first feature encoding of the point reflects the correlation between points in the local area inside the target.
  • the key points are divided into points at the target center and points at the non-target center.
  • the key points at the target center are point as a reference center point.
  • the second feature code of the reference center point is determined based on the correlation between the reference center point and other reference center points.
  • the local area inside the target is On top of the correlation between points in the method, the correlation between targets is added, and then the location and category of each target in the point cloud data are predicted based on the second feature encoding of each reference center point.
  • the solution of the above embodiment no longer analyzes the distance between all points in the point cloud. correlation modeling, but divides the correlation between points into correlation within the target and correlation between targets, which can capture local and global dependencies in the point cloud at the same time and adapt to the three-dimensional structure of point cloud data.
  • Features and irregularities improve the accuracy of target detection in point cloud data. In addition, it can also improve detection efficiency and save computing costs.
  • the solutions of the above embodiments are also improved.
  • the inventor further mined the three-dimensional features of point cloud data and introduced the geometric structure features between points into the encoding process, making the learning of the features of point cloud data more accurate, thus improving the accuracy of target detection. Specific embodiments are described below.
  • step S104 for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, the key point is within the preset range of the key point.
  • the relative positional relationship between other key points, as well as the relative geometric structure relationship between the key point and other key points within the preset range of the key point determine the first feature encoding of the key point based on the self-attention mechanism.
  • the characteristic information, position information and geometric structure information of the key point are combined with the characteristic information, position information and geometric structure information of other key points within the preset range of the key point.
  • the position information of the point and the position information of the relative point are input into the first position coding layer, the second position coding layer and the third position coding layer respectively, and the first relative position coding and the second relative position of the key point and the relative point are determined.
  • Encoding and third relative position encoding input the geometric structure information of the key point and the geometric structure information of the relative point into the geometric structure encoding layer to determine the relative geometric structure weight of the key point and the relative point; according to the characteristics of the relative point
  • the key vector and value vector of the relative point are determined by multiplying the information with the key matrix and value matrix in the first self-attention module respectively; according to the product of the feature information of the key point and the query matrix in the first self-attention module, determine The query vector of the key point; according to the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, the key vector of each relative point, each The value vector of the relative point and the query vector of the key point are used to determine the first feature encoding of the key point.
  • the sum of the first relative position code of the key point and the relative point and the query vector of the key point is used as the modified query vector of the key point;
  • the sum of the second relative position code of the point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point;
  • the third relative position code of the key point and the relative point is combined with the value of the relative point.
  • the sum of vectors is used as the correction value vector of the relative point;
  • the weight of the relative point is obtained;
  • the correction value vector of each relative point is weighted and summed according to the weight of each relative point, and the first feature code of the key point is obtained.
  • the product of the modified query vector of the key point and the modified key vector of the relative point is divided by the square root of the dimension of the feature information of the key point, and then divided by the relative value of the key point and the relative point.
  • the geometric structure weights are added, and the result is input into the first normalization layer to obtain the weight of the relative point.
  • the geometric structure information includes: at least one of the normal vector of the local plane and the curvature radius of the local plane.
  • the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located the curvature radius of the local plane where the key point is located.
  • the difference between the radius of curvature of the local plane where the relative point is located, and at least one of the angles between the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located determines the key point and the relative point. relative geometric structure weight.
  • the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases; the relative geometric structure weight of the key point and the relative point , increases with the increase of the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located; the relative geometric structure weight of the key point and the relative point increases with the increase of the normal vector of the local plane where the key point is located.
  • the difference between the curvature radius of the local plane and the curvature radius of the local plane where the relative point is located increases; the relative geometric structure weight of the key point and the relative point increases with the normal vector of the local plane where the key point is located and the relative The angle between the normal vectors of the local plane where the point is located increases as the result after the feature propagation layer increases.
  • the first transformation model is the Transformer model, which is called Local Transformer.
  • the first position coding layer, the second position coding layer and the third position coding layer may be a first feedforward network (FFN), a second feedforward network and a third feedforward network respectively, and the first normalization layer may be softmax. layer.
  • FNN first feedforward network
  • second feedforward network a second feedforward network
  • third feedforward network a third feedforward network respectively
  • the first normalization layer may be softmax. layer.
  • G i, j represents the relative geometric structure weight of key points i and j, which can be determined using the following formula:
  • n i and n j respectively represent the normal vectors of the local plane where the key points i and j are located
  • c i and c j respectively represent the curvature radius of the local plane where the key points i and j are located
  • the method representing the local plane where the key point i is located
  • ⁇ 1 , ⁇ 2 and ⁇ 3 are the parameters of the geometric structure encoding layer
  • FFN is the feedforward neural network or feature propagation layer.
  • G i,j is a Gaussian function model.
  • the correlation between two points is calculated through geometric parameters such as local plane normal vector, local curvature radius, and normal vector angle. If the correlation between two points is stronger, Stronger, the corresponding Gaussian weight G i,j will be larger.
  • the N points in the neighborhood that are closest to it find the N points in the neighborhood that are closest to it, and then use the least squares method to find a plane so that the sum of the distances projected by these N points onto this plane is the smallest, so The plane is the local plane.
  • the method of the above embodiment adds relative geometric structure weight to express the geometric structure relationship between points, and integrates object geometric features such as local plane normal vector, local curvature radius, and normal vector angle into the self-attention mechanism. , design an efficient feature extraction model and target detection model specifically for processing point cloud data.
  • the point cloud feature extraction network Point Cloud Backbone, point cloud backbone network
  • the feature information of the key points Point Feature
  • the feature information of the key points into the Local-Global Transformer Model.
  • the feature information and location information of each key point and other key points in the local area of the key point are input into the Local Transformer module, and the geometric structure information of each key point is input into the Local Transformer module to obtain the third A feature encoding.
  • the classification network for example, including the Sampling/Pooling module
  • the solution of the above embodiment proposes an end-to-end 3D point cloud target detection network based on the Transformer model, which can be called 3DTrans. It takes a 3D point cloud as input and outputs a set of labeled 3D bounding boxes to represent the target. The location of the (object).
  • the overall structure of the 3DTrans detection network is shown in Figure 2, which consists of two main components: feature extraction network and Local-Global Transformer. Given a point cloud of N points with XYZ coordinates as input, the feature extraction network downsamples the point cloud and learns the deep features of each point, thereby outputting a subset of points, and each point in the subset is represented by a C-dimensional feature representation, consider these points as key points. Local-Global Transformer takes the features of these key points as input and outputs the final target detection result.
  • the traditional Transformer model has been improved in two aspects, making it more suitable for processing 3D point cloud data.
  • the correlation between points is divided into for the correlation within objects and the correlation between objects.
  • the Local Transformer module is used to learn the correlation between points and points in the local area inside the same object
  • the Global Transformer module is used to learn the correlation between different objects.
  • Local -The Global Transformer model not only reduces the computational cost, but also captures local and global dependencies in the point cloud, thereby improving the model's learning expression ability.
  • object geometric structure information is added to the traditional Transformer model, and object geometric features such as local plane normal vector, local curvature radius, and normal vector angle are integrated into the self-attention mechanism, thereby designing a dedicated An efficient Transformer model for processing point cloud data.
  • the disclosed method does not require a large number of manual design components, does not require a large amount of prior knowledge, and does not require screening out redundant candidate frames for a large number of post-processing operations.
  • the model is simple and can be trained end-to-end, and the calculation cost is low. The processing efficiency is high and the accuracy is high.
  • the disclosed model can be trained end-to-end, labeling point cloud data images and labeling the bounding boxes and categories of each target as training samples.
  • the first feature encoding is input into the classification network to classify each key point and determine the point classified as the target center as the reference center point; input the feature information and position information of each reference center point into the second conversion model, and for each reference center point, according to the correlation between the reference center point and other reference center points, encode the first feature code of the reference center point to obtain the second feature code of the reference center point; convert the second feature code of each reference center point
  • the difference between the position and category of each target in the data and the bounding box and category of each annotated target is used to train the point cloud feature extraction network, the first conversion model, the classification network, the second conversion model, and the target detection network.
  • the point cloud feature extraction network and classification network can be pre-trained. For specific details, reference may be made to the foregoing embodiments and will not be described again here.
  • the present disclosure also proposes a device for detecting targets in point cloud data, which will be described below with reference to Figure 3 .
  • Figure 3 is a structural diagram of some embodiments of a device for detecting objects in point cloud data of the present disclosure.
  • the device 30 of this embodiment includes: a feature extraction module 310 , a first encoding module 320 , a classification module 330 , a second encoding module 340 , and a target detection module 350 .
  • Feature extraction module 310 is used to input point cloud data into the point cloud feature extraction network to obtain the output point cloud number. Multiple key points in the data, as well as the characteristic information of each key point.
  • the first encoding module 320 is configured to encode, for each key point, the characteristic information of the key point according to the correlation between the key point and other key points within the preset range of the key point, to obtain the key point.
  • the first characteristic encoding is configured to encode, for each key point, the characteristic information of the key point according to the correlation between the key point and other key points within the preset range of the key point, to obtain the key point.
  • the classification module 330 is used to classify each key point and determine the point classified as the target center as the reference center point.
  • the second encoding module 340 is configured to encode, for each reference center point, the first feature code of the reference center point according to the correlation between the reference center point and other reference center points, to obtain the first feature code of the reference center point. Second feature encoding.
  • the target detection module 350 is used to predict the location and category of each target in the point cloud data based on the second feature code of each reference center point.
  • the first encoding module 320 is used for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relationship between the key point and the key point.
  • the relative positional relationship between other key points within the preset range is determined based on the self-attention mechanism to determine the first feature encoding of the key point.
  • the first encoding module 320 is configured to, for each key point, input the characteristic information and position information of the key point, and the characteristic information and position information of other key points within the preset range of the key point into the first The first self-attention module of the encoder in the conversion model; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position of the key point is The information and the position information of the relative point are respectively input into the first position coding layer, the second position coding layer and the third position coding layer to determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point.
  • Three relative position encodings determine the key vector and value vector of the relative point based on the product of the characteristic information of the relative point and the key matrix and value matrix in the first self-attention module respectively; determine the key vector and value vector of the relative point based on the characteristic information of the key point and the first self-attention module
  • the product of the query matrix in the self-attention module determines the query vector of the key point; according to the first relative position code, the second relative position code, and the third relative position code of the key point and each relative point, each relative point
  • the key vector, the value vector of each relative point and the query vector of the key point determine the first feature encoding of the key point.
  • the first encoding module 320 is configured to, for each relative point, encode the sum of the key point and the first relative position encoding of the relative point and the query vector of the key point as a modified query of the key point Vector; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the sum of the third relative position code of the key point and the relative point and The sum of the value vectors of this relative point, As the correction value vector of the relative point; input the product of the correction query vector of the key point and the correction key vector of the relative point and the dimension of the feature information of the key point into the first normalization layer to obtain the relative point Weight; perform a weighted summation of the correction value vectors of each relative point based on the weight of each relative point to obtain the first feature code of the key point.
  • the first position encoding layer, the second position encoding layer and the third position encoding layer are respectively a first feedforward network, a second feedforward network and a third feedforward network, and the first encoding module 320 is used to The difference between the coordinates of the key point and the coordinates of the relative point is input into the first feedforward network, the second feedforward network, and the third feedforward network respectively.
  • the first encoding module 320 is configured to, for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, the key point is preset with the key point. Assume the relative positional relationship between other key points within the range, and the relative geometric structure relationship between the key point and other key points within the preset range of the key point, and determine the first feature encoding of the key point based on the self-attention mechanism. .
  • the first encoding module 320 is configured to, for each key point, combine the feature information, location information and geometric structure information of the key point, and the feature information, location information of other key points within the preset range of the key point.
  • Information and geometric structure information are input into the first self-attention module of the encoder in the first conversion model; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, input the position information of the key point and the position information of the relative point into the first position encoding layer, the second position encoding layer and the third position encoding layer respectively, and determine the first relative position encoding of the key point and the relative point.
  • the second relative position coding and the third relative position coding input the geometric structure information of the key point and the geometric structure information of the relative point into the geometric structure coding layer to determine the relative geometric structure weight of the key point and the relative point; according to The key vector and value vector of the relative point are determined by multiplying the characteristic information of the relative point with the key matrix and the value matrix in the first self-attention module respectively; according to the characteristic information of the key point and the query in the first self-attention module The product of the matrix determines the query vector of the key point; according to the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, the weight of each relative point.
  • the key vector, the value vector of each relative point and the query vector of the key point determine the first feature encoding of the key point.
  • the first encoding module 320 is configured to, for each relative point, encode the sum of the key point and the first relative position encoding of the relative point and the query vector of the key point as a modified query of the key point Vector; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the sum of the third relative position code of the key point and the relative point and The sum of the value vectors of the relative point is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point, the relative geometric structure weight of the key point and the relative point, and Dimension input of the feature information of the key point In the first normalization layer, the weight of the relative point is obtained; according to the weight of each relative point, the correction value vector of each relative point is weighted and summed to obtain the first feature code of the key point.
  • the first encoding module 320 is used to divide the product of the modified query vector of the key point and the modified key vector of the relative point by the square root of the dimension of the feature information of the key point, and then divide the product of the modified query vector of the key point and the key point with The relative geometric structure weights of the relative points are added up, and the result is input into the first normalization layer to obtain the weight of the relative point.
  • the geometric structure information includes: at least one of the normal vector of the local plane and the curvature radius of the local plane.
  • the first encoding module 320 is used to determine the key point based on the distance between the key point and the relative point.
  • the dot product of the normal vector of the local plane and the normal vector of the local plane where the relative point is located, the difference between the curvature radius of the local plane where the key point is located and the curvature radius of the local plane where the relative point is located, and the local plane where the key point is located At least one of the angles between the normal vector of the key point and the normal vector of the local plane where the relative point is located determines the relative geometric structure weight of the key point and the relative point.
  • the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases; the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases.
  • the increase in the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located increases the relative geometric structure weight of the key point and the relative point.
  • the difference between the curvature radius and the curvature radius of the local plane where the relative point is located increases; the relative geometric structure weight of the key point and the relative point increases with the normal vector of the local plane where the key point is located and the location of the relative point.
  • the angle between the normal vectors of the local plane increases.
  • the second encoding module 340 is configured to, for each reference center point, encode the first feature according to the reference center point, the first feature encoding of other reference center points, and the difference between the reference center point and other reference centers.
  • the relative positional relationship between points determines the second feature encoding of the reference center point based on the self-attention mechanism.
  • the second encoding module 340 is configured to, for each reference center point, input the first feature code and position information of the reference center point, and the first feature codes and position information of other reference center points into the second transformation.
  • the second self-attention module of the encoder in the model in the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position information of the reference center point is compared with the relative center point.
  • the position information of the center point is input into the fourth position coding layer, the fifth position coding layer and the sixth position coding layer respectively, and the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and the relative center point are determined.
  • Relative position encoding determine the key vector and value vector of the relative center point based on the product of the feature information of the relative center point and the key matrix and value matrix in the second self-attention module respectively; determine the key vector and value vector of the relative center point; based on the feature information of the reference center point and The product of the query matrix in the second self-attention module determines the query vector of the reference center point; according to the reference center point and each phase Determine the fourth relative position code, fifth relative position code and sixth relative position code of the center point, the key vector of each relative center point, the value vector of each relative center point and the query vector of the reference center point. Second feature encoding of the reference center point.
  • the second encoding module 340 is configured to, for each relative center point, encode the sum of the reference center point and the fourth position encoding of the relative center point and the query vector of the reference center point as the reference center.
  • the dimension of the first feature encoding of the reference center point is input into the second normalization layer to obtain the weight of the relative center point; the correction value vector of each relative center point is weighted and summed according to the weight of each relative center point to obtain the reference Second feature encoding of the center point.
  • the fourth relative position encoding, the fifth relative position encoding and the sixth relative position encoding are respectively a fourth feedforward network, a fifth feedforward network and a sixth feedforward network, and the second encoding module 340 is used to The differences between the coordinates of the reference center point and the coordinates of the relative center point are respectively input into the fourth feedforward network, the fifth feedforward network and the sixth feedforward network.
  • the classification module 330 is configured to input the first feature code of the key point into the classification network for each key point to obtain the classification result of the key point; determine whether the key point is the center of the target according to the classification result. point.
  • the classification network is trained by using the position information of each key point with annotation information as training data, wherein, for each key point, the key point is located within the bounding box of an object and belongs to the distance In the case of the point closest to the target center, the label information of the key point is the point at the target center.
  • the target detection module 350 is configured to predict the location and category of each target in the point cloud data according to the second feature code of each reference center point, including: inputting the second feature code of each reference center point into the second transformation
  • the decoder in the model obtains the feature vectors of each reference center point; inputs the feature vectors of each reference center point into the target detection network to obtain the location and category of each target in the point cloud data.
  • other key points within the preset range of the key point are determined using the following method: for each key point, other key points are determined in ascending order of distance from the key point. Sort, select a preset number of other key points in order from front to back, as other key points within the preset range of the key point.
  • the device for detecting objects in point cloud data in embodiments of the present disclosure can be implemented by various computing devices or computer systems, which will be described below with reference to FIG. 4 and FIG. 5 .
  • Figure 4 is a structural diagram of some embodiments of a device for detecting objects in point cloud data of the present disclosure.
  • the device 40 of this embodiment includes: a memory 410 and a processor 420 coupled to the memory 410.
  • the processor 420 is configured to execute any implementation of the present disclosure based on instructions stored in the memory 410.
  • the memory 410 may include, for example, system memory, fixed non-volatile storage media, etc.
  • System memory stores, for example, operating systems, applications, boot loaders, databases, and other programs.
  • FIG. 5 is a structural diagram of another embodiment of a device for detecting objects in point cloud data of the present disclosure.
  • the device 50 of this embodiment includes: a memory 510 and a processor 520, which are similar to the memory 410 and the processor 420 respectively. It may also include an input/output interface 530, a network interface 540, a storage interface 550, etc. These interfaces 530, 540, 550, the memory 510 and the processor 520 may be connected through a bus 560, for example.
  • the input and output interface 530 provides a connection interface for input and output devices such as a monitor, mouse, keyboard, and touch screen.
  • the network interface 540 provides a connection interface for various networked devices, such as a database server or a cloud storage server.
  • the storage interface 550 provides a connection interface for external storage devices such as SD cards and USB disks.
  • the present disclosure also provides an object sorting device, which will be described below in conjunction with FIG. 6 .
  • the item sorting device 6 includes: the detection device 30/40/50 of objects in point cloud data in any of the aforementioned embodiments, and the sorting component 62 is used according to the detection device 30 of objects in point cloud data. /40/50 The position and category of each target in the point cloud data output, and the items corresponding to the target are sorted.
  • the device 6 also includes: a point cloud collection component 64, used to collect point cloud data in a preset area, and send the point cloud data to the detection device 30/40/50 of the target in the point cloud data.
  • a point cloud collection component 64 used to collect point cloud data in a preset area, and send the point cloud data to the detection device 30/40/50 of the target in the point cloud data.
  • the sorting component is, for example, a robotic arm, and the point cloud collection component is, for example, a three-dimensional camera.
  • the three-dimensional point cloud target detection technology proposed in this disclosure can be applied to products such as vision-based sorting robotic arms in logistics scenarios. That is, the point cloud data collected by the three-dimensional camera installed on the sorting robotic arm can be used to accurately locate and identify each object. An item to help the robotic arm sort one by one.
  • the present disclosure also provides a computer program, including: instructions, which when executed by the processor, cause the processor to execute the method for detecting objects in point cloud data as in any of the foregoing embodiments.
  • embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure may employ an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. The form of the embodiment in terms of parts. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk memory, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the technical field of computers, and relates to a method and apparatus for detecting a target in point cloud data, and a computer-readable storage medium. The method in the present disclosure comprises: inputting point cloud data into a point cloud feature extraction network, so as to obtain a plurality of key points in output point cloud data and feature information of each key point; for each key point, encoding feature information of the key point according to the association between the key point and other key points within a preset range of the key point, so as to obtain a first feature code of the key point; classifying each key point, and determining, as a reference center point, a point which is classified as a target center; for each reference center point, encoding a first feature code of the reference center point according to the association between the reference center point and other reference center points, so as to obtain a second feature code of the reference center point; and predicting the position and category of each target in the point cloud data according to a second feature code of each reference center point.

Description

点云数据中目标的检测方法、装置和计算机可读存储介质Detection method, device and computer-readable storage medium for targets in point cloud data
相关申请的交叉引用Cross-references to related applications
本申请是以CN申请号为202210409033.6,申请日为2022年4月19日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with CN application number 202210409033.6 and the filing date is April 19, 2022, and claims its priority. The disclosure content of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及计算机技术领域,特别涉及一种点云数据中目标的检测方法、装置和计算机可读存储介质。The present disclosure relates to the field of computer technology, and in particular to a method, device and computer-readable storage medium for detecting targets in point cloud data.
背景技术Background technique
3D(3-dimension,三维)目标检测的目的是识别并定位3D点云中出现的物体,已经在自动驾驶和增强现实等领域获得了广泛应用。与2D图像相比,3D点云可以提供物体的几何形状并捕获场景的3D结构。The purpose of 3D (3-dimension, three-dimensional) target detection is to identify and locate objects appearing in 3D point clouds, and has been widely used in fields such as autonomous driving and augmented reality. Compared to 2D images, 3D point clouds can provide the geometry of objects and capture the 3D structure of a scene.
发明内容Contents of the invention
根据本公开的一些实施例,提供的一种点云数据中目标的检测方法,包括:将点云数据输入点云特征提取网络,得到输出的点云数据中的多个关键点,以及各个关键点的特征信息;针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码;对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点;针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码;根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别。According to some embodiments of the present disclosure, a method for detecting targets in point cloud data is provided, including: inputting point cloud data into a point cloud feature extraction network, obtaining multiple key points in the output point cloud data, and each key point Characteristic information of the point; for each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the first key point of the key point is obtained. Feature encoding; classify each key point and determine the point classified as the target center as the reference center point; for each reference center point, based on the correlation between the reference center point and other reference center points, the reference center point The first feature code of the point is encoded to obtain the second feature code of the reference center point; based on the second feature code of each reference center point, the location and category of each target in the point cloud data are predicted.
在一些实施例中,针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码包括:针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,以及该关键点与该关键点预设范围内的其他关键点之间相对位置关系,基于自注意力机制确定该关键点的第一特征编码。In some embodiments, for each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the third key point of the key point is obtained. A feature encoding includes: for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the key point and other key points within the preset range of the key point The relative positional relationship between them determines the first feature encoding of the key point based on the self-attention mechanism.
在一些实施例中,针对每个关键点,根据该关键点的特征信息,该关键点预设范 围内的其他关键点的特征信息,以及该关键点与该关键点预设范围内的其他关键点之间相对位置关系,基于自注意力机制确定该关键点的第一特征编码包括:针对每个关键点,将该关键点的特征信息和位置信息,该关键点预设范围内的其他关键点的特征信息和位置信息输入第一转换模型中编码器的第一自注意力模块;在第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;根据该相对点的特征信息分别与第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;根据该关键点的特征信息与第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码和第三相对位置编码、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。In some embodiments, for each key point, according to the characteristic information of the key point, the key point preset range Feature information of other key points within the range, as well as the relative positional relationship between the key point and other key points within the preset range of the key point, determining the first feature encoding of the key point based on the self-attention mechanism includes: for each A key point, the characteristic information and position information of the key point, and the characteristic information and position information of other key points within the preset range of the key point are input into the first self-attention module of the encoder in the first conversion model; in the first In a self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point and the position information of the relative point are respectively input into the first position coding layer. , the second position coding layer and the third position coding layer, determine the first relative position code, the second relative position code and the third relative position code of the key point and the relative point; according to the characteristic information of the relative point, respectively and The product of the key matrix and the value matrix in the first self-attention module determines the key vector and value vector of the relative point; based on the product of the feature information of the key point and the query matrix in the first self-attention module, determines the key point Query vector; based on the first relative position code, the second relative position code and the third relative position code of the key point and each relative point, the key vector of each relative point, the value vector of each relative point and the key point The query vector determines the first feature encoding of the key point.
在一些实施例中,根据该关键点与每个相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码包括:针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;将该关键点的修正查询向量与该相对点的修正键向量的乘积,与该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重;根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。In some embodiments, according to the first relative position code of the key point and each relative point, the second relative position code and the third relative position code, the key vector of each relative point, the value vector of each relative point and The query vector of the key point, determining the first feature code of the key point includes: for each relative point, the sum of the key point and the first relative position code of the relative point and the query vector of the key point is as the The modified query vector of the key point; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the second relative position code of the key point and the relative point is The sum of the three relative position codes and the value vector of the relative point is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point is multiplied by the product of the feature information of the key point The dimension is input to the first normalization layer to obtain the weight of the relative point; the correction value vector of each relative point is weighted and summed according to the weight of each relative point to obtain the first feature code of the key point.
在一些实施例中,第一位置编码层、第二位置编码层和第三位置编码层分别为第一前馈网络、第二前馈网络和第三前馈网络,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层包括:将该关键点的坐标与该相对点的坐标的差分别输入第一前馈网络、第二前馈网络、第三前馈网络。In some embodiments, the first position coding layer, the second position coding layer and the third position coding layer are respectively a first feedforward network, a second feedforward network and a third feedforward network, and the position information of the key point is Inputting the position information of the relative point into the first position coding layer, the second position coding layer and the third position coding layer respectively includes: inputting the difference between the coordinates of the key point and the coordinates of the relative point into the first feedforward network, respectively. The second feedforward network and the third feedforward network.
在一些实施例中,针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码包括:针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关 键点的特征信息,该关键点与该关键点预设范围内的其他关键点之间相对位置关系,以及该关键点与该关键点预设范围内的其他关键点之间相对几何结构关系,基于自注意力机制确定该关键点的第一特征编码。In some embodiments, for each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the third key point of the key point is obtained. A feature encoding includes: for each key point, based on the characteristic information of the key point, other related relations within the preset range of the key point Characteristic information of the key point, the relative positional relationship between the key point and other key points within the preset range of the key point, and the relative geometric structure relationship between the key point and other key points within the preset range of the key point, The first feature encoding of the key point is determined based on the self-attention mechanism.
在一些实施例中,针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,该关键点与该关键点预设范围内的其他关键点之间相对位置关系,以及该关键点与该关键点预设范围内的其他关键点之间相对几何结构关系,基于自注意力机制确定该关键点的第一特征编码包括:针对每个关键点,将该关键点的特征信息、位置信息和几何结构信息,该关键点预设范围内的其他关键点的特征信息、位置信息和几何结构信息输入第一转换模型中编码器的第一自注意力模块;在第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;将该关键点的几何结构信息与该相对点的几何结构信息输入几何结构编码层,确定该关键点与该相对点的相对几何结构权重;根据该相对点的特征信息分别与第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;根据该关键点的特征信息与第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码、第三相对位置编码和相对几何结构权重、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。In some embodiments, for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, the key point is consistent with other key points within the preset range of the key point. The relative positional relationship between the key point and the relative geometric structure relationship between the key point and other key points within the preset range of the key point. The first feature encoding of the key point based on the self-attention mechanism includes: for each key point , input the characteristic information, position information and geometric structure information of the key point and the characteristic information, position information and geometric structure information of other key points within the preset range of the key point into the first self-attention of the encoder in the first conversion model Force module; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point and the position information of the relative point are input separately The first position coding layer, the second position coding layer and the third position coding layer determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point; The geometric structure information and the geometric structure information of the relative point are input into the geometric structure encoding layer to determine the relative geometric structure weight of the key point and the relative point; according to the characteristic information of the relative point, the key matrix and the key matrix in the first self-attention module are respectively The key vector and value vector of the relative point are determined by the product of the value matrix; the query vector of the key point is determined based on the product of the feature information of the key point and the query matrix in the first self-attention module; the query vector of the key point is determined based on the key point and each The first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of a relative point, the key vector of each relative point, the value vector of each relative point and the query vector of the key point, Determine the first feature encoding of the key point.
在一些实施例中,根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码、第三相对位置编码和相对几何结构权重、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码包括:针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;将该关键点的修正查询向量与该相对点的修正键向量的乘积,该关键点与该相对点的相对几何结构权重以及该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重;根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。In some embodiments, according to the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, the key vector of each relative point, each relative The value vector of the point and the query vector of the key point. Determining the first feature encoding of the key point includes: for each relative point, combining the first relative position encoding of the key point and the relative point with the query vector of the key point. The sum of is used as the modified query vector of the key point; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the sum of the key point and the key vector of the relative point is The sum of the third relative position code of the relative point and the value vector of the relative point is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point is the key point. The relative geometric structure weight of the relative point and the dimension of the feature information of the key point are input into the first normalization layer to obtain the weight of the relative point; the correction value vector of each relative point is weighted according to the weight of each relative point. and, get the first feature code of the key point.
在一些实施例中,将该关键点的修正查询向量与该相对点的修正键向量的乘积, 该关键点与该相对点的相对几何结构权重与该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重包括:将该关键点的修正查询向量与该相对点的修正键向量的乘积除以该关键点的特征信息的维度的平方根,再与该关键点与该相对点的相对几何结构权重叠加,得到的结果输入第一归一化层,得到该相对点的权重。In some embodiments, the product of the modified query vector of the key point and the modified key vector of the relative point, The relative geometric structure weight of the key point and the relative point and the dimension of the feature information of the key point are input into the first normalization layer. Obtaining the weight of the relative point includes: combining the modified query vector of the key point and the relative point. The product of the corrected key vector is divided by the square root of the dimension of the feature information of the key point, and then added to the relative geometric structure weight of the key point and the relative point. The result is input into the first normalization layer to obtain the relative point. Weights.
在一些实施例中,几何结构信息包括:所在局部平面的法向量、所在局部平面的曲率半径中至少一项,确定该关键点与该相对点的相对几何结构权重包括:根据该关键点与该相对点的距离,该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积,该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差,以及该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角中至少一项,确定该关键点与该相对点的相对几何结构权重。In some embodiments, the geometric structure information includes: at least one of the normal vector of the local plane and the curvature radius of the local plane. Determining the relative geometric structure weight of the key point and the relative point includes: based on the key point and the relative point. The distance between the relative points, the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located, the difference between the curvature radius of the local plane where the key point is located and the curvature radius of the local plane where the relative point is located , and at least one of the angles between the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located, determines the relative geometric structure weight of the key point and the relative point.
在一些实施例中,该关键点与该相对点的相对几何结构权重,随该关键点与该相对点的距离的增大而减小;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积的增大而增大该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差的增大而增大;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角经过特征传播层之后的结果的增大而增大。In some embodiments, the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases; the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases. The increase in the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located increases the relative geometric structure weight of the key point and the relative point. The difference between the curvature radius and the curvature radius of the local plane where the relative point is located increases; the relative geometric structure weight of the key point and the relative point increases with the normal vector of the local plane where the key point is located and the location of the relative point. The angle between the normal vectors of the local plane increases as the result after the feature propagation layer increases.
在一些实施例中,针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码包括:针对每个参考中心点,根据该参考中心点的第一特征编码,其他参考中心点的第一特征编码,以及该参考中心点与其他参考中心点之间相对位置关系,基于自注意力机制确定该参考中心点的第二特征编码。In some embodiments, for each reference center point, the first feature code of the reference center point is encoded according to the correlation between the reference center point and other reference center points to obtain the second feature code of the reference center point. The feature coding includes: for each reference center point, based on the first feature code of the reference center point, the first feature codes of other reference center points, and the relative position relationship between the reference center point and other reference center points, based on the The attention mechanism determines the second feature encoding of the reference center point.
在一些实施例中,针对每个参考中心点,根据该参考中心点的第一特征编码,其他参考中心点的第一特征编码,以及该参考中心点与其他参考中心点之间相对位置关系,基于自注意力机制确定该参考中心点的第二特征编码包括:针对每个参考中心点,将该参考中心点的第一特征编码和位置信息,其他参考中心点的第一特征编码和位置信息输入第二转换模型中编码器的第二自注意力模块;在第二自注意力模块中,将其他参考中心点的作为相对中心点,针对每个相对中心点,将该参考中心点的位置信息与该相对中心点的位置信息分别输入第四位置编码层、第五位置编码层和第六位置编码层,确定该参考中心点与该相对中心点的第四相对位置编码,第五相对位置编码和 第六相对位置编码;根据该相对中心点的特征信息分别与第二自注意力模块中键矩阵和值矩阵的乘积,确定该相对中心点的键向量和值向量;根据该参考中心点的特征信息与第二自注意力模块中查询矩阵的乘积,确定该参考中心点的查询向量;根据该参考中心点与每个相对中心点的第四相对位置编码、第五相对位置编码和第六相对位置编码、每个相对中心点的键向量、每个相对中心点的值向量和该参考中心点的查询向量,确定该参考中心点的第二特征编码。In some embodiments, for each reference center point, according to the first feature code of the reference center point, the first feature codes of other reference center points, and the relative positional relationship between the reference center point and other reference center points, Determining the second feature encoding of the reference center point based on the self-attention mechanism includes: for each reference center point, the first feature encoding and position information of the reference center point, the first feature encoding and position information of other reference center points Input the second self-attention module of the encoder in the second conversion model; in the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position of the reference center point is The information and the position information of the relative center point are respectively input into the fourth position coding layer, the fifth position coding layer and the sixth position coding layer to determine the fourth relative position coding and the fifth relative position of the reference center point and the relative center point. coding and The sixth relative position encoding; determine the key vector and value vector of the relative center point according to the product of the characteristic information of the relative center point and the key matrix and value matrix in the second self-attention module respectively; according to the characteristics of the reference center point The product of the information and the query matrix in the second self-attention module determines the query vector of the reference center point; based on the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and each relative center point The position code, the key vector of each relative center point, the value vector of each relative center point and the query vector of the reference center point determine the second feature code of the reference center point.
在一些实施例中,根据该参考中心点与每个相对中心点的第四相对位置编码、第五相对位置编码和第六相对位置编码、每个相对中心点的键向量、每个相对中心点的值向量和该参考中心点的查询向量,确定该参考中心点的第二特征编码包括:针对每个相对中心点,将该参考中心点与该相对中心点的第四位置编码与该参考中心点的查询向量的和,作为该参考中心点的修正查询向量;将该参考中心点与该相对中心点的第五位置编码与该相对中心点的键向量的和,作为该相对中心点的修正键向量;将该参考中心点与该相对中心点的第六位置编码与该相对中心点的值向量的和,作为该相对中心点的修正值向量;将该参考中心点的修正查询向量与该相对中心点的修正键向量的乘积与该参考中心点的第一特征编码的维度输入第二归一化层,得到该相对中心点的权重;根据各个相对中心点的权重对各个相对中心点的修正值向量进行加权求和,得到该参考中心点的第二特征编码。In some embodiments, according to the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and each relative center point, the key vector of each relative center point, each relative center point The value vector of and the query vector of the reference center point, determining the second feature code of the reference center point includes: for each relative center point, combining the reference center point and the fourth position code of the relative center point with the reference center The sum of the query vectors of the points is used as the modified query vector of the reference center point; the sum of the fifth position code of the reference center point and the relative center point and the key vector of the relative center point is used as the correction of the relative center point Key vector; use the sum of the sixth position code of the reference center point and the relative center point and the value vector of the relative center point as the correction value vector of the relative center point; combine the correction query vector of the reference center point with the value vector of the relative center point The product of the modified key vector of the relative center point and the dimension of the first feature encoding of the reference center point is input to the second normalization layer to obtain the weight of the relative center point; the weight of each relative center point is calculated based on the weight of each relative center point. The correction value vectors are weighted and summed to obtain the second feature code of the reference center point.
在一些实施例中,第四相对位置编码,第五相对位置编码和第六相对位置编码分别为第四前馈网络、第五前馈网络和第六前馈网络,将该参考中心点的位置信息与该相对中心点的位置信息分别输入第四位置编码层、第五位置编码层和第六位置编码层包括:将该参考中心点的坐标与该相对中心点的坐标的差分别输入第四前馈网络、第五前馈网络和第六前馈网络。In some embodiments, the fourth relative position code, the fifth relative position code and the sixth relative position code are respectively the fourth feedforward network, the fifth feedforward network and the sixth feedforward network, and the position of the reference center point is The information and the position information of the relative center point are respectively input into the fourth position encoding layer, the fifth position encoding layer and the sixth position encoding layer, including: inputting the difference between the coordinates of the reference center point and the coordinates of the relative center point into the fourth position encoding layer respectively. Feedforward network, fifth feedforward network and sixth feedforward network.
在一些实施例中,对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点包括:针对每个关键点,将该关键点的第一特征编码输入分类网络,得到该关键点的分类结果;根据分类结果确定该关键点是否为目标中心的点。In some embodiments, classifying each key point and determining the point classified as the target center as the reference center point includes: for each key point, encoding the first feature of the key point into the classification network to obtain the key point The classification result; determine whether the key point is the center point of the target based on the classification result.
在一些实施例中,分类网络是利用带有标注信息的各个关键点的位置信息作为训练数据训练得到的,其中,针对每个关键点,在该关键点位于一个目标的边界框内且属于距离目标中心最近的点的情况下,该关键点的标注信息为目标中心的点。In some embodiments, the classification network is trained by using the position information of each key point with annotation information as training data, wherein, for each key point, the key point is located within the bounding box of an object and belongs to the distance In the case of the point closest to the target center, the label information of the key point is the point at the target center.
在一些实施例中,根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别包括:将各个参考中心点的第二特征编码输入第二转换模型中解码器, 得到各个参考中心点的特征向量;将各个参考中心点的特征向量输入目标检测网络,得到点云数据中各个目标的位置和类别。In some embodiments, predicting the location and category of each target in the point cloud data according to the second feature code of each reference center point includes: inputting the second feature code of each reference center point into the decoder in the second transformation model, Obtain the feature vectors of each reference center point; input the feature vectors of each reference center point into the target detection network to obtain the location and category of each target in the point cloud data.
在一些实施例中,针对每个关键点,该关键点预设范围内的其他关键点采用以下方法确定:针对每个关键点,将其他关键点按照与该关键点的距离由小到大进行排序,按照排序由前到后的顺序选取预设个数的其他关键点,作为该关键点预设范围内的其他关键点。In some embodiments, for each key point, other key points within the preset range of the key point are determined using the following method: for each key point, other key points are determined in ascending order of distance from the key point. Sort, select a preset number of other key points in order from front to back, as other key points within the preset range of the key point.
根据本公开的另一些实施例,提供的一种点云数据中目标的检测装置,包括:特征提取模块,用于将点云数据输入点云特征提取网络,得到输出的点云数据中的多个关键点,以及各个关键点的特征信息;第一编码模块,用于针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码;分类模块,用于对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点;第二编码模块,用于针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码;目标检测模块,用于根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别。According to other embodiments of the present disclosure, a device for detecting targets in point cloud data is provided, including: a feature extraction module for inputting point cloud data into a point cloud feature extraction network to obtain multiple features in the output point cloud data. key points, and the characteristic information of each key point; the first encoding module is used for each key point, based on the correlation between the key point and other key points within the preset range of the key point, the key point The characteristic information of the point is encoded to obtain the first feature encoding of the key point; the classification module is used to classify each key point and determine the point classified as the target center as the reference center point; the second encoding module is used to target Each reference center point, according to the correlation between the reference center point and other reference center points, encodes the first feature code of the reference center point to obtain the second feature code of the reference center point; the target detection module, Used to predict the location and category of each target in the point cloud data based on the second feature encoding of each reference center point.
根据本公开的又一些实施例,提供的一种点云数据中目标的检测装置,包括:处理器;以及耦接至处理器的存储器,用于存储指令,指令被处理器执行时,使处理器执行如前述任意实施例的点云数据中目标的检测方法。According to further embodiments of the present disclosure, a device for detecting objects in point cloud data is provided, including: a processor; and a memory coupled to the processor for storing instructions. When the instructions are executed by the processor, the processing The device performs the detection method of objects in point cloud data as in any of the aforementioned embodiments.
根据本公开的再一些实施例,提供的一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意实施例的点云数据中目标的检测方法的步骤。According to further embodiments of the present disclosure, a non-transitory computer-readable storage medium is provided, on which a computer program is stored, wherein when the program is executed by a processor, the object in the point cloud data of any of the foregoing embodiments is achieved. Steps of the detection method.
根据本公开的又一些实施例,提供的一种物品分拣装置,包括:前述任意实施例的点云数据中目标的检测装置以及分拣部件;分拣部件用于根据点云数据中目标的检测装置输出的点云数据中各个目标的位置和类别,对目标进行分拣。According to further embodiments of the present disclosure, an object sorting device is provided, including: a detection device for objects in point cloud data and a sorting component according to any of the foregoing embodiments; the sorting component is used to detect objects in point cloud data according to Detect the location and category of each target in the point cloud data output by the device, and sort the targets.
在一些实施例中,该装置还包括:点云采集部件,用于采集预设区域的点云数据,将点云数据发送至点云数据中目标的检测装置。In some embodiments, the device further includes: a point cloud collection component, configured to collect point cloud data in a preset area, and send the point cloud data to a detection device for targets in the point cloud data.
根据本公开的再一些实施例,提供的一种计算机程序,包括:指令,所述指令被所述处理器执行时,使所述处理器执行如前述任意实施例的点云数据中目标的检测方法。According to further embodiments of the present disclosure, a computer program is provided, including: instructions, which when executed by the processor, cause the processor to perform detection of targets in point cloud data as in any of the foregoing embodiments. method.
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其 优点将会变得清楚。Through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings, other features of the present disclosure and their The advantages will become clear.
附图说明Description of the drawings
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present disclosure or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1示出本公开的一些实施例的点云数据中目标的检测方法的流程示意图。Figure 1 shows a schematic flowchart of a method for detecting targets in point cloud data according to some embodiments of the present disclosure.
图2示出本公开的另一些实施例的点云数据中目标的模型的示意图。FIG. 2 shows a schematic diagram of a model of an object in point cloud data according to other embodiments of the present disclosure.
图3示出本公开的一些实施例的点云数据中目标的检测装置的结构示意图。Figure 3 shows a schematic structural diagram of a device for detecting targets in point cloud data according to some embodiments of the present disclosure.
图4示出本公开的另一些实施例的点云数据中目标的检测装置的结构示意图。Figure 4 shows a schematic structural diagram of a device for detecting targets in point cloud data according to other embodiments of the present disclosure.
图5示出本公开的又一些实施例的点云数据中目标的检测装置的结构示意图。Figure 5 shows a schematic structural diagram of a device for detecting targets in point cloud data according to further embodiments of the present disclosure.
图6示出本公开的一些实施例的物品分拣装置的结构示意图。Figure 6 shows a schematic structural diagram of an object sorting device according to some embodiments of the present disclosure.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only some of the embodiments of the present disclosure, rather than all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application or uses. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this disclosure.
发明人发现:由于点云的三维特征以及不规则性,无法使用卷积神经网络等强大的深度学习模型直接处理,因此需要专门的3D特征学习技术,以识别点云数据中的目标。The inventor found that due to the three-dimensional characteristics and irregularities of point clouds, they cannot be directly processed using powerful deep learning models such as convolutional neural networks. Therefore, specialized 3D feature learning technology is needed to identify targets in point cloud data.
本公开所要解决的一个技术问题是:提出一种点云数据中目标的检测方法,提高点云数据中目标检测的准确性。A technical problem to be solved by this disclosure is to propose a method for detecting targets in point cloud data to improve the accuracy of target detection in point cloud data.
本公开提供一种点云数据中目标的检测方法,下面结合图1进行描述。The present disclosure provides a method for detecting targets in point cloud data, which is described below with reference to Figure 1 .
图1为本公开点云数据中目标的检测方法一些实施例的流程图。如图1所示,该实施例的方法包括:步骤S102~S110。Figure 1 is a flow chart of some embodiments of a method for detecting objects in point cloud data of the present disclosure. As shown in Figure 1, the method in this embodiment includes: steps S102 to S110.
在步骤S102中,将点云数据输入点云特征提取网络,得到输出的点云数据中的多个关键点,以及各个关键点的特征信息。 In step S102, the point cloud data is input into the point cloud feature extraction network to obtain multiple key points in the output point cloud data and feature information of each key point.
给定具有XYZ坐标的N个点的点云作为输入,点云特征提取网络可以对点云数据进行降采样并学习每一个点的深度特征,从而输出一个点的子集,并且每一个点均由一个C(C为正整数)维特征表示,将这些点视为关键点。Given a point cloud of N points with XYZ coordinates as input, the point cloud feature extraction network can downsample the point cloud data and learn the depth features of each point, thereby outputting a subset of points, and each point is Represented by a C (C is a positive integer) dimensional feature, these points are regarded as key points.
点云特征提取网络例如为VoxelNet,PointNet,PointNet++,3DSSD,不限于所举示例,用于提取点云数据的中关键点和关键点的特征信息。例如,采用PointNet++网络来作为点云特征提取网络。将包含N个点的点云数据作为输入,遵循编码器-解码器结构,首先通过4个集合抽象层将输入的点云下采样到8倍分辨率(即N/8个点),然后再通过特征传播层将其上采样到2倍分辨率(即N/2个点),并且每一个点都用一个C维的特征来表示,关键点的集合例如表示为fi表示第i个关键点的特征信息,为特征向量。关键点的数量和采样方式不限于上述示例,根据实际应用的模型和测试结果确定。Point cloud feature extraction networks, such as VoxelNet, PointNet, PointNet++, and 3DSSD, are not limited to the examples given and are used to extract key points and feature information of key points in point cloud data. For example, the PointNet++ network is used as the point cloud feature extraction network. Taking point cloud data containing N points as input, following the encoder-decoder structure, the input point cloud is first down-sampled to 8 times the resolution (i.e. N/8 points) through 4 set abstraction layers, and then It is upsampled to 2 times the resolution (i.e. N/2 points) through the feature propagation layer, and each point is represented by a C-dimensional feature. The set of key points is, for example, expressed as fi represents the feature information of the i-th key point, which is a feature vector. The number and sampling methods of key points are not limited to the above examples and are determined based on the actual application model and test results.
在步骤S104中,针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码。In step S104, for each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the first value of the key point is obtained. Feature encoding.
例如,针对每个关键点,将其他关键点按照与该关键点的距离由小到大进行排序,按照排序由前到后的顺序选取预设个数的其他关键点,作为该关键点预设范围内的其他关键点。例如,针对每个关键点,将距其最近的K个关键点作为其对应的预设范围内(局部区域)的其他关键点。For example, for each key point, sort other key points according to the distance from the key point from small to large, and select a preset number of other key points in order from front to back as the preset key point. other key points within the scope. For example, for each key point, the K key points closest to it are used as other key points within its corresponding preset range (local area).
在一些实施例中,针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,以及该关键点与该关键点预设范围内的其他关键点之间相对位置关系,基于自注意力机制确定该关键点的第一特征编码。In some embodiments, for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relationship between the key point and other key points within the preset range of the key point. The relative positional relationship between points determines the first feature encoding of the key point based on the self-attention mechanism.
针对每个关键点,基于自注意力机制可以确定该关键点预设范围内的其他关键点相对于该关键点编码的重要程度或贡献,在对该关键点进行编码时结合预设范围内其他关键点的特征对该关键点的特征进行描述,能够提高该关键点特征表达的准确性。此外,将该关键点与该关键点预设范围内的其他关键点之间相对位置关系引入注意力机制中,提高了特征表达的准确性,进而提高目标检测的准确性。For each key point, based on the self-attention mechanism, the importance or contribution of other key points within the preset range of the key point relative to the encoding of the key point can be determined, and other key points within the preset range are combined when encoding the key point. The characteristics of a key point describe the characteristics of the key point, which can improve the accuracy of the expression of the characteristics of the key point. In addition, the relative positional relationship between the key point and other key points within the preset range of the key point is introduced into the attention mechanism, which improves the accuracy of feature expression and thereby improves the accuracy of target detection.
进一步,在一些实施例中,针对每个关键点,将该关键点的特征信息和位置信息,该关键点预设范围内的其他关键点的特征信息和位置信息输入第一转换模型中编码器的第一自注意力模块;在第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别 输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;根据该相对点的特征信息分别与第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;根据该关键点的特征信息与第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码和第三相对位置编码、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。Further, in some embodiments, for each key point, the characteristic information and position information of the key point and the characteristic information and position information of other key points within the preset range of the key point are input into the encoder in the first conversion model The first self-attention module; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point is compared with the relative point location information respectively Input the first position coding layer, the second position coding layer and the third position coding layer, determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point; according to the relative point The key vector and value vector of the relative point are determined based on the product of the feature information of the key point and the key matrix and value matrix in the first self-attention module respectively; according to the product of the feature information of the key point and the query matrix in the first self-attention module , determine the query vector of the key point; according to the first relative position code, the second relative position code and the third relative position code of the key point and each relative point, the key vector of each relative point, the key vector of each relative point The value vector and the query vector of the key point determine the first feature encoding of the key point.
进一步,在一些实施例中,针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;将该关键点的修正查询向量与该相对点的修正键向量的乘积与该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重;根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。Further, in some embodiments, for each relative point, the sum of the first relative position code of the key point and the relative point and the query vector of the key point is used as the modified query vector of the key point; The sum of the second relative position code of the point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the third relative position code of the key point and the relative point is combined with the value of the relative point The sum of the vectors is used as the correction value vector of the relative point; input the product of the correction query vector of the key point and the correction key vector of the relative point and the dimension of the feature information of the key point into the first normalization layer to obtain the The weight of the relative point; perform a weighted sum of the correction value vectors of each relative point according to the weight of each relative point to obtain the first feature code of the key point.
例如,第一转换模型为Transformer模型,由于第一转换模型用于确定目标内部点之间的关联性,可以将其称为Local Transformer。第一转换模型可以包括Transformer中的编码器部分。此外,例如,第一位置编码层、第二位置编码层和第三位置编码层分别为第一前馈网络(FFN)、第二前馈网络和第三前馈网络,针对每个关键点和对应的每个相对点,将该关键点的坐标与该相对点的坐标的差分别输入第一前馈网络、第二前馈网络、第三前馈网络,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码。第一归一化层例如为softmax层。For example, the first transformation model is a Transformer model. Since the first transformation model is used to determine the correlation between internal points of the target, it can be called a Local Transformer. The first transformation model may include the encoder part in the Transformer. In addition, for example, the first position encoding layer, the second position encoding layer and the third position encoding layer are the first feedforward network (FFN), the second feedforward network and the third feedforward network respectively. For each key point and For each corresponding relative point, input the difference between the coordinates of the key point and the coordinates of the relative point into the first feedforward network, the second feedforward network, and the third feedforward network respectively to determine the difference between the key point and the relative point. The first relative position code, the second relative position code and the third relative position code. The first normalization layer is, for example, a softmax layer.
例如,分别表示输入到Local Transformer的编码器中的N/2个关键点的特征向量和位置坐标(xi为向量)。在Local Transformer中,对于任一关键点xi,选取距其最近的K个关键点作为其对应的局部区域,并将这些点输入到一个Local Transformer模块中对属于同一局部区域的所有关键点建模:
For example, and Respectively represent the feature vectors and position coordinates of N/2 key points input to the encoder of Local Transformer ( xi is a vector). In Local Transformer, for any key point x i , select the K key points closest to it as its corresponding local area, and input these points into a Local Transformer module to construct a model for all key points belonging to the same local area. mold:
公式(1)中为Local Transformer的编码器的第一自注意力模块的输出,编码器中第一自注意力模块之后还可以包括的FFN(前馈神经网络)。如果第一自注意力模块之后不包括FFN则可以表示第一特征编码,否则,经过FFN之后的输出表示第一特征编码;为第一自注意力模块中查询矩阵,为第一自注意力模块中键矩 阵,为第一自注意力模块中值矩阵,C为关键点的特征信息的维度, 分别表示第一位置编码层、第二位置编码层和第三位置编码层对应的函数。PE(xi,xj)表示xi,xj的相对位置编码,可以采用以下公式表示:
PE(xi,xj)=FFN(xi-xj)=(xi-xj)WPE+bPE    (2)
In formula (1) It is the output of the first self-attention module of the Local Transformer encoder. The encoder can also include a FFN (feedforward neural network) after the first self-attention module. If FFN is not included after the first self-attention module then can represent the first feature encoding, otherwise, The output after FFN represents the first feature encoding; is the query matrix in the first self-attention module, is the key moment in the first self-attention module array, is the first self-attention module median matrix, C is the dimension of the feature information of the key point, Respectively represent the functions corresponding to the first position coding layer, the second position coding layer and the third position coding layer. PE( xi ,x j ) represents the relative position encoding of x i ,x j , which can be expressed by the following formula:
PE(x i ,x j )=FFN(x i -x j )=(x i -x j )W PE +b PE (2)
WPE和bPE表示FFN(前馈网络或特征传播层)的参数,对应的WPE和bPE不同。W PE and b PE represent the parameters of FFN (feedforward network or feature propagation layer), The corresponding W PE and b PE are different.
可以在编码器中应用多头注意力机制,每个注意力头都可以参考上述公式(1)和(2)确定关键点的编码,将各个注意力头的编码进行拼接后与预设矩阵相乘(或者乘积进一步经过FFN),得到关键点的第一特征编码。每个注意力头中查询矩阵、键矩阵、值矩阵以及第一位置编码层、第二位置编码层和第三位置编码层的参数不同。A multi-head attention mechanism can be applied in the encoder. Each attention head can refer to the above formulas (1) and (2) to determine the encoding of key points. The encoding of each attention head is spliced and multiplied by the preset matrix. (Or the product is further passed through FFN) to obtain the first feature encoding of the key point. The parameters of the query matrix, key matrix, value matrix, and first position encoding layer, second position encoding layer, and third position encoding layer in each attention head are different.
Local Transformer输出的第一特征编码中包含了关键点所在的局部区域的上下文信息,即目标内部点和点之间的关联性。The first feature encoding output by Local Transformer contains the contextual information of the local area where the key points are located, that is, the correlation between internal points of the target and points.
在步骤S106中,对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点。In step S106, each key point is classified, and the point classified as the target center is determined as the reference center point.
在一些实施例中,针对每个关键点,将该关键点的第一特征编码输入分类网络,得到该关键点的分类结果;根据分类结果确定该关键点是否为目标中心的点。In some embodiments, for each key point, the first feature code of the key point is input into the classification network to obtain a classification result of the key point; it is determined whether the key point is a point at the center of the target according to the classification result.
在一些实施例中,分类网络是利用带有标注信息的各个关键点的位置信息作为训练数据训练得到的,其中,针对每个关键点,在该关键点位于一个目标的边界框内且属于距离目标中心最近的点的情况下,该关键点的标注信息为目标中心的点。In some embodiments, the classification network is trained by using the position information of each key point with annotation information as training data, wherein, for each key point, the key point is located within the bounding box of an object and belongs to the distance In the case of the point closest to the target center, the label information of the key point is the point at the target center.
第一转换模型输出的关键点为密集点,并不是每一个点都代表一个单独的目标(物体),为了降低最终检测结果的冗余度,将所有的关键点进行筛选,仅保留位于目标中心的关键点。因此,需要判断每一个关键点是否为真实的目标中心。在训练过程中,为每一个关键点分配一个标签,如果一个关键点位于一个目标的边界框内且是距离目标中心最近的点,就为其分配一个正标签,否则为其分配一个负标签。根据关键点的标签训练一个二分类网络。在测试过程中,将所有的关键点都输入到二分类网络中,并且只保留分类结果为正的关键点作为参考中心点。The key points output by the first conversion model are dense points. Not every point represents a separate target (object). In order to reduce the redundancy of the final detection result, all key points are filtered and only those located at the center of the target are retained. key points. Therefore, it is necessary to judge whether each key point is the real target center. During the training process, each keypoint is assigned a label. If a keypoint is located within the bounding box of an object and is the point closest to the center of the object, it is assigned a positive label, otherwise it is assigned a negative label. Train a binary classification network based on the labels of key points. During the testing process, all key points are input into the binary classification network, and only key points with positive classification results are retained as reference center points.
在步骤S108中,针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码。In step S108, for each reference center point, the first feature code of the reference center point is encoded according to the correlation between the reference center point and other reference center points to obtain the second feature of the reference center point. coding.
在一些实施例中,针对每个参考中心点,根据该参考中心点的第一特征编码,其 他参考中心点的第一特征编码,以及该参考中心点与其他参考中心点之间相对位置关系,基于自注意力机制确定该参考中心点的第二特征编码。In some embodiments, for each reference center point, according to the first feature encoding of the reference center point, the He refers to the first feature code of the center point and the relative positional relationship between the reference center point and other reference center points, and determines the second feature code of the reference center point based on the self-attention mechanism.
进一步,在一些实施例中,针对每个参考中心点,将该参考中心点的第一特征编码和位置信息,其他参考中心点的第一特征编码和位置信息输入第二转换模型中编码器的第二自注意力模块;在第二自注意力模块中,将其他参考中心点的作为相对中心点,针对每个相对中心点,将该参考中心点的位置信息与该相对中心点的位置信息分别输入第四位置编码层、第五位置编码层和第六位置编码层,确定该参考中心点与该相对中心点的第四相对位置编码,第五相对位置编码和第六相对位置编码;根据该相对中心点的特征信息分别与第二自注意力模块中键矩阵和值矩阵的乘积,确定该相对中心点的键向量和值向量;根据该参考中心点的特征信息与第二自注意力模块中查询矩阵的乘积,确定该参考中心点的查询向量;根据该参考中心点与每个相对中心点的第四相对位置编码、第五相对位置编码和第六相对位置编码、每个相对中心点的键向量、每个相对中心点的值向量和该参考中心点的查询向量,确定该参考中心点的第二特征编码。Further, in some embodiments, for each reference center point, the first feature code and position information of the reference center point, and the first feature codes and position information of other reference center points are input into the encoder in the second conversion model. The second self-attention module; in the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position information of the reference center point and the position information of the relative center point are The fourth position coding layer, the fifth position coding layer and the sixth position coding layer are respectively input to determine the fourth relative position coding, the fifth relative position coding and the sixth relative position coding of the reference center point and the relative center point; according to The key vector and value vector of the relative center point are determined by multiplying the feature information of the relative center point with the key matrix and value matrix in the second self-attention module respectively; according to the feature information of the reference center point and the second self-attention module The product of the query matrix in the module determines the query vector of the reference center point; according to the fourth relative position code, the fifth relative position code, and the sixth relative position code of the reference center point and each relative center point, each relative center point The key vector of the point, the value vector of each relative center point and the query vector of the reference center point determine the second feature code of the reference center point.
进一步,在一些实施例中,针对每个相对中心点,将该参考中心点与该相对中心点的第四位置编码与该参考中心点的查询向量的和,作为该参考中心点的修正查询向量;将该参考中心点与该相对中心点的第五位置编码与该相对中心点的键向量的和,作为该相对中心点的修正键向量;将该参考中心点与该相对中心点的第六位置编码与该相对中心点的值向量的和,作为该相对中心点的修正值向量;将该参考中心点的修正查询向量与该相对中心点的修正键向量的乘积与该参考中心点的第一特征编码的维度输入第二归一化层,得到该相对中心点的权重;根据各个相对中心点的权重对各个相对中心点的修正值向量进行加权求和,得到该参考中心点的第二特征编码。Further, in some embodiments, for each relative center point, the sum of the reference center point, the fourth position code of the relative center point, and the query vector of the reference center point is used as the modified query vector of the reference center point. ; The sum of the fifth position code of the reference center point and the relative center point and the key vector of the relative center point is used as the modified key vector of the relative center point; the sixth position code of the reference center point and the relative center point is The sum of the position code and the value vector of the relative center point is used as the correction value vector of the relative center point; the product of the correction query vector of the reference center point and the correction key vector of the relative center point is multiplied by the third value vector of the reference center point. The dimension of a feature code is input to the second normalization layer to obtain the weight of the relative center point; the correction value vector of each relative center point is weighted and summed according to the weight of each relative center point to obtain the second reference center point. Feature encoding.
例如,第二转换模型为Transformer模型,由于第二转换模型用于确定目标之间的关联性,可以将其称为Global Transformer。第二转换模型可以包括Transformer中的编码器和解码器部分。此外,第四相对位置编码,第五相对位置编码和第六相对位置编码分别为第四前馈网络、第五前馈网络和第六前馈网络,针对每个参考中心点和对应的每个相对中心点,将该参考中心点的坐标与该相对中心点的坐标的差分别输入第四前馈网络、第五前馈网络和第六前馈网络,确定该参考中心点与该相对中心点的第四相对位置编码,第五相对位置编码和第六相对位置编码。第二归一化层例如为softmax层。 For example, the second transformation model is a Transformer model. Since the second transformation model is used to determine the correlation between targets, it can be called a Global Transformer. The second transformation model may include the encoder and decoder parts in the Transformer. In addition, the fourth relative position coding, the fifth relative position coding and the sixth relative position coding are respectively the fourth feedforward network, the fifth feedforward network and the sixth feedforward network, for each reference center point and each corresponding Relative to the center point, input the difference between the coordinates of the reference center point and the coordinates of the relative center point into the fourth feedforward network, the fifth feedforward network and the sixth feedforward network respectively to determine the reference center point and the relative center point The fourth relative position code, the fifth relative position code and the sixth relative position code. The second normalization layer is, for example, a softmax layer.
例如,从关键点中筛选得到M个参考中心点,Global Transformer旨在学习这M个不同目标之间的关联性。具体地,将M个参考中心点(例如,M个参考中心点的特征集合表示为)输入到Global Transformer模块中对不同目标之间的关联性建模:
For example, M reference center points are obtained from key points, and Global Transformer aims to learn the correlation between these M different targets. Specifically, the feature set of M reference center points (for example, M reference center points) is expressed as ) is input into the Global Transformer module to model the correlation between different targets:
公式(3)中hi为Global Transformer的编码器的第二自注意力模块的输出,编码器中第二自注意力模块之后还可以包括的FFN。如果第二自注意力模块之后不包括FFN则hi可以表示第二特征编码,否则,hi经过FFN之后的输出表示第二特征编码;为第二自注意力模块中查询矩阵,为第二自注意力模块中键矩阵,为第二自注意力模块中值矩阵,C为关键点的特征信息的维度,分别表示第四位置编码层、第四位置编码层和第四位置编码层对应的函数。 的具体形式可以参考公式(2)。In formula (3), h i is the output of the second self-attention module of the encoder of Global Transformer. The encoder can also include FFN after the second self-attention module. If FFN is not included after the second self-attention module, h i can represent the second feature encoding. Otherwise, the output of h i after FFN represents the second feature encoding; is the query matrix in the second self-attention module, is the key matrix in the second self-attention module, is the second self-attention module median matrix, C is the dimension of the feature information of the key point, Respectively represent the functions corresponding to the fourth position coding layer, the fourth position coding layer and the fourth position coding layer. The specific form of can refer to formula (2).
Global Transformer的编码器中同样可以应用多头注意力机制,在此不再赘述。hi为输出的第i个参考中心的高层特征,其中既包含了目标内部点的关联性也包含了不同目标之间的关联性。The multi-head attention mechanism can also be applied to the encoder of Global Transformer, so I won’t go into details here. h i is the high-level feature of the i-th reference center of the output, which includes both the correlation of internal points of the target and the correlation between different targets.
在步骤S110中,根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别。In step S110, the location and category of each target in the point cloud data are predicted based on the second feature encoding of each reference center point.
在一些实施例中,将各个参考中心点的第二特征编码输入第二转换模型中解码器,得到各个参考中心点的特征向量;将各个参考中心点的特征向量输入目标检测网络,得到点云数据中各个目标的位置和类别。In some embodiments, the second feature encoding of each reference center point is input to the decoder in the second conversion model to obtain the feature vector of each reference center point; the feature vector of each reference center point is input to the target detection network to obtain the point cloud. The location and category of each target in the data.
目标检测网络例如为FFN。目标检测网络根据包含了目标内部点的关联性和不同目标之间的关联性的特征向量,确定点云数据中各个目标的位置和类别。The target detection network is, for example, FFN. The target detection network determines the location and category of each target in the point cloud data based on the feature vector that contains the correlation between internal points of the target and the correlation between different targets.
上述实施例中提取点云数据的关键点以及各个关键点的特征信息,针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,确定该关键点的第一特征编码,第一特征编码反映的是目标内部局部区域中的点和点之间的关联性,进一步,将关键点分为目标中心的点和非目标中心的点,目标中心的点作为参考中心点,针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,确定该参考中心点的第二特征编码,第二特征编码中在目标内部局部区域中的点和点之间的关联性之上增加了目标之间的关联性,进而根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别。上述实施例的方案不再对点云中所有点之间 的关联性建模,而是将点和点之间的关联性分为目标内的关联性和目标间的关联性,能同时捕获点云中的局部和全局依赖关系,适应点云数据的三维特征以及不规则性,提高了点云数据中目标检测的准确性,此外还能提高检测效率,节省计算成本。In the above embodiment, the key points of the point cloud data and the characteristic information of each key point are extracted. For each key point, the key point is determined based on the correlation between the key point and other key points within the preset range of the key point. The first feature encoding of the point. The first feature encoding reflects the correlation between points in the local area inside the target. Further, the key points are divided into points at the target center and points at the non-target center. The key points at the target center are point as a reference center point. For each reference center point, the second feature code of the reference center point is determined based on the correlation between the reference center point and other reference center points. In the second feature code, the local area inside the target is On top of the correlation between points in the method, the correlation between targets is added, and then the location and category of each target in the point cloud data are predicted based on the second feature encoding of each reference center point. The solution of the above embodiment no longer analyzes the distance between all points in the point cloud. correlation modeling, but divides the correlation between points into correlation within the target and correlation between targets, which can capture local and global dependencies in the point cloud at the same time and adapt to the three-dimensional structure of point cloud data. Features and irregularities improve the accuracy of target detection in point cloud data. In addition, it can also improve detection efficiency and save computing costs.
为了进一步提高对点云数据中目标检测的准确性,还对上述实施例的方案进行了改进。发明人针对点云数据的三维特征进一步挖掘,将点之间的几何结构特征引入编码过程,使得对点云数据的特征的学习更加准确,从而提高目标检测的准确性。下面描述具体实施例。In order to further improve the accuracy of target detection in point cloud data, the solutions of the above embodiments are also improved. The inventor further mined the three-dimensional features of point cloud data and introduced the geometric structure features between points into the encoding process, making the learning of the features of point cloud data more accurate, thus improving the accuracy of target detection. Specific embodiments are described below.
针对步骤S104,在一些实施例中,针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,该关键点与该关键点预设范围内的其他关键点之间相对位置关系,以及该关键点与该关键点预设范围内的其他关键点之间相对几何结构关系,基于自注意力机制确定该关键点的第一特征编码。Regarding step S104, in some embodiments, for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, the key point is within the preset range of the key point. The relative positional relationship between other key points, as well as the relative geometric structure relationship between the key point and other key points within the preset range of the key point, determine the first feature encoding of the key point based on the self-attention mechanism.
进一步,在一些实施例中,针对每个关键点,将该关键点的特征信息、位置信息和几何结构信息,该关键点预设范围内的其他关键点的特征信息、位置信息和几何结构信息输入第一转换模型中编码器的第一自注意力模块;在第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;将该关键点的几何结构信息与该相对点的几何结构信息输入几何结构编码层,确定该关键点与该相对点的相对几何结构权重;根据该相对点的特征信息分别与第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;根据该关键点的特征信息与第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码、第三相对位置编码和相对几何结构权重、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。Further, in some embodiments, for each key point, the characteristic information, position information and geometric structure information of the key point are combined with the characteristic information, position information and geometric structure information of other key points within the preset range of the key point. Input the first self-attention module of the encoder in the first conversion model; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the key point is The position information of the point and the position information of the relative point are input into the first position coding layer, the second position coding layer and the third position coding layer respectively, and the first relative position coding and the second relative position of the key point and the relative point are determined. Encoding and third relative position encoding; input the geometric structure information of the key point and the geometric structure information of the relative point into the geometric structure encoding layer to determine the relative geometric structure weight of the key point and the relative point; according to the characteristics of the relative point The key vector and value vector of the relative point are determined by multiplying the information with the key matrix and value matrix in the first self-attention module respectively; according to the product of the feature information of the key point and the query matrix in the first self-attention module, determine The query vector of the key point; according to the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, the key vector of each relative point, each The value vector of the relative point and the query vector of the key point are used to determine the first feature encoding of the key point.
进一步,在一些实施例中,针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;将该关键点的修正查询向量与该相对点的修正键向量的乘积,该关键点与该相对点的相对几何结构权重以及该关键点的特征信息的维度输入第一归一化 层,得到该相对点的权重;根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。Further, in some embodiments, for each relative point, the sum of the first relative position code of the key point and the relative point and the query vector of the key point is used as the modified query vector of the key point; The sum of the second relative position code of the point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the third relative position code of the key point and the relative point is combined with the value of the relative point The sum of vectors is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point, the relative geometric structure weight of the key point and the relative point, and the characteristics of the key point Dimensional input of information first normalization layer, the weight of the relative point is obtained; the correction value vector of each relative point is weighted and summed according to the weight of each relative point, and the first feature code of the key point is obtained.
进一步,在一些实施例中,将该关键点的修正查询向量与该相对点的修正键向量的乘积除以该关键点的特征信息的维度的平方根,再与该关键点与该相对点的相对几何结构权重叠加,得到的结果输入第一归一化层,得到该相对点的权重。Further, in some embodiments, the product of the modified query vector of the key point and the modified key vector of the relative point is divided by the square root of the dimension of the feature information of the key point, and then divided by the relative value of the key point and the relative point. The geometric structure weights are added, and the result is input into the first normalization layer to obtain the weight of the relative point.
例如,几何结构信息包括:所在局部平面的法向量、所在局部平面的曲率半径中至少一项。在一些实施例中,根据该关键点与该相对点的距离,该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积,该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差,以及该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角中至少一项,确定该关键点与该相对点的相对几何结构权重。For example, the geometric structure information includes: at least one of the normal vector of the local plane and the curvature radius of the local plane. In some embodiments, according to the distance between the key point and the relative point, the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located, the curvature radius of the local plane where the key point is located The difference between the radius of curvature of the local plane where the relative point is located, and at least one of the angles between the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located, determines the key point and the relative point. relative geometric structure weight.
进一步,在一些实施例中,该关键点与该相对点的相对几何结构权重,随该关键点与该相对点的距离的增大而减小;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积的增大而增大;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差的增大而增大;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角经过特征传播层之后的结果的增大而增大。Further, in some embodiments, the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases; the relative geometric structure weight of the key point and the relative point , increases with the increase of the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located; the relative geometric structure weight of the key point and the relative point increases with the increase of the normal vector of the local plane where the key point is located. The difference between the curvature radius of the local plane and the curvature radius of the local plane where the relative point is located increases; the relative geometric structure weight of the key point and the relative point increases with the normal vector of the local plane where the key point is located and the relative The angle between the normal vectors of the local plane where the point is located increases as the result after the feature propagation layer increases.
例如,第一转换模型为Transformer模型,将其称为Local Transformer。第一位置编码层、第二位置编码层和第三位置编码层可以分别为第一前馈网络(FFN)、第二前馈网络和第三前馈网络,第一归一化层可以为softmax层。For example, the first transformation model is the Transformer model, which is called Local Transformer. The first position coding layer, the second position coding layer and the third position coding layer may be a first feedforward network (FFN), a second feedforward network and a third feedforward network respectively, and the first normalization layer may be softmax. layer.
可以将关键点与该关键点预设范围内的其他关键点之间相对几何结构关系加入注意力机制中,具体的可以将公式(1)进行改进:
The relative geometric structure relationship between the key point and other key points within the preset range of the key point can be added to the attention mechanism. Specifically, formula (1) can be improved:
公式(4)中Gi,j表示关键点i和j的相对几何结构权重,具体可以采用以下公式确定:
In formula (4), G i, j represents the relative geometric structure weight of key points i and j, which can be determined using the following formula:
公式(5)中,ni,nj分别表示关键点i和j所在局部平面的法向量,ci,cj分别表示关键点i和j所在局部平面的曲率半径,表示关键点i所在局部平面的法 向量和关键点j所在局部平面的法向量的夹角,β1,β2和β3为几何结构编码层的参数,FFN为前馈神经网络或特征传播层。Gi,j是一个高斯函数模型,通过局部平面法向量、局部曲率半径、法向量夹角这些几何参数来计算两个点之间的关联性强弱,如果两个点之间的关联性越强,则其对应的高斯权重Gi,j也就越大。In formula (5), n i and n j respectively represent the normal vectors of the local plane where the key points i and j are located, c i and c j respectively represent the curvature radius of the local plane where the key points i and j are located, The method representing the local plane where the key point i is located The angle between the vector and the normal vector of the local plane where the key point j is located, β 1 , β 2 and β 3 are the parameters of the geometric structure encoding layer, and FFN is the feedforward neural network or feature propagation layer. G i,j is a Gaussian function model. The correlation between two points is calculated through geometric parameters such as local plane normal vector, local curvature radius, and normal vector angle. If the correlation between two points is stronger, Stronger, the corresponding Gaussian weight G i,j will be larger.
在一些实施例中,针对一个关键点,找到邻域中和其距离最近的N个点,然后使用最小二乘法找一个平面,使得这N个点投影到这个平面上的距离之和最小,此平面即为局部平面。In some embodiments, for a key point, find the N points in the neighborhood that are closest to it, and then use the least squares method to find a plane so that the sum of the distances projected by these N points onto this plane is the smallest, so The plane is the local plane.
上述实施例的方法,加入相对几何结构权重,用于表达点和点之间的几何结构关系,将局部平面法向量、局部曲率半径、法向量夹角等物体几何特征融入到自注意力机制中,设计出专门用于处理点云数据的高效的特征提取模型和目标检测模型。The method of the above embodiment adds relative geometric structure weight to express the geometric structure relationship between points, and integrates object geometric features such as local plane normal vector, local curvature radius, and normal vector angle into the self-attention mechanism. , design an efficient feature extraction model and target detection model specifically for processing point cloud data.
下面结合图2描述本公开的一些应用例。Some application examples of the present disclosure are described below with reference to FIG. 2 .
如图2所示,将点云数据输入点云特征提取网络(Point Cloud Backbone,点云主干网络),得到关键点的特征信息(Point Feature),进而将关键点的特征信息输入Local-Global Transformer模型。在Local-Global Transformer模型中,每个关键点和该关键点局部区域内的其他关键点的特征信息和位置信息输入Local Transformer模块,并将各个关键点的几何结构信息输入Local Transformer模块,得到第一特征编码。将各个关键点通过分类网络(例如包括Sampling/Pooling模块)选择出参考中心点,将各个参考中心点的第一特征编码,位置信息输入Global Transformer模块,得到第二特征编码,将第二特征编码输入FFN,得到目标的边界框和类别。As shown in Figure 2, input the point cloud data into the point cloud feature extraction network (Point Cloud Backbone, point cloud backbone network) to obtain the feature information of the key points (Point Feature), and then input the feature information of the key points into the Local-Global Transformer Model. In the Local-Global Transformer model, the feature information and location information of each key point and other key points in the local area of the key point are input into the Local Transformer module, and the geometric structure information of each key point is input into the Local Transformer module to obtain the third A feature encoding. Select the reference center point for each key point through the classification network (for example, including the Sampling/Pooling module), input the first feature encoding and position information of each reference center point into the Global Transformer module to obtain the second feature encoding, and encode the second feature Enter FFN to get the bounding box and category of the object.
上述实施例的方案提出了一种基于Transformer模型的端到端的3D点云目标检测网络,可以称为3DTrans,它以一个3D点云作为输入并输出一组带标签的3D边界框用以表征目标(物体)所在的位置。3DTrans检测网络的整体结构如图2所示,共包含两个主要组成部分:特征提取网络和Local-Global Transformer。给定具有XYZ坐标的N个点的点云作为输入,特征提取网络对点云进行降采样并学习每一个点的深度特征,从而输出一个点的子集,并且子集中每一个点均由一个C维特征表示,将这些点视为关键点。Local-Global Transformer将这些关键点的特征作为输入,输出最终的目标检测结果。The solution of the above embodiment proposes an end-to-end 3D point cloud target detection network based on the Transformer model, which can be called 3DTrans. It takes a 3D point cloud as input and outputs a set of labeled 3D bounding boxes to represent the target. The location of the (object). The overall structure of the 3DTrans detection network is shown in Figure 2, which consists of two main components: feature extraction network and Local-Global Transformer. Given a point cloud of N points with XYZ coordinates as input, the feature extraction network downsamples the point cloud and learns the deep features of each point, thereby outputting a subset of points, and each point in the subset is represented by a C-dimensional feature representation, consider these points as key points. Local-Global Transformer takes the features of these key points as input and outputs the final target detection result.
从两个方面对传统的Transformer模型做了改进,使其更适用于处理3D点云数据。一方面,不直接对所有关键点之间的关联性建模,而是将点和点之间的关联性分 为物体内的关联性和物体间的关联性。具体来说,Local Transformer模块用于学习同一物体内部局部区域中的点和点之间的关联性,Global Transformer模块用于学习不同物体之间的关联性,通过将两个模块串联在一起,Local-Global Transformer模型在减少了计算成本的同时,还能同时捕获点云中的局部和全局依赖关系,从而提高模型的学习表达能力。另一方面,在传统的Transformer模型的基础上加入物体几何结构信息,将局部平面法向量、局部曲率半径、法向量夹角等物体几何特征融入到自注意力机制中,从而设计出专门用于处理点云数据的高效的Transformer模型。The traditional Transformer model has been improved in two aspects, making it more suitable for processing 3D point cloud data. On the one hand, instead of directly modeling the correlation between all key points, the correlation between points is divided into for the correlation within objects and the correlation between objects. Specifically, the Local Transformer module is used to learn the correlation between points and points in the local area inside the same object, and the Global Transformer module is used to learn the correlation between different objects. By connecting the two modules together, Local -The Global Transformer model not only reduces the computational cost, but also captures local and global dependencies in the point cloud, thereby improving the model's learning expression ability. On the other hand, object geometric structure information is added to the traditional Transformer model, and object geometric features such as local plane normal vector, local curvature radius, and normal vector angle are integrated into the self-attention mechanism, thereby designing a dedicated An efficient Transformer model for processing point cloud data.
本公开的方法不需要大量手工设计组件,不需要大量的先验知识,也不需要筛除冗余的候选框进行大量的后处理操作,模型简洁并且可以进行端到端的训练,计算成本低,处理效率高并且准确率高。The disclosed method does not require a large number of manual design components, does not require a large amount of prior knowledge, and does not require screening out redundant candidate frames for a large number of post-processing operations. The model is simple and can be trained end-to-end, and the calculation cost is low. The processing efficiency is high and the accuracy is high.
本公开的模型可以进行端到端的训练,将点云数据图像进行标注,标注各个目标的边界框和类别,作为训练样本。将训练样本输入点云特征提取网络,得到输出的点云数据中的多个关键点,以及各个关键点的特征信息;将各个关键点的特征信息和位置信息输入第一转换模型,针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码;将各个关键点的第一特征编码输入分类网络,对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点;将各个参考中心点的特征信息和位置信息输入第二转换模型,针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码;将各个参考中心点的第二特征编码输入第二转换模型中解码器,得到各个参考中心点的特征向量;将各个参考中心点的特征向量输入目标检测网络,得到点云数据中各个目标的位置和类别,根据得到的点云数据中各个目标的位置和类别与标注的各个目标的边界框和类别的差异,对点云特征提取网络、第一转换模型、分类网络、第二转换模型、目标检测网络进行训练。可以对点云特征提取网络、分类网络进行预训练。具体细节可以参考前述实施例,在此不再赘述。The disclosed model can be trained end-to-end, labeling point cloud data images and labeling the bounding boxes and categories of each target as training samples. Input the training samples into the point cloud feature extraction network to obtain multiple key points in the output point cloud data, as well as the characteristic information of each key point; input the characteristic information and position information of each key point into the first conversion model, and for each Key point, according to the correlation between the key point and other key points within the preset range of the key point, the characteristic information of the key point is encoded to obtain the first characteristic encoding of the key point; the key points of each key point are The first feature encoding is input into the classification network to classify each key point and determine the point classified as the target center as the reference center point; input the feature information and position information of each reference center point into the second conversion model, and for each reference center point, according to the correlation between the reference center point and other reference center points, encode the first feature code of the reference center point to obtain the second feature code of the reference center point; convert the second feature code of each reference center point The feature encoding is input to the decoder in the second conversion model to obtain the feature vector of each reference center point; the feature vector of each reference center point is input to the target detection network to obtain the position and category of each target in the point cloud data. According to the obtained point cloud The difference between the position and category of each target in the data and the bounding box and category of each annotated target is used to train the point cloud feature extraction network, the first conversion model, the classification network, the second conversion model, and the target detection network. The point cloud feature extraction network and classification network can be pre-trained. For specific details, reference may be made to the foregoing embodiments and will not be described again here.
本公开还提出一种点云数据中目标的检测装置,下面结合图3进行描述。The present disclosure also proposes a device for detecting targets in point cloud data, which will be described below with reference to Figure 3 .
图3为本公开点云数据中目标的检测装置的一些实施例的结构图。如图3所示,该实施例的装置30包括:特征提取模块310,第一编码模块320,分类模块330,第二编码模块340,目标检测模块350。Figure 3 is a structural diagram of some embodiments of a device for detecting objects in point cloud data of the present disclosure. As shown in FIG. 3 , the device 30 of this embodiment includes: a feature extraction module 310 , a first encoding module 320 , a classification module 330 , a second encoding module 340 , and a target detection module 350 .
特征提取模块310,用于将点云数据输入点云特征提取网络,得到输出的点云数 据中的多个关键点,以及各个关键点的特征信息。Feature extraction module 310 is used to input point cloud data into the point cloud feature extraction network to obtain the output point cloud number. Multiple key points in the data, as well as the characteristic information of each key point.
第一编码模块320,用于针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码。The first encoding module 320 is configured to encode, for each key point, the characteristic information of the key point according to the correlation between the key point and other key points within the preset range of the key point, to obtain the key point. The first characteristic encoding.
分类模块330,用于对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点。The classification module 330 is used to classify each key point and determine the point classified as the target center as the reference center point.
第二编码模块340,用于针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码。The second encoding module 340 is configured to encode, for each reference center point, the first feature code of the reference center point according to the correlation between the reference center point and other reference center points, to obtain the first feature code of the reference center point. Second feature encoding.
目标检测模块350,用于根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别。The target detection module 350 is used to predict the location and category of each target in the point cloud data based on the second feature code of each reference center point.
在一些实施例中,第一编码模块320用于针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,以及该关键点与该关键点预设范围内的其他关键点之间相对位置关系,基于自注意力机制确定该关键点的第一特征编码。In some embodiments, the first encoding module 320 is used for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relationship between the key point and the key point. The relative positional relationship between other key points within the preset range is determined based on the self-attention mechanism to determine the first feature encoding of the key point.
在一些实施例中,第一编码模块320用于针对每个关键点,将该关键点的特征信息和位置信息,该关键点预设范围内的其他关键点的特征信息和位置信息输入第一转换模型中编码器的第一自注意力模块;在第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;根据该相对点的特征信息分别与第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;根据该关键点的特征信息与第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码和第三相对位置编码、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。In some embodiments, the first encoding module 320 is configured to, for each key point, input the characteristic information and position information of the key point, and the characteristic information and position information of other key points within the preset range of the key point into the first The first self-attention module of the encoder in the conversion model; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position of the key point is The information and the position information of the relative point are respectively input into the first position coding layer, the second position coding layer and the third position coding layer to determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point. Three relative position encodings; determine the key vector and value vector of the relative point based on the product of the characteristic information of the relative point and the key matrix and value matrix in the first self-attention module respectively; determine the key vector and value vector of the relative point based on the characteristic information of the key point and the first self-attention module The product of the query matrix in the self-attention module determines the query vector of the key point; according to the first relative position code, the second relative position code, and the third relative position code of the key point and each relative point, each relative point The key vector, the value vector of each relative point and the query vector of the key point determine the first feature encoding of the key point.
在一些实施例中,第一编码模块320用于针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和, 作为该相对点的修正值向量;将该关键点的修正查询向量与该相对点的修正键向量的乘积,与该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重;根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。In some embodiments, the first encoding module 320 is configured to, for each relative point, encode the sum of the key point and the first relative position encoding of the relative point and the query vector of the key point as a modified query of the key point Vector; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the sum of the third relative position code of the key point and the relative point and The sum of the value vectors of this relative point, As the correction value vector of the relative point; input the product of the correction query vector of the key point and the correction key vector of the relative point and the dimension of the feature information of the key point into the first normalization layer to obtain the relative point Weight; perform a weighted summation of the correction value vectors of each relative point based on the weight of each relative point to obtain the first feature code of the key point.
在一些实施例中,第一位置编码层、第二位置编码层和第三位置编码层分别为第一前馈网络、第二前馈网络和第三前馈网络,第一编码模块320用于将该关键点的坐标与该相对点的坐标的差分别输入第一前馈网络、第二前馈网络、第三前馈网络。In some embodiments, the first position encoding layer, the second position encoding layer and the third position encoding layer are respectively a first feedforward network, a second feedforward network and a third feedforward network, and the first encoding module 320 is used to The difference between the coordinates of the key point and the coordinates of the relative point is input into the first feedforward network, the second feedforward network, and the third feedforward network respectively.
在一些实施例中,第一编码模块320用于针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,该关键点与该关键点预设范围内的其他关键点之间相对位置关系,以及该关键点与该关键点预设范围内的其他关键点之间相对几何结构关系,基于自注意力机制确定该关键点的第一特征编码。In some embodiments, the first encoding module 320 is configured to, for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, the key point is preset with the key point. Assume the relative positional relationship between other key points within the range, and the relative geometric structure relationship between the key point and other key points within the preset range of the key point, and determine the first feature encoding of the key point based on the self-attention mechanism. .
在一些实施例中,第一编码模块320用于针对每个关键点,将该关键点的特征信息、位置信息和几何结构信息,该关键点预设范围内的其他关键点的特征信息、位置信息和几何结构信息输入第一转换模型中编码器的第一自注意力模块;在第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;将该关键点的几何结构信息与该相对点的几何结构信息输入几何结构编码层,确定该关键点与该相对点的相对几何结构权重;根据该相对点的特征信息分别与第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;根据该关键点的特征信息与第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;根据该关键点与每个相对点的第一相对位置编码,第二相对位置编码、第三相对位置编码和相对几何结构权重、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。In some embodiments, the first encoding module 320 is configured to, for each key point, combine the feature information, location information and geometric structure information of the key point, and the feature information, location information of other key points within the preset range of the key point. Information and geometric structure information are input into the first self-attention module of the encoder in the first conversion model; in the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, input the position information of the key point and the position information of the relative point into the first position encoding layer, the second position encoding layer and the third position encoding layer respectively, and determine the first relative position encoding of the key point and the relative point. , the second relative position coding and the third relative position coding; input the geometric structure information of the key point and the geometric structure information of the relative point into the geometric structure coding layer to determine the relative geometric structure weight of the key point and the relative point; according to The key vector and value vector of the relative point are determined by multiplying the characteristic information of the relative point with the key matrix and the value matrix in the first self-attention module respectively; according to the characteristic information of the key point and the query in the first self-attention module The product of the matrix determines the query vector of the key point; according to the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, the weight of each relative point The key vector, the value vector of each relative point and the query vector of the key point determine the first feature encoding of the key point.
在一些实施例中,第一编码模块320用于针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;将该关键点的修正查询向量与该相对点的修正键向量的乘积,该关键点与该相对点的相对几何结构权重以及该关键点的特征信息的维度输入 第一归一化层,得到该相对点的权重;根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。In some embodiments, the first encoding module 320 is configured to, for each relative point, encode the sum of the key point and the first relative position encoding of the relative point and the query vector of the key point as a modified query of the key point Vector; the sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point; the sum of the third relative position code of the key point and the relative point and The sum of the value vectors of the relative point is used as the correction value vector of the relative point; the product of the correction query vector of the key point and the correction key vector of the relative point, the relative geometric structure weight of the key point and the relative point, and Dimension input of the feature information of the key point In the first normalization layer, the weight of the relative point is obtained; according to the weight of each relative point, the correction value vector of each relative point is weighted and summed to obtain the first feature code of the key point.
在一些实施例中,第一编码模块320用于将该关键点的修正查询向量与该相对点的修正键向量的乘积除以该关键点的特征信息的维度的平方根,再与该关键点与该相对点的相对几何结构权重叠加,得到的结果输入第一归一化层,得到该相对点的权重。In some embodiments, the first encoding module 320 is used to divide the product of the modified query vector of the key point and the modified key vector of the relative point by the square root of the dimension of the feature information of the key point, and then divide the product of the modified query vector of the key point and the key point with The relative geometric structure weights of the relative points are added up, and the result is input into the first normalization layer to obtain the weight of the relative point.
在一些实施例中,几何结构信息包括:所在局部平面的法向量、所在局部平面的曲率半径中至少一项,第一编码模块320用于根据该关键点与该相对点的距离,该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积,该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差,以及该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角中至少一项,确定该关键点与该相对点的相对几何结构权重。In some embodiments, the geometric structure information includes: at least one of the normal vector of the local plane and the curvature radius of the local plane. The first encoding module 320 is used to determine the key point based on the distance between the key point and the relative point. The dot product of the normal vector of the local plane and the normal vector of the local plane where the relative point is located, the difference between the curvature radius of the local plane where the key point is located and the curvature radius of the local plane where the relative point is located, and the local plane where the key point is located At least one of the angles between the normal vector of the key point and the normal vector of the local plane where the relative point is located determines the relative geometric structure weight of the key point and the relative point.
在一些实施例中,该关键点与该相对点的相对几何结构权重,随该关键点与该相对点的距离的增大而减小;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积的增大而增大该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差的增大而增大;该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角的增大而增大。In some embodiments, the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases; the relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases. The increase in the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located increases the relative geometric structure weight of the key point and the relative point. The difference between the curvature radius and the curvature radius of the local plane where the relative point is located increases; the relative geometric structure weight of the key point and the relative point increases with the normal vector of the local plane where the key point is located and the location of the relative point. The angle between the normal vectors of the local plane increases.
在一些实施例中,第二编码模块340用于针对每个参考中心点,根据该参考中心点的第一特征编码,其他参考中心点的第一特征编码,以及该参考中心点与其他参考中心点之间相对位置关系,基于自注意力机制确定该参考中心点的第二特征编码。In some embodiments, the second encoding module 340 is configured to, for each reference center point, encode the first feature according to the reference center point, the first feature encoding of other reference center points, and the difference between the reference center point and other reference centers. The relative positional relationship between points determines the second feature encoding of the reference center point based on the self-attention mechanism.
在一些实施例中,第二编码模块340用于针对每个参考中心点,将该参考中心点的第一特征编码和位置信息,其他参考中心点的第一特征编码和位置信息输入第二转换模型中编码器的第二自注意力模块;在第二自注意力模块中,将其他参考中心点的作为相对中心点,针对每个相对中心点,将该参考中心点的位置信息与该相对中心点的位置信息分别输入第四位置编码层、第五位置编码层和第六位置编码层,确定该参考中心点与该相对中心点的第四相对位置编码,第五相对位置编码和第六相对位置编码;根据该相对中心点的特征信息分别与第二自注意力模块中键矩阵和值矩阵的乘积,确定该相对中心点的键向量和值向量;根据该参考中心点的特征信息与第二自注意力模块中查询矩阵的乘积,确定该参考中心点的查询向量;根据该参考中心点与每个相 对中心点的第四相对位置编码、第五相对位置编码和第六相对位置编码、每个相对中心点的键向量、每个相对中心点的值向量和该参考中心点的查询向量,确定该参考中心点的第二特征编码。In some embodiments, the second encoding module 340 is configured to, for each reference center point, input the first feature code and position information of the reference center point, and the first feature codes and position information of other reference center points into the second transformation. The second self-attention module of the encoder in the model; in the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position information of the reference center point is compared with the relative center point. The position information of the center point is input into the fourth position coding layer, the fifth position coding layer and the sixth position coding layer respectively, and the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and the relative center point are determined. Relative position encoding; determine the key vector and value vector of the relative center point based on the product of the feature information of the relative center point and the key matrix and value matrix in the second self-attention module respectively; determine the key vector and value vector of the relative center point; based on the feature information of the reference center point and The product of the query matrix in the second self-attention module determines the query vector of the reference center point; according to the reference center point and each phase Determine the fourth relative position code, fifth relative position code and sixth relative position code of the center point, the key vector of each relative center point, the value vector of each relative center point and the query vector of the reference center point. Second feature encoding of the reference center point.
在一些实施例中,第二编码模块340用于针对每个相对中心点,将该参考中心点与该相对中心点的第四位置编码与该参考中心点的查询向量的和,作为该参考中心点的修正查询向量;将该参考中心点与该相对中心点的第五位置编码与该相对中心点的键向量的和,作为该相对中心点的修正键向量;将该参考中心点与该相对中心点的第六位置编码与该相对中心点的值向量的和,作为该相对中心点的修正值向量;将该参考中心点的修正查询向量与该相对中心点的修正键向量的乘积与该参考中心点的第一特征编码的维度输入第二归一化层,得到该相对中心点的权重;根据各个相对中心点的权重对各个相对中心点的修正值向量进行加权求和,得到该参考中心点的第二特征编码。In some embodiments, the second encoding module 340 is configured to, for each relative center point, encode the sum of the reference center point and the fourth position encoding of the relative center point and the query vector of the reference center point as the reference center. The modified query vector of the point; the sum of the fifth position code of the reference center point and the relative center point and the key vector of the relative center point is used as the modified key vector of the relative center point; the reference center point and the relative center point are The sum of the sixth position code of the center point and the value vector of the relative center point is used as the correction value vector of the relative center point; the product of the correction query vector of the reference center point and the correction key vector of the relative center point is multiplied by the The dimension of the first feature encoding of the reference center point is input into the second normalization layer to obtain the weight of the relative center point; the correction value vector of each relative center point is weighted and summed according to the weight of each relative center point to obtain the reference Second feature encoding of the center point.
在一些实施例中,第四相对位置编码,第五相对位置编码和第六相对位置编码分别为第四前馈网络、第五前馈网络和第六前馈网络,第二编码模块340用于将该参考中心点的坐标与该相对中心点的坐标的差分别输入第四前馈网络、第五前馈网络和第六前馈网络。In some embodiments, the fourth relative position encoding, the fifth relative position encoding and the sixth relative position encoding are respectively a fourth feedforward network, a fifth feedforward network and a sixth feedforward network, and the second encoding module 340 is used to The differences between the coordinates of the reference center point and the coordinates of the relative center point are respectively input into the fourth feedforward network, the fifth feedforward network and the sixth feedforward network.
在一些实施例中,分类模块330用于针对每个关键点,将该关键点的第一特征编码输入分类网络,得到该关键点的分类结果;根据分类结果确定该关键点是否为目标中心的点。In some embodiments, the classification module 330 is configured to input the first feature code of the key point into the classification network for each key point to obtain the classification result of the key point; determine whether the key point is the center of the target according to the classification result. point.
在一些实施例中,分类网络是利用带有标注信息的各个关键点的位置信息作为训练数据训练得到的,其中,针对每个关键点,在该关键点位于一个目标的边界框内且属于距离目标中心最近的点的情况下,该关键点的标注信息为目标中心的点。In some embodiments, the classification network is trained by using the position information of each key point with annotation information as training data, wherein, for each key point, the key point is located within the bounding box of an object and belongs to the distance In the case of the point closest to the target center, the label information of the key point is the point at the target center.
在一些实施例中,目标检测模块350用于根据各个参考中心点的第二特征编码,预测点云数据中各个目标的位置和类别包括:将各个参考中心点的第二特征编码输入第二转换模型中解码器,得到各个参考中心点的特征向量;将各个参考中心点的特征向量输入目标检测网络,得到点云数据中各个目标的位置和类别。In some embodiments, the target detection module 350 is configured to predict the location and category of each target in the point cloud data according to the second feature code of each reference center point, including: inputting the second feature code of each reference center point into the second transformation The decoder in the model obtains the feature vectors of each reference center point; inputs the feature vectors of each reference center point into the target detection network to obtain the location and category of each target in the point cloud data.
在一些实施例中,针对每个关键点,该关键点预设范围内的其他关键点采用以下方法确定:针对每个关键点,将其他关键点按照与该关键点的距离由小到大进行排序,按照排序由前到后的顺序选取预设个数的其他关键点,作为该关键点预设范围内的其他关键点。 In some embodiments, for each key point, other key points within the preset range of the key point are determined using the following method: for each key point, other key points are determined in ascending order of distance from the key point. Sort, select a preset number of other key points in order from front to back, as other key points within the preset range of the key point.
本公开的实施例中的点云数据中目标的检测装置可各由各种计算设备或计算机系统来实现,下面结合图4以及图5进行描述。The device for detecting objects in point cloud data in embodiments of the present disclosure can be implemented by various computing devices or computer systems, which will be described below with reference to FIG. 4 and FIG. 5 .
图4为本公开点云数据中目标的检测装置的一些实施例的结构图。如图4所示,该实施例的装置40包括:存储器410以及耦接至该存储器410的处理器420,处理器420被配置为基于存储在存储器410中的指令,执行本公开中任意一些实施例中的点云数据中目标的检测方法。Figure 4 is a structural diagram of some embodiments of a device for detecting objects in point cloud data of the present disclosure. As shown in Figure 4, the device 40 of this embodiment includes: a memory 410 and a processor 420 coupled to the memory 410. The processor 420 is configured to execute any implementation of the present disclosure based on instructions stored in the memory 410. Example of target detection method in point cloud data.
其中,存储器410例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。The memory 410 may include, for example, system memory, fixed non-volatile storage media, etc. System memory stores, for example, operating systems, applications, boot loaders, databases, and other programs.
图5为本公开点云数据中目标的检测装置的另一些实施例的结构图。如图5所示,该实施例的装置50包括:存储器510以及处理器520,分别与存储器410以及处理器420类似。还可以包括输入输出接口530、网络接口540、存储接口550等。这些接口530,540,550以及存储器510和处理器520之间例如可以通过总线560连接。其中,输入输出接口530为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口540为各种联网设备提供连接接口,例如可以连接到数据库服务器或者云端存储服务器等。存储接口550为SD卡、U盘等外置存储设备提供连接接口。Figure 5 is a structural diagram of another embodiment of a device for detecting objects in point cloud data of the present disclosure. As shown in Figure 5, the device 50 of this embodiment includes: a memory 510 and a processor 520, which are similar to the memory 410 and the processor 420 respectively. It may also include an input/output interface 530, a network interface 540, a storage interface 550, etc. These interfaces 530, 540, 550, the memory 510 and the processor 520 may be connected through a bus 560, for example. Among them, the input and output interface 530 provides a connection interface for input and output devices such as a monitor, mouse, keyboard, and touch screen. The network interface 540 provides a connection interface for various networked devices, such as a database server or a cloud storage server. The storage interface 550 provides a connection interface for external storage devices such as SD cards and USB disks.
本公开还提供一种物品分拣装置,下面结合图6进行描述。The present disclosure also provides an object sorting device, which will be described below in conjunction with FIG. 6 .
如图6所示,物品分拣装置6包括:前述任意实施例中的点云数据中目标的检测装置30/40/50,以及分拣部件62用于根据点云数据中目标的检测装置30/40/50输出的点云数据中各个目标的位置和类别,对目标对应的物品进行分拣。As shown in Figure 6, the item sorting device 6 includes: the detection device 30/40/50 of objects in point cloud data in any of the aforementioned embodiments, and the sorting component 62 is used according to the detection device 30 of objects in point cloud data. /40/50 The position and category of each target in the point cloud data output, and the items corresponding to the target are sorted.
在一些实施例中,装置6还包括:点云采集部件64,用于采集预设区域的点云数据,将点云数据发送至点云数据中目标的检测装置30/40/50。In some embodiments, the device 6 also includes: a point cloud collection component 64, used to collect point cloud data in a preset area, and send the point cloud data to the detection device 30/40/50 of the target in the point cloud data.
分拣部件例如为机械臂,点云采集部件例如为三维相机,The sorting component is, for example, a robotic arm, and the point cloud collection component is, for example, a three-dimensional camera.
本公开所提出的三维点云目标检测技术可应用于物流场景中基于视觉的分拣机械臂等产品,即利用架设在分拣机械臂上得三维相机所采集的点云数据可以精准定位识别每一个物品,从而帮助机械臂进行逐一的分拣。The three-dimensional point cloud target detection technology proposed in this disclosure can be applied to products such as vision-based sorting robotic arms in logistics scenarios. That is, the point cloud data collected by the three-dimensional camera installed on the sorting robotic arm can be used to accurately locate and identify each object. An item to help the robotic arm sort one by one.
本公开还提供一种计算机程序,包括:指令,所述指令被所述处理器执行时,使所述处理器执行如前述任意实施例的点云数据中目标的检测方法。The present disclosure also provides a computer program, including: instructions, which when executed by the processor, cause the processor to execute the method for detecting objects in point cloud data as in any of the foregoing embodiments.
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬 件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure may employ an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. The form of the embodiment in terms of parts. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk memory, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. .
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。 The above are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present disclosure shall be included in the protection of the present disclosure. within the range.

Claims (25)

  1. 一种点云数据中目标的检测方法,包括:A method for detecting targets in point cloud data, including:
    将点云数据输入点云特征提取网络,得到输出的所述点云数据中的多个关键点,以及各个关键点的特征信息;Input the point cloud data into the point cloud feature extraction network to obtain multiple key points in the output point cloud data and the feature information of each key point;
    针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码;For each key point, the characteristic information of the key point is encoded according to the correlation between the key point and other key points within the preset range of the key point, and the first characteristic encoding of the key point is obtained;
    对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点;Classify each key point and determine the point classified as the target center as the reference center point;
    针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码;For each reference center point, encode the first feature code of the reference center point according to the correlation between the reference center point and other reference center points to obtain the second feature code of the reference center point;
    根据各个参考中心点的第二特征编码,预测所述点云数据中各个目标的位置和类别。According to the second feature encoding of each reference center point, the location and category of each target in the point cloud data are predicted.
  2. 根据权利要求1所述的检测方法,其中,所述针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码包括:The detection method according to claim 1, wherein for each key point, the characteristic information of the key point is processed according to the correlation between the key point and other key points within the preset range of the key point. Encoding, obtaining the first feature encoding of the key point includes:
    针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,以及该关键点与该关键点预设范围内的其他关键点之间相对位置关系,基于自注意力机制确定该关键点的第一特征编码。For each key point, based on the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relative positional relationship between the key point and other key points within the preset range of the key point , determine the first feature encoding of the key point based on the self-attention mechanism.
  3. 根据权利要求2所述的检测方法,其中,所述针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,以及该关键点与该关键点预设范围内的其他关键点之间相对位置关系,基于自注意力机制确定该关键点的第一特征编码包括:The detection method according to claim 2, wherein for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relationship between the key point and the key point. The relative positional relationship between other key points within the preset range of the point, and the first feature encoding of the key point determined based on the self-attention mechanism includes:
    针对每个关键点,将该关键点的特征信息和位置信息,该关键点预设范围内的其他关键点的特征信息和位置信息输入第一转换模型中编码器的第一自注意力模块;For each key point, input the characteristic information and position information of the key point and the characteristic information and position information of other key points within the preset range of the key point into the first self-attention module of the encoder in the first conversion model;
    在所述第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;In the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point and the position information of the relative point are respectively input into the third a position coding layer, a second position coding layer and a third position coding layer to determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point;
    根据该相对点的特征信息分别与所述第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量; Determine the key vector and value vector of the relative point according to the product of the characteristic information of the relative point and the key matrix and value matrix in the first self-attention module respectively;
    根据该关键点的特征信息与所述第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;Determine the query vector of the key point based on the product of the characteristic information of the key point and the query matrix in the first self-attention module;
    根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码和第三相对位置编码、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。According to the first relative position code, the second relative position code and the third relative position code of the key point and each relative point, the key vector of each relative point, the value vector of each relative point and the query vector of the key point , determine the first feature code of the key point.
  4. 根据权利要求3所述的检测方法,其中,所述根据该关键点与每个相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码包括:The detection method according to claim 3, wherein the first relative position code, the second relative position code and the third relative position code according to the key point and each relative point, the key vector of each relative point, The value vector of each relative point and the query vector of the key point. Determining the first feature encoding of the key point includes:
    针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;For each relative point, the sum of the first relative position code of the key point and the relative point and the query vector of the key point is used as the modified query vector of the key point;
    将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量;The sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point;
    将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;The sum of the third relative position code of the key point and the relative point and the value vector of the relative point is used as the correction value vector of the relative point;
    将该关键点的修正查询向量与该相对点的修正键向量的乘积,与该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重;Enter the product of the modified query vector of the key point and the modified key vector of the relative point and the dimension of the feature information of the key point into the first normalization layer to obtain the weight of the relative point;
    根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。Perform a weighted summation of the correction value vectors of each relative point according to the weight of each relative point to obtain the first feature code of the key point.
  5. 根据权利要求3所述的检测方法,其中,所述第一位置编码层、第二位置编码层和第三位置编码层分别为第一前馈网络、第二前馈网络和第三前馈网络,所述将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层包括:The detection method according to claim 3, wherein the first position coding layer, the second position coding layer and the third position coding layer are a first feedforward network, a second feedforward network and a third feedforward network respectively. , the input of the position information of the key point and the position information of the relative point into the first position coding layer, the second position coding layer and the third position coding layer respectively includes:
    将该关键点的坐标与该相对点的坐标的差分别输入第一前馈网络、第二前馈网络、第三前馈网络。The difference between the coordinates of the key point and the coordinates of the relative point is input into the first feedforward network, the second feedforward network, and the third feedforward network respectively.
  6. 根据权利要求1所述的检测方法,其中,所述针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码包括:The detection method according to claim 1, wherein for each key point, the characteristic information of the key point is processed according to the correlation between the key point and other key points within the preset range of the key point. Encoding, obtaining the first feature encoding of the key point includes:
    针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,该关键点与该关键点预设范围内的其他关键点之间相对位置关系,以及该关键点与该关键点预设范围内的其他关键点之间相对几何结构关系,基于自注意力 机制确定该关键点的第一特征编码。For each key point, based on the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, and the relative positional relationship between the key point and other key points within the preset range of the key point, And the relative geometric structure relationship between the key point and other key points within the preset range of the key point, based on self-attention The mechanism determines the first feature encoding of the keypoint.
  7. 根据权利要求6所述的检测方法,其中,所述针对每个关键点,根据该关键点的特征信息,该关键点预设范围内的其他关键点的特征信息,该关键点与该关键点预设范围内的其他关键点之间相对位置关系,以及该关键点与该关键点预设范围内的其他关键点之间相对几何结构关系,基于自注意力机制确定该关键点的第一特征编码包括:The detection method according to claim 6, wherein for each key point, according to the characteristic information of the key point, the characteristic information of other key points within the preset range of the key point, the key point and the key point The relative positional relationship between other key points within the preset range, and the relative geometric structure relationship between the key point and other key points within the preset range of the key point, and the first feature of the key point is determined based on the self-attention mechanism Coding includes:
    针对每个关键点,将该关键点的特征信息、位置信息和几何结构信息,该关键点预设范围内的其他关键点的特征信息、位置信息和几何结构信息输入第一转换模型中编码器的第一自注意力模块;For each key point, input the characteristic information, position information and geometric structure information of the key point and the characteristic information, position information and geometric structure information of other key points within the preset range of the key point into the encoder in the first conversion model The first self-attention module;
    在所述第一自注意力模块中,将该关键点预设范围内的其他关键点作为相对点,针对每个相对点,将该关键点的位置信息与该相对点的位置信息分别输入第一位置编码层、第二位置编码层和第三位置编码层,确定该关键点与该相对点的第一相对位置编码,第二相对位置编码和第三相对位置编码;In the first self-attention module, other key points within the preset range of the key point are used as relative points, and for each relative point, the position information of the key point and the position information of the relative point are respectively input into the third a position coding layer, a second position coding layer and a third position coding layer to determine the first relative position coding, the second relative position coding and the third relative position coding of the key point and the relative point;
    将该关键点的几何结构信息与该相对点的几何结构信息输入几何结构编码层,确定该关键点与该相对点的相对几何结构权重;Input the geometric structure information of the key point and the geometric structure information of the relative point into the geometric structure coding layer, and determine the relative geometric structure weight of the key point and the relative point;
    根据该相对点的特征信息分别与所述第一自注意力模块中键矩阵和值矩阵的乘积,确定该相对点的键向量和值向量;Determine the key vector and value vector of the relative point according to the product of the characteristic information of the relative point and the key matrix and value matrix in the first self-attention module respectively;
    根据该关键点的特征信息与所述第一自注意力模块中查询矩阵的乘积,确定该关键点的查询向量;Determine the query vector of the key point based on the product of the feature information of the key point and the query matrix in the first self-attention module;
    根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码、第三相对位置编码和相对几何结构权重、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码。According to the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, the key vector of each relative point, the value vector of each relative point and the The query vector of the key point determines the first feature encoding of the key point.
  8. 根据权利要求7所述的检测方法,其中,所述根据该关键点与每个相对点的第一相对位置编码、第二相对位置编码、第三相对位置编码和相对几何结构权重、每个相对点的键向量、每个相对点的值向量和该关键点的查询向量,确定该关键点的第一特征编码包括:The detection method according to claim 7, wherein the first relative position code, the second relative position code, the third relative position code and the relative geometric structure weight of the key point and each relative point, each relative The key vector of the point, the value vector of each relative point and the query vector of the key point. Determining the first feature encoding of the key point includes:
    针对每个相对点,将该关键点与该相对点的第一相对位置编码与该关键点的查询向量的和,作为该关键点的修正查询向量;For each relative point, the sum of the first relative position code of the key point and the relative point and the query vector of the key point is used as the modified query vector of the key point;
    将该关键点与该相对点的第二相对位置编码与该相对点的键向量的和,作为该相对点的修正键向量; The sum of the second relative position code of the key point and the relative point and the key vector of the relative point is used as the modified key vector of the relative point;
    将该关键点与该相对点的第三相对位置编码与该相对点的值向量的和,作为该相对点的修正值向量;The sum of the third relative position code of the key point and the relative point and the value vector of the relative point is used as the correction value vector of the relative point;
    将该关键点的修正查询向量与该相对点的修正键向量的乘积,该关键点与该相对点的相对几何结构权重以及该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重;The product of the modified query vector of the key point and the modified key vector of the relative point, the relative geometric structure weight of the key point and the relative point, and the dimension of the feature information of the key point are input into the first normalization layer to obtain the The weight of relative points;
    根据各个相对点的权重对各个相对点的修正值向量进行加权求和,得到该关键点的第一特征编码。Perform a weighted summation of the correction value vectors of each relative point according to the weight of each relative point to obtain the first feature code of the key point.
  9. 根据权利要求7所述的检测方法,其中,所述将该关键点的修正查询向量与该相对点的修正键向量的乘积,该关键点与该相对点的相对几何结构权重与该关键点的特征信息的维度输入第一归一化层,得到该相对点的权重包括:The detection method according to claim 7, wherein the product of the modified query vector of the key point and the modified key vector of the relative point, the relative geometric structure weight of the key point and the relative point and the product of the key point The dimensions of the feature information are input into the first normalization layer, and the weight of the relative point obtained includes:
    将该关键点的修正查询向量与该相对点的修正键向量的乘积除以该关键点的特征信息的维度的平方根,再与该关键点与该相对点的相对几何结构权重叠加,得到的结果输入第一归一化层,得到该相对点的权重。Divide the product of the modified query vector of the key point and the modified key vector of the relative point by the square root of the dimension of the feature information of the key point, and then add it to the relative geometric structure weight of the key point and the relative point to get the result Enter the first normalization layer to get the weight of the relative point.
  10. 根据权利要求7所述的检测方法,其中,所述几何结构信息包括:所在局部平面的法向量、所在局部平面的曲率半径中至少一项,所述确定该关键点与该相对点的相对几何结构权重包括:The detection method according to claim 7, wherein the geometric structure information includes: at least one of the normal vector of the local plane and the curvature radius of the local plane, and the relative geometry of the key point and the relative point is determined. Structural weights include:
    根据该关键点与该相对点的距离,该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积,该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差,以及该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角中至少一项,确定该关键点与该相对点的相对几何结构权重。According to the distance between the key point and the relative point, the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located, the curvature radius of the local plane where the key point is located and the local area where the relative point is located. The difference between the curvature radii of the planes and at least one of the angles between the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located determines the relative geometric structure weight of the key point and the relative point.
  11. 根据权利要求10所述的检测方法,其中,The detection method according to claim 10, wherein,
    该关键点与该相对点的相对几何结构权重,随该关键点与该相对点的距离的增大而减小;The relative geometric structure weight of the key point and the relative point decreases as the distance between the key point and the relative point increases;
    该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的点积的增大而增大;The relative geometric structure weight of the key point and the relative point increases as the dot product of the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located increases;
    该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的曲率半径与该相对点所在局部平面的曲率半径的差的增大而增大;The relative geometric structure weight of the key point and the relative point increases as the difference between the curvature radius of the local plane where the key point is located and the curvature radius of the local plane where the relative point is located increases;
    该关键点与该相对点的相对几何结构权重,随该关键点所在局部平面的法向量与该相对点的所在局部平面的法向量的夹角经过特征传播层之后的结果的增大而增大。The relative geometric structure weight of the key point and the relative point increases as the angle between the normal vector of the local plane where the key point is located and the normal vector of the local plane where the relative point is located increases after passing through the feature propagation layer. .
  12. 根据权利要求1-11任一项所述的检测方法,其中,所述针对每个参考中心点, 根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码包括:The detection method according to any one of claims 1-11, wherein for each reference center point, According to the correlation between the reference center point and other reference center points, the first feature code of the reference center point is coded, and the second feature code of the reference center point includes:
    针对每个参考中心点,根据该参考中心点的第一特征编码,其他参考中心点的第一特征编码,以及该参考中心点与其他参考中心点之间相对位置关系,基于自注意力机制确定该参考中心点的第二特征编码。For each reference center point, it is determined based on the self-attention mechanism based on the first feature code of the reference center point, the first feature codes of other reference center points, and the relative position relationship between the reference center point and other reference center points. The second characteristic encoding of the reference center point.
  13. 根据权利要求12所述的检测方法,其中,所述针对每个参考中心点,根据该参考中心点的第一特征编码,其他参考中心点的第一特征编码,以及该参考中心点与其他参考中心点之间相对位置关系,基于自注意力机制确定该参考中心点的第二特征编码包括:The detection method according to claim 12, wherein for each reference center point, according to the first feature code of the reference center point, the first feature codes of other reference center points, and the difference between the reference center point and other reference points. The relative positional relationship between center points, and the second feature encoding of the reference center point determined based on the self-attention mechanism includes:
    针对每个参考中心点,将该参考中心点的第一特征编码和位置信息,其他参考中心点的第一特征编码和位置信息输入第二转换模型中编码器的第二自注意力模块;For each reference center point, input the first feature encoding and position information of the reference center point, and the first feature encoding and position information of other reference center points into the second self-attention module of the encoder in the second conversion model;
    在所述第二自注意力模块中,将其他参考中心点的作为相对中心点,针对每个相对中心点,将该参考中心点的位置信息与该相对中心点的位置信息分别输入第四位置编码层、第五位置编码层和第六位置编码层,确定该参考中心点与该相对中心点的第四相对位置编码,第五相对位置编码和第六相对位置编码;In the second self-attention module, other reference center points are used as relative center points, and for each relative center point, the position information of the reference center point and the position information of the relative center point are respectively input into the fourth position. The coding layer, the fifth position coding layer and the sixth position coding layer determine the fourth relative position coding, the fifth relative position coding and the sixth relative position coding of the reference center point and the relative center point;
    根据该相对中心点的特征信息分别与所述第二自注意力模块中键矩阵和值矩阵的乘积,确定该相对中心点的键向量和值向量;Determine the key vector and value vector of the relative center point according to the product of the characteristic information of the relative center point and the key matrix and value matrix in the second self-attention module respectively;
    根据该参考中心点的特征信息与所述第二自注意力模块中查询矩阵的乘积,确定该参考中心点的查询向量;Determine the query vector of the reference center point based on the product of the feature information of the reference center point and the query matrix in the second self-attention module;
    根据该参考中心点与每个相对中心点的第四相对位置编码、第五相对位置编码和第六相对位置编码、每个相对中心点的键向量、每个相对中心点的值向量和该参考中心点的查询向量,确定该参考中心点的第二特征编码。According to the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and each relative center point, the key vector of each relative center point, the value vector of each relative center point and the reference The query vector of the center point determines the second feature code of the reference center point.
  14. 根据权利要求13所述的检测方法,其中,所述根据该参考中心点与每个相对中心点的第四相对位置编码、第五相对位置编码和第六相对位置编码、每个相对中心点的键向量、每个相对中心点的值向量和该参考中心点的查询向量,确定该参考中心点的第二特征编码包括:The detection method according to claim 13, wherein according to the fourth relative position code, the fifth relative position code and the sixth relative position code of the reference center point and each relative center point, the The key vector, the value vector of each relative center point and the query vector of the reference center point. Determining the second feature encoding of the reference center point includes:
    针对每个相对中心点,将该参考中心点与该相对中心点的第四位置编码与该参考中心点的查询向量的和,作为该参考中心点的修正查询向量;For each relative center point, the sum of the reference center point, the fourth position code of the relative center point, and the query vector of the reference center point is used as the modified query vector of the reference center point;
    将该参考中心点与该相对中心点的第五位置编码与该相对中心点的键向量的和,作为该相对中心点的修正键向量; The sum of the fifth position code of the reference center point and the relative center point and the key vector of the relative center point is used as the modified key vector of the relative center point;
    将该参考中心点与该相对中心点的第六位置编码与该相对中心点的值向量的和,作为该相对中心点的修正值向量;The sum of the sixth position code of the reference center point and the relative center point and the value vector of the relative center point is used as the correction value vector of the relative center point;
    将该参考中心点的修正查询向量与该相对中心点的修正键向量的乘积与该参考中心点的第一特征编码的维度输入第二归一化层,得到该相对中心点的权重;Enter the product of the modified query vector of the reference center point and the modified key vector of the relative center point and the dimension of the first feature encoding of the reference center point into the second normalization layer to obtain the weight of the relative center point;
    根据各个相对中心点的权重对各个相对中心点的修正值向量进行加权求和,得到该参考中心点的第二特征编码。The correction value vectors of each relative center point are weighted and summed according to the weight of each relative center point to obtain the second feature code of the reference center point.
  15. 根据权利要求13所述的检测方法,其中,所述第四相对位置编码,第五相对位置编码和第六相对位置编码分别为第四前馈网络、第五前馈网络和第六前馈网络,将该参考中心点的位置信息与该相对中心点的位置信息分别输入第四位置编码层、第五位置编码层和第六位置编码层包括:The detection method according to claim 13, wherein the fourth relative position code, the fifth relative position code and the sixth relative position code are a fourth feedforward network, a fifth feedforward network and a sixth feedforward network respectively. , respectively inputting the position information of the reference center point and the position information of the relative center point into the fourth position coding layer, the fifth position coding layer and the sixth position coding layer includes:
    将该参考中心点的坐标与该相对中心点的坐标的差分别输入第四前馈网络、第五前馈网络和第六前馈网络。The differences between the coordinates of the reference center point and the coordinates of the relative center point are respectively input into the fourth feedforward network, the fifth feedforward network and the sixth feedforward network.
  16. 根据权利要求1-15任一项所述的检测方法,其中,所述对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点包括:The detection method according to any one of claims 1 to 15, wherein the classification of each key point and determining the point classified as the target center as the reference center point includes:
    针对每个关键点,将该关键点的第一特征编码输入分类网络,得到该关键点的分类结果;For each key point, input the first feature encoding of the key point into the classification network to obtain the classification result of the key point;
    根据所述分类结果确定该关键点是否为目标中心的点。Determine whether the key point is a point at the center of the target according to the classification result.
  17. 根据权利要求16所述的检测方法,其中,所述分类网络是利用带有标注信息的各个关键点的位置信息作为训练数据训练得到的,其中,针对每个关键点,在该关键点位于一个目标的边界框内且属于距离所述目标中心最近的点的情况下,该关键点的标注信息为目标中心的点。The detection method according to claim 16, wherein the classification network is trained by using the position information of each key point with annotation information as training data, wherein for each key point, the key point is located in a If it is within the bounding box of the target and belongs to the point closest to the target center, the annotation information of the key point is the point at the target center.
  18. 根据权利要求1-17任一项所述的检测方法,其中,所述根据各个参考中心点的第二特征编码,预测所述点云数据中各个目标的位置和类别包括:The detection method according to any one of claims 1-17, wherein predicting the position and category of each target in the point cloud data according to the second feature encoding of each reference center point includes:
    将各个参考中心点的第二特征编码输入第二转换模型中解码器,得到各个参考中心点的特征向量;Input the second feature encoding of each reference center point into the decoder in the second conversion model to obtain the feature vector of each reference center point;
    将各个参考中心点的特征向量输入目标检测网络,得到所述点云数据中各个目标的位置和类别。The feature vectors of each reference center point are input into the target detection network to obtain the location and category of each target in the point cloud data.
  19. 根据权利要求1-18任一项所述的检测方法,其中,针对每个关键点,该关键点预设范围内的其他关键点采用以下方法确定:The detection method according to any one of claims 1-18, wherein for each key point, other key points within the preset range of the key point are determined using the following method:
    针对每个关键点,将其他关键点按照与该关键点的距离由小到大进行排序,按照 排序由前到后的顺序选取预设个数的其他关键点,作为该关键点预设范围内的其他关键点。For each key point, sort other key points according to the distance from the key point from small to large, according to Sort from front to back to select a preset number of other key points as other key points within the preset range of the key point.
  20. 一种点云数据中目标的检测装置,包括:A device for detecting targets in point cloud data, including:
    特征提取模块,用于将点云数据输入点云特征提取网络,得到输出的所述点云数据中的多个关键点,以及各个关键点的特征信息;A feature extraction module, used to input point cloud data into a point cloud feature extraction network to obtain multiple key points in the output point cloud data and feature information of each key point;
    第一编码模块,用于针对每个关键点,根据该关键点与该关键点预设范围内的其他关键点之间的关联性,对该关键点的特征信息进行编码,得到该关键点的第一特征编码;The first encoding module is used to encode the characteristic information of each key point according to the correlation between the key point and other key points within the preset range of the key point to obtain the key point's characteristic information. First feature encoding;
    分类模块,用于对各个关键点进行分类,确定分类为目标中心的点,作为参考中心点;The classification module is used to classify each key point and determine the point classified as the target center as the reference center point;
    第二编码模块,用于针对每个参考中心点,根据该参考中心点与其他参考中心点之间的关联性,对该参考中心点的第一特征编码进行编码,得到该参考中心点的第二特征编码;The second encoding module is used to encode, for each reference center point, the first feature code of the reference center point according to the correlation between the reference center point and other reference center points, to obtain the third feature code of the reference center point. Two feature encoding;
    目标检测模块,用于根据各个参考中心点的第二特征编码,预测所述点云数据中各个目标的位置和类别。A target detection module, configured to predict the location and category of each target in the point cloud data based on the second feature code of each reference center point.
  21. 一种点云数据中目标的检测装置,包括:A device for detecting targets in point cloud data, including:
    处理器;以及processor; and
    耦接至所述处理器的存储器,用于存储指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-19任一项所述的点云数据中目标的检测方法。A memory coupled to the processor for storing instructions that, when executed by the processor, cause the processor to execute the target in the point cloud data according to any one of claims 1-19. Detection method.
  22. 一种非瞬时性计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-19任一项所述方法的步骤。A non-transitory computer-readable storage medium on which a computer program is stored, wherein the steps of the method of any one of claims 1-19 are implemented when the program is executed by a processor.
  23. 一种物品分拣装置,包括:权利要求20或21所述的点云数据中目标的检测装置以及分拣部件;An object sorting device, comprising: the detection device for targets in point cloud data and a sorting component according to claim 20 or 21;
    所述分拣部件用于根据所述点云数据中目标的检测装置输出的点云数据中各个目标的位置和类别,对目标进行分拣。The sorting component is used to sort the targets according to the position and category of each target in the point cloud data output by the detection device of the target in the point cloud data.
  24. 根据权利要求23所述的物品分拣装置,还包括:The article sorting device according to claim 23, further comprising:
    点云采集部件,用于采集预设区域的点云数据,将所述点云数据发送至所述点云数据中目标的检测装置。A point cloud collection component is used to collect point cloud data in a preset area, and send the point cloud data to a detection device for targets in the point cloud data.
  25. 一种计算机程序,包括:指令,所述指令被所述处理器执行时,使所述处理器执行如权利要求1-19任一项所述的点云数据中目标的检测方法。 A computer program includes: instructions, which when executed by the processor, cause the processor to execute the method for detecting objects in point cloud data according to any one of claims 1 to 19.
PCT/CN2023/087273 2022-04-19 2023-04-10 Method and apparatus for detecting target in point cloud data, and computer-readable storage medium WO2023202401A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210409033.6A CN115018910A (en) 2022-04-19 2022-04-19 Method and device for detecting target in point cloud data and computer readable storage medium
CN202210409033.6 2022-04-19

Publications (1)

Publication Number Publication Date
WO2023202401A1 true WO2023202401A1 (en) 2023-10-26

Family

ID=83067520

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/087273 WO2023202401A1 (en) 2022-04-19 2023-04-10 Method and apparatus for detecting target in point cloud data, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN115018910A (en)
WO (1) WO2023202401A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115018910A (en) * 2022-04-19 2022-09-06 京东科技信息技术有限公司 Method and device for detecting target in point cloud data and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021046716A1 (en) * 2019-09-10 2021-03-18 深圳市大疆创新科技有限公司 Method, system and device for detecting target object and storage medium
WO2021164469A1 (en) * 2020-02-21 2021-08-26 北京市商汤科技开发有限公司 Target object detection method and apparatus, device, and storage medium
CN113988164A (en) * 2021-10-21 2022-01-28 电子科技大学 Representative point self-attention mechanism-oriented lightweight point cloud target detection method
CN114120270A (en) * 2021-11-08 2022-03-01 同济大学 Point cloud target detection method based on attention and sampling learning
CN115018910A (en) * 2022-04-19 2022-09-06 京东科技信息技术有限公司 Method and device for detecting target in point cloud data and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021046716A1 (en) * 2019-09-10 2021-03-18 深圳市大疆创新科技有限公司 Method, system and device for detecting target object and storage medium
WO2021164469A1 (en) * 2020-02-21 2021-08-26 北京市商汤科技开发有限公司 Target object detection method and apparatus, device, and storage medium
CN113988164A (en) * 2021-10-21 2022-01-28 电子科技大学 Representative point self-attention mechanism-oriented lightweight point cloud target detection method
CN114120270A (en) * 2021-11-08 2022-03-01 同济大学 Point cloud target detection method based on attention and sampling learning
CN115018910A (en) * 2022-04-19 2022-09-06 京东科技信息技术有限公司 Method and device for detecting target in point cloud data and computer readable storage medium

Also Published As

Publication number Publication date
CN115018910A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
US10176404B2 (en) Recognition of a 3D modeled object from a 2D image
Shi et al. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network
Elyan et al. Deep learning for symbols detection and classification in engineering drawings
JP7193252B2 (en) Captioning image regions
Su et al. Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views
US9098773B2 (en) System and method of detecting objects in scene point cloud
JP5677798B2 (en) 3D object recognition and position and orientation determination method in 3D scene
CN113205466B (en) Incomplete point cloud completion method based on hidden space topological structure constraint
CN112016638B (en) Method, device and equipment for identifying steel bar cluster and storage medium
JP6888484B2 (en) A search program, a search method, and an information processing device on which the search program operates.
Shen et al. Vehicle detection in aerial images based on lightweight deep convolutional network and generative adversarial network
Song et al. 6-DOF image localization from massive geo-tagged reference images
Zhang et al. Automatic generation of as-built geometric civil infrastructure models from point cloud data
WO2023202401A1 (en) Method and apparatus for detecting target in point cloud data, and computer-readable storage medium
EP4016392A1 (en) Machine-learning for 3d object detection
Smith et al. Advanced Computing Strategies for Engineering: 25th EG-ICE International Workshop 2018, Lausanne, Switzerland, June 10-13, 2018, Proceedings, Part I
Massa et al. Convolutional neural networks for joint object detection and pose estimation: A comparative study
CN115455171A (en) Method, device, equipment and medium for mutual retrieval and model training of text videos
Sriram et al. Analytical review and study on object detection techniques in the image
CN111597367B (en) Three-dimensional model retrieval method based on view and hash algorithm
CN117058439A (en) Point cloud analysis method and system based on position adaptation module and knowledge complement
Tan et al. 3D detection transformer: Set prediction of objects using point clouds
Wang et al. LiDAR-SLAM loop closure detection based on multi-scale point cloud feature transformer
Chen et al. Image-based airborne LiDAR point cloud encoding for 3D building model retrieval
Xu et al. Aop-net: All-in-one perception network for joint lidar-based 3d object detection and panoptic segmentation