WO2024102628A1 - Ordonnancement de nuage de points auto-supervisé à l'aide de modèles d'apprentissage automatique - Google Patents
Ordonnancement de nuage de points auto-supervisé à l'aide de modèles d'apprentissage automatique Download PDFInfo
- Publication number
- WO2024102628A1 WO2024102628A1 PCT/US2023/078768 US2023078768W WO2024102628A1 WO 2024102628 A1 WO2024102628 A1 WO 2024102628A1 US 2023078768 W US2023078768 W US 2023078768W WO 2024102628 A1 WO2024102628 A1 WO 2024102628A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- multidimensional
- point cloud
- points
- point
- processing system
- Prior art date
Links
- 238000010801 machine learning Methods 0.000 title abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 74
- 238000013528 artificial neural network Methods 0.000 claims abstract description 50
- 230000009471 action Effects 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims description 87
- 238000012549 training Methods 0.000 claims description 20
- 230000015654 memory Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 description 8
- 238000003384 imaging method Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000003102 growth factor Substances 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- Machine learning models such as artificial neural networks (ANNs), convolutional neural networks (CNNs), or the like, can be used to perform various actions on input data. These actions may include, for example, data compression, pattern matching (e.g., for biometric authentication), object detection (e.g., for surveillance applications, autonomous driving, or the like), natural language processing (e.g., identification of keywords in spoken speech that triggers execution of specified operations within a system), or other inference operations in which models are used to predict something about the state of the environment from which input data is received.
- ANNs artificial neural networks
- CNNs convolutional neural networks
- a source data set may include images, video, or other content captured in a specific environment with specific equipment in a specific state (e.g., an urban or otherwise highly built environment, with imaging devices having specific noise and optical properties, that are relatively clean).
- the input data which a machine learning model uses to generate an inference may include multidimensional data, such as a multidimensional point cloud P+S Ref. No.: QUAL/2300565PC 1 Client Ref.
- a point cloud representing a visual scene may include multiple spatial dimensions and may include a large number of discrete points. Because a multidimensional point cloud may include a large number of points, processing a multidimensional point cloud in order to infer meaningful data from the multidimensional point cloud may be a computationally expensive task. Further, many of the points in a point cloud may represent the same or similar data, and thus, processing a multidimensional point cloud may also result in redundant computation for points that have the same, or at least very similar, semantic meanings or similar contributions to the meaning of a multidimensional point cloud.
- Certain aspects provide a processor-implemented method for inferencing against a multidimensional point cloud using a machine learning model.
- An example method generally includes generating a score for each respective point in a multidimensional point cloud. Points in the multidimensional point cloud are ranked based on the generated score for each respective point in the multidimensional point cloud. The top points are selected from the ranked multidimensional point cloud, and one or more actions are taken based on the selected top points.
- Certain aspects provide a processor-implemented method for training a machine learning model to perform inferences from a multidimensional point cloud.
- An example method generally includes training a neural network to map multidimensional point clouds into feature maps. A score is generated for each respective point in a multidimensional point cloud.
- the points in the multidimensional point cloud are ranked based on the generated score for each respective point in the multidimensional point cloud.
- a plurality of top point sets are generated from the ranked points in the multidimensional point cloud.
- the neural network is retrained based on a noise contrastive estimation loss calculated based on the plurality of top point sets.
- FIG.1 illustrates an example pipeline for training and using a self-supervised machine learning model trained to perform inferences on a multidimensional point cloud, according to aspects of the present disclosure.
- FIG.2 illustrates an example of contrastive learning based on an ordered set of points in a multidimensional point cloud, according to aspects of the present disclosure.
- FIG.3 illustrates example operations for self-supervised training of a machine learning model to perform inferences on a multidimensional point cloud, according to aspects of the present disclosure.
- FIG.4 illustrates example operations for processing a multidimensional point cloud using a self-supervised machine learning model, according to aspects of the present disclosure.
- FIG.5 illustrates an example implementation of a processing system on which self-supervised training of a machine learning model to perform inferences on a multidimensional point cloud can be performed, according to aspects of the present disclosure.
- FIG.6 illustrates an example implementation of a processing system on which processing a multidimensional point cloud using a self-supervised machine learning model can be performed, according to aspects of the present disclosure.
- identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one aspect may be beneficially incorporated in other aspects without further recitation.
- P+S Ref. No.: QUAL/2300565PC 3 Client Ref. No.: 2300565WO DETAILED DESCRIPTION
- Aspects of the present disclosure provide techniques and apparatuses for training and using self-supervised machine learning models to efficiently and accurately process multidimensional point clouds.
- Multidimensional data such as multidimensional point clouds
- multidimensional point clouds may provide information about the three-dimensional spatial location (e.g., height relative to an elevation datum point, lateral (side-to-side) distance relative to a defined datum point, and depth relative to a defined datum point) of each object in a scene relative to the reference point.
- a multidimensional point cloud may include a large amount of data (e.g., a large number of discrete data points) which may be impractical to process in order to extract meaning or other information from the multidimensional point cloud. Further, the points in a multidimensional point cloud may have different levels of importance and contribute different amounts of meaning to the overall scene in which the point cloud exists.
- two points that are adjacent to each other in a point cloud may convey similar information, as these points may be located on a same surface of an object in a spatial environment; however, two points that are far away from each other in the point cloud may convey very different information (e.g., relate to different objects in a spatial environment or different surfaces of the same object in the spatial environment).
- various techniques can be used to reduce the size of the point cloud from which meaning is to be extracted. For example, random selection or furthest point sampling can be used to reduce the size of a point cloud that is provided as input into a machine learning model for processing.
- points that are P+S Ref. No.: QUAL/2300565PC 4 Client Ref. No.: 2300565WO proximate to each other may convey minimal additional information, while points that are far away from each other may relate to different portions of the same object (e.g., a point corresponding to the left wingtip of an aeroplane and a point corresponding to the right wingtip of the aeroplane, or a point corresponding to the bow of a ship and a point corresponding to the stern of the ship, both of which may have a distance of a sizable number of meters from each other) or may relate to different objects altogether).
- inference performance using a randomly selected or sampled subset of points from a point cloud may be negatively impacted.
- Other techniques may attempt to order the points in a point cloud. For example, group-wise ordering can be achieved using fully supervised models; however, these techniques may not differentiate between different discrete points in the point cloud and may entail the use of labeled data (which may be unavailable or impractical to generate) for supervised learning.
- Another technique may allow for point- wise projection of a point cloud; however, these techniques may not allow for an ordering to be directly learned from an input point cloud, but rather involve various transformations and projections—and thus additional computational expense—before the points in a point cloud can be ordered.
- aspects of the present disclosure provide techniques and apparatuses for efficiently ordering points in multidimensional point clouds to allow for the identification and use of a representative subset of points to perform an inference on the multidimensional point cloud.
- a scoring neural network can be used to assign a score to each point in a multidimensional point cloud.
- the score assigned to a point may indicate a relative importance of that point to the overall meaning of the multidimensional point cloud.
- the points may be sorted by score, and the top k points can be used to perform inferences on the multidimensional point cloud using a machine learning model and to perform self-supervised training of a machine learning model that maps input multidimensional point clouds to a feature map based on which the scores for each point can be generated.
- FIG. 1 depicts an example pipeline 100 for training and using a self- supervised machine learning model to perform inferences on a multidimensional point cloud, according to aspects of the present disclosure.
- Pipeline 100 includes a point network 110 (labeled “PointNet”), a scoring neural network 120 (labeled “Scorer”), and a top point selection module 130 (labeled “Top-k”).
- Pipeline 100 may be configured to order points in an input multidimensional point cloud, such as multidimensional point cloud ⁇ 105, using self- supervised machine learning techniques.
- Multidimensional point cloud ⁇ 105 may be represented as ⁇ ⁇ ⁇ th ⁇ R , where ⁇ ⁇ represents the i point in the multidimensional point cloud ⁇ 105 and ⁇ corresponds to the number of points in the multidimensional point cloud ⁇ 105.
- each of the N points in the multidimensional point cloud 105 may be associated with a real value in each of a plurality of dimensions (e.g., in this example, three spatial dimensions, such as height, width, and depth).
- Pipeline 100 may attempt to find an ordering of the points ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ ⁇ , ... , ⁇ ⁇ ⁇ from an unlabeled data set that minimizes, or at least reduces, the value of the downstream objective function ⁇ : ⁇ ⁇ ⁇ arg m ⁇ in ⁇ ⁇ ⁇ ⁇ where various subsets of ⁇ ⁇ ⁇ contain the top n points, where ⁇ ⁇ ⁇ .
- the point network 110 can generate a feature map F 112 from the multidimensional point cloud 105.
- the point network 110 may generate the feature map F 112 with dimensions of ⁇ ⁇ ⁇ , where N represents the number of points in multidimensional point cloud 105 and D represents the number of dimensions in the feature map F 112 into which multidimensional point cloud ⁇ 105 is mapped. D may be different from the number of dimensions in which points in the multidimensional point cloud lie.
- point network 110 may be a neural network (a feature extracting neural network) or other machine learning model that takes a set of unordered points in a point cloud as an input and generates the feature map as the P+S Ref. No.: QUAL/2300565PC 6 Client Ref. No.: 2300565WO output of a plurality of multi-layer perceptrons (MLPs).
- Point network 110 may, in some aspects, exclude transformation layers which may be used to apply various geometric transformations to the multidimensional point cloud 105 to allow for point network 110 to be spatially invariant.
- Scoring neural network 120 may be a neural network configured to generate a score for each point in the point cloud based on the feature map F 112 generated by point network 110.
- scoring neural network 120 may provide a mapping ⁇ from a point cloud to a score vector according to the expression doing so, given a feature map F ⁇ R
- the score matrix 124 may be ordered based on an index associated with each point in the feature map F 112 such that the score matrix 124 is unordered with respect to the scores generated for each point in the feature map F 112.
- a feature for the i th point in the feature map F may be denoted in D dimensions, and ⁇ ⁇ , ⁇ ⁇ ⁇ 1,2, ... , ⁇ represents the ij th element in F ⁇ .
- the score generated for the i th point in the multidimensional point cloud ⁇ 105 may be computed to represent the contribution of that point to a global feature ⁇ representing multidimensional point cloud ⁇ 105.
- the global feature ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ ⁇ , ... , ⁇ ⁇ ⁇ may be computed by an order-invariant max-pooling block 122 represented by the equation: or, alternatively (and equivalently): ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ F, ⁇ ⁇ ⁇ 0 ⁇
- a point having the maximum value in the j th dimension may be calculated according to the equation: ⁇ ⁇ ⁇ ⁇ ⁇ arg ⁇ ⁇ m ⁇ , ⁇ a , ... x , ⁇ ⁇ ⁇ for ⁇ ⁇ ⁇ 1,2, ... , ⁇ [0028]
- the number of a point i to the global feature ⁇ the number
- a score ⁇ ⁇ for a point i may be calculated according to the equation: P+S Ref. No.: QUAL/2300565PC 7 Client Ref. No.: 2300565WO
- ⁇ ⁇ represents a Kronecker delta function where ⁇ ⁇ ⁇ ⁇ and ⁇ ⁇ ⁇ 1 if ⁇ ⁇ ⁇ .
- the score for a point i may be 1.0 if the feature for that point i is descriptive of the global feature ⁇ in its entirety and may be 0.0 if the feature F ⁇ for that point i is not descriptive of the global feature ⁇ .
- the score may be represented as a differentiable approximation of the importance of features for a point i.
- the differentiable approximation may be represented by the equation: where ⁇ represents a sigmoid operation with temperature ⁇ such that By scaling ⁇ with 2, the sigmoid outputs may arrive at the interval [0, 1].
- the score for a point i may be 1.0 if the feature F ⁇ for that point i is descriptive of the global feature ⁇ in its entirety and may be 0.0 if the feature F ⁇ for that point i is not descriptive of the global feature ⁇ . Further, because ⁇ ⁇ ⁇ ⁇ , the score vector ⁇ for all points may be represented by the equation ⁇ ⁇ ⁇ , ⁇ ⁇ ⁇ ⁇ 1 ⁇ . [0030] While the scoring neural network 120 is discussed above with respect to a sigmoid function, it should be recognized that other non-linear functions can be used to generate a score for each point i in feature map F.
- Top point selection module 130 is generally configured to sort, in a differentiable manner, the points in multidimensional point cloud ⁇ 105 based on the score matrix 124 including scores generated for points in multidimensional point cloud ⁇ 105 by scoring neural network 120. In doing so, top point selection module 130 may use a top-k operator that ranks the points in the multidimensional point cloud ⁇ 105 by solving a parameterized optimal transport problem, for example.
- No.: 2300565WO transport problem attempts to find a transport plan from a discrete distribution ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ ⁇ , ... , ⁇ ⁇ ⁇ ⁇ to a discrete distribution B ⁇ ⁇ 0,1,2, ... , ⁇ ⁇ 1 ⁇ ⁇ .
- marginals for both ⁇ and B may be defined as ⁇ ⁇ ⁇ ⁇ 1 ⁇ / ⁇
- a cost matrix ⁇ ⁇ R ⁇ may be defined, with ⁇ ⁇ representing the cost of transporting mass from th to ⁇ ⁇ (e.g., from the i point to the j th element in B.
- the cost may be, for example, defined as the squared Euclidean distance between ⁇ ⁇ and ⁇ ⁇ such that [0033]
- the optimal transport problem can be represented by the equation: ⁇ ⁇ ⁇ arg m ⁇ ⁇ i ⁇ n ⁇ ⁇ , ⁇ ⁇ ⁇ h ⁇ , such that ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ and ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , where ⁇ represents the inner product and h ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ log ⁇ ⁇ represents an entropy regularizer that can minimize, or at least reduce, discontinuities and generate a smoothed and differentiable approximation for the top-k operation.
- An approximation ⁇ ⁇ of the optimal ⁇ may thus represent the optimal transport plan that transforms discrete distribution ⁇ to discrete distribution B.
- the approximate optimal transport plan ⁇ ⁇ may be scaled by N so that ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ B represents the ordering of the points in multidimensional point cloud ⁇ 105, represented as sorted point cloud ⁇ ⁇ ⁇ 132, where ⁇ ⁇ ⁇ ⁇ R ⁇ .
- the sorted point cloud ⁇ ⁇ ⁇ 132 may be represented by an ordered vector 131.
- the ordered vector 131 may be generated by sorting the score matrix 124 from the highest score to the lowest score, such that the index of a point in the ordered vector 131 is different from the index of that point in the feature map F 112 (or a max-pooled version thereof).
- the point with the highest score may be set to 0, the point with the next highest score may be set to 1, and so on, until the point with the lowest score is set to N-1.
- top point selection module 130 can generate one or more point sets from ⁇ ⁇ ⁇ 132.
- FIG.2 illustrates an example 200 of contrastive learning based on an ordered set of points in a multidimensional point cloud, according to aspects of the present disclosure.
- the point network 110 may be retrained, or refined, using self-supervision techniques.
- the hierarchical scheme (e.g., the order in which points are sorted in ⁇ ⁇ ⁇ 132) may be used as a supervision signal for retraining the point network 110.
- a plurality of subsets of points in multidimensional point cloud ⁇ 105 can be generated.
- the subsets of points ⁇ may be defined with increasing cardinality, represented as ⁇ ⁇ , with
- the ⁇ term may control, or at least influence, the growth of the size of each subset ⁇ .
- the first subset ⁇ ⁇ may include the top ⁇ points in the ranked multidimensional point cloud 105
- the second subset ⁇ ⁇ may include the top ⁇ ⁇ points
- the third subset ⁇ ⁇ may include the top ⁇ ⁇ points, and so on.
- the subsets of points ⁇ from ⁇ ⁇ ⁇ 132 may be treated as positive pairs for use in calculating an NCE loss, while negative pairs may be constructed from subsets of points from point clouds different from the multidimensional point cloud 105 (e.g., point clouds representing other objects or other scenes different from the object or scene depicted by the multidimensional point cloud 105, such as the points in the point set which are projected into regions 220 or 230 of the latent space 205).
- point clouds representing other objects or other scenes different from the object or scene depicted by the multidimensional point cloud 105, such as the points in the point set which are projected into regions 220 or 230 of the latent space 205.
- a multiple- instance NCE loss may be represented by the equation: where ⁇ ⁇ ⁇ represents the positive set and ⁇ ⁇ represents the negative set for the i th subset of points from ⁇ ⁇ ⁇ 132.
- ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ represents a procedure including the backbone f of scoring neural network 120, a max-pooling operation ⁇ ⁇ , and a projection head ⁇ configured to project the pooled features of point subsets into a shared latent space 205.
- the subsets of P+S Ref. No.: QUAL/2300565PC 10 Client Ref. No.: 2300565WO points ⁇ may be projected into a latent space representation, with these points being projected into a first region 210 of the latent space 205.
- Each set of points ⁇ 212, 214, 216 may represent different subsets of points from the multidimensional point cloud ⁇ 105, with the first set 212 being the smallest set and being a subset of the second set ⁇ ⁇ 214, which in turn may be smaller than and a subset of the m th set ⁇ ⁇ 216 (as well as any intervening sets of points, not illustrated in FIG.
- the other point sets based on which contrastive learning is to be performed on the point network 110 may be projected into other regions in the latent space 205, such as regions 220 and 230 (amongst others).
- the overall loss function used for training (or retraining) the point network 110 using contrastive learning techniques may be represented by the equation: [0039] Because the subsets of points ⁇ increase in cardinality, the top points may be used more often in calculating the contrastive loss between different subsets of points, as these top points may be shared across different subsets of points.
- FIG. 3 illustrates example operations 300 for self-supervised training of a machine learning model to perform inferences on a multidimensional point cloud, according to aspects of the present disclosure.
- Operations 300 can be performed, for example, by a computing system, such as that illustrated in FIG.5, on which training data sets of multidimensional point clouds can be used to train a machine learning model to identify a representative set of points for a multidimensional point cloud and perform inferences based on the representative set of points.
- operations 300 may begin at block 310, in which a neural network is trained to map a multidimensional point cloud into a feature map using a P+S Ref. No.: QUAL/2300565PC 11 Client Ref. No.: 2300565WO feature generating neural network (e.g., the point network 110 illustrated in FIG.1).
- the multidimensional point cloud may have N points, with each point being located in a multidimensional (e.g., three-dimensional) space.
- Each point in the multidimensional point cloud generally represents spatial data in each dimension of a multidimensional space in which the data from which the multidimensional point cloud was generated lies.
- spatial data may be measured or otherwise represented relative to one or more reference points or planes.
- one or more dimensions in which data is located in the multidimensional point cloud may be non-spatial dimensions, such as frequency dimensions, temporal dimensions, or the like.
- operations 300 proceed with generating a score for each respective point in the multidimensional point cloud using a point scoring neural network (e.g., the scoring neural network 120 illustrated in FIG. 1).
- the score generated for each respective point in the multidimensional point cloud may be a score relative to an overall feature into which the multidimensional point cloud is mapped by the feature generating neural network. Points having higher scores may correspond to points having a higher degree of importance to the overall feature into which the multidimensional point cloud is mapped and may have higher scores than points which have a lesser degree of importance to the overall feature into which the multidimensional point cloud is mapped.
- the score for a respective point in the multidimensional point cloud may be calculated based on the sum of a max-pooled set of features calculated along each feature dimension for that point.
- operations 300 proceed with ranking points in the multidimensional point cloud based on the generated score for each respective point in the multidimensional point cloud.
- an optimum transport problem can be solved in order to map a discrete distribution ⁇ of points to a discrete, ordered distribution B.
- the resulting ranked set of points ⁇ ⁇ ⁇ may include the same number of points as the input multidimensional point cloud ⁇ , with values from 0 through N – 1.
- operations 300 proceed with generating a plurality of top point sets from the ranked points in the multidimensional point cloud.
- the plurality of top point P+S Ref. No.: QUAL/2300565PC 12 Client Ref. No.: 2300565WO sets may be generated with increasing cardinality based on a base size (e.g., the growth factor term ⁇ ) associated with the first (smallest) top point set of the plurality of top point sets.
- the top point sets may increase in size exponentially, such that for the k th point set, the size of (e.g., number of points included in) the k th point set is represented by ⁇ ⁇ .
- operations 300 proceed with retraining the neural network based on a noise contrastive estimation loss (e.g., minimizing such a loss) calculated based on the plurality of top point sets.
- a noise contrastive estimation loss e.g., minimizing such a loss
- an NCE loss may be calculated between the plurality of top point sets, treated as a positive set, and top point sets from one or more other multidimensional point clouds, treated as a negative set.
- the NCE loss may be calculated based on a projection of features of the point subsets in the positive and negative sets into a shared latent space.
- the subsets of points may increase in cardinality (e.g., size)
- the top points may be used more often in calculating the NCE loss, and the neural network may be trained to generate the highest scores for the points in the multidimensional point cloud that are the most contrastively informative points and generate lower scores for points in the multidimensional point cloud that are less contrastively informative.
- FIG.4 illustrates example operations 400 for processing a multidimensional point cloud using a self-supervised machine learning model, according to aspects of the present disclosure.
- Operations 400 can be performed, for example, by a computing system, such as a user equipment (UE) or other computing device, such as that illustrated in FIG.6, on which a trained machine learning model can be deployed and used to process an input multidimensional point cloud.
- a computing system such as a user equipment (UE) or other computing device, such as that illustrated in FIG.6, on which a trained machine learning model can be deployed and used to process an input multidimensional point cloud.
- operations 400 begin at block 410, with generating a score for each respective point in a multidimensional point cloud.
- the operations further include generating the multidimensional point cloud based on a neural network that is trained to generate a feature map based on the input of a multidimensional point cloud representing an object or scene input into a neural network for analysis.
- the multidimensional point cloud may be generated based on one or more ranging devices associated with the UE or other computing device performing the operations 400.
- these ranging devices may include radar devices, P+S Ref. No.: QUAL/2300565PC 13 Client Ref. No.: 2300565WO LIDAR sensors, ultrasonic sensors, or other devices that are capable of measuring a distance between the ranging device and another object.
- the multidimensional point cloud may include a set of points having a plurality of spatial dimensions.
- points in the multidimensional point cloud may have values determined in relation to one or more reference points or planes
- the set of points may include data on the height, width, and depth dimensions, with the height data being relative to a defined reference zero- elevation plane, width being relative to a datum point such as the center of an imaging device that captured the image from which the multidimensional point cloud was generated or some other reference point, and depth being relative to a datum point such as the point at which the imaging device is located.
- the multidimensional point cloud may also or alternatively include points having one or more non-spatial dimensions, such as a frequency dimension, a temporal dimension, or the like.
- the multidimensional point cloud may be mapped into a feature map representative of the multidimensional point cloud using a point network.
- the point network may map the multidimensional point cloud into the feature map based on a self-supervised loss function trained to map points in a multidimensional space to features in a multidimensional feature space.
- the point network may generate a two-dimensional matrix with dimensions of N by D, where D represents the number of feature dimensions into which points are mapped. That is, each point i, ⁇ ⁇ ⁇ , may be associated with D feature values in the feature map.
- the score for each respective point i may be calculated based on the feature map representing the multidimensional point cloud.
- the score generated for each respective point in the multidimensional point cloud may be a score relative to an overall feature into which the multidimensional point cloud is mapped by the neural network. Points having higher scores may correspond to points having a higher degree of importance to the overall feature into which the multidimensional point cloud is mapped and may have higher scores than points which have a lesser degree of importance to the overall feature into which the multidimensional point cloud is mapped.
- respective point in the multidimensional point cloud may be calculated based on the sum of a max-pooled set of features calculated along each feature dimension for that point.
- operations 400 proceed with ranking points in the multidimensional point cloud based on the generated score for each respective point in the multidimensional point cloud.
- an optimum transport problem can be solved in order to map a discrete distribution ⁇ of points to a discrete, ordered distribution B.
- the resulting ranked set of points ⁇ ⁇ ⁇ may include the same number of points as the input multidimensional point cloud ⁇ , with values from 0 through N – 1.
- operations 400 proceed with selecting top points rom the ranked multidimensional point cloud.
- the top points may be the top k points selected based on noise contrastive estimation over a plurality of subsets of multidimensional point clouds.
- operations 400 proceed with taking one or more actions based on the selected top points.
- the one or more actions may include classifying an input represented by the multidimensional point cloud as representative of one of a plurality of types of objects.
- the one or more actions may include semantically segmenting an input image into a plurality of segments. Each segment in the plurality of segments may correspond to a type of object in the input image.
- FIG.5 depicts an example processing system 500 for self-supervised training of machine learning models to perform inferences on a multidimensional point cloud, such as described herein for example with respect to FIG.3.
- Processing system 500 includes a central processing unit (CPU) 502, which in some examples may be a multi-core CPU. Instructions executed at the CPU 502 may be loaded, for example, from a program memory associated with the CPU 502 or may be loaded from a memory 524. P+S Ref.
- Processing system 500 also includes additional processing components tailored to specific functions, such as a graphics processing unit (GPU) 504, a digital signal processor (DSP) 506, a neural processing unit (NPU) 508, a multimedia processing unit 510, and a wireless connectivity component 512.
- GPU graphics processing unit
- DSP digital signal processor
- NPU neural processing unit
- NPU 508 is generally a specialized circuit configured for implementing control and arithmetic logic for executing machine learning algorithms, such as algorithms for processing artificial neural networks (ANNs), deep neural networks (DNNs), random forests (RFs), and the like.
- ANNs artificial neural networks
- DNNs deep neural networks
- RFs random forests
- NPU may sometimes alternatively be referred to as a neural signal processor (NSP), tensor processing units (TPUs), neural network processor (NNP), intelligence processing unit (IPU), vision processing unit (VPU), or graph processing unit.
- NPUs such as NPU 508, are configured to accelerate the performance of common machine learning tasks, such as image classification, machine translation, object detection, and various other predictive models.
- a plurality of NPUs may be instantiated on a single chip, such as a system on a chip (SoC), while in other examples they may be part of a dedicated neural-network accelerator.
- SoC system on a chip
- NPUs may be optimized for training or inference, or in some cases configured to balance performance between both.
- NPUs designed to accelerate training are generally configured to accelerate the optimization of new models, which is a highly compute-intensive operation that involves inputting an existing dataset (often labeled or tagged), iterating over the dataset, and then adjusting model parameters, such as weights and biases, in order to improve model performance. Generally, optimizing based on a wrong prediction involves propagating back through the layers of the model and determining gradients to reduce the prediction error.
- NPUs designed to accelerate inference are generally configured to operate on complete models.
- NPU 508 is a part of one or more of CPU 502, GPU 504, and/or DSP 506.
- P+S Ref. No.: QUAL/2300565PC 16 Client Ref. No.: 2300565WO
- wireless connectivity component 512 may include subcomponents, for example, for third generation (3G) connectivity, fourth generation (4G) connectivity (e.g., Long-Term Evolution (LTE)), fifth generation (5G) connectivity (e.g., New Radio (NR)), Wi-Fi connectivity, Bluetooth connectivity, and other wireless data transmission standards.
- Wireless connectivity component 512 is further coupled to one or more antennas 514.
- Processing system 500 may also include one or more sensor processing units 516 associated with any manner of sensor, one or more image signal processors (ISPs) 518 associated with any manner of image sensor, and/or a navigation processor 520, which may include satellite-based positioning system components (e.g., GPS or GLONASS) as well as inertial positioning system components.
- ISPs image signal processors
- Navigation processor 520 which may include satellite-based positioning system components (e.g., GPS or GLONASS) as well as inertial positioning system components.
- Processing system 500 may also include one or more input and/or output devices 522, such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like.
- one or more of the processors of processing system 500 may be based on an ARM or RISC-V instruction set.
- Processing system 500 also includes memory 524, which is representative of one or more static and/or dynamic memories, such as a dynamic random access memory, a flash-based static memory, and the like.
- memory 524 includes computer-executable components, which may be executed by one or more of the aforementioned processors of processing system 500.
- memory 524 includes neural network training component 524A, score generating component 524B, point ranking component 524C, and top point set generating component 524D.
- the depicted components, and others not depicted, may be configured to perform various aspects of the methods described herein.
- processing system 500 and/or components thereof may be configured to perform the methods described herein.
- aspects of processing system 500 may be omitted, such as where processing system 500 is a server computer or the like.
- multimedia processing unit 510, wireless connectivity component 512, sensor processing units 516, ISPs 518, and/or navigation processor 520 may be omitted in other aspects.
- Client Ref. No.: 2300565WO aspects of processing system 500 may be distributed, such as training a model and using the model to generate inferences.
- FIG. 6 depicts an example processing system 600 for processing a multidimensional point cloud using a self-supervised machine learning model, such as described herein for example with respect to FIG.4.
- the processing system 600 includes a central processing unit (CPU) 602, which in some examples may be a multi-core CPU.
- the processing system 600 also includes additional processing components tailored to specific functions, such as a graphics processing unit (GPU) 604, a digital signal processor (DSP) 606, and a neural processing unit (NPU) 608.
- the CPU 602, GPU 604, DSP 606, and NPU 608 may be similar to the CPU 502, GPU 504, DSP 506, and NPU 508 discussed above with respect to FIG.5.
- wireless connectivity component 612 may include subcomponents, for example, for 3G connectivity, 4G connectivity (e.g., LTE), 5G connectivity (e.g., NR), Wi-Fi connectivity, Bluetooth connectivity, and other wireless data transmission standards. Wireless connectivity component 612 is further coupled to one or more antennas 614.
- Processing system 600 may also include one or more sensor processing units 616 associated with any manner of sensor, one or more image signal processors (ISPs) 618 associated with any manner of image sensor, and/or a navigation processor 620, which may include satellite-based positioning system components (e.g., GPS or GLONASS) as well as inertial positioning system components.
- satellite-based positioning system components e.g., GPS or GLONASS
- Processing system 600 may also include one or more input and/or output devices 622, such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like. [0078] In some examples, one or more of the processors of processing system 600 may be based on an ARM or RISC-V instruction set. [0079] Processing system 600 also includes memory 624, which is representative of one or more static and/or dynamic memories, such as a dynamic random access memory, a flash-based static memory, and the like. In this example, memory 624 includes computer-executable components, which may be executed by one or more of the aforementioned processors of processing system 600. P+S Ref. No.: QUAL/2300565PC 18 Client Ref.
- memory 624 includes score generating component 624A, point ranking component 624B, top point selecting component 624C, and action taking component 624D.
- the depicted components, and others not depicted, may be configured to perform various aspects of the methods described herein.
- processing system 600 and/or components thereof may be configured to perform the methods described herein.
- aspects of processing system 600 may be omitted, such as where processing system 600 is a server computer or the like.
- multimedia processing unit 610, wireless connectivity component 612, sensor processing units 616, ISPs 618, and/or navigation processor 620 may be omitted in other aspects.
- Clause 1 A processor-implemented method, comprising: generating a score for each respective point in a multidimensional point cloud using a scoring neural network; ranking points in the multidimensional point cloud based on the generated score for each respective point in the multidimensional point cloud; selecting top points from the ranked multidimensional point cloud; and taking one or more actions based on the selected top points.
- Clause 2 The method of clause 1, wherein generating the score for each point in the multidimensional point cloud comprises: mapping the multidimensional point cloud into a feature map representing the multidimensional point cloud using a feature extracting neural network; and generating the score for each respective point in the multidimensional point cloud based on the feature map representing the multidimensional point cloud.
- Clause 3 The method of clause 2, wherein the feature extracting neural network is configured to map the multidimensional point cloud into the feature map based on a self-supervised loss function trained to map points in a multidimensional space to points in a multidimensional feature space.
- Clause 4 The method of clause 2 or 3, wherein the feature map comprises a map with dimensions of a number of points in the multidimensional point cloud by a number of feature dimensions into which the multidimensional point cloud is mapped.
- Clause 5 The method of any of clauses 2 through 4, wherein the score for each respective point in the multidimensional point cloud is generated based on a global feature representing the multidimensional point cloud and a sum of scores for the respective point in each feature dimension in the feature map.
- Clause 6 The method of any of clauses 1 through 5, wherein ranking the points in the multidimensional point cloud comprises ranking the points in the multidimensional point cloud based on optimal transport problem between an unordered ranking of points in the multidimensional point cloud to an ordered ranking of points in the multidimensional point cloud.
- Clause 7 The method of any of clauses 1 through 6, wherein selecting the top points from the ranked multidimensional point cloud comprises selecting the top k points based on noise contrastive estimation over a plurality of subsets of multidimensional point clouds.
- Clause 8 The method of any of clauses 1 through 7, wherein the one or more actions comprises classifying an input represented by the multidimensional point cloud as representative of one of a plurality of types of objects.
- Clause 9 The method of any of clauses 1 through 8, wherein the one or more actions comprise semantically segmenting an input image into a plurality of segments, each segment of the plurality of segments corresponding to a type of object in the input image.
- Clause 10 The method of any of clauses 1 through 9, wherein the multidimensional point cloud comprises a set of points having a plurality of spatial dimensions.
- a processor-implemented method comprising: training a neural network to map multidimensional point clouds into feature maps; generating a score for each respective point in a multidimensional point cloud; ranking points in the multidimensional point cloud based on the generated score for each respective point in the multidimensional point cloud; generating a plurality of top point sets from the ranked P+S Ref. No.: QUAL/2300565PC 20 Client Ref. No.: 2300565WO points in the multidimensional point cloud; and retraining the neural network based on a noise contrastive estimation loss calculated based on the plurality of top point sets.
- Clause 12 The method of clause 11, wherein generating the plurality of top point sets from the ranked points in the multidimensional point cloud comprises generating a plurality of top point sets with increasing cardinality based on a base size of a first top point set of the plurality of top point sets.
- Clause 13 The method of clause 12, wherein the increasing cardinality is based on exponential growth of the base size.
- Clause 14 The method of clause 12 or 13, wherein a k th point set from the plurality of top point sets comprises a subset of a k + 1 th point set from the plurality of top point sets.
- Clause 15 The method of any of clauses 11 through 14, wherein retraining the neural network comprises calculating a noise contrastive estimation loss between the plurality of top point sets and a plurality of point sets from one or more other multidimensional point clouds.
- Clause 16 A processing system comprising: a memory comprising computer- executable instructions; and one or more processors configured to execute the computer- executable instructions and cause the processing system to perform a method in accordance with any of clauses 1-15.
- Clause 17 A processing system, comprising means for performing a method in accordance with any of clauses 1-15.
- Clause 18 A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method in accordance with any of clauses 1-15.
- Clause 19 A computer program product embodied on a computer-readable storage medium comprising code for performing a method in accordance with any of clauses 1-15. Additional Considerations [0103] The preceding description is provided to enable any person skilled in the art to practice the various aspects described herein. The examples discussed herein are not P+S Ref. No.: QUAL/2300565PC 21 Client Ref. No.: 2300565WO limiting of the scope, applicability, or aspects set forth in the claims.
- “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
- the term “determining” encompasses a wide variety of actions.
- determining may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database, or another data structure), ascertaining, and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), and the like. Also, “determining” may include resolving, selecting, choosing, establishing, and the like. [0107]
- the methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific P+S Ref. No.: QUAL/2300565PC 22 Client Ref.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
Certains aspects de la présente divulgation proposent des techniques et des appareils pour inférer contre un nuage de points multidimensionnel à l'aide d'un modèle d'apprentissage automatique. Un procédé donné à titre d'exemple consiste généralement à générer une note pour chaque point respectif dans un nuage de points multidimensionnel à l'aide d'un réseau de neurones artificiels de notation. Des points dans le nuage de points multidimensionnel sont classés sur la base de la note générée pour chaque point respectif dans le nuage de points multidimensionnel. Les points supérieurs sont sélectionnés à partir du nuage de points multidimensionnel classé, et une ou plusieurs actions sont prises sur la base des points k supérieurs sélectionnés.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263383381P | 2022-11-11 | 2022-11-11 | |
US63/383,381 | 2022-11-11 | ||
US18/501,167 | 2023-11-03 | ||
US18/501,167 US20240161460A1 (en) | 2022-11-11 | 2023-11-03 | Self-supervised point cloud ordering using machine learning models |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024102628A1 true WO2024102628A1 (fr) | 2024-05-16 |
Family
ID=89164408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/078768 WO2024102628A1 (fr) | 2022-11-11 | 2023-11-06 | Ordonnancement de nuage de points auto-supervisé à l'aide de modèles d'apprentissage automatique |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024102628A1 (fr) |
-
2023
- 2023-11-06 WO PCT/US2023/078768 patent/WO2024102628A1/fr unknown
Non-Patent Citations (4)
Title |
---|
LU YUHENG ET AL: "Directed Mix Contrast for Lidar Point Cloud Segmentation", 2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), IEEE, 18 July 2022 (2022-07-18), pages 1 - 6, XP034175663, DOI: 10.1109/ICME52920.2022.9859891 * |
METZER GAL GAL METZER@GMAIL COM ET AL: "Self-Sampling for Neural Point Cloud Consolidation", ACM TRANSACTIONS ON GRAPHICS, ACM, NY, US, vol. 40, no. 5, 24 September 2021 (2021-09-24), pages 1 - 14, XP058683720, ISSN: 0730-0301, DOI: 10.1145/3470645 * |
NICO ENGEL ET AL: "Point Transformer", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 14 October 2021 (2021-10-14), XP091064647, DOI: 10.1109/ACCESS.2021.3116304 * |
TAO AN ET AL: "SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation", IEEE TRANSACTIONS ON IMAGE PROCESSING, IEEE, USA, vol. 31, 18 July 2022 (2022-07-18), pages 4952 - 4965, XP011915016, ISSN: 1057-7149, [retrieved on 20220719], DOI: 10.1109/TIP.2022.3190709 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11610115B2 (en) | Learning to generate synthetic datasets for training neural networks | |
US11375176B2 (en) | Few-shot viewpoint estimation | |
US20210383225A1 (en) | Self-supervised representation learning using bootstrapped latent representations | |
US11544498B2 (en) | Training neural networks using consistency measures | |
US20220188636A1 (en) | Meta pseudo-labels | |
CN113762327B (zh) | 机器学习方法、机器学习系统以及非暂态电脑可读取媒体 | |
CN111223128A (zh) | 目标跟踪方法、装置、设备及存储介质 | |
US20220237890A1 (en) | Method and apparatus with neural network training | |
US20230154005A1 (en) | Panoptic segmentation with panoptic, instance, and semantic relations | |
CN116997939A (zh) | 使用专家混合来处理图像 | |
US20240161460A1 (en) | Self-supervised point cloud ordering using machine learning models | |
US11961249B2 (en) | Generating stereo-based dense depth images | |
WO2024102628A1 (fr) | Ordonnancement de nuage de points auto-supervisé à l'aide de modèles d'apprentissage automatique | |
US11669745B2 (en) | Proposal learning for semi-supervised object detection | |
Kalirajan et al. | Deep learning for moving object detection and tracking | |
US20230004812A1 (en) | Hierarchical supervised training for neural networks | |
US20240221166A1 (en) | Point-level supervision for video instance segmentation | |
US20240249538A1 (en) | Long-range 3d object detection using 2d bounding boxes | |
US20240127075A1 (en) | Synthetic dataset generator | |
US20240311622A1 (en) | Selectable data-aware activation functions in neural networks | |
Saastad | Towards creating a map layer of road intersections by information extraction from Mapillary images | |
US20210248426A1 (en) | Learning device, learning method, and computer program product | |
do Nascimento et al. | Development of a Convolutional Neural Network for Classification of Type of Vessels | |
WO2023169696A1 (fr) | Entraînement de réseaux neuronaux de découverte d'objets et de réseaux neuronaux de représentation de caractéristiques en utilisant un apprentissage auto-supervisé | |
EP4434002A1 (fr) | Segmentation panoptique avec relations panoptiques, d'instances et sémantiques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23821428 Country of ref document: EP Kind code of ref document: A1 |