CN114972758A

CN114972758A - Instance segmentation method based on point cloud weak supervision

Info

Publication number: CN114972758A
Application number: CN202210629786.8A
Authority: CN
Inventors: 李怡康; 石博天; 李想
Original assignee: Shanghai AI Innovation Center
Current assignee: Shanghai AI Innovation Center
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-08-30

Abstract

The invention relates to an example segmentation method based on point cloud weak supervision, which comprises the following steps: projecting the laser radar point cloud to an image plane to form a projection point cloud; purifying the projection point cloud, and removing overlapped points caused by parallax between the laser radar and the camera to obtain purified point cloud; distributing foreground labels/background labels to the points in the purified point cloud; and training a divider by taking the purified point cloud with the foreground label/the background label as a supervision signal, and performing example division and prediction mask by using the divider.

Description

Instance segmentation method based on point cloud weak supervision

Technical Field

The invention relates to the field of artificial intelligence, in particular to a point cloud weak supervision-based instance segmentation method.

Background

In recent years, the importance of the automatic driving system has been increased in academic and industrial fields. The existing image Instance segmentation technology generally trains a deep learning model by using a supervised learning method, so that an Instance Mask (Instance Mask) representing an Instance segmentation result of a picture can be generated in inference. High-quality example segmentation techniques can provide significant assistance to an automatic driving system, for example, some algorithms use example segmentation results to fuse lidar and image data, thereby improving the performance of cross-modal three-dimensional target detection.

However, in order to implement supervised learning, the cost of performing instance segmentation labeling on the training data set is extremely high, and especially in an automatic driving scene, a large number of instances of people, vehicles, non-motor vehicles, other obstacles and the like are generally contained in the image. The desire to perform high quality labeling on millions of training samples often requires a significant amount of human and material effort. Existing supervised example segmentation methods, such as Mask R-CNN and CondInst, all rely heavily on the quality and quantity of manual labeling, making it difficult for these methods to utilize larger-scale data.

In order to reduce the cost, weak supervision example segmentation methods such as the BoxInst method and the PointSup method have been developed. And the weak supervision instance segmentation method utilizes partial manual labeling, although the cost is low, additional manpower and time are still required to be arranged for manual labeling, and the performance of the weak supervision instance segmentation method is low. Therefore, there is a need to further investigate example segmentation techniques that are cost effective and effective.

Disclosure of Invention

The invention aims to provide a point cloud weak supervision-based instance segmentation method, which can directly guide the weak supervision training of an instance segmentation model by using a point cloud acquired by a laser radar without carrying out complete mask marking on an image, and further improve the performance of the weak supervision instance segmentation model under the condition of not introducing additional manual marking cost.

In a first aspect of the present invention, to solve the problems existing in the prior art, the present invention provides an example segmentation method based on point cloud weak supervision, including:

projecting the laser radar point cloud to an image plane to form a projection point cloud;

purifying the projection point cloud, and removing overlapped points caused by parallax between the laser radar and the camera to obtain purified point cloud;

distributing foreground labels/background labels to the points in the purified point cloud; and

and training a divider by taking the purified point cloud with the foreground label/the background label as a supervision signal, and performing example division and prediction mask by using the divider.

In one embodiment of the present invention, before the step of mapping the lidar point cloud to the image plane to form a projection point cloud, the method further comprises: inputting an image, and extracting image features through an image feature extractor, wherein the image features are used as input features of a training segmenter; and

labeling a three-dimensional bounding box of the object in the image.

In one embodiment of the invention, a point error loss function and a graph consistency loss function are used to constrain the output of the segmenter when training the segmenter.

In one embodiment of the invention, wherein projecting the lidar point cloud onto an image plane to form a projected point cloud comprises:

the laser radar point cloud is expressed as

By transforming matrices

Projecting the laser radar point cloud from a laser radar coordinate system to a camera coordinate system, and then passing through a camera matrix

Further projecting the image plane to form a projection point cloud, wherein the projection point cloud is:

wherein

The method is a set of points which are projected to an image plane by the laser radar point cloud and are expressed by homogeneous coordinates.

In an embodiment of the present invention, wherein the purifying the projection point cloud, removing the overlapped points caused by the parallax between the laser radar and the camera to obtain the purified point cloud comprises:

each pixel P obtained by projection onto an image plane _2d And the depth truth values of the corresponding laser radar points form a sparse depth map

And

using a two-dimensional sliding window

To traverse the entire sparse depth map, within each window, the projection point cloud is segmented into near points according to relative depth

And a remote point

Wherein the point at which the relative depth exceeds the depth threshold is a distant point

The point at which the relative depth does not exceed the depth threshold is the proximity point

Wherein p (x, y) denotes sliding in two dimensions

A point in the lidar point cloud corresponding to a pixel with (x, y) coordinates, τ _depth Representing a depth threshold with which points at greater distances can be filtered out, d (x, y) representing a depth value corresponding to a pixel with coordinates (x, y), d _min And d _max Respectively represented in two-dimensional sliding windows

A minimum depth value and a maximum depth value within;

by calculating the proximity point

Removing the points with longer distance in the minimum envelope range of the adjacent points as overlapped points to obtain the purified point cloud

Wherein the overlapping point

Comprises the following steps:

wherein x _min ,x _max Is near the point

Minimum and maximum values of (1) on the x-axis, y _min ,y _max Is near the point

With minimum and maximum values on the y-axis.

In one embodiment of the present invention, wherein assigning the points in the refined point cloud with foreground/background labels comprises:

according to the position relation between the purified point cloud and the three-dimensional bounding box, the purified point cloud is processed

Divided into points within a three-dimensional bounding box

And points outside the three-dimensional bounding box

Points within a three-dimensional bounding box

As a positive sample, and assigning a foreground label, will

Assigning a background label as a negative sample at a part of points around the three-dimensional bounding box, wherein the number of the positive samples and the negative samples is s;

and propagating the pseudo labels of the positive sample and the negative sample to the surrounding 8 pixels according to the image feature similarity.

In one embodiment of the present invention, wherein

As a negative example, assigning a background label to a portion of points surrounding the vicinity of the three-dimensional bounding box includes:

firstly, 8 fixed points of the three-dimensional bounding box are projected to an image plane, and then the minimum enveloping rectangle is obtained through calculation

Selecting

Can project partial points falling within the envelope rectangle b as negative samples

In one embodiment of the present invention, wherein propagating the pseudo labels of the positive and negative examples to the surrounding 8 pixels according to image feature similarity comprises:

when the image feature similarity exceeds the similarity threshold, selecting a candidate point p from a positive sample and a negative sample _c Spread over 8 pixels around the image, so that these 8 pixels have candidate points p _c The same category label, wherein the judgment formula of label propagation is:

where l (p) is the pseudo label to which point p is assigned,

is a candidate point p _c The 8 pixels around on the image,

is the extractor of the point p from the image features obtained by pre-training

Of the extracted image features, tau _dense Is a similarity threshold.

In one embodiment of the present invention, wherein in training the segmenter, constraining the output of the segmenter using the point error loss function and the graph consistency loss function comprises:

a point error loss function is constructed by adopting a bilinear interpolation method, wherein the loss between the prediction mask and the pseudo label can be measured through the point error loss function, and the point error loss function is as follows:

where K is the total number of instances in the image, S is all the points with false labels, p _ks Is the s point in the k example, l _ks Then is point p _ks Pseudo label of (L) _point Indicating a loss of point error.

using the refined point cloud

Constructing an undirected graph G ═<V,E>Wherein the refined point cloud

The points in (1) are used as nodes to form a V, E is an edge, whether an edge is formed between two nodes and whether the two nodes have the same pseudo label or not is determined according to the image feature similarity and the three-dimensional geometric feature similarity between the two nodes, wherein the weighted sum of the image feature similarity and the three-dimensional geometric feature similarity is as follows:

W _ij ＝w ₁ S _image (i,j)+w ₂ S _geometry (i,j)，

wherein w ₁ And w ₂ Is the balance weight of the similarity of the image features and the similarity of the three-dimensional geometric features, S _image (i, j) and S _geometry (i, j) respectively represent a node p _i And p _j Similarity of image features and similarity of three-dimensional geometric features between them, W _ij Representing the overall similarity of the graphs; total similarity W of current drawing _ij When the similarity threshold value tau is larger than the similarity threshold value tau, an edge is formed between the two nodes and has the same pseudo label, otherwise, a connecting edge does not exist between the two nodes and the pseudo labels of the two nodes are different;

according to the fact that when a connecting edge exists between nodes, the predicted masks of the divider are close, the output of the divider can be constrained by using a graph consistency loss function:

where N | V | is all nodes in the undirected graph,

and

are respectively a node p _i And node p _j Of the prediction mask, L _consistency Indicating a loss of map consistency.

The invention has at least the following beneficial effects: the point cloud acquired by the laser radar can be directly used for guiding the weak supervision training of the example divider, complete mask marking is not needed to be carried out on the image, the performance of the weak supervision example division model is further improved under the condition of not introducing extra manual marking cost, and the method has the potential of finishing the example divider training by using mass unmarked laser radar data.

Drawings

To further clarify the above and other advantages and features of embodiments of the present invention, a more particular description of embodiments of the invention will be rendered by reference to the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope.

FIG. 1 shows a schematic diagram of a process for example segmentation according to the BoxInst method of the prior art;

FIG. 2 is a schematic diagram illustrating point fetching during example segmentation according to the PointSup method in the prior art;

FIG. 3 illustrates a flow of an example segmentation method based on point cloud unsupervised according to one embodiment of the invention; and

FIG. 4 shows a schematic diagram of a point tag assignment module, according to one embodiment of the invention.

Detailed Description

It should be noted that the components in the figures may be exaggerated and not necessarily to scale for illustrative purposes.

In the present invention, the embodiments are only intended to illustrate the aspects of the present invention, and should not be construed as limiting.

In the present invention, the terms "a" and "an" do not exclude the presence of a plurality of elements, unless otherwise specified.

It is further noted herein that in embodiments of the present invention, only a portion of the components or assemblies may be shown for clarity and simplicity, but those of ordinary skill in the art will appreciate that, given the teachings of the present invention, required components or assemblies may be added as needed in a particular scenario.

It is also noted herein that, within the scope of the present invention, the terms "same", "equal", and the like do not mean that the two values are absolutely equal, but allow some reasonable error, that is, the terms also encompass "substantially the same", "substantially equal".

It should also be noted herein that in the description of the present invention, the terms "central", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In addition, the embodiments of the present invention describe the process steps in a specific order, however, this is only for convenience of distinguishing the steps, and does not limit the order of the steps.

The existing image Instance segmentation technology generally trains a deep learning model by using a supervised learning method, so that a Mask (Instance Mask) representing an Instance segmentation result of a picture can be generated in the process of inference. In an automatic driving scene, a large number of examples of people, vehicles, non-motor vehicles and other obstacles are usually contained in an image, and the cost of supervised learning is extremely high by carrying out example segmentation and annotation on a training data set.

Because supervised learning has a great dependence on training data labeling, the prior art starts from weak supervised learning, performs low-cost weak labeling on original data only, and completes a weak supervised instance segmentation task by using a weak supervised learning technology based on weak labeling. The weak supervised learning technology based on the weak annotation mainly comprises a BoxInst method, a PointSup method and the like.

Fig. 1 shows a process diagram of an example segmentation according to the BoxInst method of the prior art.

As shown in fig. 1, the BoxInst method implements weakly supervised instance segmentation using only the three-dimensional bounding boxes of the respective instance as supervisory signals. And the method can achieve about 90% of the performance of supervised instance segmentation on the public data set. Although the performance of the BoxInst method cannot be compared with the supervised instance segmentation, the method has the capability of utilizing mass data due to extremely low labeling cost.

Specifically, the core of BoxInst builds on one hypothesis: when the bounding box closely surrounds the object, at least one pixel on the three-dimensional bounding box overlaps the object (considered a positive sample), while the outside of the detection box must be independent of the object (considered a negative sample). The method then achieves alignment of the predicted result and the three-dimensional bounding box result by introducing reprojection errors.

The purpose of introducing reprojection errors is to penalize the difference between the projections of the prediction mask and the true bounding box mask in the x-axis and y-axis directions, respectively. Therefore, the projection of the prediction mask as close to the true value bounding box mask as possible after projection is realized. The loss function of the reprojection error can be expressed in particular as:

wherein

Denotes a prediction mask, b denotes a true value bounding box mask,

and

represent the projection of the prediction mask on the x and y axes, respectively, and proj x (b) and proj y (b) represent the projection of the truth bounding box mask on the x and y axes, respectively. L (X, Y) represents the Dice Loss (Dice Loss) between two terms (prediction mask and truth bounding box mask):

in addition, to constrain that the prediction mask does not completely become a three-dimensional bounding box during training, the BoxInst method also utilizes pairwise penalties to constrain the prediction mask. The pairwise loss function is based on the following assumptions: if two pixels are similar in color, their class labels are likely to be the same. The pair-wise loss can then be expressed as:

wherein

The prediction class, P (y), representing a point at coordinates x, y _e 1) represents whether the point at the coordinate x, y position and k, l two points belong to the same category (either foreground or background).

Fig. 2 shows a schematic diagram of point taking during example segmentation according to the prior art PointSup method.

Using only bounding boxes as supervisory signals can only achieve 85% of the performance of supervised learning methods on some public data sets. As shown in fig. 2, the PointSup method is based on box supervision (BoxInst method), and adds several points randomly sampled from the bounding box and manually labeled for classes (labeled as foreground/background) and trains the weakly supervised instance segmentation model by using the points as the weakly supervised signal.

The PointSup method realizes the great improvement of the model performance through the additional labeling with lower cost, and experiments prove that the PointSup method not only far exceeds the performance of the BoxInst method, but also can achieve about 97% of the performance of a supervised learning method.

An autonomous driving system typically carries a LiDAR (LiDAR) to capture point cloud data, which provides a strong instance segmentation monitor as a data that reflects depth truth with high accuracy. In particular, the point cloud can capture the contour of an object of interest, so when the lidar point cloud is projected onto a two-dimensional image, it is naturally possible to provide a surveillance signal at the point level. Furthermore, the three-dimensional geometry features can provide additional information for example segmentation.

The invention provides a method for automatically labeling points in a laser radar point cloud, and the points are projected to a two-dimensional space to serve as sample points for providing supervision signals, so that low-cost weak supervision instance segmentation is realized in an automatic driving scene.

The Point cloud weak supervision-based instance segmentation mainly comprises a Point Label Assignment (Point Label Assignment) module and a Graph-based Consistency Regularization (Graph-based Consistency Regularization) module. The point label distribution module is used for distributing foreground/background labels for the laser radar point cloud through a series of rule means. The graph consistency regularization module further constrains segmentor prediction to generate high quality masks by encoding geometric consistency and morphological consistency simultaneously.

FIG. 3 shows a flow of an example segmentation method based on point cloud weak surveillance according to an embodiment of the present invention.

As shown in fig. 3, the example segmentation method based on point cloud weak supervision includes an image example segmentation branch (upper part) and a point cloud processing branch (lower part). The image instance segmentation branch comprises an existing image-based weak supervision instance segmentation model, and the process is as follows: inputting an image, extracting image features by using an image feature extractor, training a divider by using the image features as input features, and finally predicting the image by using the divider to generate a mask. In the existing image-based weak supervision instance segmentation model, a three-dimensional Bounding Box (Bounding Box) of an object in an image needs to be manually labeled and several points of manually labeled categories in the three-dimensional Bounding Box are used as supervision signals in the training process. The point cloud processing branch aims to replace a part needing manual marking points in the existing image-based weak supervision instance segmentation model by using the point cloud acquired by the laser radar as a supervision signal. Here, a point label assignment module and a graph consistency regularization module are designed. The laser radar can provide additional weak supervision signals through a point label distribution module and a graph consistency regularization module, and finally the signals complete point cloud supervision training of the existing weak supervision instance segmentation model in a point error loss function and graph consistency loss function mode. In the whole process, extra marking is not needed for the point cloud manually, so that extra labor cost is not introduced.

The point label assignment module inputs the lidar point clouds and the three-dimensional bounding box and inputs pseudo labels (pseudo labels) of the points in these point clouds. Specifically, the lidar point cloud is first projected onto the image plane, and then noise points (overlapping points) due to parallax between the lidar and the camera are filtered out (since the lidar position is higher than the camera, part of the lidar point cloud does not have corresponding pixels in the image). A set of rules is then used to assign a foreground or background binary label to each point in the lidar point cloud. And finally, transmitting the point labels to adjacent pixels according to the feature similarity as weight.

In order to utilize the supervision information provided by the laser radar, the inventor designs a point label distribution module to distribute a two-class label to three-dimensional points in the laser radar point cloud so as to train the divider as a label. As shown in fig. 4, the original point cloud (laser radar point cloud) is projected to an image plane to form a projection point cloud, the projection point cloud is purified, a part of overlapped points are deleted to obtain a purified point cloud, and finally, the points in the purified point cloud are assigned with foreground/background labels according to rules.

Point cloud projection

The process of projecting the original point cloud to the image plane to form a projection point cloud is called point cloud projection. A point cloud containing N points in three-dimensional space can be represented as a homogeneous coordinate system (homogenetic coordinate system)

Transformation matrix

Used for projecting the laser radar point cloud from the laser radar coordinate system to the camera coordinate system and then passing through the camera matrix

Further projected to the image plane. Thus, a two-dimensional point set (projection point cloud) of the original point cloud projected to the image plane can be represented as:

wherein

Is a collection of points represented in homogeneous coordinates after the original point cloud is projected onto the image plane.

Depth directed point purification

The projection point cloud is purified, and the process of deleting partial overlapped points is called depth-guided point purification. In many autopilot systems, the lidar is located on the roof of the vehicle and the camera is mounted in front of the vehicle or in the windshield, which results in parallax between the two sensors. Parallax causes some foreground pixels, projected to the image plane, not necessarily foreground points in the three-dimensional point cloud space. Here, these overlapping points are eliminated using a depth-directed point purification method based on the assumption that the depth change of the surface points of the same object should not be abrupt.

Specifically, each pixel P projected onto the image plane is first projected ₂ d and the depth (z-axis) true value of the corresponding three-dimensional point (laser radar point) form a sparse depth map

The figure is a sparse image, if there is a corresponding three-dimensional point (lidar point) at a coordinate position, the value of that position is the pixel P projected onto the image plane ₂ d, if there is no corresponding three-dimensional point at the position, the value of the position is 0. Then using a two-dimensional sliding window

To traverse the entire sparse depth map. Within each window, the projection point cloud is segmented into near points according to relative depth

And a remote point

Two sets.

Wherein p (x, y) represents a point in the point cloud corresponding to a pixel with coordinates (x, y) in the two-dimensional sliding window w; tau is _depth Representing a depth threshold, wherein all points with relatively far distances can be filtered by adopting the depth threshold; d (x, y) represents a depth value corresponding to a pixel with coordinates (x, y), and p (x, y) represents a three-dimensional point corresponding to the pixel position when d (x, y) ≠ 0; d is a radical of _min And d _max Respectively represented in two-dimensional sliding window

Minimum and maximum withinA depth value.

Similarly, all points whose relative depth exceeds the depth threshold are distant points

However, not all distant points are overlapping points, so the near point is calculated

The points far away from the minimum envelope range of the adjacent points are taken as the overlapping points to be filtered out:

wherein x _min ,x _max ,y _min ,y _max Is all points of proximity

With minimum and maximum values on the x and y axes. This is because the depth of the foreground points should not change drastically over a small two-dimensional sliding window. When a point having a larger depth value is surrounded by points having smaller depth values, the point has a high probability of being an overlapped point. The purified point cloud after removing the overlapped points can be represented as:

wherein the content of the first and second substances,

the points of overlap are represented as such,

the point of proximity is represented by a point of proximity,

the remote point is represented by a line of sight,

representing the refined point cloud.

Label distribution

After the projection point cloud is purified, the points in the remaining purified point cloud are distributed as positive and negative samples (foreground/background), namely label distribution.

Firstly, the purified point cloud can be obtained according to the position relation between the purified point cloud and the three-dimensional bounding boxes of all the examples

Two subsets are divided:

points represented within the three-dimensional bounding box,

representing points outside the three-dimensional bounding box. All points appearing in the three-dimensional bounding box

As a positive sample, a foreground label is assigned. Generally, only a small fraction of the points belong to

Can be taken as a positive sample, while most points belong to points not within the three-dimensional bounding box

To reduce the amount of computation, it is usual to use only

A part of points around the three-dimensional bounding box is used as negative samples to participate in training. The specific sampling method is as follows: firstly, 8 fixed points of the three-dimensional bounding box (8 vertexes of the three-dimensional bounding box) are projected to an image plane, and then the minimum enveloping rectangle is obtained through calculation

The enveloping rectangle is represented as a relaxed two-dimensional bounding box (since the projection of a three-dimensional bounding box onto a two-dimensional plane typically does not completely overlap the instances in the image and is slightly larger in size). Final selection

The partial point of the envelope rectangle b can be projected as the negative sample after sampling, and is recorded as

In particular to

The label allocation strategy for each candidate point is as follows:

where 1 indicates that a point is assigned a positive sample, 0 indicates that a point is assigned a negative sample, and-1 indicates that a point is ignored. For parallel acceleration in the training process, the sum of the number of positive samples and negative samples is determined to be s, and the sum of the number of positive samples and the number of negative samples is determined to be s, and the positive samples and the negative samples are taken at a certain positive sampling rate and a certain negative sampling rate

And

the s points are co-sampled. If it is not

The deletions will be supplemented from the purified point cloud by gaussian distributed random sampling, otherwise from

S points are sampled. Finally obtaining s points and theirA pseudo tag.

Label propagation

The points reserved by label distribution are very sparse after being projected to the image, so that the pseudo labels of the s points are further spread to the surrounding pixels according to the image feature similarity (label spreading), and dense supervision signals are provided. Selecting a candidate point p from s positive and negative samples _c Then judging whether to use the candidate point p according to the image feature similarity _c The pseudo label of (a) is propagated to 8 pixels around the image, traversing s positive and negative samples. The judgment formula of the label propagation is as follows:

where l (p) is the pseudo label to which point p is assigned,

is a candidate point p _c The 8 pixels around on the image,

Of the extracted image features, tau _dense Is the similarity threshold. When the similarity of the image features exceeds the similarity threshold, the label of the candidate point px is spread to 8 pixels around the image, so that the 8 pixels have the same similarity with the candidate point p _c The same class label. Otherwise, the label is not transmitted, because when the image feature similarity of two points is low, whether the two points belong to the same category cannot be judged. Finally, a dense point set with a pseudo label is obtained after label propagation.

Laser radar point loss

For example partitioning methods using masks, the masks output by the (partitioner) may be expressed as

Where h and w are the resolution of the slicer output mask. Predicting the position of point p by using bilinear interpolation method

A point-based binary cross entropy loss function (called a point error loss function) is constructed by adopting a bilinear interpolation method:

where K is the total number of instances in the image, S is all points with false labels, p _ks Is the s point in the k example, l _ks Then is point p _ks Pseudo label of (L) _point Indicating a loss of point error). Due to prediction mask)

Is obtained by interpolating the pixels immediately surrounding the point p, the loss function can not only be optimized at the current point, but also propagate the error back to the pixels adjacent to the point. An example segmentation mask with a higher edge sharpness can be obtained by a point-based binary cross entropy loss function.

Although the aforementioned point tag assignment module is capable of providing refined pseudo tags, inaccurate tags may still exist because of the following two points: (1) systematic errors due to calibration inaccuracies. For example, at the edges of some objects, the points acquired by the lidar may be projected to the background in the two-dimensional image plane. (2) Due to the low reflectivity and high transmissivity of materials such as vehicle windshields, the laser radar beam may pass through the glass and detect the background. This may result in the point label assignment module assigning false labels to these false points. To solve this problem, a graph consistency regularization module is designed. The graph consistency regularization module constrains the segmenter to generate reasonable prediction masks by exploring similarity relationships between various adjacent spatial points. The module firstly takes each point in the point cloud as a node, and takes the weighted sum of the similarity of the three-dimensional geometric features and the similarity of the image features as an edge to construct an undirected graph. This graph-based similarity can regularize the prediction mask of the instance partitions. The graph consistency regular module supervises the training process of the segmenter through a graph consistency loss function, so that the performance of the segmenter is improved. The graph consistency regularization module includes two parts: a graph establishing method based on similarity and consistency regularization.

Image establishing method based on similarity

Giving a set of points obtained according to the point label distribution module

Construct an undirected graph G ═<V,E>. Wherein V is represented by

The edge E needs to be measured according to the image feature similarity and the three-dimensional geometric feature similarity between two nodes:

W _ij ＝w ₁ S _image (i,j)+w ₂ S _geometry (i,j)，

wherein w ₁ And w ₂ Is the balance weight of the similarity of the image features and the similarity of the three-dimensional geometric features, S _image (i, j) and S _geometry (i, j) respectively represent a node p _i And p _j Similarity of image features and similarity of three-dimensional geometric features between them, W _ij Representing the overall similarity of the graphs.

In order to construct the image feature similarity, first, a feature map of an image extracted by a convolutional neural network model (image feature extractor) is used

Point image features are then obtained using bilinear interpolation

Final two nodes p _i And p _j Image characteristic phase betweenThe similarity can be expressed as:

S _image (i,j)＝f(p _i ) ^T f(p _j )，

and for the similarity of the three-dimensional geometric features, each point set on the image is considered

Three-dimensional point P of pixel in original laser radar system _3d And calculating the similarity of the three-dimensional geometrical characteristics between the three-dimensional points:

wherein m is a normalization constant, | | ₂ Is a number of 2-norm,

is a three-dimensional point P _3d The point (i) of (a) is,

is a three-dimensional point P _3d Point j in (d). Then the S is _image (i, j) and S _geometry (i, j) are combined together by weight to form the weight of the connecting edge between two points.

Consistency regularization

In weakly supervised learning, the assumption of consistency priors is that points in the same structure (usually referring to the same cluster or manifold) are more likely to have similar labels. Total similarity W of current drawing _ij Larger, the two points will be more similar and they should then have the same label. A similarity threshold τ is defined to determine whether an edge is formed between two points.

Wherein e _ij E, when two nodes p _i And p _i Overall graph similarity W between _ij Greater than phaseIf the similarity threshold value tau is equal, a connecting edge exists between the two points, namely the node p _i And p _j The pseudo label of (b) is the same, otherwise, the node p _i And p _j There is no connecting edge between them, and the pseudo labels are different. Similar to the point label assignment module, the undirected graph G is utilized here to constrain the segmenter to predict similar label consistency.

The consistency rule is specifically expressed as: when edge e _ij When 1, the mask of the segmenter prediction

And

should be as close as possible. This loss can be defined by a binary cross entropy loss function (graph consistency loss):

where N | V | is all nodes in the undirected graph,

and

are respectively a node p _i And node p _j Of the prediction mask, L _consistency Indicating a loss of map consistency. The formula shows that when two nodes p _i And p _j When there is no connecting edge (i.e. e) _ij 0) and when there is a connection between two points, the two points are predicted to be as similar as possible.

Finally, the two loss functions are merged together as a total loss function to supervise the training of the segmenter:

L＝L _point +L _consistency ，

by the method, the laser radar point cloud is used as the supervision signal to restrict the prediction of the divider under the condition that no additional marking is carried out on the point cloud, the prediction performance of the divider can be improved, and the weak supervision instance division task is completed.

The point cloud weak supervision-based example segmentation method is added to a PointSup method and a BoxInst method for experimental verification, and the experimental results are shown in Table 1. The results of the example segmentation were evaluated using standard evaluation metrics for example segmentation, including Average Precision (AP), Average Precision when the merge ratio threshold was 50% (AP) ₅₀ ) Average Accuracy (AP) when the cross-over ratio threshold is 75% ₇₅ ) Average Accuracy (AP) of small-sized objects _s ) Average Accuracy (AP) of medium-sized objects _m ) Average Accuracy (AP) of large-size object _l )。AP、AP ₅₀ 、AP ₇₅ 、AP _s 、AP _m And AP _l The larger the value of (c), the better the performance of the segmenter (model) is represented. Performance verification is performed on some own annotated data set. The method comprises the steps of adding laser radar point cloud serving as a supervision signal to training of a divider of the existing weak supervision method, and predicting the mask of example division by using the divider after training.

Table 1 compares the experimental results of example segmentation methods based on point cloud weak surveillance, the example segmentation results of existing semi-surveillance methods, and the example segmentation results of supervised methods.

Compared with the existing weak supervision method, the example segmentation method based on the point cloud weak supervision does not introduce extra manual labeling cost at all, and can be used as a supplement of other weak supervision methods. The method is superimposed on the existing method, so that the overall performance of the weak supervised instance segmenter can be further improved, the performance close to that of a supervised learning method can be achieved with extremely low cost, and the method has the potential of utilizing mass data to train the segmenter in a large scale.

Although some embodiments of the present invention have been described herein, those skilled in the art will appreciate that they have been presented by way of example only. Numerous variations, substitutions and modifications will occur to those skilled in the art in light of the teachings of the present invention without departing from the scope thereof. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

1. An example segmentation method based on point cloud weak supervision comprises the following steps:

2. The point cloud unsupervised instance segmentation method of claim 1, further comprising, prior to the step of mapping the lidar point cloud to the image plane to form a projected point cloud: the method comprises the steps of inputting an image, and extracting image features through an image feature extractor, wherein the image features serve as input features of a training segmenter; and

labeling a three-dimensional bounding box of the object in the image.

3. The point cloud weak supervision-based instance segmentation method according to claim 2, wherein a point error loss function and a graph consistency loss function are used to constrain the output of the segmenter when training the segmenter.

4. The point cloud weakly supervised based instance segmentation method of claim 3, wherein projecting the lidar point cloud to an image plane to form a projected point cloud comprises:

the laser radar point cloud is expressed as

By transforming matrices

Projecting the laser radar point cloud from a laser radar coordinate system to a camera coordinate system and then passing through a camera matrix

wherein

5. The point cloud weak surveillance-based instance segmentation method of claim 3, wherein the refining the projection point cloud, and removing overlapped points caused by parallax between the lidar and the camera to obtain a refined point cloud comprises:

And

using a two-dimensional sliding window

And a remote site

Wherein p (x, y) denotes sliding in two dimensions

A minimum depth value and a maximum depth value within;

by calculating the proximity point

Removing the points far away from the minimum envelope range of the adjacent points as overlapped points to obtain the purified point cloud

Wherein the overlapping point

Comprises the following steps:

wherein x _min ,x _max Is near the point

Of the minimum and maximum values on the y-axis.

6. The point cloud weak surveillance-based instance segmentation method of claim 5, wherein assigning foreground/background labels to points in the refined point cloud comprises:

Divided into points within a three-dimensional bounding box

And points outside the three-dimensional bounding box

Points within a three-dimensional bounding box

As a positive sample, and assigning a foreground label, will

A part of points around the three-dimensional bounding box is used as negative samples, and background labels are distributed, wherein the number of the positive samples and the negative samples is s;

7. The point cloud weakly supervised based instance segmentation method of claim 6, wherein

firstly, 8 fixed points of the three-dimensional bounding box are projected to an image plane, and then a minimum enveloping rectangle is obtained through calculation

Selecting

Can project part of points falling within the envelope rectangle b as negative samples

8. The point cloud weak surveillance-based instance segmentation method of claim 6, wherein propagating the pseudo labels of the positive and negative examples onto the surrounding 8 pixels according to image feature similarity comprises:

when the image feature similarity exceeds a similarity threshold, selecting a candidate point p from a positive sample and a negative sample _c Spread over 8 pixels around the image, so that these 8 pixels have candidate points p _c The same category label, wherein the judgment formula of label propagation is:

where l (p) is the pseudo label to which point p is assigned,

is a candidate point p _c The 8 pixels around on the image,

Of the extracted image features, tau _dense Is the similarity threshold.

9. The point cloud weak supervision-based instance segmentation method of claim 6, wherein in training the segmenter, constraining the output of the segmenter using a point error loss function and a graph consistency loss function comprises:

where K is the total number of instances in the image, S is all points with false labels, p _ks Is the s point in the k example, l _ks Then is point p _ks Pseudo label of L _point Indicating a loss of point error.

10. The point cloud weak supervision-based instance segmentation method of claim 9, wherein in training the segmenter, employing the point error loss function and the graph consistency loss function to constrain the output of the segmenter comprises:

using the purified point cloud

Constructing an undirected graph G ═<V,E>Wherein the refined point cloud

W _ij ＝w ₁ S _image (i,j)+w ₂ S _geometry (i,j)，

wherein w ₁ And w ₂ Is the balance weight of the similarity of the image features and the similarity of the three-dimensional geometric features, S _image (i, j) and S _geometry (i, j) respectively represent nodes p _i And p _j Similarity of image features and similarity of three-dimensional geometric features between them, W _ij Representing the overall similarity of the graphs; total similarity W of current drawing _ij When the similarity threshold value tau is larger than the similarity threshold value tau, an edge is formed between the two nodes and has the same pseudo label, otherwise, a connecting edge does not exist between the two nodes and the pseudo labels of the two nodes are different;

where N | V | is all nodes in the undirected graph,

and