CN117649515A

CN117649515A - Digital twinning-based semi-supervised 3D target detection method, system and equipment

Info

Publication number: CN117649515A
Application number: CN202311546436.6A
Authority: CN
Inventors: 张天柱; 杨文飞; 潘晓扬; 张哲�; 王诗良; 吴枫
Original assignee: Deep Space Exploration Laboratory Tiandu Laboratory
Current assignee: Deep Space Exploration Laboratory Tiandu Laboratory
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-03-05

Abstract

The invention discloses a semi-supervised 3D target detection method, a system and equipment based on digital twinning, which belong to the field of computer vision and comprise the steps of receiving point clouds, respectively inputting the point clouds into a preset teacher model and a preset student model for preprocessing, and respectively obtaining an uncertain prediction result of the teacher module and an uncertain prediction result of the student module; performing pseudo tag screening and weight distribution on the uncertain prediction result of the teacher module obtained through pretreatment; and (3) supervising the unlabeled data of the student model by using the pseudo-labels and the weight scores, supervising the labeled data of the student model by using the group-Truth, and guiding the uncertain prediction result of the teacher module by using the NMS (network system) guided by IoU to obtain the final detection result. The invention finally helps the model to accurately and efficiently position the object instance and identify the object category by designing the uncertainty estimation method based on face perception and the pseudo tag screening strategy.

Description

Digital twinning-based semi-supervised 3D target detection method, system and equipment

Technical Field

The invention relates to the field of computer vision, in particular to a semi-supervised 3D target detection method, system and equipment based on digital twinning.

Background

3D target detection is a basic task for 3D scene understanding, and aims to predict semantic tags and spatial positioning frames of each object in a point cloud scene. With the popularity of AR/VR, 3D indoor scanning, and autopilot, 3D object detection has become a key technology to facilitate scene understanding. Over the past several decades, a number of fully supervised 3D object detection methods have been proposed, with significant progress in this field. However, these methods rely on a large amount of carefully annotated 3D scene data, which is an expensive and time-consuming collection. In order to reduce the high annotation cost of fully supervised three dimensional object detection methods, semi-supervised methods combining small amounts of marked data and large amounts of unmarked data for model training have gained increasing attention.

Currently, semi-supervised 3D target detection methods are broadly divided into two categories: a consistency constraint-based method and a pseudo tag-based method. The core idea of the method based on consistency constraints is to encourage consistent predictions for data with different data enhancements. Specifically, the data are respectively input into the teacher model and the student model through data enhancement, prediction results are respectively given to the data with different data enhancement, and the consistency loss is used for constraint. The pseudo tag based approach aims at selecting high quality pseudo tags from model predictions of unlabeled data, which are then used in combination with labeled data for model training. The pseudo tag based approach uses global metric scores (IoU, classification confidence, voting scores, etc.) to select pseudo tags, however, pseudo tags with higher global metric scores may not cover each face of an object well, while pseudo tags with lower global metric scores may provide correct predictions for certain faces, thus a semi-supervised 3D object detection approach based on digital twinning is proposed.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a semi-supervised 3D target detection method based on digital twinning, which optimizes three processes of pseudo tag prediction, quality evaluation and screening and realizes a high-precision 3D target detection result.

The aim of the invention can be achieved by the following technical scheme:

in a first aspect, the present application proposes a digital twinning-based semi-supervised 3D target detection method, including:

receiving point clouds, respectively inputting the point clouds into a preset teacher model and a preset student model for preprocessing, and respectively obtaining an uncertain prediction result of the teacher module and an uncertain prediction result of the student module, wherein the preprocessing comprises the following steps:

inputting point cloud, extracting network PointNet through point cloud characteristics, and obtaining candidate points P through translation operation and voting mechanism _prop Spatial location and characteristics of (a);

based on the candidate point P _prop The characteristics and the space position are predicted by using a multi-layer perceptron to predict the category of the object in the space and the space distribution of each surface of the boundary frame;

obtaining geometrical features of the surface by using a surface-based pooling operation in combination with the spatial positions of the predicted bounding boxes;

splicing the geometric features and the discrete probability distribution features of the surface together, inputting the geometric features and the discrete probability distribution features into a multi-layer perceptron, and outputting an uncertainty prediction result;

performing pseudo tag screening and weight distribution on the uncertain prediction result of the teacher module obtained through pretreatment;

monitoring unlabeled data of the student model by using the pseudo tags and the weight scores, and monitoring labeled data of the student model by using the group-Truth;

the indeterminate prediction results of the teacher module were obtained using IoU guided NMS to obtain the final detection results.

In some embodiments, the input point cloud acquires the spatial position and the characteristics of the candidate points through the point cloud characteristics extraction network PointNet, the translation operation and the voting mechanism;

wherein the input point cloud obtains a seed point P through furthest point sampling _seed Seed point P _seed The characteristics of surrounding points are aggregated by using k nearest neighbor and a multi-layer perceptron, and the characteristics are repeated twice as an input point cloud of the next stage;

using the last output seed point P _seed To predict the probability that it is a foreground object point and its distance from the object center, and to translate its spatial position to the object center. Seed point P after translation _seed Called candidate point P _prop 。

In some embodiments, the using a multi-layer perceptron to predict the spatial distribution of each face of the bounding box and the class of objects in space includes the steps of:

we have devised a side-aware parameterization method to represent bounding boxes. Specifically, given a candidate point P _prop Instead of predicting the offset to the center point and the size of the object, we predict the position and the feature from the candidate point P _prop Distance to each side. Based on the observation that the predicted probability distribution can measure uncertainty, we modify the bounding box parameterization from deterministic to probabilistic. We compare each face with the candidate point P _prop The distance of the position is divided into discrete grids in space, the probability that the front face falls in the grids is predicted, and the spatial position of the face is obtained through the expectations of distribution, and the formula is as follows:

wherein s is _i Is that each lattice representsP is the predicted probability,is the spatial position of the predicted face;

in some embodiments, the face-based pooling operation yields geometric features of the face, including the steps of:

the face-based pooling operation requires selection of a face point P _side A facet is a virtual grid point that contains a particular face of an object. More specifically, for the front of the object, we divide the width and height of the object into segments D. Subsequently, we generate 2×d×d grid points in front of and behind the front of the object. For each bin we find its k nearest neighbors and from the seed bin P _seed Distance weighted interpolation of feature propagation to a surface point P _side . Then we will all the points P _side Inputting into a point cloud feature extraction network PointNet to obtain geometrical feature F _geo 。

In some embodiments, the geometric features and the discrete probability distribution features of the surface are spliced together, input into a multi-layer perceptron, and output uncertainty prediction results, comprising the steps of:

the discrete probability distribution feature is the statistic of the discrete distribution of the face, and we choose the average value of k values with the largest probability, the variance of the distribution and all probability values of the discrete distribution as the distribution feature Fx of the face _ist ；

Distribution characteristics F of the dough _dist Geometric features of dough F _geo Spliced together;

then the fused characteristics are input into a multi-layer perceptron, and the uncertainty measure u of the output surface of the Sigmoid activation layer is passed _s The formula is as follows:

_u s＝Sigmoid(MLP(Cat(F _geo ,F _dist )))

the uncertainty measure of the surface can be obtained by using the calculation method, and in order to guide the training of the uncertainty estimation module, we use the direct absolute error of the predicted surface and the real surface as the tag of the uncertaintyConstraint is carried out, and a label calculation formula is as follows:

wherein y is _s Is the spatial position of the real face,is the spatial position of the predicted surface, MIN is the minimum value, alpha is the scale-up and scale-down coefficient, and the invention is set to be 4; furthermore, the present invention uses absolute errors of predicted and real faces +.>And predicted uncertainty u _s Mean square error between as uncertainty regression loss Lx _ncer ：

Where U is the uncertainty estimate u= { U for each side of the object _s |∈B}。

In some embodiments, the pseudo tag screening and weight allocation includes the steps of:

and the pseudo tag screening and weight distribution module is used for: the module consists of three parts: class specific filters, ioU guided low half NMS policies and face aware weight allocation; the category-specific filter uses category confidence, foreground confidence, and IoU predictors to filter out low quality false labels; ioU guided low half NMS strategy the low half NMS guided by IoU discarded only half of the predicted IoU lower prediction results; the face perception weight distribution is carried out by using the uncertainty of the face as each face of the pseudo tag, and the weight distribution formula is as follows:

wherein q is _s Is the quality score (weight))，α ₂ Is the scale factor, and finally we weight the loss using the quality score:

q _B is q _s The average value of (2) reflects the global positioning quality of the bounding box. In this way we reduce the disturbance of the less localized faces in the model training.

In some embodiments, the monitoring of unlabeled data of a student model using pseudo tags and weight scores and monitoring of labeled data of a student model using group-Truth includes the steps of:

since we decouple the bounding box into separate regression problems for each face, the general bounding box regression loss does not work well with the localization problem presented by the present invention. We therefore use the IoU loss of rotation and the smooth L1 loss based on face awareness in the face aware network:

plane-based smoothing L1 loss L _reg (s) focus on the local positioning of each face, facilitating subsequent uncertainty predictions. Loss of L of IoU of rotation _IoU (B) Focusing on the global localization of the bounding box, it is robust to changes in shape and dimensions.

For unlabeled data, we use the pseudo-labels as supervisory signals, weighted with quality scores:

in some embodiments, the NMS guided by IoU for the output of the teacher model in the reasoning stage comprises the following steps:

the detection model is output as a bounding box of the object, the category of the object, uncertainty of each face of the object and IoU predicted value of the object; according to the invention, the IoU predicted value of the object is used as an additional reference value of the non-maximum suppression algorithm, namely, the IoU predicted value of the object and the class confidence of the object are multiplied to be used as the total confidence of the object and are input into the non-maximum suppression algorithm, so that repeated predicted results with lower quality are filtered out, and a high-precision detection result is obtained.

In a second aspect, the present application also proposes a semi-supervised 3D object detection system based on face uncertainty estimation, comprising;

the preprocessing module is used for receiving the point cloud, inputting the point cloud into a preset teacher model and a preset student model for preprocessing, and respectively obtaining an uncertain prediction result of the teacher module and an uncertain prediction result of the student module; the preselection module comprises a candidate point feature extraction module, a spatial distribution prediction module of the surface, a geometric feature acquisition module of the surface and a discrete distribution prediction module of the surface;

the candidate point feature extraction module is used for inputting point clouds and obtaining candidate points P through point cloud feature extraction network PointNet, translation operation and voting mechanism _prop Spatial location and characteristics of (a);

a spatial distribution prediction module of the surface for predicting the candidate point P _prop The characteristics and the space position are predicted by using a multi-layer perceptron to predict the category of the object in the space and the space distribution of each surface of the boundary frame;

a surface geometric feature acquisition module, configured to obtain surface geometric features using a surface-based pooling operation in combination with the spatial positions of the predicted bounding boxes;

the surface discrete distribution prediction module is used for splicing the geometric characteristics and the discrete probability distribution characteristics of the surface together, inputting the geometric characteristics and the discrete probability distribution characteristics into the multi-layer perceptron, and outputting an uncertainty prediction result;

the uncertainty prediction module is used for pseudo tag screening and weight distribution, and is used for performing pseudo tag screening and weight distribution on the uncertainty prediction result of the teacher module obtained through pretreatment;

and the output module is used for supervising the unlabeled data of the student model by using the pseudo labels and the weight scores, and supervising the labeled data of the student model by using the group-Truth.

In a third aspect, the present application further proposes a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the memory stores the computer program capable of running on the processor, and when the processor loads and executes the computer program, a digital twinning-based semi-supervised 3D object detection method is adopted as described above.

In a fourth aspect, the present application further proposes a computer readable storage medium, in which a computer program is stored, which when loaded and executed by a processor, employs a digital twinning-based semi-supervised 3D object detection method as described above.

The invention has the beneficial effects that:

the patent provides a semi-supervised 3D target detection method, system and equipment based on digital twinning. In order to solve the problems of poor quality of pseudo labels generated by a teacher model and adverse model training, a prediction module and an uncertainty prediction module of discrete distribution of one surface are designed to ensure that the positioning problem of an object boundary box can be decoupled into the positioning problem of a plurality of surfaces, and the prediction reliability evaluation of each surface can be given. In addition, a pseudo tag screening and weight distribution module is designed to inhibit interference of false positioning of the face in the pseudo tag on a model training process so as to obtain better target detection performance. Through the design of this patent, the unreasonable and insufficient problem of effective information utilization of dataset of pseudo-label selection in the 3D target has been solved to a great extent, and accurate, the efficient detection object instance of model is finally helped, discernment object class.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a block diagram of a digital twinning-based semi-supervised 3D object detection system of the present application;

FIG. 2 is a flow chart of face aware pooling operations and uncertainty prediction in an embodiment of the present application;

fig. 3 is a flow chart of a semi-supervised 3D object detection method based on digital twinning.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Referring to fig. 2-3, the present application proposes a digital twinning-based semi-supervised 3D target detection method, which includes:

we have devised a side-aware parameterization method to represent bounding boxes. Specifically, given a candidate point P _prop Instead of predicting the offset to the center point and the size of the object, we predict the position and the feature from the candidate point P _prop Distance to each side. Based on the observation that the predicted probability distribution can measure uncertainty, we modify the bounding box parameterization from deterministic to probabilistic. We compare each face with the candidate point P _prop The distance of the position is divided into discrete grids in space, the probability that the front face falls in the grids is predicted, and the blank of the face is obtained through the distribution expectationThe formula of the inter-position is as follows:

wherein s is _i Is the distance value represented by each bin, P is the predicted probability,is the spatial position of the predicted face;

the discrete probability distribution feature is the statistic of the discrete distribution of the face, and we choose the average value of k values with the largest probability, the variance of the distribution and all probability values of the discrete distribution as the distribution feature F of the face _dist ；

u _s ＝Sigmoid(MLP(Cat(F _geo ,F _dist )))

wherein y is _s Is the spatial position of the real face,is the spatial position of the predicted surface, MIN is the minimum value, alpha is the scale-up and scale-down coefficient, and the invention is set to be 4; furthermore, the present invention uses absolute errors of predicted and real faces +.>And predicted uncertainty u _s Mean square error between as uncertainty regression loss L _uncer ：

wherein q is _s Is the mass fraction (weight) of the face, α ₂ Is the scale factor, and finally we weight the loss using the quality score:

The embodiment of the application discloses a semi-supervised 3D target detection system based on surface uncertainty estimation, wherein a 3D target is a semantic label and a boundary box of each object in a system input point cloud scene, and the system comprises; the device comprises a candidate point feature extraction module, a discrete distribution prediction module of a face, an uncertainty prediction module and a pseudo tag screening and weight distribution module.

Yet another embodiment of the present invention provides a semi-supervised 3D target detection system based on face uncertainty estimation, comprising;

The overall execution flow of the system is shown in fig. 1, assuming that the input point cloud has N points, each containing location (x, y, z) information. First, we use a candidate point feature extraction module based on point cloud feature extraction network PointNet to encode the position information to obtain candidate point feature F _prop And position P _prop . Next, we will get the candidate point P obtained by feature aggregation _prop Input to the discrete distribution prediction module of the surface, we can obtain the probability distribution of 6 surfaces of the object predicted by each candidate point, the orientation angle of the object and the object class. Then, we input the distribution of all the surfaces of the object and the geometric features obtained by the surface-based pooling operation into an uncertainty prediction module, and we can obtain the uncertainty measure of each surface of the object. Further, we use the pseudo tag screening and weight distribution module to screen the output of the teacher model, using high quality pseudo tag information for training of the student model. The supervision information of the final student model is GT constraint of labeled data and pseudo-label constraint of teacher model of unlabeled data; and the parameters of the teacher model are dynamically updated by using the parameters of the student model. In the reasoning stage, the teacher model outputs the final prediction result after removing the repeated prediction frames by the NMS guided by IoU. The different modules are detailed as follows:

and a candidate point feature extraction module. The candidate point feature extraction module mainly comprises three parts: the point feature extraction network module PointNet, translation operation and voting operation.

The point feature extraction network module PointNet: the point feature extraction network module aims at aggregating seed points P _seed Mainly using a multi-layer perceptron and the furthest point sampling to realize multi-scale feature extraction;

where max is the maximum pooling, MLP is the multi-layer perceptron, FPS is the furthest point sample, and K represents the K nearest neighbors of the seed point.

The translation operation is to make the seed point P _seed Inputting into a feedforward network, outputting the offset from the current position to the center of the object to obtain the position P of the candidate point _prop ：

P _prop ＝P _seed +F _seed W ₁ +B ₁

The voting operation is to input candidate points into the feedforward network and output the probability P of the candidate points in the object _in ：

P _in ＝F _seed W ₂ +B ₂

Wherein W is ₁ 、W ₂ Is the weight of the linear layer, B ₁ 、B ₂ Is the bias of the linear layer.

A discrete distribution prediction module of the surface. The discrete distribution prediction module of the surface mainly comprises two parts: a position prediction module and a category prediction module for the object.

Position prediction module of object: given a candidate point P _prop Instead of predicting the offset to the center point and the size of the object, we predict the position and the feature from the candidate point P _prop Distance to each side. Based on the observation that the predicted probability distribution can measure uncertainty, we modify the bounding box parameterization from deterministic to probabilistic. We compare each face with the candidate point P _prop The distance of the position is divided into discrete lattices in space, and the probability that the current plane falls in the lattice is predicted:

P(s＝s _i )＝MLP(F _prop )

we then calculate the expected position of the face using the spatial probability distribution, given by:

category prediction module of object: given a candidate point P _prop We predict the class of objects as follows:

P _cls ＝MLP(F _prop )

i.e. by means of the above modules we can get the class and spatial probability distribution of the candidate points.

Uncertainty prediction module. The uncertainty prediction module mainly comprises two parts: a geometric feature extraction module and an uncertainty prediction network module of the face.

Face geometric feature extraction module: requiring selection of a point P _side A facet is a virtual grid point that contains a particular face of an object. More specifically, for the front of the object, we divide the width and height of the object into segments D. Subsequently, we generate 2×d×d grid points in front of and behind the front of the object. For each bin we find its k nearest neighbors and from the seed bin P _seed Distance weighted interpolation of feature propagation to a face point F _side ：

F _side ＝∑F ⁱ _seed W _i

Wherein F is ⁱ _seed Is the feature of the i nearest neighbor, W _i Is the weight value of the distance weighted interpolation. After obtaining the spatial position and feature of the face point, it is input to the point feature extraction network PointNet to aggregate the geometric feature F of the face of the prediction frame _geo ：

Uncertainty prediction network module: the discrete probability distribution feature is the statistic of the discrete distribution of the face, and we choose the average value of k values with the largest probability, the variance of the distribution and all probability values of the discrete distribution as the distribution feature F of the face _dist ；

Distribution characteristics F of the dough _dist Geometric features of dough F _geo Spliced together; then the fused characteristics are input into a multi-layer perceptron, and the uncertainty measure u of the output surface of the Sigmoid activation layer is passed _s The formula is as follows:

u _s ＝Sigmoid(MLP(Cat(F _geo ,F _dist )))

And the pseudo tag screening and weight distribution module. The pseudo tag screening and weight distribution module mainly comprises three parts: a class-specific filter, a IoU guided low-half NMS module and a face-aware weight assignment module.

Category-specific filter: which uses category confidence, foreground confidence, and IoU predictors to filter out low quality false labels. The thresholds of the classification score, the foreground score, and the IoU score are expressed as τ _cls 、τ _obj And τ _IoU The method comprises the steps of carrying out a first treatment on the surface of the The selected pseudo tag needs to satisfy:

P _cls >τ _cls ,P _obj >τ _obj ,P _IoU >τ _IoU ；

IoU guided low half NMS policy: noise in the pseudo tag caused by repeated bounding box predictions is suppressed by a low half non-maximum suppression strategy guided by IoU. For a stack of highly overlapping pseudo-markers we discard only half of the predictions IoU lower suggestions;

face perception weight distribution module: the uncertainty of the using surface is weight distribution for each surface of the pseudo tag, and the weight distribution formula is as follows:

The method can be widely applied to systems in the fields of automatic driving, mechanical arm grabbing, augmented reality and the like, and can accurately position and identify the object in the point cloud scene. In practice, the method can be installed on front-end equipment, robots and automatic driving automobiles in a software mode to provide real-time object target detection; the method can also be installed in a background server to provide object positioning and recognition results of a large number of 3D point cloud scenes.

Table 1: comparison of experimental results on ScannetV2 dataset

As shown in Table 1, we compared our method with the best performance method at present on ScanNet V2 data set, and from the results, we can see that our method obtains the best performance on both AP@50 and AP@25 under the condition of using different amounts of tagged data, and verifies the effectiveness of our method.

The embodiment of the application also discloses a terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein when the processor executes the computer program, any one of the semi-supervised 3D target detection methods based on digital twinning are adopted.

The terminal device may be a computer device such as a desktop computer, a notebook computer, or a cloud server, and the terminal device includes, but is not limited to, a processor and a memory, for example, the terminal device may further include an input/output device, a network access device, a bus, and the like.

The processor may be a Central Processing Unit (CPU), or of course, according to actual use, other general purpose processors, digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), ready-made programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the general purpose processor may be a microprocessor or any conventional processor, etc., which is not limited in this application.

The memory may be an internal storage unit of the terminal device, for example, a hard disk or a memory of the terminal device, or may be an external storage device of the terminal device, for example, a plug-in hard disk, a Smart Memory Card (SMC), a secure digital card (SD), or a flash memory card (FC) equipped on the terminal device, or the like, and may be a combination of the internal storage unit of the terminal device and the external storage device, where the memory is used to store a computer program and other programs and data required by the terminal device, and the memory may be used to temporarily store data that has been output or is to be output, which is not limited in this application.

Any of the semi-supervised 3D object detection methods based on digital twinning in the embodiment is stored in a memory of the terminal device through the terminal device, and is loaded and executed on a processor of the terminal device, so that the method is convenient to use.

The embodiment of the application also discloses a computer readable storage medium, and the computer readable storage medium stores a computer program, wherein when the computer program is executed by a processor, any of the digital twinning-based semi-supervised 3D object detection methods in the embodiment are adopted.

The computer program may be stored in a computer readable medium, where the computer program includes computer program code, where the computer program code may be in a source code form, an object code form, an executable file form, or some middleware form, etc., and the computer readable medium includes any entity or device capable of carrying the computer program code, a recording medium, a usb disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a Random Access Memory (RAM), an electrical carrier signal, a telecommunication signal, a software distribution medium, etc., where the computer readable medium includes, but is not limited to, the above components.

The semi-supervised 3D object detection method based on digital twinning in any of the embodiments is stored in the computer readable storage medium through the computer readable storage medium, and is loaded and executed on a processor, so as to facilitate the storage and application of the method.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. The semi-supervised 3D target detection method based on digital twinning is characterized by comprising the following steps of:

2. The digital twinning-based semi-supervised 3D object detection method according to claim 1, wherein the input point cloud, the extraction of the spatial location and characteristics of candidate points through the point cloud characteristics extraction network PointNet and the translation operation and voting mechanism, comprises the following steps:

the input point cloud obtains seed points through the farthest point sampling, the seed points aggregate the characteristics of surrounding points by using k nearest neighbors and a multi-layer perceptron, the seed points after the aggregation of the characteristics are used as the input point cloud of the next stage, and the operations are repeated twice to obtain the finally output seed points;

predicting probability that the last output seed point is a foreground object point and distance from the last output seed point to the center of the object by using the characteristics of the last output seed point, and translating the spatial position of the last output seed point to the center of the object; the translated seed point is called the candidate point P _prop 。

3. The digital twinning-based semi-supervised 3D object detection method according to claim 1, wherein the multi-layer perceptron is used to predict the spatial distribution of each face of the bounding box and class of objects in space, comprising the following:

given a candidate point P _prop Is predicted from the candidate point P _prop Distance to each side; based on the observation of the uncertainty of the predicted probability distribution measure, correcting the parameterization of the boundary box from a deterministic method to a probabilistic method; each surface is opposite to the candidate point P _prop The distance of the position is divided into discrete grids in space, the probability that the front face falls in the grids is predicted, and the spatial position of the face is obtained through the expectations of distribution, and the formula is as follows:

wherein s is _i Is the distance value represented by each bin, P is the predicted probability,is the spatial position of the predicted face.

4. The digital twinning-based semi-supervised 3D target detection method of claim 1, wherein the face-based pooling operation yields face geometry, comprising the steps of:

the face-based pooling operation requires selection of a face point P _side Pastry P _side Is a virtual grid point that contains a particular face of the object; specifically, for the front surface of the object, dividing the width and the height of the object into sections D; generating 2×d×d grid points in front of and behind the front surface of the object; for each of the points P _side Find each of the points P _side And from seed point P _seed Distance weighted interpolation of feature propagation to a surface point P _side The method comprises the steps of carrying out a first treatment on the surface of the All the points P _side Inputting into a point cloud feature extraction network PointNet to obtain geometrical feature F _geo 。

5. The digital twinning-based semi-supervised 3D target detection method of claim 1, wherein the geometric features and the discrete probability distribution features of the faces are spliced together, input into a multi-layer perceptron, and output uncertainty prediction results, comprising the steps of:

the discrete probability distribution feature is the statistic of the discrete distribution of the face, and the average value of k values with the largest probability, the variance of the distribution and all probability values of the discrete distribution are selected as the distribution feature F of the face _dist ；

inputting the fused characteristics into a multi-layer perceptron, and outputting an uncertainty measure u of the surface through a Sigmoid activation layer _s The formula is as follows:

u _s ＝Sigmoid(MLP(Cat(F _geo ，F _dist )))

obtaining a measure of uncertainty of a face using predictionTag with uncertainty as the absolute error between the face and real faceAnd (3) performing constraint, guiding training of the uncertainty estimation module, and calculating a label according to the following formula:

wherein y is _s Is the spatial position of the real face,is the spatial position of the predicted surface, MIN is the minimum value, alpha is the scale-up and scale-down coefficient, and absolute error of the predicted surface and the real surface is used>And predicted uncertainty u _s Mean square error between as uncertainty regression loss L _uncer ：

6. The digital twinning-based semi-supervised 3D target detection method of claim 1, wherein said pseudo tag screening and weight distribution is implemented by a pseudo tag screening and weight distribution module, said pseudo tag screening and weight distribution module class specific screener, ioU guided low half NMS policy and face aware weight distribution; the category-specific filter uses category confidence, foreground confidence, and IoU predictor to filter out low quality false labels; ioU guided low half NMS strategy the low half NMS guided by IoU discards half of the predicted IoU lower prediction results; the face perception weight distribution is carried out by using the uncertainty of the face as each face of the pseudo tag, and the weight distribution formula is as follows:

wherein q is _s Is the mass fraction of the face, alpha ₂ Is a scale factor, the loss is weighted using the quality score:

q _B is q _s The average value of (2) reflects the global positioning quality of the bounding box.

7. The digital twinning-based semi-supervised 3D object detection method according to claim 1, wherein the using of pseudo labels and weight scores to supervise unlabeled data of student models and group-Truth to supervise labeled data of student models comprises the steps of:

IoU loss using rotation and smooth L1 loss based on face awareness in face aware networks:

using pseudo tags as supervisory signals for the unlabeled data, weighting with mass fractions:

8. a semi-supervised 3D object detection system based on face uncertainty estimation, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, characterized in that the memory stores the computer program capable of running on the processor, and that the processor, when loading and executing the computer program, employs a digital twinning-based semi-supervised 3D object detection method as defined in any of claims 1 to 7.

10. A computer readable storage medium having a computer program stored therein, wherein the computer program, when loaded and executed by a processor, employs a digital twinning-based semi-supervised 3D object detection method as claimed in any one of claims 1 to 7.