CN115965833A - Point cloud sequence recognition model training and recognition method, device, equipment and medium - Google Patents

Point cloud sequence recognition model training and recognition method, device, equipment and medium Download PDF

Info

Publication number
CN115965833A
CN115965833A CN202211674070.6A CN202211674070A CN115965833A CN 115965833 A CN115965833 A CN 115965833A CN 202211674070 A CN202211674070 A CN 202211674070A CN 115965833 A CN115965833 A CN 115965833A
Authority
CN
China
Prior art keywords
point cloud
sequence
model
target
inputting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211674070.6A
Other languages
Chinese (zh)
Inventor
郭裕兰
危义民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Sun Yat Sen University
Sun Yat Sen University Shenzhen Campus
Original Assignee
Sun Yat Sen University
Sun Yat Sen University Shenzhen Campus
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, Sun Yat Sen University Shenzhen Campus filed Critical Sun Yat Sen University
Priority to CN202211674070.6A priority Critical patent/CN115965833A/en
Publication of CN115965833A publication Critical patent/CN115965833A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, equipment and a medium for training and identifying a point cloud sequence identification model, wherein the method comprises the following steps: acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent to each other from the point cloud sequence to obtain a source point cloud and a target point cloud; inputting the source point cloud and the target point cloud into an auxiliary task model for point cloud reconstruction to obtain a reconstructed point cloud, and pre-training an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder; acquiring an identification task decoder, and combining the identification task decoder with a trained encoder to obtain an initialized point cloud sequence identification model; and inputting the point cloud sequence into the initialized point cloud sequence recognition model for training to obtain the trained point cloud sequence recognition model. The embodiment of the invention can reduce the dependence of the point cloud sequence identification model on the artificial marking data, can be suitable for various point cloud identification tasks, and can be widely applied to the technical field of artificial intelligence.

Description

Point cloud sequence recognition model training and recognition method, device, equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a medium for training and identifying a point cloud sequence identification model.
Background
With the advent of dynamic point cloud acquisition technologies and equipment, people can easily obtain a large number of original point cloud sequences, and capture spatio-temporal information from the point cloud sequences through a supervised learning method, but the supervised learning method needs to invest a large number of accurately manually labeled labels. Because point cloud sequences have more complex time and space structures than point cloud, image and other data, the dependence of a point cloud sequence identification system on manual marking data needs to be reduced. At present, the surveillance information provided by a point cloud sequence identification model in the related art is limited, and the point cloud sequence identification model cannot be suitable for various point cloud identification tasks. In view of the above, there is a need to solve the technical problems in the related art.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, and a medium for training and identifying a point cloud sequence identification model, so as to reduce the dependency on tag data of the point cloud sequence identification model and improve the applicability to an identification task.
In one aspect, the present invention provides a method for training a point cloud sequence recognition model, including:
acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent to each other from the point cloud sequence to obtain a source point cloud and a target point cloud;
inputting the source point cloud and the target point cloud into an auxiliary task model for point cloud reconstruction to obtain a reconstructed point cloud, and pre-training an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder;
acquiring an identification task decoder, and combining the identification task decoder and the trained encoder to obtain an initialized point cloud sequence identification model;
and inputting the point cloud sequence into an initialized point cloud sequence recognition model for training to obtain a trained point cloud sequence recognition model.
Optionally, the inputting the source point cloud and the target point cloud into an auxiliary task model for point cloud reconstruction to obtain a reconstructed point cloud includes:
the auxiliary task model comprises an encoder, a feature converter and an auxiliary task decoder;
inputting the source point cloud and the target point cloud into the encoder to perform high-dimensional feature representation mapping processing to obtain source point cloud features and target point cloud features;
inputting the source point cloud characteristics and the target point cloud characteristics into the characteristic converter for characteristic conversion processing to obtain target point cloud prediction characteristics;
and inputting the target point cloud prediction characteristics into the auxiliary task decoder to perform characteristic compression processing to obtain reconstructed point cloud.
Optionally, the source point cloud and the target point cloud are input to the encoder to perform high-dimensional feature representation mapping processing, so as to obtain a source point cloud feature and a target point cloud feature, where the encoder includes four processing layers, and the processing step of each processing layer includes:
selecting and processing input data according to a farthest point sampling method to obtain a neighborhood center point;
selecting a target radius according to the neighborhood center point to perform neighborhood construction processing to obtain a space-time neighborhood;
performing local feature extraction processing on the data in the space-time neighborhood, and splicing the extracted local features to obtain output data;
wherein the target radius of each of the processing layers is different.
Optionally, the inputting the source point cloud feature and the target point cloud feature into the feature transformer for feature transformation to obtain a target point cloud prediction feature includes:
carrying out mean variance transformation processing on the source point cloud characteristics to obtain transformation characteristics;
adding the transformation characteristic and the characteristic offset to obtain an added characteristic;
and carrying out weighted summation processing on the addition characteristic and the target point cloud characteristic to obtain a target point cloud prediction characteristic.
Optionally, the step of inputting the target point cloud prediction features into the auxiliary task decoder to perform feature compression processing to obtain a reconstructed point cloud includes:
the auxiliary task decoder comprises four identical feature transfer layers and a full connection layer;
carrying out reverse mapping processing on the target point cloud prediction features through the feature transfer layer to obtain low-dimensional features;
and carrying out coordinate mapping processing on the low-dimensional features through the full-connection layer to obtain a reconstructed point cloud.
Optionally, the inputting the point cloud sequence into the initialized point cloud sequence recognition model for training to obtain the trained point cloud sequence recognition model specifically includes:
inputting the point cloud sequence into an initialized point cloud sequence identification model to obtain a point cloud sequence identification result;
determining a loss value of training according to the point cloud sequence recognition result and the point cloud sequence label;
and updating the parameters of the recognition task decoder according to the loss values to obtain a trained point cloud sequence recognition model.
On the other hand, the embodiment of the invention also provides a point cloud sequence identification method, which comprises the following steps:
acquiring a point cloud sequence to be identified;
inputting the point cloud sequence to be recognized into the point cloud sequence recognition model obtained by the training method of the point cloud sequence recognition model according to any one of claims 1 to 6, and obtaining a point cloud sequence recognition result.
On the other hand, the embodiment of the invention also provides a point cloud sequence identification device, which comprises:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent to each other from the point cloud sequence to obtain a source point cloud and a target point cloud;
the second module is used for inputting the source point cloud and the target point cloud into an auxiliary task model to carry out point cloud reconstruction to obtain a reconstructed point cloud, and pre-training an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder;
the third module is used for acquiring a recognition task decoder, and combining the recognition task decoder with the trained encoder to obtain an initialized point cloud sequence recognition model;
and the fourth module is used for inputting the point cloud sequence into the initialized point cloud sequence recognition model for training to obtain the trained point cloud sequence recognition model.
On the other hand, the embodiment of the invention also discloses an electronic device, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
On the other hand, the embodiment of the invention also discloses a computer readable storage medium, wherein the storage medium stores a program, and the program is executed by a processor to realize the method.
In another aspect, an embodiment of the present invention further discloses a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
Advantages and benefits of the present invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention:
according to the training method of the point cloud sequence recognition model, the source point cloud and the target point cloud are input into an auxiliary task model to be subjected to point cloud reconstruction to obtain a reconstructed point cloud, and a coder of the auxiliary task model is subjected to pre-training treatment according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained coder; the encoder can be trained by a self-supervision method, so that the dependence of a point cloud sequence recognition model on manual marking data is reduced; the embodiment of the invention also obtains an identification task decoder, and combines the identification task decoder and the trained encoder to obtain an initialized point cloud sequence identification model; the decoder can combine a plurality of point cloud sequence identification tasks, thereby being used for the plurality of point cloud sequence identification tasks and improving the flexibility and generalization capability of a point cloud sequence identification model.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a training method of a point cloud sequence recognition model according to an embodiment of the present disclosure;
fig. 2 is a system structure diagram of a point cloud sequence identification model according to an embodiment of the present disclosure;
fig. 3 is a network structure diagram of a feature transformer according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
First, several terms referred to in the present application are resolved:
point cloud sequence: the point cloud sequence is composed of a plurality of frames of point clouds. The point cloud sequence may be viewed as a collection of points, containing position information of a plurality of time points of a target object in a scene.
Self-supervision learning: aiming at improving the feature extraction capability of the model by designing an auxiliary task to mine the characterization characteristics of the data as supervision information for the data without manual marking. The supervised information obtained here does not refer to the original task labels faced by the self-supervised learning, but to the constructed auxiliary task labels. The goal of the self-monitoring recognition system is to design an intelligent system capable of automatically processing a point cloud sequence by utilizing a self-monitoring learning correlation theory, and the intelligent system is used for tasks including human action recognition, semantic segmentation, scene flow estimation and the like.
Is characterized in that: a common term in the field of computer vision. Refers to a vector or matrix that can characterize the data.
In the related art, the method for identifying the point cloud sequence is generally a scheme of applying a rotation transformation to the point cloud or a scheme of calculating geometric features of the point cloud. In the scheme of applying rotation transformation to the point cloud, the angle value of the rotation transformation is limited, and the position change of the midpoint of the point cloud caused by different angle values may have little difference, and the corresponding relationship is not obvious. The supervision information provided by the scheme is limited, and the task types which can be applied by the point cloud identification model are limited. In the scheme of calculating the geometric features of the point cloud, the used geometric features are not stable enough, for example, a normal is unstable geometric information and is easily influenced by partial point offset. The point cloud identification model trained in the mode is sensitive to the change of the point cloud midpoint position, and can be applied to partial point cloud identification tasks only.
In view of this, referring to fig. 1, an embodiment of the present invention provides a method for training a point cloud sequence recognition model, including:
s101, acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent to each other from the point cloud sequence to obtain a source point cloud and a target point cloud;
s102, inputting the source point cloud and the target point cloud into an auxiliary task model to carry out point cloud reconstruction to obtain a reconstructed point cloud, and carrying out pre-training processing on an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder;
s103, acquiring an identification task decoder, and combining the identification task decoder and the trained encoder to obtain an initialized point cloud sequence identification model;
and S104, inputting the point cloud sequence into an initialized point cloud sequence recognition model for training to obtain a trained point cloud sequence recognition model.
In the embodiment of the invention, the point cloud sequence-based self-supervision identification system enables the auxiliary task model to mine some hidden label information from the point cloud sequence data by introducing a new auxiliary task model so as to supervise the training of the point cloud identification model, thereby greatly reducing the dependency of the point cloud sequence identification model on manual marking data.
The training method of the point cloud identification model is divided into a pre-training stage and a formal training stage. In the pre-training stage, point cloud reconstruction of a current frame and a future frame is introduced to serve as an auxiliary task model. The current frame-future frame reconstruction task is to take any two frames of point cloud data at intervals from a section of point cloud sequence to obtain a source point cloud (current frame) and a target point cloud (future frame), and reconstruct the next frame of point cloud by using the characteristics of the previous frame of point cloud. The auxiliary task model in the pre-training stage comprises an encoder, a feature converter and an auxiliary task decoder. And in the formal training stage, a trained encoder in the auxiliary task model is used, and a recognition task decoder aiming at a specific recognition task is introduced to form a point cloud sequence recognition model. After the point cloud sequence recognition model is obtained, a small amount of manual marking data is used for training the point cloud sequence recognition model, the point cloud sequence recognition can be trained only by little marking data as the encoder is trained, and different recognition task decoders can be replaced to be applied to different point cloud sequence recognition tasks.
As a further preferred embodiment, in the step S102, inputting the source point cloud and the target point cloud into an auxiliary task model to perform point cloud reconstruction to obtain a reconstructed point cloud, which includes:
the auxiliary task model comprises an encoder, a feature converter and an auxiliary task decoder;
inputting the source point cloud and the target point cloud into the encoder to perform high-dimensional feature representation mapping processing to obtain source point cloud features and target point cloud features;
inputting the source point cloud characteristics and the target point cloud characteristics into the characteristic converter for characteristic conversion processing to obtain target point cloud prediction characteristics;
and inputting the target point cloud prediction characteristics into the auxiliary task decoder to perform characteristic compression processing to obtain reconstructed point cloud.
In the embodiment of the invention, in order to ensure the generalization capability of the algorithm, the point cloud sequence self-supervision identification problem is modeled into a representation learning problem, namely the training of an auxiliary task model is supervised according to the characteristic similarity error between the minimized reconstructed point cloud and the target point cloud. The core of the auxiliary task model is to introduce a characteristic converter aiming at a current frame-future frame reconstruction task of a point cloud sequence. The feature converter can effectively aggregate the source point cloud features and the target point cloud features according to feature similarity, so that point-by-point coordinates of reconstructed point clouds are predicted according to the features. Because different algorithm requirements are required for the coding feature map and the decoding feature map, the embodiment of the invention adopts a heterogeneous coding feature extraction network and a decoding feature extraction network to respectively process features, and the coding feature extraction network and the decoding feature extraction network are respectively referred to as an encoder and an auxiliary task decoder for short. Referring to fig. 2, in the pre-training stage, a source point cloud and a target point cloud are input into an auxiliary task model to perform point cloud reconstruction, so as to obtain a reconstructed point cloud, where the auxiliary task model includes an encoder, a feature transformer, and a decoder. And updating parameters of the auxiliary task model by taking the chamfering loss between the target point cloud and the reconstructed point cloud as a loss function, so as to obtain the trained encoder.
Further as a preferred embodiment, the source point cloud and the target point cloud are input to the encoder to perform high-dimensional feature representation mapping processing, so as to obtain a source point cloud feature and a target point cloud feature, wherein the encoder includes four processing layers, and the processing step of each processing layer includes:
selecting and processing input data according to a farthest point sampling method to obtain a neighborhood central point;
selecting a target radius according to the neighborhood center point to perform neighborhood construction processing to obtain a space-time neighborhood;
performing local feature extraction processing on the data in the space-time neighborhood, and splicing the extracted local features to obtain output data;
wherein the target radius of each of the processing layers is different.
In the embodiment of the invention, the purpose of the encoder is to map an input point cloud sequence of any frame into a high-dimensional feature representation (feature vector), and the invention uses a point cloud convolution neural network with a multilayer structure as a ground network. The encoder in the embodiment of the invention can be used for analyzing point cloud sequence information on the basis of a MetaOrNet network. The encoder follows the first four layers of the meteorenet network. The encoder has a simple structure, the first layer places multi-frame point clouds in the same space, and a farthest point sampling method is utilized to select a neighborhood center point. And (3) outwards taking a certain radius from the center point of the neighborhood to construct a space-time neighborhood, combining points in the space-time neighborhood, extracting local features by using a convolutional neural network, and finally splicing the local features to form global features. The second layer repeats the feature extraction steps of the first layer, except that a larger radius is used to combine the spatio-temporal neighborhood, and the third and fourth layers are analogized in turn. Four layers are stacked up and down to form the encoder.
Further as a preferred embodiment, the inputting the source point cloud feature and the target point cloud feature into the feature transformer for feature transformation processing to obtain a target point cloud prediction feature includes:
carrying out mean variance transformation processing on the source point cloud characteristics to obtain transformation characteristics;
adding the transformation characteristic and the characteristic offset to obtain an added characteristic;
and carrying out weighted summation processing on the addition characteristic and the target point cloud characteristic to obtain a target point cloud prediction characteristic.
In the embodiment of the invention, the feature converter is used for deducing the target point cloud prediction feature based on the source point cloud feature and the target point cloud feature so as to be used for reconstructing the subsequent target point cloud. Referring to fig. 3, the feature transformer of the embodiment of the present invention inputs the source point cloud feature S and the target point cloud feature T, and outputs the predicted feature P of the target point cloud. The processing process of the feature converter comprises a coarse conversion part and a fine conversion part, and in the coarse conversion stage, mean value variance conversion is carried out on the cloud features S of the source point
Figure BDA0004017336300000091
Unifying the mean S and variance of the source point cloud characteristics and the mean and variance of the target point cloud characteristics T to obtain transformation characteristics C; in the fine conversion phase, a characteristic offset W is added to the conversion characteristic C->
Figure BDA0004017336300000101
And obtaining an addition characteristic F, and inputting the addition characteristic F into an attention mechanism, namely performing weighted summation processing on the input addition characteristic F and the target point cloud characteristic T to obtain a target point cloud prediction characteristic P. The feature offset W is the same as the added feature F in dimensionality and is obtained through neural network learning, and the value of the feature offset W is continuously updated during training, so that the optimal feature offset W value is trained. And (3) carrying out mean variance transformation on the source point cloud characteristics S, wherein a mean variance transformation equation is as follows:
Figure BDA0004017336300000102
in the formula, m1 is the mean value of the cloud features of the source points, n1 is the variance of the cloud features of the source points, m2 is the mean value of the features of the target point cloud, and n2 is the variance of the features of the target point cloud.
And carrying out weighted summation processing on the input addition characteristic F and the target point cloud characteristic T, wherein a weighted summation equation is as follows:
P=α*F+β*T,[α,β]∈[0,1];
wherein alpha and beta are learned from a neural network model, and the values of alpha and beta are continuously adjusted according to model losses.
As a further preferred embodiment, the step of inputting the target point cloud prediction features into the auxiliary task decoder to perform feature compression processing to obtain a reconstructed point cloud includes:
the auxiliary task decoder comprises four identical feature transfer layers and a full connection layer;
carrying out reverse mapping processing on the target point cloud prediction features through the feature transfer layer to obtain low-dimensional features;
and carrying out coordinate mapping processing on the low-dimensional features through the full-connection layer to obtain a reconstructed point cloud.
In an embodiment of the invention, the purpose of the decoder is to compress the spatial dimensions of the feature map to obtain the feature vectors. The embodiment of the invention can adopt a feature transfer layer of a PointNet network, and a decoder consists of four feature transfer layers and a full connection layer (introduced by a convolutional neural network). Each feature transfer layer reversely maps the input high-dimensional point cloud features to the low-dimensional point cloud features, and finally the full-connection layer maps the low-dimensional point cloud vectors output by the fourth feature transfer layer to point-by-point coordinates.
Further as a preferred embodiment, the inputting the point cloud sequence into the initialized point cloud sequence recognition model for training to obtain the trained point cloud sequence recognition model specifically includes:
inputting the point cloud sequence into an initialized point cloud sequence identification model to obtain a point cloud sequence identification result;
determining a loss value of training according to the point cloud sequence recognition result and the point cloud sequence label;
and updating the parameters of the recognition task decoder according to the loss values to obtain a trained point cloud sequence recognition model.
In the embodiment of the invention, the encoder of the auxiliary task model is pre-trained by an automatic supervision learning method to obtain the trained encoder. And recombining the trained encoder and the recognition task decoder to obtain the point cloud sequence recognition model, wherein the recognition task decoder depends on the point cloud sequence recognition task required to be processed. In the embodiment of the invention, a three-dimensional scene flow estimation task and a human behavior identification task can be included. Referring to fig. 2, in the embodiment of the present invention, a point cloud sequence with a small amount of marked data is input into a point cloud sequence identification model, where the point cloud sequence identification model includes an encoder and a new decoder, and the encoder repeatedly uses the encoder trained in the auxiliary task model to obtain a predicted scene stream. And calculating the target scene flow and the prediction scene flow in the point cloud sequence label to obtain a trained loss value, namely the minimum mean square error. And updating the parameters of the model by adopting a back propagation algorithm based on the trained loss value, and iterating for several rounds to obtain the trained point cloud sequence recognition model.
It can be understood that the input of the three-dimensional scene stream estimation task in the embodiment of the present invention is a point cloud sequence including two frames of point clouds, and the output is a three-dimensional motion vector of a corresponding point between the two frames of point clouds. In the three-dimensional scene flow estimation task, the recognition task decoder can adopt the last four layers in a FlowNet3D model, and comprises three upper convolution layers (introduced by a FlowNet3D network) and one full connection layer. Wherein the upper convolutional layer functions to transfer the features from a set of points of the upper layer to a set of points of the lower layer. The full-connection layer is used for mapping the features of the feature space calculated by the encoder back to the sample mark space, so that the corresponding classes of the samples are identified. The embodiment of the invention can also be applied to a human behavior identification task, wherein in the human behavior identification task, a point cloud sequence of a plurality of frames of point clouds is input, and a label corresponding to the whole point cloud sequence is output to represent the human behavior type contained in the point cloud sequence. The embodiment of the invention can adopt the recognition task decoder comprising two full-connection layers (introduced by a convolutional neural network), and the full-connection layers map the features of the feature space calculated by the encoder back to the sample mark space, thereby recognizing the corresponding class of the sample.
On the other hand, the embodiment of the application also provides a point cloud sequence identification method, which comprises the following steps:
acquiring a point cloud sequence to be identified;
and inputting the point cloud sequence to be recognized into the point cloud sequence recognition model obtained by the training method of the point cloud sequence recognition model to obtain a point cloud sequence recognition result.
In the embodiment of the application, after the model is trained, the point cloud sequence to be identified can be subjected to a small amount of labeling, and then the point cloud sequence is input into the point cloud tower identification model obtained by the training method of the point cloud tower identification model, so that a point cloud tower identification result can be obtained. The embodiment of the invention can use a small amount of manual marking data to train the point cloud sequence recognition model, and the encoder is trained, so that the model can be trained only by a small amount of marking data and can be used for different point cloud sequence recognition tasks. A large number of experiments verify that the model can achieve the performance of a point cloud sequence recognition model trained by using all original data only by 25% of the original marked data.
It can be understood that the contents in the above-mentioned embodiment of the training method for a point cloud sequence recognition model are all applicable to this embodiment of the point cloud sequence recognition method, and the functions specifically implemented by this embodiment of the point cloud sequence recognition method are the same as those in the above-mentioned embodiment of the training method for a point cloud sequence recognition model, and the beneficial effects achieved by this embodiment of the training method for a point cloud sequence recognition model are also the same as those achieved by the above-mentioned embodiment of the training method for a point cloud sequence recognition model.
On the other hand, the embodiment of the invention also provides a point cloud sequence identification device, which comprises:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent to each other from the point cloud sequence to obtain a source point cloud and a target point cloud;
the second module is used for inputting the source point cloud and the target point cloud into an auxiliary task model to carry out point cloud reconstruction to obtain a reconstructed point cloud, and pre-training an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder;
the third module is used for acquiring a recognition task decoder, and combining the recognition task decoder with the trained encoder to obtain an initialized point cloud sequence recognition model;
and the fourth module is used for inputting the point cloud sequence into the initialized point cloud sequence recognition model for training to obtain the trained point cloud sequence recognition model.
It can be understood that the contents in the above-mentioned embodiment of the training method for the point cloud sequence recognition model are all applicable to the embodiment of the point cloud sequence recognition device, the functions specifically implemented by the embodiment of the point cloud sequence recognition device are the same as those in the above-mentioned embodiment of the training method for the point cloud sequence recognition model, and the beneficial effects achieved by the embodiment of the training method for the point cloud sequence recognition model are also the same as those achieved by the above-mentioned embodiment of the training method for the point cloud sequence recognition model.
Corresponding to the method of fig. 1, an embodiment of the present invention further provides an electronic device, including a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
Corresponding to the method of fig. 1, the embodiment of the present invention also provides a computer-readable storage medium, which stores a program, and the program is executed by a processor to implement the method as described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
In summary, the embodiments of the present invention have the following advantages: the model of the embodiment of the invention can be combined with a new decoder of a plurality of point cloud sequence identification tasks, thereby being used for the plurality of point cloud sequence identification tasks and improving the flexibility and generalization capability of the point cloud sequence identification model. The embodiment of the invention can also realize the performance of the fully supervised model under the supervision of all the annotation data under the condition of only using 25% of the annotation data of the original point cloud sequence, thereby greatly reducing the dependence of the model on the manual marking data.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer given the nature, function, and interrelationships of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is to be determined from the appended claims along with their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for training a point cloud sequence recognition model is characterized by comprising the following steps:
acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent from the point cloud sequence to obtain a source point cloud and a target point cloud;
inputting the source point cloud and the target point cloud into an auxiliary task model to perform point cloud reconstruction to obtain a reconstructed point cloud, and performing pre-training processing on an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder;
acquiring an identification task decoder, and combining the identification task decoder with the trained encoder to obtain an initialized point cloud sequence identification model;
and inputting the point cloud sequence into an initialized point cloud sequence recognition model for training to obtain a trained point cloud sequence recognition model.
2. The method of claim 1, wherein the inputting the source point cloud and the target point cloud into an auxiliary task model for point cloud reconstruction to obtain a reconstructed point cloud comprises:
the auxiliary task model comprises an encoder, a feature converter and an auxiliary task decoder;
inputting the source point cloud and the target point cloud into the encoder to perform high-dimensional feature representation mapping processing to obtain source point cloud features and target point cloud features;
inputting the source point cloud characteristics and the target point cloud characteristics into the characteristic converter for characteristic conversion processing to obtain target point cloud prediction characteristics;
and inputting the target point cloud prediction characteristics into the auxiliary task decoder to perform characteristic compression processing to obtain reconstructed point cloud.
3. The method of claim 2, wherein the source point cloud and the target point cloud are input into the encoder for high-dimensional feature representation mapping processing, and source point cloud features and target point cloud features are obtained, wherein the encoder comprises four processing layers, and the processing step of each processing layer comprises:
selecting and processing input data according to a farthest point sampling method to obtain a neighborhood central point;
selecting a target radius according to the neighborhood center point to perform neighborhood construction processing to obtain a space-time neighborhood;
performing local feature extraction processing on the data in the space-time neighborhood, and splicing the extracted local features to obtain output data;
wherein the target radius of each of the processing layers is different.
4. The method of claim 2, wherein inputting the source point cloud features and the target point cloud features into the feature transformer for feature transformation processing to obtain target point cloud predicted features comprises:
carrying out mean variance transformation processing on the source point cloud characteristics to obtain transformation characteristics;
adding the transformation characteristics and the characteristic offset to obtain added characteristics;
and carrying out weighted summation processing on the addition characteristic and the target point cloud characteristic to obtain a target point cloud prediction characteristic.
5. The method of claim 2, wherein the inputting the target point cloud predicted feature into the auxiliary task decoder for feature compression processing to obtain a reconstructed point cloud comprises:
the auxiliary task decoder comprises four identical feature transfer layers and a full connection layer;
carrying out reverse mapping processing on the target point cloud prediction features through the feature transfer layer to obtain low-dimensional features;
and carrying out coordinate mapping processing on the low-dimensional features through the full-connection layer to obtain a reconstructed point cloud.
6. The method according to claim 1, wherein the inputting the point cloud sequence into an initialized point cloud sequence recognition model for training to obtain a trained point cloud sequence recognition model specifically comprises:
inputting the point cloud sequence into an initialized point cloud sequence identification model to obtain a point cloud sequence identification result;
determining a loss value of training according to the point cloud sequence recognition result and the point cloud sequence label;
and updating the parameters of the recognition task decoder according to the loss values to obtain a trained point cloud sequence recognition model.
7. A method for identifying a point cloud sequence, the method comprising:
acquiring a point cloud sequence to be identified;
inputting the point cloud sequence to be recognized into the point cloud sequence recognition model obtained by the training method of the point cloud sequence recognition model according to any one of claims 1 to 6, and obtaining a point cloud sequence recognition result.
8. An apparatus for identifying a point cloud sequence, the apparatus comprising:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring a point cloud sequence, and selecting any two frames of point cloud data which are not adjacent to each other from the point cloud sequence to obtain a source point cloud and a target point cloud;
the second module is used for inputting the source point cloud and the target point cloud into an auxiliary task model to carry out point cloud reconstruction to obtain a reconstructed point cloud, and pre-training an encoder of the auxiliary task model according to the characteristic similarity error of the reconstructed point cloud and the target point cloud to obtain a trained encoder;
the third module is used for acquiring a recognition task decoder, and combining the recognition task decoder with the trained encoder to obtain an initialized point cloud sequence recognition model;
and the fourth module is used for inputting the point cloud sequence into the initialized point cloud sequence recognition model for training to obtain the trained point cloud sequence recognition model.
9. An electronic device, comprising a memory and a processor;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
CN202211674070.6A 2022-12-26 2022-12-26 Point cloud sequence recognition model training and recognition method, device, equipment and medium Pending CN115965833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211674070.6A CN115965833A (en) 2022-12-26 2022-12-26 Point cloud sequence recognition model training and recognition method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211674070.6A CN115965833A (en) 2022-12-26 2022-12-26 Point cloud sequence recognition model training and recognition method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115965833A true CN115965833A (en) 2023-04-14

Family

ID=87358201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211674070.6A Pending CN115965833A (en) 2022-12-26 2022-12-26 Point cloud sequence recognition model training and recognition method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115965833A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725966A (en) * 2024-02-18 2024-03-19 粤港澳大湾区数字经济研究院(福田) Training method of sketch sequence reconstruction model, geometric model reconstruction method and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725966A (en) * 2024-02-18 2024-03-19 粤港澳大湾区数字经济研究院(福田) Training method of sketch sequence reconstruction model, geometric model reconstruction method and equipment
CN117725966B (en) * 2024-02-18 2024-06-11 粤港澳大湾区数字经济研究院(福田) Training method of sketch sequence reconstruction model, geometric model reconstruction method and equipment

Similar Documents

Publication Publication Date Title
CN114120102A (en) Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
Cascianelli et al. Full-GRU natural language video description for service robotics applications
CN113159056A (en) Image segmentation method, device, equipment and storage medium
Dai et al. Pfemed: Few-shot medical image classification using prior guided feature enhancement
Wang et al. Reliable identification of redundant kernels for convolutional neural network compression
CN111680757A (en) Zero sample image recognition algorithm and system based on self-encoder
CN112801068A (en) Video multi-target tracking and segmenting system and method
CN115880317A (en) Medical image segmentation method based on multi-branch feature fusion refining
CN115965833A (en) Point cloud sequence recognition model training and recognition method, device, equipment and medium
CN116823850A (en) Cardiac MRI segmentation method and system based on U-Net and transducer fusion improvement
CN116543351A (en) Self-supervision group behavior identification method based on space-time serial-parallel relation coding
Blier-Wong et al. Rethinking representations in P&C actuarial science with deep neural networks
Zeng et al. Deep stereo matching with hysteresis attention and supervised cost volume construction
CN113780129B (en) Action recognition method based on unsupervised graph sequence predictive coding and storage medium
Li et al. Masked modeling for self-supervised representation learning on vision and beyond
CN116665110B (en) Video action recognition method and device
CN110069666B (en) Hash learning method and device based on neighbor structure keeping
CN117012370A (en) Multi-mode disease auxiliary reasoning system, method, terminal and storage medium
CN116843995A (en) Method and device for constructing cytographic pre-training model
CN111598841A (en) Example significance detection method based on regularized dense connection feature pyramid
CN115861196A (en) Active learning method for multi-modal medical images
CN112200055B (en) Pedestrian attribute identification method, system and device of combined countermeasure generation network
CN115690115A (en) Lung medical image segmentation method based on reconstruction pre-training
CN112364933B (en) Image classification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240126

Address after: 518107 Room 501, building 3, Herun Jiayuan, Huaxia Road, Guangming Street, Guangming New District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen, Zhongshan University

Country or region after: China

Applicant after: SUN YAT-SEN University

Applicant after: National University of Defense Technology

Address before: 518107 Room 501, building 3, Herun Jiayuan, Huaxia Road, Guangming Street, Guangming New District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen, Zhongshan University

Country or region before: China

Applicant before: SUN YAT-SEN University

TA01 Transfer of patent application right