CN111784699B

CN111784699B - Method and device for carrying out target segmentation on three-dimensional point cloud data and terminal equipment

Info

Publication number: CN111784699B
Application number: CN201910264860.9A
Authority: CN
Inventors: 黄晓航
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2019-04-03
Filing date: 2019-04-03
Publication date: 2024-06-18
Anticipated expiration: 2039-04-03
Also published as: CN111784699A

Abstract

The invention is suitable for the technical field of image processing, and provides a method, a device and terminal equipment for carrying out target segmentation on three-dimensional point cloud data.

Description

Method and device for carrying out target segmentation on three-dimensional point cloud data and terminal equipment

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a method, a device and terminal equipment for performing target segmentation on three-dimensional point cloud data.

Background

The point cloud segmentation is a process of dividing the point cloud into a plurality of similar areas, and points in the same area have the same attribute, namely, the points in the area correspond to a specific target. At present, a new method is provided for point cloud segmentation by a deep learning algorithm represented by a convolutional neural network, and very good results are obtained under the condition that a target object is single and has a small gesture, however, for three-dimensional point cloud data, the smoothness and consistency among data points are not considered, and especially under the condition that the target object is multi-scale and multi-gesture, the contour of the segmented target object is inaccurate and rough, the resolution of the obtained point cloud image is low, and the requirement of people on the accuracy of target segmentation of the three-dimensional point cloud data cannot be met.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a method, a device and a terminal device for performing target segmentation on three-dimensional point cloud data, so as to solve the problem that the accuracy of the existing target segmentation on the three-dimensional point cloud data is low.

A first aspect of an embodiment of the present invention provides a method for performing object segmentation on three-dimensional point cloud data, including:

acquiring original three-dimensional point cloud data;

Inputting the original three-dimensional point cloud data into a coding network to obtain a coded first three-dimensional point cloud feature map, wherein the coding network comprises M coding layers, and M is a positive integer greater than 1;

inputting the first three-dimensional point cloud feature map to a decoding network, wherein the decoding network comprises M decoding layers, and simultaneously inputting the first three-dimensional point cloud feature map output by each coding layer in the coding network to the decoding layer with the same data size as the coding layer;

And inputting the second three-dimensional point cloud characteristic diagram output by the decoding network into a full convolution neural network and a full connection conditional random field layer to perform segmentation processing, so as to obtain classification information of the original three-dimensional point cloud data.

A second aspect of an embodiment of the present invention provides a device for performing object segmentation on three-dimensional point cloud data, including:

The data acquisition unit is used for acquiring original three-dimensional point cloud data;

The coding unit is used for inputting the original three-dimensional point cloud data into a coding network to obtain a coded first three-dimensional point cloud characteristic diagram, wherein the coding network comprises M coding layers, and M is a positive integer greater than 1;

the decoding unit is used for inputting the first three-dimensional point cloud feature map into a decoding network, the decoding network comprises M decoding layers, and meanwhile, the first three-dimensional point cloud feature map output by each encoding layer in the encoding network is input into the decoding layer with the same data size as the encoding layer;

And the target segmentation unit is used for inputting the second three-dimensional point cloud characteristic image output by the decoding network into a full convolution neural network and a full connection conditional random field layer to carry out segmentation processing, so as to obtain the classification information of the original three-dimensional point cloud data.

A third aspect of an embodiment of the present invention provides a terminal device, including:

The method comprises the steps of a method for dividing three-dimensional point cloud data into targets, wherein the method comprises the steps of a memory, a processor and a computer program stored in the memory and capable of running on the processor, and the steps of the method for dividing the targets are achieved when the processor executes the computer program.

Wherein the computer program comprises:

A fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the steps of the method for object segmentation of three-dimensional point cloud data provided in the first aspect of the embodiments of the present invention.

Wherein the computer program comprises:

Compared with the prior art, the embodiment of the invention has the beneficial effects that: the obtained original three-dimensional point cloud data are input into a coding network to obtain a coded first three-dimensional point cloud feature map, the first three-dimensional point cloud feature map is input into a decoding network, the first three-dimensional point cloud feature map output by each coding layer in the coding network is input into a decoding layer with the same data size as the coding layer, and finally, the second three-dimensional point cloud feature map output by the decoding network is input into a full convolutional neural network and a full connection conditional random field layer to be subjected to segmentation processing to obtain classification information of the original three-dimensional point cloud data, so that the finally obtained classification information of the three-dimensional point cloud data not only keeps the resolution of the original three-dimensional point cloud data, but also improves the accuracy of target segmentation and achieves a comparatively ideal segmentation effect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an implementation of a method for performing object segmentation on three-dimensional point cloud data according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a neural network according to an embodiment of the present invention;

FIG. 3 is a flowchart of a specific implementation of a method for correcting a prediction classification score according to an embodiment of the present invention;

FIG. 4 is a flowchart of a specific implementation of a method for optimizing a fully connected conditional random field layer according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of a device for performing object segmentation on three-dimensional point cloud data according to an embodiment of the present invention;

Fig. 6 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples. Referring to fig. 1, fig. 1 shows a flow chart of an implementation of a method for performing object segmentation on three-dimensional point cloud data according to an embodiment of the present invention, and details are described in connection with an architecture schematic diagram of a neural network shown in fig. 2 according to an embodiment of the present invention:

In step S101, original three-dimensional point cloud data is acquired.

In the embodiment of the invention, the number of data points in the acquired original three-dimensional point cloud data is N, the characteristic dimension number, namely the channel number C, is 6, and the corresponding values are x, y, z, r, g and b respectively, wherein the x, y and z values correspond to three-dimensional space coordinate values of the original three-dimensional point cloud data, and the r, g and b values correspond to color channel values.

In step S102, the original three-dimensional point cloud data is input to a coding network, so as to obtain a coded first three-dimensional point cloud feature map, where the coding network includes M coding layers, and M is a positive integer greater than 1.

In the embodiment of the invention, the coding network is mainly used for extracting characteristics of original three-dimensional point cloud data and downsampling the data, and comprises M coding layers, wherein M is preferably a positive integer greater than or equal to 4. It will be appreciated that the greater the number of coding layers, the more accurate the extracted features and the corresponding slower the efficiency, with a need for balancing in terms of accuracy and efficiency.

Here, the first one of the first three-dimensional point cloud feature maps is merely used to distinguish from the second three-dimensional point cloud feature map and the third three-dimensional point cloud feature map hereinafter to represent different three-dimensional point cloud feature maps. Here, since the encoding network and the encoding layers in the encoding network both perform feature extraction on the original three-dimensional point cloud data, a first three-dimensional point cloud feature map is finally output by the encoding network, and for convenience of description, the three-dimensional point cloud feature map output by the encoding layer is also referred to as a first three-dimensional point cloud feature map, but it can be understood that the first three-dimensional point cloud feature map output by each encoding layer is not necessarily the same, that is, the first three-dimensional point cloud feature map is not kept unchanged, and the first three-dimensional point cloud feature maps output by different encoding layers have different characteristics, such as different data sizes. The second three-dimensional point cloud characteristic diagram and the third three-dimensional point cloud characteristic diagram are the same, and the second three-dimensional point cloud characteristic diagram and the third three-dimensional point cloud characteristic diagram which are obtained under different conditions are not identical and have different characteristics.

In step S103, the first three-dimensional point cloud feature map is input to a decoding network, where the decoding network includes M decoding layers, and the first three-dimensional point cloud feature map output by each encoding layer in the encoding network is input to a decoding layer having the same data size as the encoding layer.

In the embodiment of the invention, the decoding network is mainly used for carrying out feature propagation and up-sampling of data on the first three-dimensional point cloud feature map output by the encoding network so as to recover the original size, namely the resolution of the first three-dimensional point cloud feature map.

As can be seen from fig. 2, the coding layer and the decoding layer are each composed of a point scale invariant feature transform PointSIFT model and an X convolution layer, except that the configuration parameters of each coding layer and each decoding layer are different, and the X convolution layer referred to herein is an X-Conv convolution layer, i.e., an X-Conv layer.

Wherein, the configuration parameters include:

The number of output channels C1 and C2, the number N of output data points, the characteristic dimension C of the output data points, the number K of the neighboring points of each local root node and the void ratio D of the void convolution.

In the coding network, pointSIFT models are mainly used for carrying out feature description on original three-dimensional point cloud data or a first three-dimensional point cloud feature map output by a previous coding layer; the X-Conv layer is mainly used for extracting features of a first three-dimensional point cloud feature map output by a PointSIFT model in the same coding layer and then inputting the feature to the next coding layer, or inputting the feature to a decoding network and a decoding layer with the same data size as the coding layer to which the X-Conv belongs.

In the decoding network, the PointSIFT model is mainly used for carrying out feature description on a first three-dimensional point cloud feature map output by the encoding network, or is used for carrying out feature description on a second three-dimensional point cloud feature map output by the last decoding layer and a third three-dimensional point cloud feature map obtained after the first three-dimensional point cloud feature map output by the encoding layer with the same data size as the decoding layer is spliced; the X-Conv layer is mainly used for extracting features of a second three-dimensional point cloud feature map or a third three-dimensional point cloud feature map output by PointSIFT models of the same decoding layer and inputting the feature extracted features to the next decoding layer or PointSIFT models.

Here, each PonitSIFT model includes two PonitSIFT layers, the number of output channels of the two PonitSIFT layers being the same or different.

The method comprises the steps of encoding or decoding information of each data point in the original three-dimensional point cloud data, the first three-dimensional point cloud feature map, the second three-dimensional point cloud feature map or the third three-dimensional point cloud feature map in different directions through PonitSIFT models, so that the data point has scale invariance and rotation invariance, the characteristics extracted by an encoding network or propagated by a decoding network are guaranteed to have scale invariance and rotation invariance, the accuracy of target segmentation of the three-dimensional point cloud data is improved, and the accurate requirement of people on target segmentation of the three-dimensional point cloud data is met.

In the process of decoding the first three-dimensional point cloud feature map finally output by the coding network through the decoding network, the output data of each coding layer in the coding network, namely, the first three-dimensional point cloud feature map is spliced with the second three-dimensional point cloud feature map output by the decoding layer with the same data size as the coding layer in the decoding network at the same time, a third three-dimensional point cloud feature map is formed and then is input into the next decoding layer, wherein the decoding layer with the same data size as the coding layer, particularly the coding layer and the decoding layer, have the same feature dimension C, namely, the values of the feature dimension C of the coding layer and the decoding layer are the same.

In step S104, the second three-dimensional point cloud feature map output by the decoding network is input to a full convolutional neural network and a full connection conditional random field layer to perform segmentation processing, so as to obtain classification information of the original three-dimensional point cloud data.

In the embodiment of the invention, a second three-dimensional point cloud characteristic diagram finally output by a decoding network is input into a full convolution neural network for preliminary segmentation processing, a group of prediction classification scores of each data point in the second three-dimensional point cloud characteristic diagram are obtained, the prediction classification scores are used as initial prediction classification scores, and a corresponding prediction classification distribution diagram is obtained according to the initial prediction classification scores.

And splicing the obtained prediction classification distribution diagram and a second three-dimensional point cloud feature diagram finally output by a decoding network, inputting the second three-dimensional point cloud feature diagram into a fully-connected conditional random field layer for optimization so as to correct the prediction classification score of each data point, and finally obtaining classification information of the original three-dimensional point cloud data.

The classification information comprises a prediction classification score of each data point and a prediction classification distribution map obtained according to the prediction classification score, wherein the prediction classification distribution map is a distribution map formed by three-dimensional point cloud data after target segmentation is completed.

Specifically, referring to fig. 3, fig. 3 shows a specific implementation flow of a method for correcting a prediction classification score according to an embodiment of the present invention, which is described in detail below:

in step S301, after the second three-dimensional point cloud feature map output through the decoding network is input to the weight discarding dropout layer, a prediction classification score of each data point in the second three-dimensional point cloud feature map is calculated through a full convolutional neural network.

In the embodiment of the invention, the purpose of inputting the second three-dimensional point cloud feature map to the dropout layer is to prevent errors caused by the over-fitting of network training, so as to improve the accuracy of target segmentation.

The step of inputting the second three-dimensional point cloud feature map output by the decoding network to the weight discarding dropout layer in step S301 specifically includes:

and inputting the second three-dimensional point cloud characteristic diagram and the original three-dimensional point cloud data which are output through the decoding network into a PointSIFT model for characteristic description, and then inputting the second three-dimensional point cloud characteristic diagram and the original three-dimensional point cloud data into the dropout layer.

In step S302, according to the prediction classification score, performing object segmentation on the second three-dimensional point cloud feature map to obtain an initial prediction classification distribution map.

In step S303, the initial prediction classification distribution map is spliced with the second three-dimensional point cloud feature map output by the decoding network, and then the spliced initial prediction classification distribution map is input to a fully-connected conditional random field layer to correct the prediction classification score, so as to obtain a corrected prediction classification distribution map.

In the embodiment of the invention, a second three-dimensional point cloud feature map output by a decoding network is input to a dropout layer for random weight updating, then the second three-dimensional point cloud feature map is input to a full convolution neural network, a prediction classification score of each data point in the second three-dimensional point cloud feature map is calculated through the full convolution neural network, then target segmentation is carried out on the second three-dimensional point cloud feature map according to the prediction classification score, after a corresponding initial prediction classification distribution map is obtained, the initial prediction classification distribution map is input to a full connection conditional random field layer for optimization so as to correct the prediction classification score of each data point, and thus a corrected prediction classification distribution map is obtained.

Specifically, to further improve the accuracy of the object segmentation of the three-dimensional point cloud data, after step S304, the method further includes:

And carrying out normalization processing and loss processing on the corrected prediction classification score through a normalization exponential function and a loss function to obtain a final prediction classification distribution map.

In the embodiment of the invention, the normalized exponential function is specifically a softmax function, and the loss function is defined as the KL divergence between the class distribution of the prediction classification and the true class distribution:

Wherein, Category distribution classifying predictions for point i,/>For the true category distribution of point i,/>For distribution/>And distribution/>And N is the number of points of the three-dimensional point cloud data.

Specifically, in step S303, further including a step of optimizing the fully connected conditional random field layer, please refer to fig. 4, fig. 4 shows a specific implementation flow of a method for optimizing the fully connected conditional random field layer provided in the embodiment of the present invention, which is described in detail below:

In step S401, the energy function of the fully connected conditional random field layer is configured.

In the embodiment of the present invention, the configured energy function E is:

where x _i is a random variable defined at point i, As a unitary potential function,/>As a binary potential function.

And/>The definition is as follows:

Wherein, I.e. the probability that x _i takes on the value of category i.

Wherein,AndIs a class compatible function; omega _s is a spatial filter coefficient, omega _b is a bilateral filter coefficient, and theta _α、θ_β、θ_γ is a bandwidth coefficient of a filter core; p _i、P_j is the three-dimensional coordinate vector of the i point and the j point, and f _i、f_j is the feature vector of the i point and the j point.

In step S402, the fully connected conditional random field layer with the configured energy function is reduced and approximated using the variance inference based on the average field.

In step S403, the simplified and approximated fully connected conditional random field layer is calculated by arranging a polyhedral mesh algorithm.

In step S404, the computed full-connected conditional random field layer is jointly trained and optimized based on the CRF as RNN structure.

In the embodiment of the invention, the optimized full-connection conditional random field layer is used to make the full-connection conditional random field layer better use of smooth information and consistency information among data points, so that a more accurate segmentation result can be achieved, namely, the accuracy of target segmentation of three-dimensional point cloud data is improved.

In the embodiment of the invention, the obtained original three-dimensional point cloud data is input into the coding network to obtain the coded first three-dimensional point cloud feature map, the first three-dimensional point cloud feature map is input into the decoding network, the first three-dimensional point cloud feature map output by each coding layer in the coding network is input into the decoding layer with the same data size as the coding layer, and finally, the second three-dimensional point cloud feature map output by the decoding network is input into the full convolution neural network and the full connection conditional random field layer for segmentation processing to obtain the classification information of the original three-dimensional point cloud data, so that the finally obtained classification information of the three-dimensional point cloud data not only keeps the resolution of the original three-dimensional point cloud data, but also improves the accuracy of target segmentation and achieves the ideal segmentation effect.

It should be understood that the sequence number of each step in the above embodiment does not mean the execution sequence, and the execution sequence of each process should be controlled by its function and internal logic, and should not limit the implementation process of the embodiment of the present invention in any way.

Corresponding to the method for performing object segmentation on three-dimensional point cloud data in the foregoing embodiments, fig. 5 shows a schematic diagram of an apparatus for performing object segmentation on three-dimensional point cloud data according to an embodiment of the present invention, and for convenience of explanation, only the portion relevant to the embodiment of the present invention is shown.

Referring to fig. 5, the apparatus includes:

A data acquisition unit 51 for acquiring original three-dimensional point cloud data;

The encoding unit 52 is configured to input the original three-dimensional point cloud data to an encoding network, so as to obtain an encoded first three-dimensional point cloud feature map, where the encoding network includes M encoding layers, and M is a positive integer greater than 1;

A decoding unit 53, configured to input the first three-dimensional point cloud feature map to a decoding network, where the decoding network includes M decoding layers, and simultaneously input the first three-dimensional point cloud feature map output by each encoding layer in the encoding network to a decoding layer having the same data size as the encoding layer;

and the target segmentation unit 54 is configured to input the second three-dimensional point cloud feature map output through the decoding network to a full convolutional neural network and a full connection conditional random field layer for segmentation processing, so as to obtain classification information of the original three-dimensional point cloud data.

Specifically, the coding layer and the decoding layer both comprise a point scale invariant feature transform PointSIFT model and an X convolution layer, except that the configuration parameters of each coding layer and each decoding layer are different;

wherein the configuration parameters include:

The number of output channels, the number N of output data points, the characteristic dimension C of the output data points, the number K of the neighboring points of each local root node and the cavity rate D of cavity convolution.

Specifically, in the coding network, the PointSIFT model is used for performing feature description on the original three-dimensional point cloud data or the first three-dimensional point cloud feature map output by the last coding layer; the preset convolution layer is used for extracting features of a first three-dimensional point cloud feature map output by a PointSIFT model of the same coding layer and then inputting the feature to the next coding layer, or inputting the feature to the decoding network and a decoding layer with the same data size as the coding layer to which the preset convolution layer belongs;

In the decoding network, the PointSIFT model is used for carrying out feature description on a first three-dimensional point cloud feature map output by the encoding network, or is used for carrying out feature description on a second three-dimensional point cloud feature map output by a last decoding layer and a third three-dimensional point cloud feature map obtained after the first three-dimensional point cloud feature map output by the encoding layer with the same data size as the decoding layer is spliced; the X convolution layer is configured to perform feature extraction on a second three-dimensional point cloud feature map output by the PointSIFT model of the same decoding layer, or the third three-dimensional point cloud feature map, and then input the feature extraction result to a next decoding layer or other PointSIFT models.

Specifically, the target segmentation unit 54 includes:

the prediction classification score calculation subunit is used for calculating the prediction classification score of each data point in the second three-dimensional point cloud feature map through the full convolution neural network after the second three-dimensional point cloud feature map output through the decoding network is input to the weight discarding dropout layer;

The target segmentation subunit is used for carrying out target segmentation on the second three-dimensional point cloud characteristic map according to the prediction classification score to obtain an initial prediction classification distribution map;

And the prediction classification score correction subunit is used for splicing the initial prediction classification distribution map with the second three-dimensional point cloud feature map output by the decoding network, and inputting the spliced initial prediction classification distribution map to the fully-connected conditional random field layer to correct the prediction classification score so as to obtain a corrected prediction classification distribution map.

Specifically, the prediction classification score calculating subunit is specifically configured to:

Specifically, the target segmentation unit 54 further includes:

and the normalization processing subunit is used for carrying out normalization processing and loss processing on the corrected prediction classification score through a normalization exponential function and a loss function to obtain a final prediction classification distribution diagram.

Fig. 6 is a schematic diagram of a terminal according to an embodiment of the present invention. As shown in fig. 6, the terminal device 6 of this embodiment includes: a processor 60, a memory 61 and a computer program 62 stored in said memory 61 and executable on said processor 60. The processor 60, when executing the computer program 62, implements the steps described above in the embodiments of the method for object segmentation of three-dimensional point cloud data, such as steps 101 to 104 shown in fig. 1. Or the processor 60, when executing the computer program 62, performs the functions of the units in the system embodiments described above, such as the functions of the modules 51 to 54 shown in fig. 5.

Illustratively, the computer program 62 may be partitioned into one or more units that are stored in the memory 61 and executed by the processor 60 to complete the present invention. The one or more units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 62 in the terminal device 6. For example, the computer program 62 may be divided into a data acquisition unit 51, an encoding unit 52, a decoding unit 53, a target division unit 54, each unit functioning specifically as follows:

wherein the configuration parameters include:

Specifically, in the coding network, the PointSIFT model is used for performing feature description on the original three-dimensional point cloud data or the first three-dimensional point cloud feature map output by the last coding layer; the X convolution layer is used for extracting the characteristics of a first three-dimensional point cloud characteristic image output by a PointSIFT model of the same coding layer and then inputting the characteristic image to the next coding layer, or inputting the characteristic image to the decoding network and a decoding layer with the same data size as the coding layer to which the X convolution layer belongs;

Specifically, the target segmentation unit 54 includes:

Specifically, the target segmentation unit 54 further includes:

The terminal device 6 may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the terminal device 6 and does not constitute a limitation of the terminal device 6, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal may further include an input-output device, a network access device, a bus, etc.

The Processor 60 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing the computer program and other programs and data required by the terminal. The memory 61 may also be used for temporarily storing data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the system is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed system/terminal device and method may be implemented in other manners. For example, the system/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, systems or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or system capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method for object segmentation of three-dimensional point cloud data, the method comprising:

acquiring original three-dimensional point cloud data;

inputting a second three-dimensional point cloud characteristic diagram output by the decoding network into a full convolution neural network and a full connection conditional random field layer to perform segmentation processing, so as to obtain classification information of the original three-dimensional point cloud data; the first three-dimensional point cloud feature graphs output by different encoding layers have different characteristics, and the second three-dimensional point cloud feature graphs output by different decoding layers have different characteristics;

The coding layer and the decoding layer both comprise a point scale invariant feature transform PointSIFT model and an X convolution layer, except that the configuration parameters of each coding layer and each decoding layer are different; the PointSIFT model includes two PointSIFT layers;

in the coding network, the X convolution layer is used for extracting features of a first three-dimensional point cloud feature map output by a PointSIFT model of the same coding layer;

In the decoding network, a PointSIFT model with the largest output channel number in a decoding layer is used for carrying out feature description on a first three-dimensional point cloud feature map output by an X convolution layer in the encoding network, the rest PointSIFT models are used for carrying out feature description on a third three-dimensional point cloud feature map, and the third three-dimensional point cloud feature map is obtained after only a second three-dimensional point cloud feature map output by a last decoding layer and the first three-dimensional point cloud feature map output by the encoding layer with the same data size as the decoding layer are spliced.

2. The method of claim 1, wherein the configuration parameters comprise:

3. The method of claim 2, wherein in the encoding network, the PointSIFT model is configured to characterize the original three-dimensional point cloud data or a first three-dimensional point cloud feature map output by a previous encoding layer; the X convolution layer is used for extracting the characteristics of a first three-dimensional point cloud characteristic image output by a PointSIFT model of the same coding layer and then inputting the characteristic image to the next coding layer, or inputting the characteristic image to the decoding network and a decoding layer with the same data size as the coding layer to which the X convolution layer belongs.

4. The method of any one of claims 1 to 3, wherein the step of inputting the second three-dimensional point cloud feature map output by the decoding network to a full convolutional neural network and a full connection conditional random field layer to perform segmentation processing, to obtain classification information of the original three-dimensional point cloud data includes:

After a second three-dimensional point cloud characteristic diagram output by the decoding network is input to a weight discarding dropout layer, calculating the prediction classification score of each data point in the second three-dimensional point cloud characteristic diagram through a full convolution neural network;

performing target segmentation on the second three-dimensional point cloud feature map according to the prediction classification score to obtain an initial prediction classification distribution map;

and splicing the initial prediction classification distribution map with a second three-dimensional point cloud characteristic map output by the decoding network, and inputting the spliced initial prediction classification distribution map and the second three-dimensional point cloud characteristic map into a fully-connected conditional random field layer to correct the prediction classification score so as to obtain a corrected prediction classification distribution map.

5. The method of claim 4, wherein the step of inputting the second three-dimensional point cloud feature map output via the decoding network to a weight discard dropout layer comprises:

6. The method of claim 4, wherein after the step of concatenating the initial predictive classification map with the second three-dimensional point cloud feature map output via the decoding network, inputting the concatenated conditional random field layer to modify the predictive classification score to obtain a modified predictive classification map, further comprising:

7. An apparatus for object segmentation of three-dimensional point cloud data, the apparatus comprising:

The target segmentation unit is used for inputting the second three-dimensional point cloud characteristic image output by the decoding network into a full convolution neural network and a full connection conditional random field layer to carry out segmentation processing, so as to obtain classification information of the original three-dimensional point cloud data; the first three-dimensional point cloud feature graphs output by different encoding layers have different characteristics, and the second three-dimensional point cloud feature graphs output by different decoding layers have different characteristics;

8. The apparatus of claim 7, wherein the configuration parameters comprise:

9. Terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, realizes the steps of the method for object segmentation of three-dimensional point cloud data according to any of claims 1 to 6.

10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the method for object segmentation of three-dimensional point cloud data according to any one of claims 1 to 6.