WO2024040954A1 - Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus - Google Patents

Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus Download PDF

Info

Publication number
WO2024040954A1
WO2024040954A1 PCT/CN2023/082749 CN2023082749W WO2024040954A1 WO 2024040954 A1 WO2024040954 A1 WO 2024040954A1 CN 2023082749 W CN2023082749 W CN 2023082749W WO 2024040954 A1 WO2024040954 A1 WO 2024040954A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
sub
projection image
surround
segmentation result
Prior art date
Application number
PCT/CN2023/082749
Other languages
French (fr)
Chinese (zh)
Inventor
温欣
Original Assignee
北京京东乾石科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东乾石科技有限公司 filed Critical 北京京东乾石科技有限公司
Publication of WO2024040954A1 publication Critical patent/WO2024040954A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, and more specifically, to a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device and a storage medium.
  • point cloud data is widely used in many fields such as autonomous driving and robot grabbing.
  • deep learning technology has shown good performance in point cloud data processing. Since point cloud data collected through various sensors are usually unlabeled data, and the cost of manually labeling data is high, semi-supervised training is usually used to build deep neural networks in related technologies.
  • the present disclosure provides a point cloud semantic segmentation network training method, point cloud semantic segmentation method, device, electronic device, readable storage medium and computer program product.
  • One aspect of the present disclosure provides a point cloud semantic segmentation network training method, including: mapping multiple sets of point cloud data to initial views to obtain multiple surround projections; based on the preset size, respectively mapping the first surround projection
  • the image and the second surround projection image are partitioned to obtain multiple first partition images and multiple second partition images, wherein the above-mentioned first surround view projection image and the above-mentioned second surround view projection image belong to multiple of the above-mentioned surround view projection images; from Determine a plurality of first target partition maps among a plurality of the above-mentioned first partition maps; utilize the plurality of the above-mentioned first target partition maps.
  • Each of the above-mentioned first target partition maps replaces the second target partition map in the above-mentioned second surrounding projection image to obtain a hybrid projection image, wherein the above-mentioned second target partition map belongs to multiple above-mentioned second partition maps, and the above-mentioned third target partition map
  • the position of the first target partition map is the same as the above-mentioned second target partition map; and the above-mentioned first surround projection image and the above-mentioned hybrid projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.
  • the above-mentioned first surround projection image and the above-mentioned hybrid projection image are used as training samples to train the initial network, and the point cloud semantic segmentation network is obtained, which includes: separately using the above-mentioned first surround projection image and the above-mentioned hybrid projection image.
  • the projection map is input into the above-mentioned initial network to obtain the first feature map and the first segmentation result corresponding to the above-mentioned first surround projection map, and the second feature map and the second segmentation result corresponding to the above-mentioned mixed projection map; calculate the above-mentioned first
  • the information entropy loss between the feature map and the above-mentioned second feature map is used to obtain the first loss value
  • the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result is calculated to obtain the second loss value
  • using the above-mentioned third loss value The first loss value and the second loss value are used to adjust the model parameters of the above-mentioned initial network to finally obtain the above-mentioned point cloud data semantic segmentation network.
  • the above-mentioned calculation of the information entropy loss between the above-mentioned first feature map and the above-mentioned second feature map to obtain the first loss value includes: determining from the above-mentioned first feature map the information associated with a plurality of the above-mentioned first targets.
  • a first sub-feature map related to the partition map split the above-mentioned second feature map into a second sub-feature map related to a plurality of the above-mentioned first target partition maps and a third sub-character map unrelated to the plurality of the above-mentioned first target partition maps feature map; and when the confidence probability of the above-mentioned first sub-feature map is greater than the preset threshold, using the above-mentioned first sub-feature map and the above-mentioned second sub-feature map as a positive sample pair, using the above-mentioned first sub-feature map and the above-mentioned
  • the third sub-feature map is used as a negative sample pair, and the information entropy loss between the above-mentioned positive sample pair and the above-mentioned negative sample pair is calculated to obtain the above-mentioned first loss value.
  • the above-mentioned calculation of the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result to obtain the second loss value includes: determining from the above-mentioned first segmentation result a number of the above-mentioned first targets.
  • a first sub-segmentation result related to the partition map determining a second sub-segmentation result related to a plurality of the above-mentioned first target partition maps from the above-mentioned second segmentation result; based on the confidence probability of the above-mentioned first sub-segmentation result and the above-mentioned second sub-segmentation result split result Confidence probability, determine the predicted value and label value; and calculate the cross-entropy loss between the above-mentioned predicted value and the above-mentioned label value, to obtain the above-mentioned second loss value.
  • determining the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes: when the confidence probability of the first sub-segmentation result is greater than the above-mentioned In the case of the confidence probability of the second sub-segmentation result, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned label value, and the above-mentioned second sub-segmentation result is the above-mentioned predicted value; and when the confidence probability of the above-mentioned first sub-segmentation result is less than the above-mentioned second sub-segmentation result In the case of the confidence probability of the sub-segmentation results, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned predicted value, and the above-mentioned second sub
  • the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label; wherein the above-mentioned calculation is between the first segmentation result and the second segmentation result.
  • cross-entropy loss to obtain a second loss value including: determining a third sub-segmentation result related to the above-mentioned third target partition map from the above-mentioned first segmentation result, and having nothing to do with the above-mentioned third target partition map and related to multiple of the above-mentioned
  • the fifth sub-segmentation result related to the above-mentioned third target partition map is determined from the above-mentioned second segmentation result, and has nothing to do with the above-mentioned third target partition map and is related to multiple of the above-mentioned
  • the sixth sub-segmentation result related to the first target partition map calculate the cross-entrop
  • the above-mentioned initial network includes an encoder and a decoder; wherein the above-mentioned first surround-view projection image and the above-mentioned mixed projection image are respectively input into the above-mentioned initial network to obtain a third image corresponding to the above-mentioned first surround-view projection image.
  • a feature map and a first segmentation result, as well as a second feature map and a second segmentation result corresponding to the above-mentioned mixed projection image including: respectively inputting the above-mentioned first surround projection image and the above-mentioned mixed projection image into the above-mentioned encoder to obtain the above-mentioned
  • the first image feature corresponding to the first surround projection image and the second image feature corresponding to the above-mentioned hybrid projection image and inputting the above-mentioned first image feature and the above-mentioned second image feature into the above-mentioned decoder respectively to obtain the above-mentioned first surround-view projection
  • the above-mentioned mapping of multiple sets of point cloud data to the initial view to obtain multiple surround projections includes: for each set of the above-mentioned point cloud data, separately mapping the three-dimensional image of each point in the above-mentioned point cloud data.
  • the coordinate data is subjected to polar coordinate conversion to obtain the polar coordinate data of each point in the above point cloud data; based on the polar coordinate data of each point in the above point cloud data, multiple points in the above point cloud data are respectively mapped to the above Among multiple grids of the initial view; for each grid of the above-mentioned initial view, determining the characteristic data of the above-mentioned grid based on the three-dimensional coordinate data and polar coordinate data of the points in the above-mentioned grid; and based on a plurality of the above-mentioned grids
  • the characteristic data is used to construct the above-mentioned surround projection image.
  • Another aspect of the present disclosure provides a point cloud semantic segmentation method, including: mapping target point cloud data to an initial view to obtain a surround projection image; and inputting the above surround projection image into a point cloud semantic segmentation network to obtain the above The semantic segmentation feature map of the target point cloud data; wherein, the above-mentioned point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method as described above.
  • a point cloud semantic segmentation network training device including: a first mapping module for mapping multiple sets of point cloud data to initial views respectively to obtain multiple surround projections; a first processing A module for performing partition processing on the first surround projection image and the second surround projection image respectively based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the above-mentioned first surround projection image and The above-mentioned second surround projection picture belongs to a plurality of the above-mentioned surround projection pictures; a determination module is used to determine a plurality of first target partition pictures from a plurality of the above-mentioned first partition pictures; a second processing module is used to utilize a plurality of the above-mentioned first partition pictures.
  • Each first target partition map in the target partition map replaces the second target partition map in the second surrounding projection map to obtain a hybrid projection map, wherein the second target partition map belongs to multiple second target partition maps.
  • a partition map, the above-mentioned first target partition map and the above-mentioned second target partition map are in the same position; and a training module for using the above-mentioned first surround projection map and the above-mentioned mixed projection map as training samples to train the initial network to obtain points Cloud semantic segmentation network.
  • a point cloud semantic segmentation device including: second a mapping module for mapping the target point cloud data to the initial view to obtain a surround projection image; and a third processing module for inputting the above surround projection image into the point cloud semantic segmentation network to obtain the semantics of the above target point cloud data Segmentation feature map; wherein, the above-mentioned point cloud semantic segmentation network is trained by using the point cloud semantic segmentation network training method as described above.
  • Another aspect of the present disclosure provides an electronic device, including: one or more processors; a memory for storing one or more instructions, wherein when the one or more instructions are processed by the one or more processors When executed, the above one or more processors are caused to implement the above method.
  • Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions, which when executed are used to implement the method as described above.
  • Another aspect of the present disclosure provides a computer program product, which includes computer-executable instructions that, when executed, are used to implement the method as described above.
  • the point cloud data when training a point cloud semantic segmentation network, can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network.
  • partition mixing forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability.
  • the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex.
  • the utilization efficiency of hardware resources during network training can be effectively improved.
  • FIG. 1 schematically illustrates an exemplary system architecture in which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and a device can be applied according to an embodiment of the present disclosure.
  • Figure 2 schematically shows a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.
  • Figure 3 schematically shows a schematic diagram of the training process of a point cloud semantic segmentation network according to an embodiment of the present disclosure.
  • Figure 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.
  • Figure 5 schematically shows a block diagram of a point cloud semantic segmentation network training device according to an embodiment of the present disclosure.
  • Figure 6 schematically shows a block diagram of a point cloud semantic segmentation device according to an embodiment of the present disclosure.
  • FIG. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure.
  • embodiments of the present disclosure provide a method that can effectively utilize a large amount of lidar raw point cloud data, supplemented by a small amount of labeled data, to conduct semi-supervised training of a point cloud semantic segmentation network.
  • this method proposed a partitioned mixed data
  • the enhancement strategy improves the recognition difficulty of the model by mixing two different surround projection images, reducing the loss of shape information of the three-dimensional point cloud during the data enhancement process, thereby improving the training effect of the model, and the robustness and reliability of the model. .
  • inventions of the present disclosure provide a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device, and a storage medium.
  • the point cloud semantic segmentation network training method includes: mapping multiple sets of point cloud data to the initial view respectively to obtain multiple surround projection images; based on the preset size, partitioning the first surround projection image and the second surround projection image respectively Processing to obtain a plurality of first partition maps and a plurality of second partition maps, wherein the first surround projection map and the second surround projection map belong to multiple surround projection views; multiple first partition maps are determined from the plurality of first partition maps Target partition map; use each first target partition map in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection map to obtain a hybrid projection map, wherein the second target partition map Belonging to multiple second partition maps, the positions of the first target partition map and the second target partition map are the same; and the first surround projection map and the mixed projection map are used as training samples to train the initial network to obtain the point cloud semantic segmentation
  • FIG. 1 schematically illustrates an exemplary system architecture in which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and a device can be applied according to an embodiment of the present disclosure.
  • Figure 1 is only an example of a system architecture to which embodiments of the present disclosure can be applied, to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure cannot be used in other applications.
  • Device, system, environment or scenario are examples of a system architecture to which embodiments of the present disclosure can be applied.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105.
  • the terminal devices 101, 102, and 103 may be various types of equipment equipped with lidar, or may be various types of electronic equipment capable of controlling lidar, or may be various types of electronic equipment capable of storing point cloud data.
  • the network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105.
  • Network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
  • Server 105 may be a server that provides various services.
  • the server may be The training process of point cloud semantic segmentation network provides support of computing resources and storage resources.
  • the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure can generally be executed by the server 105 .
  • the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can generally be installed in the server 105 .
  • the terminal devices 101, 102, and 103 can collect point cloud data, or the terminal devices 101, 102, and 103 can obtain point cloud data collected by other terminal devices through the Internet, and the point cloud data can be sent to the server 105 through the network.
  • the server 105 executes the method provided by the embodiment of the present disclosure to implement the training of the point cloud semantic segmentation network or perform point cloud semantic segmentation on the point cloud data.
  • the point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure can also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105.
  • the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure may also be provided on a server or server different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. in a server cluster.
  • the point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure can also be executed by the terminal device 101, 102, or 103, or can also be executed by a device different from the terminal device 101, 102, or 103.
  • Other terminal devices execute.
  • the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can also be provided in the terminal device 101, 102, or 103, or be provided in a device different from the terminal device 101, 102, or 103. in other terminal devices.
  • Figure 2 schematically shows a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.
  • the method includes operations S201 to S205.
  • partition processing is performed on the first surround projection image and the second surround projection image respectively to obtain multiple first partition images and multiple second partition images, wherein, The first surround projection image and the second surround projection image belong to multiple surround projection images.
  • a plurality of first target partition maps are determined from a plurality of first partition maps.
  • each first target partition map in the plurality of first target partition maps is used to respectively replace the second target partition map in the second surround projection image to obtain a hybrid projection image, wherein the second target partition map Belonging to multiple second partition maps, the positions of the first target partition map and the second target partition map are the same.
  • the first surround projection image and the mixed projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.
  • point cloud data can be collected using sensing equipment such as rotating scanning lidar.
  • Each set of point cloud data can be configured with a preset rectangular coordinate system.
  • Each point in the point cloud data can represent is a three-dimensional coordinate under the Cartesian coordinate system, and the center of the Cartesian coordinate system can represent the position of the sensing device when collecting the point cloud data.
  • point cloud data collected using rotating scanning lidar can be distributed in a sphere, and the initial view can be obtained by unfolding the annular surface of the sphere near a horizontal plane.
  • the direction vector when mapping the point can be determined based on the coordinates of the point, and then the direction vector can be used to project the point onto the initial view.
  • partitioning the surround projection image based on a preset size may equally divide the surround projection image into a plurality of rectangular areas.
  • the size of the preset size can be determined according to the size of the surround projection image in a specific application scenario, and is not limited here.
  • the resolution of the surround projection image can be 24 ⁇ 480.
  • partitioning the surround projection image it can be equally divided into 16 parts and 6 parts along the length and width directions, thereby dividing the surround projection image into a total of 96 pieces of resolution.
  • the rate is 4 ⁇ 30 partition map.
  • the first surround projection image and the second surround projection image may be randomly selected from a plurality of surround projection images.
  • the first surround projection image and the second surround projection image may have completely different characteristics, that is, the point cloud data corresponding to the first surround projection image and the point cloud data corresponding to the second surround projection image may be for different scenes. Collected from different items below.
  • the first target partition map can be randomly collected from multiple first partition maps, and the first target partition map can occupy a certain proportion in the first partition map, and the proportion can be, for example, 25%, 30%, etc., are not limited here.
  • the operation of respectively replacing the second target partition map in the second surround projection image with each first target partition map in the plurality of first target partition maps may include: according to the first target partition map The position information of the picture is determined from the second surround projection picture, and after the second target partition picture is deleted, the first target partition picture is filled in the corresponding position.
  • the method used in training the initial network is not limited here.
  • it may be the gradient descent method, the least squares method, etc.
  • the training parameters set when training the initial network such as the number of training times, batch capacity, learning rate, etc., can be set according to specific application scenarios and are not limited here.
  • the point cloud data when training a point cloud semantic segmentation network, can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network.
  • partition mixing forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability.
  • the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex.
  • the utilization efficiency of hardware resources during network training can be effectively improved.
  • the surround projection image can be obtained by using the method of operation S201.
  • operation S201 can include the following operations:
  • For each set of point cloud data perform polar coordinate transformation on the three-dimensional coordinate data of each point in the point cloud data to obtain the polar coordinate data of each point in the point cloud data; based on the point cloud data Based on the polar coordinate data of each point in the data, multiple points in the point cloud data are mapped to multiple grids in the initial view; for each grid in the initial view, based on the three-dimensional coordinate data of the points in the grid and polar coordinate data to determine the characteristic data of the grid; and, based on the characteristic data of multiple grids, construct a surround projection map.
  • each point in the point cloud data may have three-dimensional coordinate data, that is, x, y, and z.
  • the transformed coordinates yaw and pitch under the rotating coordinate system can be obtained, That is, polar coordinate data.
  • the grid of the initial view may refer to a pixel color block corresponding to a single pixel point in the initial view.
  • the resolution of the initial view may be 20 ⁇ 480, then the initial view may have 9600 pixel color patches, and correspondingly, the initial view may have 9600 rasters.
  • the feature data of the point closest to the origin among the multiple points can be taken as the feature data of the grid.
  • the characteristic data of the point may include three-dimensional coordinate data, polar coordinate data, and data processed based on the three-dimensional coordinate data and polar coordinate data, such as reflectivity data, depth data, etc.
  • Figure 3 schematically shows a schematic diagram of the training process of a point cloud semantic segmentation network according to an embodiment of the present disclosure.
  • the training process of the point cloud semantic segmentation network can include a sample preprocessing process and a network iterative training process.
  • a part of the partitions in the first surround projection image may be replaced into the second surround projection image to obtain a hybrid projection image.
  • a part of the partitions in the first surround projection image may be replaced into the second surround projection image to obtain a hybrid projection image.
  • the network iterative training process may be to input the first surround projection image and the mixed projection image as a sample pair into the initial network, and based on the set loss function and gradient descent method, least squares method and other models The iterative method adjusts the model parameters of the initial network to achieve training of the initial network.
  • the initial network may include an encoder and a decoder.
  • the first surround projection image and the hybrid projection image are respectively input into the initial network to obtain the first feature map and the first segmentation corresponding to the first surround projection image.
  • the results, as well as the second feature map and the second segmentation result corresponding to the hybrid projection map may include the following operations:
  • the image features are input into the decoder to obtain a first feature map and a first segmentation result corresponding to the first surround projection image, and a second feature map and a second segmentation result corresponding to the hybrid projection image.
  • the encoder can be any feature extraction network, such as ResNet18, etc.
  • the decoder can be any feature upsampling network, such as UpperNet, etc.
  • the network iterative training process may specifically include the following operations:
  • the first segmentation result may represent a semantic feature segmentation result of each area in the first surround projection image.
  • the first feature map may have the same size as the first surround projection image, and areas with different semantic features on the first feature map may have different color features.
  • areas with different semantic features on the first feature map may respectively refer to areas where people, cars, and obstacles are located, and the three areas may be represented by red, blue, and green respectively.
  • calculating the information entropy loss between the first feature map and the second feature map, and obtaining the first loss value may include the following operations:
  • the first sub-feature map and the third sub-feature map are used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.
  • the first feature map may have the same size as the first surround projection image
  • the first feature map may be obtained based on the position information of the first target partition map in the first surround projection image.
  • the first sub-feature map is determined.
  • the method of calculating the confidence probability of the first sub-feature map is not limited here.
  • the Gaussian formula can be used to determine the confidence probability.
  • the preset threshold can be determined according to specific application scenarios, for example, it can be set to 90%, 95%, etc., which is not limited here.
  • the calculation method of information entropy loss can be as shown in formula (1):
  • L 1 represents information entropy loss
  • fp represents the first sub-feature map
  • fx represents the second sub-feature map
  • fy represents the third sub-feature map.
  • calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:
  • the first segmentation result may have the same size as the first surround projection image, and thus may be determined from the first segmentation result based on the position information of the first target partition map in the first surround projection image.
  • the first sub-segmentation result may have the same size as the first surround projection image, and thus may be determined from the first segmentation result based on the position information of the first target partition map in the first surround projection image.
  • the calculation method of the confidence probability of the first sub-segmentation result and the second sub-segmentation result is not limited here.
  • the Gaussian formula can be used to determine the confidence probability.
  • the prediction value and the label value can be determined respectively by comparing the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result. Specifically, when the confidence probability of the first sub-segmentation result is greater than In the case of the confidence probability of the second sub-segmentation result, the first sub-segmentation result is determined as the label value and the second sub-segmentation result is the predicted value; when the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result In this case, it is determined that the first sub-segmentation result is the predicted value, and the second sub-segmentation result is the label value.
  • L 2 represents cross entropy loss
  • y represents the label value
  • y p represents the predicted value
  • the total loss used to adjust the model parameters of the initial network may be a weighted sum of information entropy loss and cross-entropy loss, and its weight may be a hyperparameter that can be used by users when tuning the model. Any setting.
  • the plurality of first target partition maps may include a third target partition map, and the third target partition map has a real label.
  • calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:
  • the original unlabeled data can be used and a small amount of labeled data can be used to train the network, realizing semi-supervised training of the point cloud semantic segmentation network, thereby ensuring the network Based on the semantic segmentation effect, the cost of data annotation is reduced.
  • Figure 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.
  • the method includes operations S401 to S402.
  • the target point cloud data is mapped to the initial view to obtain a surround projection image.
  • the surround projection image is input into the point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.
  • the point cloud semantic segmentation network can be trained using the method in the point cloud semantic segmentation network training method section, which will not be described again here.
  • Figure 5 schematically shows a block diagram of a point cloud semantic segmentation network training device according to an embodiment of the present disclosure.
  • the point cloud semantic segmentation network training device 500 includes a first mapping module 510 , a first processing module 520 , a determination module 530 , a second processing module 540 and a training module 550 .
  • the first mapping module 510 is used to map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections.
  • the first processing module 520 is configured to perform partition processing on the first surround view projection image and the second surround view projection image based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the first surround view projection image The projection image and the second surround projection image belong to multiple surround projection images.
  • the determining module 530 is configured to determine a plurality of first target partition maps from a plurality of first partition maps.
  • the second processing module 540 is configured to use each first target partition map in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image, to obtain to a mixed projection map, wherein the second target partition map belongs to multiple second partition maps, and the first target partition map and the second target partition map are at the same position.
  • the training module 550 is used to train the initial network using the first surround projection image and the mixed projection image as training samples to obtain a point cloud semantic segmentation network.
  • the point cloud data when training a point cloud semantic segmentation network, can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network.
  • partition mixing forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability.
  • the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex.
  • the utilization efficiency of hardware resources during network training can be effectively improved.
  • the training module 550 includes a first training sub-module, a second training sub-module, a third training sub-module and a fourth training sub-module.
  • the first training submodule is used to input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first feature map corresponding to the mixed projection image.
  • the second feature map and the second segmentation result are used to input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first feature map corresponding to the mixed projection image.
  • the second feature map and the second segmentation result is used to input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first feature map corresponding to the mixed projection image.
  • the second training submodule is used to calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value.
  • the third training submodule is used to calculate the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value.
  • the fourth training submodule is used to use the first loss value and the second loss value to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.
  • the second training sub-module includes a first training unit, a second training unit and a third training unit.
  • the first training unit is configured to determine the first sub-feature map related to the plurality of first target partition maps from the first feature map.
  • the second training unit is used to split the second feature map into a second sub-feature map related to the plurality of first target partition maps and a third sub-feature map unrelated to the plurality of first target partition maps.
  • the third training unit is used to use the first sub-feature map and the second sub-feature map as a positive sample pair when the confidence probability of the first sub-feature map is greater than the preset threshold, and use the first sub-feature map and the third sub-feature map as a positive sample pair.
  • the sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.
  • the third training sub-module includes a fourth training unit, a fifth training unit, a sixth training unit and a seventh training unit.
  • the fourth training unit is configured to determine the first sub-segmentation results related to the plurality of first target partition maps from the first segmentation results.
  • a fifth training unit is configured to determine second sub-segmentation results related to the plurality of first target partition maps from the second segmentation results.
  • the sixth training unit is used to determine the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result.
  • the seventh training unit is used to calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.
  • the sixth training unit includes a first training sub-unit and a second training sub-unit.
  • the first training subunit is used to determine that the first sub-segmentation result is the label value and the second sub-segmentation result is the predicted value when the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result.
  • the second training subunit is used to determine that the first sub-segmentation result is the predicted value and the second sub-segmentation result is the label value when the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result.
  • the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label.
  • the third training sub-module includes an eighth training unit, a ninth training unit, a tenth training unit, an eleventh training unit and a twelfth training unit.
  • the eighth training unit is used to determine, from the first segmentation result, a third sub-segmentation result related to the third target partition map, and a fourth sub-segmentation result that is independent of the third target partition map and related to the plurality of first target partition maps. Segmentation results.
  • a ninth training unit configured to determine a fifth sub-segmentation result related to the third target partition map from the second segmentation result, and a sixth sub-segmentation result that is independent of the third target partition map and related to the plurality of first target partition maps. Segmentation results.
  • the tenth training unit is used to calculate the cross-entropy loss between the third sub-segmentation result and the real label to obtain the third loss value.
  • the eleventh training unit is used to calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain the fourth loss value.
  • the twelfth training unit is used to determine the second loss value based on the third loss value and the fourth loss value.
  • the initial network includes an encoder and a decoder.
  • the first training sub-module includes a thirteenth training unit and a fourteenth training unit.
  • the thirteenth training unit is used to input the first surround projection image and the hybrid projection image into the encoder respectively, and obtain the first image feature corresponding to the first surround projection image and the second image feature corresponding to the hybrid projection image.
  • the fourteenth training unit is used to input the first image feature and the second image feature into the decoder respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first segmentation result corresponding to the mixed projection image. Two feature maps and second segmentation results.
  • the first mapping module 510 includes a first mapping unit, a second mapping unit, a third mapping unit and a fourth mapping unit.
  • the first mapping unit is used to perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data for each set of point cloud data, so as to obtain the polar coordinate data of each point in the point cloud data.
  • the second mapping unit is used to map multiple points in the point cloud data to multiple grids in the initial view based on the polar coordinate data of each point in the point cloud data.
  • the third mapping unit is used for determining, for each grid of the initial view, the characteristic data of the grid based on the three-dimensional coordinate data and polar coordinate data of the points in the grid.
  • the fourth mapping unit is used to construct a surround projection map based on the feature data of multiple grids.
  • point cloud semantic segmentation network training device part in the embodiment of the present disclosure corresponds to the point cloud semantic segmentation network training method part in the embodiment of the present disclosure.
  • the description of the point cloud semantic segmentation network training device part is specific. Refer to the point cloud semantic segmentation network training method section and will not go into details here.
  • Figure 6 schematically shows a block diagram of a point cloud semantic segmentation device according to an embodiment of the present disclosure.
  • the point cloud semantic segmentation device 600 includes a second mapping module 610 and a third processing module 620 .
  • the second mapping module 610 is used to map the target point cloud data to the initial view to obtain a surround projection image.
  • the third processing module 620 is used to input the surround projection image into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.
  • the point cloud semantic segmentation network can be trained using the method in the point cloud semantic segmentation network training method section, which will not be described again here.
  • modules, sub-modules, units, sub-units may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be split into multiple modules for implementation.
  • any one or more of the modules, sub-modules, units, and sub-units according to embodiments of the present disclosure may be at least partially implemented as hardware circuits, such as field programmable gate arrays (FPGAs), programmable logic arrays (PLA), System-on-a-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuits that can be implemented in hardware or firmware, or in a combination of software, hardware, and firmware Any one of these implementation methods or an appropriate combination of any of them.
  • FPGAs field programmable gate arrays
  • PLA programmable logic arrays
  • ASIC application-specific integrated circuit
  • one or more of the modules, sub-modules, units, and sub-units according to embodiments of the present disclosure It can be at least partially implemented as a computer program module, and when the computer program module is executed, it can perform corresponding functions.
  • the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or any more of the second mapping module 610 and the third processing module 620 can be combined into one Implemented in modules/units/subunits, or any one of the modules/units/subunits can be split into multiple modules/units/subunits. Alternatively, at least part of the functionality of one or more of these modules/units/subunits may be combined with at least part of the functionality of other modules/units/subunits and combined in one module/unit/subunit realized in.
  • At least one of the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or the second mapping module 610 and the third processing module 620 may be implemented, at least in part, as hardware circuitry, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (ASIC), or may It can be implemented by hardware or firmware in any other reasonable way to integrate or package circuits, or it can be implemented in any one of the three implementation methods of software, hardware and firmware or in an appropriate combination of any of them.
  • FPGA field programmable gate array
  • PLA programmable logic array
  • ASIC application specific integrated circuit
  • the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or at least one of the second mapping module 610 and the third processing module 620 can be at least Partially implemented as computer program modules, when the computer program modules are executed, corresponding functions can be performed.
  • FIG. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure.
  • the electronic device shown in FIG. 7 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • a computer electronic device 700 includes a processor 701 that can be loaded into a random access memory (RAM) 703 according to a program stored in a read-only memory (ROM) 702 or from a storage portion 708 perform various appropriate actions and processing according to the program in it.
  • processor 701 may include, for example, a general purpose microprocessor (eg, a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)), among others.
  • Processor 701 may also include onboard memory for caching purposes.
  • the processor 701 may include different actions for performing the method flow according to embodiments of the present disclosure. A single processing unit or multiple processing units.
  • the processor 701, ROM 702 and RAM 703 are connected to each other through a bus 704.
  • the processor 701 performs various operations according to the method flow of the embodiment of the present disclosure by executing programs in the ROM 702 and/or RAM 703. It should be noted that the program may also be stored in one or more memories other than ROM 702 and RAM 703.
  • the processor 701 may also perform various operations according to the method flow of embodiments of the present disclosure by executing programs stored in the one or more memories.
  • the electronic device 700 may further include an input/output (I/O) interface 705 that is also connected to the bus 704 .
  • Electronic device 700 may also include one or more of the following components connected to I/O interface 705: an input portion 706 including a keyboard, mouse, etc.; including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and an output section 707 of speakers and the like; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem and the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • Driver 710 is also connected to I/O interface 705 as needed.
  • Removable media 711 such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 710 as needed, so that a computer program read therefrom is installed into the storage portion 708 as needed.
  • the method flow according to the embodiments of the present disclosure may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication portion 709 and/or installed from removable media 711 .
  • the computer program is executed by the processor 701
  • the above-described functions defined in the system of the embodiment of the present disclosure are performed.
  • the systems, devices, devices, modules, units, etc. described above may be implemented by computer program modules.
  • the present disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the device/device/system described in the above embodiments; it may also exist independently without being assembled into the device/system. in the device/system.
  • the above computer-readable storage media carries There are one or more programs, and when the above one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include but are not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the computer-readable storage medium may include one or more memories other than ROM 702 and/or RAM 703 and/or ROM 702 and RAM 703 described above.
  • Embodiments of the present disclosure also include a computer program product, which includes a computer program.
  • the computer program includes program code for executing the method provided by the embodiment of the present disclosure.
  • the program product is run on an electronic device, the program The code is used to enable the electronic device to implement the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure.
  • the computer program may rely on tangible storage media such as optical storage devices and magnetic storage devices.
  • the computer program can also be transmitted and distributed in the form of a signal on a network medium, and downloaded and installed through the communication part 709, and/or installed from the removable medium 711.
  • the program code contained in the computer program can be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • the program code for executing the computer program provided by the embodiments of the present disclosure may be written in any combination of one or more programming languages. Specifically, high-level procedural and/or object-oriented programming may be utilized. programming language, and/or assembly/machine language to implement implement these calculation procedures. Programming languages include, but are not limited to, programming languages such as Java, C++, python, "C" language or similar programming languages.
  • the program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device, such as provided by an Internet service. (business comes via Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service business comes via Internet connection
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented by special purpose hardware-based systems that perform the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions.
  • Those skilled in the art will understand that features recited in various embodiments and/or claims of the present disclosure may be combined and/or combined in various ways, even if such combinations or combinations are not explicitly recited in the present disclosure. In particular, various combinations and/or combinations of features recited in the various embodiments and/or claims of the disclosure may be made without departing from the spirit and teachings of the disclosure. All such combinations and/or combinations fall within the scope of this disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

Provided in the present disclosure are a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and apparatus, an electronic device, and a storage medium, applicable in the technical field of artificial intelligence. The method comprises: respectively mapping a plurality of groups of point cloud data into an initial view to obtain a plurality of surround-view projection images; on the basis of a preset size, respectively partitioning a first surround-view projection image and a second surround-view projection image to obtain a plurality of first partition images and a plurality of second partition images; determining a plurality of first target partition images from the plurality of first partition images; replacing second target partition images in the second surround-view projection images by using each of the plurality of first target partition images to obtain a mixed projection image; and training an initial network by using the first surround-view projection image and the mixed projection image as training samples to obtain a point cloud semantic segmentation network.

Description

点云语义分割网络训练方法、点云语义分割方法及装置Point cloud semantic segmentation network training method, point cloud semantic segmentation method and device
本申请要求于2022年8月24日递交的中国专利申请No.202211022552.3的优先权,其内容一并在此作为参考。This application claims priority from Chinese Patent Application No. 202211022552.3 submitted on August 24, 2022, the content of which is hereby incorporated by reference.
技术领域Technical field
本公开涉及人工智能技术领域,更具体地,涉及一种点云语义分割网络训练方法、点云语义分割方法、装置、电子设备和存储介质。The present disclosure relates to the field of artificial intelligence technology, and more specifically, to a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device and a storage medium.
背景技术Background technique
随着三维传感技术的发展,点云数据被广泛应用于自动驾驶、机器人抓取等诸多领域。深度学习技术作为点云数据分析的主流解决方案,在点云数据处理方面表现出了良好的性能。由于通过各式传感器收集到的点云数据通常为无标注数据,且人工标注数据的成本较高,因此,相关技术中通常利用半监督训练的方式来构建深度神经网络。With the development of three-dimensional sensing technology, point cloud data is widely used in many fields such as autonomous driving and robot grabbing. As a mainstream solution for point cloud data analysis, deep learning technology has shown good performance in point cloud data processing. Since point cloud data collected through various sensors are usually unlabeled data, and the cost of manually labeling data is high, semi-supervised training is usually used to build deep neural networks in related technologies.
在相关技术中,半监督训练算法提升语义分割任务的研究主要集中在二维图像领域,将该方法直接应用与三维点云的分割任务时会产生三维形状失真的问题,从而间接导致点云数据的语义分割效果较差。In related technologies, research on semi-supervised training algorithms to improve semantic segmentation tasks is mainly concentrated in the field of two-dimensional images. When this method is directly applied to the segmentation task of three-dimensional point clouds, it will cause the problem of three-dimensional shape distortion, which indirectly leads to point cloud data The semantic segmentation effect is poor.
发明内容Contents of the invention
有鉴于此,本公开提供了一种点云语义分割网络训练方法、点云语义分割方法、装置、电子设备、可读存储介质和计算机程序产品。In view of this, the present disclosure provides a point cloud semantic segmentation network training method, point cloud semantic segmentation method, device, electronic device, readable storage medium and computer program product.
本公开的一个方面提供了一种点云语义分割网络训练方法,包括:将多组点云数据分别映射到初始视图中,得到多个环视投影图;基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中,上述第一环视投影图和上述第二环视投影图属于多个上述环视投影图;从多个上述第一分区图中确定多个第一目标分区图;利用多个上述第一目标分区图中的 每个上述第一目标分区图分别对上述第二环视投影图中的第二目标分区图进行替换,得到混合投影图,其中,上述第二目标分区图属于多个上述第二分区图,上述第一目标分区图与上述第二目标分区图的位置相同;以及将上述第一环视投影图和上述混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。One aspect of the present disclosure provides a point cloud semantic segmentation network training method, including: mapping multiple sets of point cloud data to initial views to obtain multiple surround projections; based on the preset size, respectively mapping the first surround projection The image and the second surround projection image are partitioned to obtain multiple first partition images and multiple second partition images, wherein the above-mentioned first surround view projection image and the above-mentioned second surround view projection image belong to multiple of the above-mentioned surround view projection images; from Determine a plurality of first target partition maps among a plurality of the above-mentioned first partition maps; utilize the plurality of the above-mentioned first target partition maps. Each of the above-mentioned first target partition maps replaces the second target partition map in the above-mentioned second surrounding projection image to obtain a hybrid projection image, wherein the above-mentioned second target partition map belongs to multiple above-mentioned second partition maps, and the above-mentioned third target partition map The position of the first target partition map is the same as the above-mentioned second target partition map; and the above-mentioned first surround projection image and the above-mentioned hybrid projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.
根据本公开的实施例,上述将上述第一环视投影图和上述混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络,包括:分别将上述第一环视投影图和上述混合投影图输入上述初始网络中,得到与上述第一环视投影图对应的第一特征图谱和第一分割结果,以及与上述混合投影图对应的第二特征图谱和第二分割结果;计算上述第一特征图谱和上述第二特征图谱之间的信息熵损失,得到第一损失值;计算上述第一分割结果和上述第二分割结果之间的交叉熵损失,得到第二损失值;以及利用上述第一损失值和第二损失值来调整上述初始网络的模型参数,以最终得到上述点云数据语义分割网络。According to an embodiment of the present disclosure, the above-mentioned first surround projection image and the above-mentioned hybrid projection image are used as training samples to train the initial network, and the point cloud semantic segmentation network is obtained, which includes: separately using the above-mentioned first surround projection image and the above-mentioned hybrid projection image. The projection map is input into the above-mentioned initial network to obtain the first feature map and the first segmentation result corresponding to the above-mentioned first surround projection map, and the second feature map and the second segmentation result corresponding to the above-mentioned mixed projection map; calculate the above-mentioned first The information entropy loss between the feature map and the above-mentioned second feature map is used to obtain the first loss value; the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result is calculated to obtain the second loss value; and using the above-mentioned third loss value The first loss value and the second loss value are used to adjust the model parameters of the above-mentioned initial network to finally obtain the above-mentioned point cloud data semantic segmentation network.
根据本公开的实施例,上述计算上述第一特征图谱和上述第二特征图谱之间的信息熵损失,得到第一损失值,包括:从上述第一特征图谱中确定与多个上述第一目标分区图相关的第一子特征图谱;将上述第二特征图谱拆分为与多个上述第一目标分区图相关的第二子特征图谱和与多个上述第一目标分区图无关的第三子特征图谱;以及在上述第一子特征图谱的置信概率大于预设阈值的情况下,以上述第一子特征图谱和上述第二子特征图谱作为正样本对,以上述第一子特征图谱和上述第三子特征图谱作为负样本对,计算上述正样本对和上述负样本对之间的信息熵损失,得到上述第一损失值。According to an embodiment of the present disclosure, the above-mentioned calculation of the information entropy loss between the above-mentioned first feature map and the above-mentioned second feature map to obtain the first loss value includes: determining from the above-mentioned first feature map the information associated with a plurality of the above-mentioned first targets. A first sub-feature map related to the partition map; split the above-mentioned second feature map into a second sub-feature map related to a plurality of the above-mentioned first target partition maps and a third sub-character map unrelated to the plurality of the above-mentioned first target partition maps feature map; and when the confidence probability of the above-mentioned first sub-feature map is greater than the preset threshold, using the above-mentioned first sub-feature map and the above-mentioned second sub-feature map as a positive sample pair, using the above-mentioned first sub-feature map and the above-mentioned The third sub-feature map is used as a negative sample pair, and the information entropy loss between the above-mentioned positive sample pair and the above-mentioned negative sample pair is calculated to obtain the above-mentioned first loss value.
根据本公开的实施例,上述计算上述第一分割结果和上述第二分割结果之间的交叉熵损失,得到第二损失值,包括:从上述第一分割结果中确定与多个上述第一目标分区图相关的第一子分割结果;从上述第二分割结果中确定与多个上述第一目标分区图相关的第二子分割结果;基于上述第一子分割结果的置信概率和上述第二子分割结果的 置信概率,确定预测值和标签值;以及计算上述预测值和上述标签值之间的交叉熵损失,得到上述第二损失值。According to an embodiment of the present disclosure, the above-mentioned calculation of the cross-entropy loss between the above-mentioned first segmentation result and the above-mentioned second segmentation result to obtain the second loss value includes: determining from the above-mentioned first segmentation result a number of the above-mentioned first targets. A first sub-segmentation result related to the partition map; determining a second sub-segmentation result related to a plurality of the above-mentioned first target partition maps from the above-mentioned second segmentation result; based on the confidence probability of the above-mentioned first sub-segmentation result and the above-mentioned second sub-segmentation result split result Confidence probability, determine the predicted value and label value; and calculate the cross-entropy loss between the above-mentioned predicted value and the above-mentioned label value, to obtain the above-mentioned second loss value.
根据本公开的实施例,上述基于上述第一子分割结果的置信概率和上述第二子分割结果的置信概率,确定预测值和标签值,包括:在上述第一子分割结果的置信概率大于上述第二子分割结果的置信概率的情况下,确定上述第一子分割结果为上述标签值,上述第二子分割结果为上述预测值;以及在上述第一子分割结果的置信概率小于上述第二子分割结果的置信概率的情况下,确定上述第一子分割结果为上述预测值,上述第二子分割结果为上述标签值。According to an embodiment of the present disclosure, determining the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes: when the confidence probability of the first sub-segmentation result is greater than the above-mentioned In the case of the confidence probability of the second sub-segmentation result, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned label value, and the above-mentioned second sub-segmentation result is the above-mentioned predicted value; and when the confidence probability of the above-mentioned first sub-segmentation result is less than the above-mentioned second sub-segmentation result In the case of the confidence probability of the sub-segmentation results, it is determined that the above-mentioned first sub-segmentation result is the above-mentioned predicted value, and the above-mentioned second sub-segmentation result is the above-mentioned label value.
根据本公开的实施例,多个上述第一目标分区图中包括第三目标分区图,上述第三目标分区图具有真实标签;其中,上述计算上述第一分割结果和上述第二分割结果之间的交叉熵损失,得到第二损失值,包括:从上述第一分割结果中确定与上述第三目标分区图相关的第三子分割结果,和与上述第三目标分区图无关且与多个上述第一目标分区图相关的第四子分割结果;从上述第二分割结果中确定与上述第三目标分区图相关的第五子分割结果,和与上述第三目标分区图无关且与多个上述第一目标分区图相关的第六子分割结果;计算上述第三子分割结果和上述真实标签之间的交叉熵损失,得到第三损失值;计算上述第四子分割结果和上述第六子分割结果之间的交叉熵损失,得到第四损失值;以及基于上述第三损失值和上述第四损失值,确定上述第二损失值。According to an embodiment of the present disclosure, the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label; wherein the above-mentioned calculation is between the first segmentation result and the second segmentation result. cross-entropy loss to obtain a second loss value, including: determining a third sub-segmentation result related to the above-mentioned third target partition map from the above-mentioned first segmentation result, and having nothing to do with the above-mentioned third target partition map and related to multiple of the above-mentioned The fourth sub-segmentation result related to the first target partition map; the fifth sub-segmentation result related to the above-mentioned third target partition map is determined from the above-mentioned second segmentation result, and has nothing to do with the above-mentioned third target partition map and is related to multiple of the above-mentioned The sixth sub-segmentation result related to the first target partition map; calculate the cross-entropy loss between the above-mentioned third sub-segmentation result and the above-mentioned real label to obtain the third loss value; calculate the above-mentioned fourth sub-segmentation result and the above-mentioned sixth sub-segmentation The cross entropy loss between the results is used to obtain a fourth loss value; and based on the above third loss value and the above fourth loss value, the above second loss value is determined.
根据本公开的实施例,上述初始网络包括编码器和解码器;其中,上述分别将上述第一环视投影图和上述混合投影图输入上述初始网络中,得到与上述第一环视投影图对应的第一特征图谱和第一分割结果,以及与上述混合投影图对应的第二特征图谱和第二分割结果,包括:分别将上述第一环视投影图和上述混合投影图输入上述编码器,得到与上述第一环视投影图对应的第一图像特征和与上述混合投影图对应的第二图像特征;以及分别将上述第一图像特征和上述第二图像特征输入上述解码器,得到与上述第一环视投影图对应的上述第一特征图 谱和上述第一分割结果,以及与上述混合投影图对应的上述第二特征图谱和上述第二分割结果。According to an embodiment of the present disclosure, the above-mentioned initial network includes an encoder and a decoder; wherein the above-mentioned first surround-view projection image and the above-mentioned mixed projection image are respectively input into the above-mentioned initial network to obtain a third image corresponding to the above-mentioned first surround-view projection image. A feature map and a first segmentation result, as well as a second feature map and a second segmentation result corresponding to the above-mentioned mixed projection image, including: respectively inputting the above-mentioned first surround projection image and the above-mentioned mixed projection image into the above-mentioned encoder to obtain the above-mentioned The first image feature corresponding to the first surround projection image and the second image feature corresponding to the above-mentioned hybrid projection image; and inputting the above-mentioned first image feature and the above-mentioned second image feature into the above-mentioned decoder respectively to obtain the above-mentioned first surround-view projection The above first feature map corresponding to the picture spectrum and the above-mentioned first segmentation result, as well as the above-mentioned second feature map and the above-mentioned second segmentation result corresponding to the above-mentioned mixed projection map.
根据本公开的实施例,上述将多组点云数据分别映射到初始视图中,得到多个环视投影图,包括:对于每组上述点云数据,分别对上述点云数据中每个点的三维坐标数据进行极坐标转换,以得到上述点云数据中每个点的极坐标数据;基于上述点云数据中每个点的极坐标数据,将上述点云数据中的多个点分别映射到上述初始视图的多个栅格中;对于上述初始视图的每个栅格,基于上述栅格中的点的三维坐标数据和极坐标数据,确定上述栅格的特征数据;以及基于多个上述栅格的特征数据,构建得到上述环视投影图。According to an embodiment of the present disclosure, the above-mentioned mapping of multiple sets of point cloud data to the initial view to obtain multiple surround projections includes: for each set of the above-mentioned point cloud data, separately mapping the three-dimensional image of each point in the above-mentioned point cloud data. The coordinate data is subjected to polar coordinate conversion to obtain the polar coordinate data of each point in the above point cloud data; based on the polar coordinate data of each point in the above point cloud data, multiple points in the above point cloud data are respectively mapped to the above Among multiple grids of the initial view; for each grid of the above-mentioned initial view, determining the characteristic data of the above-mentioned grid based on the three-dimensional coordinate data and polar coordinate data of the points in the above-mentioned grid; and based on a plurality of the above-mentioned grids The characteristic data is used to construct the above-mentioned surround projection image.
本公开的另一个方面提供了一种点云语义分割方法,包括:将目标点云数据映射到初始视图中,得到环视投影图;以及将上述环视投影图输入点云语义分割网络中,得到上述目标点云数据的语义分割特征图谱;其中,上述点云语义分割网络包括利用如上所述的点云语义分割网络训练方法训练得到。Another aspect of the present disclosure provides a point cloud semantic segmentation method, including: mapping target point cloud data to an initial view to obtain a surround projection image; and inputting the above surround projection image into a point cloud semantic segmentation network to obtain the above The semantic segmentation feature map of the target point cloud data; wherein, the above-mentioned point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method as described above.
本公开的另一个方面提供了一种点云语义分割网络训练装置,包括:第一映射模块,用于将多组点云数据分别映射到初始视图中,得到多个环视投影图;第一处理模块,用于基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中,上述第一环视投影图和上述第二环视投影图属于多个上述环视投影图;确定模块,用于从多个上述第一分区图中确定多个第一目标分区图;第二处理模块,用于利用多个上述第一目标分区图中的每个上述第一目标分区图分别对上述第二环视投影图中的第二目标分区图进行替换,得到混合投影图,其中,上述第二目标分区图属于多个上述第二分区图,上述第一目标分区图与上述第二目标分区图的位置相同;以及训练模块,用于将上述第一环视投影图和上述混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。Another aspect of the present disclosure provides a point cloud semantic segmentation network training device, including: a first mapping module for mapping multiple sets of point cloud data to initial views respectively to obtain multiple surround projections; a first processing A module for performing partition processing on the first surround projection image and the second surround projection image respectively based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the above-mentioned first surround projection image and The above-mentioned second surround projection picture belongs to a plurality of the above-mentioned surround projection pictures; a determination module is used to determine a plurality of first target partition pictures from a plurality of the above-mentioned first partition pictures; a second processing module is used to utilize a plurality of the above-mentioned first partition pictures. Each first target partition map in the target partition map replaces the second target partition map in the second surrounding projection map to obtain a hybrid projection map, wherein the second target partition map belongs to multiple second target partition maps. a partition map, the above-mentioned first target partition map and the above-mentioned second target partition map are in the same position; and a training module for using the above-mentioned first surround projection map and the above-mentioned mixed projection map as training samples to train the initial network to obtain points Cloud semantic segmentation network.
本公开的另一个方面提供了一种点云语义分割装置,包括:第二 映射模块,用于将目标点云数据映射到初始视图中,得到环视投影图;以及第三处理模块,用于将上述环视投影图输入点云语义分割网络中,得到上述目标点云数据的语义分割特征图谱;其中,上述点云语义分割网络包括利用如上所述的点云语义分割网络训练方法训练得到。Another aspect of the present disclosure provides a point cloud semantic segmentation device, including: second a mapping module for mapping the target point cloud data to the initial view to obtain a surround projection image; and a third processing module for inputting the above surround projection image into the point cloud semantic segmentation network to obtain the semantics of the above target point cloud data Segmentation feature map; wherein, the above-mentioned point cloud semantic segmentation network is trained by using the point cloud semantic segmentation network training method as described above.
本公开的另一方面提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个指令,其中,当上述一个或多个指令被上述一个或多个处理器执行时,使得上述一个或多个处理器实现如上所述的方法。Another aspect of the present disclosure provides an electronic device, including: one or more processors; a memory for storing one or more instructions, wherein when the one or more instructions are processed by the one or more processors When executed, the above one or more processors are caused to implement the above method.
本公开的另一方面提供了一种计算机可读存储介质,存储有计算机可执行指令,上述指令在被执行时用于实现如上所述的方法。Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions, which when executed are used to implement the method as described above.
本公开的另一方面提供了一种计算机程序产品,上述计算机程序产品包括计算机可执行指令,上述指令在被执行时用于实现如上所述的方法。Another aspect of the present disclosure provides a computer program product, which includes computer-executable instructions that, when executed, are used to implement the method as described above.
根据本公开的实施例,在训练点云语义分割网络时,可以将点云数据映射为环视投影图,并对第一环视投影图和第二环视投影图进行分区混合,即使用第一环视投影图中的部分分区对第二环视投影图中对应的分区进行替换,得到混合投影图,之后,可以利用混合投影图和第一环视投影图来训练初始网络,以最终得到点云语义分割网络。通过分区混合的方式,可以实现该部分分区与背景的强制解耦,能够有效提升数据的丰富度,降低网络在预测局部区域时对背景、全局信息的依赖,提高网络的识别能力。同时,通过分区混合的方式,还可以有效地保留原始点云投影在环视投影图上的三维形状,可以至少部分地克服数据增强导致的三维形变和形状信息丢失的问题,可以提升网络的鲁棒性。通过上述技术手段,可以有效提升网络训练过程中对硬件资源的利用效率。According to embodiments of the present disclosure, when training a point cloud semantic segmentation network, the point cloud data can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network. Through partition mixing, forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability. At the same time, through partition mixing, the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex. Through the above technical means, the utilization efficiency of hardware resources during network training can be effectively improved.
附图说明Description of drawings
通过以下参照附图对本公开实施例的描述,本公开的上述以及其他目的、特征和优点将更为清楚,在附图中: The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
图1示意性示出了根据本公开实施例的可以应用点云语义分割网络训练方法、点云语义分割方法及装置的示例性系统架构。FIG. 1 schematically illustrates an exemplary system architecture in which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and a device can be applied according to an embodiment of the present disclosure.
图2示意性示出了根据本公开实施例的点云语义分割网络训练方法的流程图。Figure 2 schematically shows a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.
图3示意性示出了根据本公开实施例的点云语义分割网络的训练流程的示意图。Figure 3 schematically shows a schematic diagram of the training process of a point cloud semantic segmentation network according to an embodiment of the present disclosure.
图4示意性示出了根据本公开实施例的点云语义分割方法的流程图。Figure 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.
图5示意性示出了根据本公开实施例的点云语义分割网络训练装置的框图。Figure 5 schematically shows a block diagram of a point cloud semantic segmentation network training device according to an embodiment of the present disclosure.
图6示意性示出了根据本公开实施例的点云语义分割装置的框图。Figure 6 schematically shows a block diagram of a point cloud semantic segmentation device according to an embodiment of the present disclosure.
图7示意性示出了根据本公开实施例的适于实现点云语义分割网络训练方法或点云语义分割方法的电子设备的框图。FIG. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure.
具体实施方式Detailed ways
以下,将参照附图来描述本公开的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本公开的范围。在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本公开实施例的全面理解。然而,明显地,一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood, however, that these descriptions are exemplary only and are not intended to limit the scope of the present disclosure. In the following detailed description, for convenience of explanation, numerous specific details are set forth to provide a comprehensive understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. Furthermore, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily confusing the concepts of the present disclosure.
在此使用的术语仅仅是为了描述具体实施例,而并非意在限制本公开。在此使用的术语“包括”、“包含”等表明了所述特征、步骤、操作和/或部件的存在,但是并不排除存在或添加一个或多个其他特征、步骤、操作或部件。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. The terms "comprising," "comprising," and the like, as used herein, indicate the presence of stated features, steps, operations, and/or components but do not exclude the presence or addition of one or more other features, steps, operations, or components.
在此使用的所有术语(包括技术和科学术语)具有本领域技术人员通常所理解的含义,除非另外定义。应注意,这里使用的术语应解释为具有与本说明书的上下文相一致的含义,而不应以理想化或过于刻板的方式来解释。All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art, unless otherwise defined. It should be noted that the terms used here should be interpreted to have meanings consistent with the context of this specification and should not be interpreted in an idealized or overly rigid manner.
在使用类似于“A、B和C等中至少一个”这样的表述的情况下, 一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。在使用类似于“A、B或C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B或C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C、和/或具有A、B、C的系统等)。In the case of using an expression like "at least one of A, B, C, etc.", Generally should be interpreted as those skilled in the art generally understand the meaning of the expression (for example, "a system having at least one of A, B and C" shall include, but is not limited to, A alone, B alone, C alone , systems with A and B, systems with A and C, systems with B and C, and/or systems with A, B, C, etc.). Where an expression similar to "at least one of A, B or C, etc." is used, it should generally be interpreted in accordance with the meaning that a person skilled in the art generally understands the expression to mean (for example, "having A, B or C "A system with at least one of" shall include, but is not limited to, systems with A alone, B alone, C alone, A and B, A and C, B and C, and/or systems with A, B, C, etc. ).
在自动驾驶技术领域中,利用深度学习技术对周围环境进行感知、识别是一项极为重要的基础研究。然而,深度学习技术所构建的深度神经网络往往需要大量的人工标注数据进行训练,这部分人工标注数据的成本和时耗往往是阻碍深度神经网络模型性能提升的壁垒。另一方面,无人驾驶车辆在行驶过程中,通过各式传感器能够收集到大量额原始无标注数据。因此,如何利用好这些原始无标注数据,加之少量有标注数据的辅助,对神经网络进行训练,即采用半监督训练的方式提升神经网络的识别、分类性能,是研发自动驾驶系统的过程中,能够起到增效、降本作用的一个重要研究任务。In the field of autonomous driving technology, using deep learning technology to perceive and identify the surrounding environment is an extremely important basic research. However, deep neural networks built by deep learning technology often require a large amount of manually labeled data for training. The cost and time consumption of this manually labeled data are often barriers to improving the performance of deep neural network models. On the other hand, unmanned vehicles can collect a large amount of original unlabeled data through various sensors while driving. Therefore, how to make good use of these original unlabeled data, coupled with the assistance of a small amount of labeled data, to train the neural network, that is, using semi-supervised training to improve the recognition and classification performance of the neural network, is the process of developing an autonomous driving system. It is an important research task that can increase efficiency and reduce costs.
在相关技术中,利用半监督训练算法提升语义分割任务的研究主要集中在二维图像领域。针对三维点云场景,尤其是基于激光雷达扫描结果的三维点云语义分割模型的半监督训练算法研究仍然处于一个空白阶段。由于二维图像和三维点云之间存在模态差异,二维图像上的点云语义分割半监督训练算法无法直接、有效地移植到三维点云语义分割任务当中。例如,在通过环视投影图对三维点云进行语义分割时,利用常规的二维图像数据增强方法,如增加噪声、旋转、缩放等会导致三维点云的三维形状失真,进而影响模型的训练效果。Among related technologies, research on using semi-supervised training algorithms to improve semantic segmentation tasks mainly focuses on the field of two-dimensional images. Research on semi-supervised training algorithms for 3D point cloud scenes, especially 3D point cloud semantic segmentation models based on LiDAR scanning results, is still at a blank stage. Due to the modal differences between 2D images and 3D point clouds, the semi-supervised training algorithm for point cloud semantic segmentation on 2D images cannot be directly and effectively transplanted to the 3D point cloud semantic segmentation task. For example, when performing semantic segmentation on a 3D point cloud through a surround projection image, using conventional 2D image data enhancement methods, such as adding noise, rotation, scaling, etc., will cause the 3D shape of the 3D point cloud to be distorted, thereby affecting the training effect of the model. .
有鉴于此,本公开的实施例提供了一种能够有效利用大量激光雷达原始点云数据,同时辅以少量有标注数据,能够对点云语义分割网络进行半监督训练的方法,在该方法中,提出了一种分区混合的数据 增强策略,通过混合两个不同的环视投影图来提升模型的识别难度,减少了数据增强过程对三维点云的形状信息的丢失,进而提高模型的训练效果,及模型的鲁棒性和可靠性。In view of this, embodiments of the present disclosure provide a method that can effectively utilize a large amount of lidar raw point cloud data, supplemented by a small amount of labeled data, to conduct semi-supervised training of a point cloud semantic segmentation network. In this method , proposed a partitioned mixed data The enhancement strategy improves the recognition difficulty of the model by mixing two different surround projection images, reducing the loss of shape information of the three-dimensional point cloud during the data enhancement process, thereby improving the training effect of the model, and the robustness and reliability of the model. .
具体地,本公开的实施例提供了一种点云语义分割网络训练方法、点云语义分割方法、装置、电子设备和存储介质。该点云语义分割网络训练方法包括:将多组点云数据分别映射到初始视图中,得到多个环视投影图;基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中,第一环视投影图和第二环视投影图属于多个环视投影图;从多个第一分区图中确定多个第一目标分区图;利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换,得到混合投影图,其中,第二目标分区图属于多个第二分区图,第一目标分区图与第二目标分区图的位置相同;以及将第一环视投影图和混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。Specifically, embodiments of the present disclosure provide a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, a device, an electronic device, and a storage medium. The point cloud semantic segmentation network training method includes: mapping multiple sets of point cloud data to the initial view respectively to obtain multiple surround projection images; based on the preset size, partitioning the first surround projection image and the second surround projection image respectively Processing to obtain a plurality of first partition maps and a plurality of second partition maps, wherein the first surround projection map and the second surround projection map belong to multiple surround projection views; multiple first partition maps are determined from the plurality of first partition maps Target partition map; use each first target partition map in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection map to obtain a hybrid projection map, wherein the second target partition map Belonging to multiple second partition maps, the positions of the first target partition map and the second target partition map are the same; and the first surround projection map and the mixed projection map are used as training samples to train the initial network to obtain the point cloud semantic segmentation network .
图1示意性示出了根据本公开实施例的可以应用点云语义分割网络训练方法、点云语义分割方法及装置的示例性系统架构。需要注意的是,图1所示仅为可以应用本公开实施例的系统架构的示例,以帮助本领域技术人员理解本公开的技术内容,但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。FIG. 1 schematically illustrates an exemplary system architecture in which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method and a device can be applied according to an embodiment of the present disclosure. It should be noted that Figure 1 is only an example of a system architecture to which embodiments of the present disclosure can be applied, to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure cannot be used in other applications. Device, system, environment or scenario.
如图1所示,根据该实施例的系统架构100可以包括终端设备101、102、103,网络104和服务器105。As shown in Figure 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105.
终端设备101、102、103可以是配置有激光雷达的各类设备,或者,也可以是能够控制激光雷达的各类电子设备,或者,还可以是能够存储点云数据的各类电子设备。The terminal devices 101, 102, and 103 may be various types of equipment equipped with lidar, or may be various types of electronic equipment capable of controlling lidar, or may be various types of electronic equipment capable of storing point cloud data.
网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线和/或无线通信链路等等。The network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
服务器105可以是提供各种服务的服务器,例如,该服务器可以为 点云语义分割网络的训练过程提供计算资源和存储资源的支持。Server 105 may be a server that provides various services. For example, the server may be The training process of point cloud semantic segmentation network provides support of computing resources and storage resources.
需要说明的是,本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法一般可以由服务器105执行。相应地,本公开实施例所提供的点云语义分割网络训练装置或点云语义分割装置一般可以设置于服务器105中。终端设备101、102、103可以采集得到点云数据,或者,终端设备101、102、103可以通过互联网等途径来获取其他终端设备采集的点云数据,该点云数据可以通过网络发送给服务器105,以便服务器105执行本公开实施例所提供的方法,以实现点云语义分割网络的训练或对该点云数据进行点云语义分割。本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法也可以由不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地,本公开实施例所提供的点云语义分割网络训练装置或点云语义分割装置也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。或者,本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法也可以由终端设备101、102、或103执行,或者也可以由不同于终端设备101、102、或103的其他终端设备执行。相应地,本公开实施例所提供的点云语义分割网络训练装置或点云语义分割装置也可以设置于终端设备101、102、或103中,或设置于不同于终端设备101、102、或103的其他终端设备中。It should be noted that the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure can generally be executed by the server 105 . Accordingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can generally be installed in the server 105 . The terminal devices 101, 102, and 103 can collect point cloud data, or the terminal devices 101, 102, and 103 can obtain point cloud data collected by other terminal devices through the Internet, and the point cloud data can be sent to the server 105 through the network. , so that the server 105 executes the method provided by the embodiment of the present disclosure to implement the training of the point cloud semantic segmentation network or perform point cloud semantic segmentation on the point cloud data. The point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure can also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105. Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure may also be provided on a server or server different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. in a server cluster. Alternatively, the point cloud semantic segmentation network training method or point cloud semantic segmentation method provided by the embodiments of the present disclosure can also be executed by the terminal device 101, 102, or 103, or can also be executed by a device different from the terminal device 101, 102, or 103. Other terminal devices execute. Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure can also be provided in the terminal device 101, 102, or 103, or be provided in a device different from the terminal device 101, 102, or 103. in other terminal devices.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Depending on implementation needs, there can be any number of end devices, networks, and servers.
图2示意性示出了根据本公开实施例的点云语义分割网络训练方法的流程图。Figure 2 schematically shows a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.
如图2所示,该方法包括操作S201~S205。As shown in Figure 2, the method includes operations S201 to S205.
在操作S201,将多组点云数据分别映射到初始视图中,得到多个环视投影图。In operation S201, multiple sets of point cloud data are respectively mapped to the initial view to obtain multiple surround projection images.
在操作S202,基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中, 第一环视投影图和第二环视投影图属于多个环视投影图。In operation S202, based on the preset size, partition processing is performed on the first surround projection image and the second surround projection image respectively to obtain multiple first partition images and multiple second partition images, wherein, The first surround projection image and the second surround projection image belong to multiple surround projection images.
在操作S203,从多个第一分区图中确定多个第一目标分区图。In operation S203, a plurality of first target partition maps are determined from a plurality of first partition maps.
在操作S204,利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换,得到混合投影图,其中,第二目标分区图属于多个第二分区图,第一目标分区图与第二目标分区图的位置相同。In operation S204, each first target partition map in the plurality of first target partition maps is used to respectively replace the second target partition map in the second surround projection image to obtain a hybrid projection image, wherein the second target partition map Belonging to multiple second partition maps, the positions of the first target partition map and the second target partition map are the same.
在操作S205,将第一环视投影图和混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。In operation S205, the first surround projection image and the mixed projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.
根据本公开的实施例,点云数据可以利用旋转式扫描的激光雷达等传感设备采集得到,每组点云数据可以配置有预设的直角坐标系,点云数据中的每个点可以表示为该直角坐标系下的一个三维坐标,该直角坐标系的中心可以表示采集该点云数据时,该传感设备所处的位置。According to embodiments of the present disclosure, point cloud data can be collected using sensing equipment such as rotating scanning lidar. Each set of point cloud data can be configured with a preset rectangular coordinate system. Each point in the point cloud data can represent is a three-dimensional coordinate under the Cartesian coordinate system, and the center of the Cartesian coordinate system can represent the position of the sensing device when collecting the point cloud data.
根据本公开的实施例,利用旋转式扫描的激光雷达采集得到的点云数据可以分布在一个球体内,该初始视图可以由该球体在水平面附近的环状表面展开得到。对于点云数据中的每个点,基于该点的坐标可以确定该点进行映射时的方向向量,再利用该方向向量可以将该点投射到该初始视图上。According to embodiments of the present disclosure, point cloud data collected using rotating scanning lidar can be distributed in a sphere, and the initial view can be obtained by unfolding the annular surface of the sphere near a horizontal plane. For each point in the point cloud data, the direction vector when mapping the point can be determined based on the coordinates of the point, and then the direction vector can be used to project the point onto the initial view.
根据本公开的实施例,基于预设尺寸来环视投影图进行分区可以将该环视投影图等分为多个矩形区域。该预设尺寸的大小可以根据具体应用场景中该环视投影图的尺寸来确定,在此不作限定。例如,环视投影图的分辨率可以为24×480,在对环视投影图进行分区时,可以分别沿长宽方向等分成16份和6份,从而将该环视投影图分为总计96片的分辨率为4×30的分区图。According to an embodiment of the present disclosure, partitioning the surround projection image based on a preset size may equally divide the surround projection image into a plurality of rectangular areas. The size of the preset size can be determined according to the size of the surround projection image in a specific application scenario, and is not limited here. For example, the resolution of the surround projection image can be 24×480. When partitioning the surround projection image, it can be equally divided into 16 parts and 6 parts along the length and width directions, thereby dividing the surround projection image into a total of 96 pieces of resolution. The rate is 4×30 partition map.
根据本公开的实施例,第一环视投影图和第二环视投影图可以从多个环视投影图中随机选择得到。第一环视投影图和第二环视投影图可以具有完全不同的特征,即与该第一环视投影图对应的点云数据可以和与该第二环视投影图对应的点云数据可以是对不同场景下的不同物品采集得到的。 According to embodiments of the present disclosure, the first surround projection image and the second surround projection image may be randomly selected from a plurality of surround projection images. The first surround projection image and the second surround projection image may have completely different characteristics, that is, the point cloud data corresponding to the first surround projection image and the point cloud data corresponding to the second surround projection image may be for different scenes. Collected from different items below.
根据本公开的实施例,第一目标分区图可以从多个第一分区图中随机采集得到,第一目标分区图可以在第一分区图中占有一定的比例,该比例例如可以为25%、30%等,在此不作限定。According to an embodiment of the present disclosure, the first target partition map can be randomly collected from multiple first partition maps, and the first target partition map can occupy a certain proportion in the first partition map, and the proportion can be, for example, 25%, 30%, etc., are not limited here.
根据本公开的实施例,利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换的操作可以包括:根据第一目标分区图的位置信息,从第二环视投影图中确定第二目标分区图,将第二目标分区图删除后,再将第一目标分区图填充到对应位置中。According to an embodiment of the present disclosure, the operation of respectively replacing the second target partition map in the second surround projection image with each first target partition map in the plurality of first target partition maps may include: according to the first target partition map The position information of the picture is determined from the second surround projection picture, and after the second target partition picture is deleted, the first target partition picture is filled in the corresponding position.
根据本公开的实施例,训练初始网络时所采用的方法在此不作限定,例如可以是梯度下降法、最小二乘法等。训练初始网络时设置的训练参数,如训练次数、批次容量、学习率等可以根据具体应用场景进行设置,在此不作限定。According to the embodiments of the present disclosure, the method used in training the initial network is not limited here. For example, it may be the gradient descent method, the least squares method, etc. The training parameters set when training the initial network, such as the number of training times, batch capacity, learning rate, etc., can be set according to specific application scenarios and are not limited here.
根据本公开的实施例,在训练点云语义分割网络时,可以将点云数据映射为环视投影图,并对第一环视投影图和第二环视投影图进行分区混合,即使用第一环视投影图中的部分分区对第二环视投影图中对应的分区进行替换,得到混合投影图,之后,可以利用混合投影图和第一环视投影图来训练初始网络,以最终得到点云语义分割网络。通过分区混合的方式,可以实现该部分分区与背景的强制解耦,能够有效提升数据的丰富度,降低网络在预测局部区域时对背景、全局信息的依赖,提高网络的识别能力。同时,通过分区混合的方式,还可以有效地保留原始点云投影在环视投影图上的三维形状,可以至少部分地克服数据增强导致的三维形变和形状信息丢失的问题,可以提升网络的鲁棒性。通过上述技术手段,可以有效提升网络训练过程中对硬件资源的利用效率。According to embodiments of the present disclosure, when training a point cloud semantic segmentation network, the point cloud data can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network. Through partition mixing, forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability. At the same time, through partition mixing, the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex. Through the above technical means, the utilization efficiency of hardware resources during network training can be effectively improved.
下面参考图3,结合具体实施例对图2所示的方法做进一步说明。The method shown in Figure 2 will be further described below with reference to Figure 3 and specific embodiments.
根据本公开的实施例,环视投影图可以利用操作S201的方法来得到,具体地,操作S201可以包括如下操作:According to an embodiment of the present disclosure, the surround projection image can be obtained by using the method of operation S201. Specifically, operation S201 can include the following operations:
对于每组点云数据,分别对点云数据中每个点的三维坐标数据进行极坐标转换,以得到点云数据中每个点的极坐标数据;基于点云数 据中每个点的极坐标数据,将点云数据中的多个点分别映射到初始视图的多个栅格中;对于初始视图的每个栅格,基于栅格中的点的三维坐标数据和极坐标数据,确定栅格的特征数据;以及,基于多个栅格的特征数据,构建得到环视投影图。For each set of point cloud data, perform polar coordinate transformation on the three-dimensional coordinate data of each point in the point cloud data to obtain the polar coordinate data of each point in the point cloud data; based on the point cloud data Based on the polar coordinate data of each point in the data, multiple points in the point cloud data are mapped to multiple grids in the initial view; for each grid in the initial view, based on the three-dimensional coordinate data of the points in the grid and polar coordinate data to determine the characteristic data of the grid; and, based on the characteristic data of multiple grids, construct a surround projection map.
根据本公开的实施例,点云数据中的每个点可以具有三维坐标数据,即x、y和z,对该点进行极坐标转换,可以得到在旋转坐标系下的转换坐标yaw和pitch,即极坐标数据。According to embodiments of the present disclosure, each point in the point cloud data may have three-dimensional coordinate data, that is, x, y, and z. By performing polar coordinate transformation on the point, the transformed coordinates yaw and pitch under the rotating coordinate system can be obtained, That is, polar coordinate data.
根据本公开的实施例,初始视图的栅格可以指与该初始视图中的单个像素点对应的像素色块。例如,初始视图的分辨率可以为20×480,则该初始视图可以具有9600个像素色块,相应地,该初始视图可以有9600个栅格。According to embodiments of the present disclosure, the grid of the initial view may refer to a pixel color block corresponding to a single pixel point in the initial view. For example, the resolution of the initial view may be 20×480, then the initial view may have 9600 pixel color patches, and correspondingly, the initial view may have 9600 rasters.
根据本公开的实施例,在栅格中映射有多个点的情况下,可以取多个点中离原点最近的点的特征数据作为该栅格的特征数据。该点的特征数据可以包括三维坐标数据、极坐标数据以及基于该三维坐标数据和极坐标数据处理得到的数据,如反射率数据、深度数据等。According to embodiments of the present disclosure, when multiple points are mapped in a grid, the feature data of the point closest to the origin among the multiple points can be taken as the feature data of the grid. The characteristic data of the point may include three-dimensional coordinate data, polar coordinate data, and data processed based on the three-dimensional coordinate data and polar coordinate data, such as reflectivity data, depth data, etc.
图3示意性示出了根据本公开实施例的点云语义分割网络的训练流程的示意图。Figure 3 schematically shows a schematic diagram of the training process of a point cloud semantic segmentation network according to an embodiment of the present disclosure.
如图3所示,点云语义分割网络的训练流程可以包括样本预处理过程和网络迭代训练过程。As shown in Figure 3, the training process of the point cloud semantic segmentation network can include a sample preprocessing process and a network iterative training process.
根据本公开的实施例,在样本预处理过程中,可以将第一环视投影图中的一部分分区替换到第二环视投影图中,以得到混合投影图。具体方法可以参见操作S202~S204的方法,在此不再赘述。According to embodiments of the present disclosure, during sample preprocessing, a part of the partitions in the first surround projection image may be replaced into the second surround projection image to obtain a hybrid projection image. For specific methods, please refer to the methods of operations S202 to S204, which will not be described again here.
根据本公开的实施例,网络迭代训练过程可以是将第一环视投影图和混合投影图作为样本对,输入到初始网络中,并基于设置好的损失函数及梯度下降法、最小二乘法等模型迭代方法对初始网络的模型参数进行调整,以实现该初始网络的训练。According to embodiments of the present disclosure, the network iterative training process may be to input the first surround projection image and the mixed projection image as a sample pair into the initial network, and based on the set loss function and gradient descent method, least squares method and other models The iterative method adjusts the model parameters of the initial network to achieve training of the initial network.
根据本公开的实施例,初始网络可以包括编码器和解码器。According to embodiments of the present disclosure, the initial network may include an encoder and a decoder.
根据本公开的实施例,分别将第一环视投影图和混合投影图输入初始网络中,得到与第一环视投影图对应的第一特征图谱和第一分割 结果,以及与混合投影图对应的第二特征图谱和第二分割结果可以包括如下操作:According to an embodiment of the present disclosure, the first surround projection image and the hybrid projection image are respectively input into the initial network to obtain the first feature map and the first segmentation corresponding to the first surround projection image. The results, as well as the second feature map and the second segmentation result corresponding to the hybrid projection map, may include the following operations:
分别将第一环视投影图和混合投影图输入编码器,得到与第一环视投影图对应的第一图像特征和与混合投影图对应的第二图像特征;以及分别将第一图像特征和第二图像特征输入解码器,得到与第一环视投影图对应的第一特征图谱和第一分割结果,以及与混合投影图对应的第二特征图谱和第二分割结果。Input the first surround projection image and the mixed projection image into the encoder respectively to obtain the first image feature corresponding to the first surround view projection image and the second image feature corresponding to the mixed projection image; and respectively input the first image feature and the second image feature corresponding to the mixed projection image. The image features are input into the decoder to obtain a first feature map and a first segmentation result corresponding to the first surround projection image, and a second feature map and a second segmentation result corresponding to the hybrid projection image.
根据本公开的实施例,编码器可以是任意的特征提取网络,如ResNet18等。According to embodiments of the present disclosure, the encoder can be any feature extraction network, such as ResNet18, etc.
根据本公开的实施例,解码器可以是任意的特征上采样网络,如UperNet等。According to embodiments of the present disclosure, the decoder can be any feature upsampling network, such as UpperNet, etc.
根据本公开的实施例,网络迭代训练过程具体可以包括如下操作:According to embodiments of the present disclosure, the network iterative training process may specifically include the following operations:
分别将第一环视投影图和混合投影图输入初始网络中,得到与第一环视投影图对应的第一特征图谱和第一分割结果,以及与混合投影图对应的第二特征图谱和第二分割结果;计算第一特征图谱和第二特征图谱之间的信息熵损失,得到第一损失值;计算第一分割结果和第二分割结果之间的交叉熵损失,得到第二损失值;以及利用第一损失值和第二损失值来调整初始网络的模型参数,以最终得到点云数据语义分割网络。Input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the second feature map and second segmentation result corresponding to the mixed projection image. Result; calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value; calculate the cross entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value; and use The first loss value and the second loss value are used to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.
根据本公开的实施例,第一分割结果可以表示第一环视投影图中各区域的语义特征分割结果。According to an embodiment of the present disclosure, the first segmentation result may represent a semantic feature segmentation result of each area in the first surround projection image.
根据本公开的实施例,第一特征图谱可以具有与第一环视投影图相同的尺寸,第一特征图谱上具有不同语义特征的区域可以具有不同的颜色特征。例如,第一特征图谱上具有不同语义特征的区域可以分别指人、车及障碍物所处的区域,该三个区域可以分别用红色、蓝色、绿色来表示。According to embodiments of the present disclosure, the first feature map may have the same size as the first surround projection image, and areas with different semantic features on the first feature map may have different color features. For example, areas with different semantic features on the first feature map may respectively refer to areas where people, cars, and obstacles are located, and the three areas may be represented by red, blue, and green respectively.
根据本公开的实施例,计算第一特征图谱和第二特征图谱之间的信息熵损失,得到第一损失值可以包括如下操作: According to an embodiment of the present disclosure, calculating the information entropy loss between the first feature map and the second feature map, and obtaining the first loss value may include the following operations:
从第一特征图谱中确定与多个第一目标分区图相关的第一子特征图谱;将第二特征图谱拆分为与多个第一目标分区图相关的第二子特征图谱和与多个第一目标分区图无关的第三子特征图谱;以及在第一子特征图谱的置信概率大于预设阈值的情况下,以第一子特征图谱和第二子特征图谱作为正样本对,以第一子特征图谱和第三子特征图谱作为负样本对,计算正样本对和负样本对之间的信息熵损失,得到第一损失值。Determine a first sub-feature map related to a plurality of first target partition maps from the first feature map; split the second feature map into a second sub-feature map related to a plurality of first target partition maps and a second sub-feature map related to the plurality of first target partition maps. The third sub-feature map that is irrelevant to the first target partition map; and when the confidence probability of the first sub-feature map is greater than the preset threshold, using the first sub-feature map and the second sub-feature map as a positive sample pair, and using the third sub-feature map as a positive sample pair. The first sub-feature map and the third sub-feature map are used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.
根据本公开的实施例,由于第一特征图谱可以具有与第一环视投影图相同的尺寸,因此,可以基于第一目标分区图在第一环视投影图中的位置信息,来从第一特征图谱中确定该第一子特征图谱。According to an embodiment of the present disclosure, since the first feature map may have the same size as the first surround projection image, the first feature map may be obtained based on the position information of the first target partition map in the first surround projection image. The first sub-feature map is determined.
根据本公开的实施例,第一子特征图谱的置信概率的计算方法在此不作限定,例如可以利用高斯公式来确定该置信概率。According to an embodiment of the present disclosure, the method of calculating the confidence probability of the first sub-feature map is not limited here. For example, the Gaussian formula can be used to determine the confidence probability.
根据本公开的实施例,预设阈值可以根据具体应用场景来确定,例如可以设置为90%、95%等,在此不作限定。According to embodiments of the present disclosure, the preset threshold can be determined according to specific application scenarios, for example, it can be set to 90%, 95%, etc., which is not limited here.
根据本公开的实施例,信息熵损失的计算方法可以如公式(1)所示:
According to an embodiment of the present disclosure, the calculation method of information entropy loss can be as shown in formula (1):
在式(1)中,L1表示信息熵损失;fp表示第一子特征图谱;fx表示第二子特征图谱;fy表示第三子特征图谱。In formula (1), L 1 represents information entropy loss; fp represents the first sub-feature map; fx represents the second sub-feature map; fy represents the third sub-feature map.
根据本公开的实施例,计算第一分割结果和第二分割结果之间的交叉熵损失,得到第二损失值可以包括如下操作:According to an embodiment of the present disclosure, calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:
从第一分割结果中确定与多个第一目标分区图相关的第一子分割结果;从第二分割结果中确定与多个第一目标分区图相关的第二子分割结果;基于第一子分割结果的置信概率和第二子分割结果的置信概率,确定预测值和标签值;以及计算预测值和标签值之间的交叉熵损失,得到第二损失值。 Determine a first sub-segmentation result related to a plurality of first target partition maps from the first segmentation result; determine a second sub-segmentation result related to a plurality of first target partition maps from the second segmentation result; based on the first sub-segmentation result The confidence probability of the segmentation result and the confidence probability of the second sub-segmentation result are used to determine the predicted value and the label value; and the cross-entropy loss between the predicted value and the label value is calculated to obtain the second loss value.
根据本公开的实施例,第一分割结果可以具有与第一环视投影图相同的尺寸,因而可以基于第一目标分区图在第一环视投影图中的位置信息,来从第一分割结果中确定第一子分割结果。According to an embodiment of the present disclosure, the first segmentation result may have the same size as the first surround projection image, and thus may be determined from the first segmentation result based on the position information of the first target partition map in the first surround projection image. The first sub-segmentation result.
根据本公开的实施例,第一子分割结果和第二子分割结果的置信概率的计算方法在此不作限定,例如可以利用高斯公式来确定该置信概率。According to an embodiment of the present disclosure, the calculation method of the confidence probability of the first sub-segmentation result and the second sub-segmentation result is not limited here. For example, the Gaussian formula can be used to determine the confidence probability.
根据本公开的实施例,可以通过比较第一子分割结果的置信概率和第二子分割结果的置信概率,来分别确定预测值和标签值,具体地,在第一子分割结果的置信概率大于第二子分割结果的置信概率的情况下,确定第一子分割结果为标签值,第二子分割结果为预测值;在第一子分割结果的置信概率小于第二子分割结果的置信概率的情况下,确定第一子分割结果为预测值,第二子分割结果为标签值。According to an embodiment of the present disclosure, the prediction value and the label value can be determined respectively by comparing the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result. Specifically, when the confidence probability of the first sub-segmentation result is greater than In the case of the confidence probability of the second sub-segmentation result, the first sub-segmentation result is determined as the label value and the second sub-segmentation result is the predicted value; when the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result In this case, it is determined that the first sub-segmentation result is the predicted value, and the second sub-segmentation result is the label value.
根据本公开的实施例,交叉熵损失的计算方法可以如公式(2)所示:
L2=∑(ylogyp+(1-y)log(1-yp))   (2)
According to an embodiment of the present disclosure, the calculation method of cross entropy loss can be as shown in formula (2):
L 2 =∑(ylogy p +(1-y)log(1-y p )) (2)
在式(2)中,L2表示交叉熵损失;y表示标签值;yp表示预测值。In formula (2), L 2 represents cross entropy loss; y represents the label value; y p represents the predicted value.
根据本公开的实施例,用于进行该初始网络的模型参数调整的总损失可以是信息熵损失和交叉熵损失的加权求和,其权重可以是一个超参,可供用户在模型调优时任意设置。According to embodiments of the present disclosure, the total loss used to adjust the model parameters of the initial network may be a weighted sum of information entropy loss and cross-entropy loss, and its weight may be a hyperparameter that can be used by users when tuning the model. Any setting.
根据本公开的实施例,多个第一目标分区图中可以包括第三目标分区图,第三目标分区图具有真实标签。According to an embodiment of the present disclosure, the plurality of first target partition maps may include a third target partition map, and the third target partition map has a real label.
根据本公开的是实施例,在确定第三目标分区图存在时,计算第一分割结果和第二分割结果之间的交叉熵损失,得到第二损失值可以包括如下操作:According to an embodiment of the present disclosure, when it is determined that the third target partition map exists, calculating the cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:
从第一分割结果中确定与第三目标分区图相关的第三子分割结果,和与第三目标分区图无关且与多个第一目标分区图相关的第四子分割结果;从第二分割结果中确定与第三目标分区图相关的第五子分割结果,和与第三目标分区图无关且与多个第一目标分区图相关的第六子分割结果;计算第三子分割结果和真实标签之间的交叉熵损失,得到 第三损失值;计算第四子分割结果和第六子分割结果之间的交叉熵损失,得到第四损失值;以及基于第三损失值和第四损失值,确定第二损失值。Determine a third sub-segmentation result related to the third target partition map from the first segmentation result, and a fourth sub-segmentation result unrelated to the third target partition map and related to multiple first target partition maps; from the second segmentation result The fifth sub-segmentation result related to the third target partition map is determined in the result, and the sixth sub-segmentation result is independent of the third target partition map and related to multiple first target partition maps; calculate the third sub-segmentation result and the true Cross entropy loss between labels, we get a third loss value; calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value; and determine the second loss value based on the third loss value and the fourth loss value.
根据本公开的实施例,通过上述损失函数的设计,可以利用原始无标注数据,并加以少量有标注数据来对网络进行训练,实现了点云语义分割网络的半监督训练,从而可以在保障网络的语义分割效果的基础上,降低数据标注的成本。According to the embodiments of the present disclosure, through the design of the above loss function, the original unlabeled data can be used and a small amount of labeled data can be used to train the network, realizing semi-supervised training of the point cloud semantic segmentation network, thereby ensuring the network Based on the semantic segmentation effect, the cost of data annotation is reduced.
图4示意性示出了根据本公开实施例的点云语义分割方法的流程图。Figure 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.
如图4所示,该方法包括操作S401~S402。As shown in Figure 4, the method includes operations S401 to S402.
在操作S401,将目标点云数据映射到初始视图中,得到环视投影图。In operation S401, the target point cloud data is mapped to the initial view to obtain a surround projection image.
在操作S402,将环视投影图输入点云语义分割网络中,得到目标点云数据的语义分割特征图谱。In operation S402, the surround projection image is input into the point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.
根据本公开的实施例,该点云语义分割网络可以利用上述点云语义分割网络训练方法部分的方法训练得到,在此不再赘述。According to embodiments of the present disclosure, the point cloud semantic segmentation network can be trained using the method in the point cloud semantic segmentation network training method section, which will not be described again here.
图5示意性示出了根据本公开实施例的点云语义分割网络训练装置的框图。Figure 5 schematically shows a block diagram of a point cloud semantic segmentation network training device according to an embodiment of the present disclosure.
如图5所示,点云语义分割网络训练装置500包括第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550。As shown in FIG. 5 , the point cloud semantic segmentation network training device 500 includes a first mapping module 510 , a first processing module 520 , a determination module 530 , a second processing module 540 and a training module 550 .
第一映射模块510,用于将多组点云数据分别映射到初始视图中,得到多个环视投影图。The first mapping module 510 is used to map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections.
第一处理模块520,用于基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中,第一环视投影图和第二环视投影图属于多个环视投影图。The first processing module 520 is configured to perform partition processing on the first surround view projection image and the second surround view projection image based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the first surround view projection image The projection image and the second surround projection image belong to multiple surround projection images.
确定模块530,用于从多个第一分区图中确定多个第一目标分区图。The determining module 530 is configured to determine a plurality of first target partition maps from a plurality of first partition maps.
第二处理模块540,用于利用多个第一目标分区图中的每个第一目标分区图分别对第二环视投影图中的第二目标分区图进行替换,得 到混合投影图,其中,第二目标分区图属于多个第二分区图,第一目标分区图与第二目标分区图的位置相同。The second processing module 540 is configured to use each first target partition map in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image, to obtain to a mixed projection map, wherein the second target partition map belongs to multiple second partition maps, and the first target partition map and the second target partition map are at the same position.
训练模块550,用于将第一环视投影图和混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。The training module 550 is used to train the initial network using the first surround projection image and the mixed projection image as training samples to obtain a point cloud semantic segmentation network.
根据本公开的实施例,在训练点云语义分割网络时,可以将点云数据映射为环视投影图,并对第一环视投影图和第二环视投影图进行分区混合,即使用第一环视投影图中的部分分区对第二环视投影图中对应的分区进行替换,得到混合投影图,之后,可以利用混合投影图和第一环视投影图来训练初始网络,以最终得到点云语义分割网络。通过分区混合的方式,可以实现该部分分区与背景的强制解耦,能够有效提升数据的丰富度,降低网络在预测局部区域时对背景、全局信息的依赖,提高网络的识别能力。同时,通过分区混合的方式,还可以有效地保留原始点云投影在环视投影图上的三维形状,可以至少部分地克服数据增强导致的三维形变和形状信息丢失的问题,可以提升网络的鲁棒性。通过上述技术手段,可以有效提升网络训练过程中对硬件资源的利用效率。According to embodiments of the present disclosure, when training a point cloud semantic segmentation network, the point cloud data can be mapped into a surround projection image, and the first surround projection image and the second surround projection image can be partitioned and mixed, that is, using the first surround projection Some partitions in the figure replace the corresponding partitions in the second surround projection image to obtain a hybrid projection image. Afterwards, the hybrid projection image and the first surround projection image can be used to train the initial network to finally obtain a point cloud semantic segmentation network. Through partition mixing, forced decoupling of this part of the partition and the background can be achieved, which can effectively improve the richness of the data, reduce the network's dependence on background and global information when predicting local areas, and improve the network's recognition ability. At the same time, through partition mixing, the three-dimensional shape of the original point cloud projected on the surround projection map can be effectively preserved, which can at least partially overcome the problem of three-dimensional deformation and shape information loss caused by data enhancement, and can improve the robustness of the network. sex. Through the above technical means, the utilization efficiency of hardware resources during network training can be effectively improved.
根据本公开的实施例,训练模块550包括第一训练子模块、第二训练子模块、第三训练子模块和第四训练子模块。According to an embodiment of the present disclosure, the training module 550 includes a first training sub-module, a second training sub-module, a third training sub-module and a fourth training sub-module.
第一训练子模块,用于分别将第一环视投影图和混合投影图输入初始网络中,得到与第一环视投影图对应的第一特征图谱和第一分割结果,以及与混合投影图对应的第二特征图谱和第二分割结果。The first training submodule is used to input the first surround projection image and the mixed projection image into the initial network respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first feature map corresponding to the mixed projection image. The second feature map and the second segmentation result.
第二训练子模块,用于计算第一特征图谱和第二特征图谱之间的信息熵损失,得到第一损失值。The second training submodule is used to calculate the information entropy loss between the first feature map and the second feature map to obtain the first loss value.
第三训练子模块,用于计算第一分割结果和第二分割结果之间的交叉熵损失,得到第二损失值。The third training submodule is used to calculate the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value.
第四训练子模块,用于利用第一损失值和第二损失值来调整初始网络的模型参数,以最终得到点云数据语义分割网络。The fourth training submodule is used to use the first loss value and the second loss value to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.
根据本公开的实施例,第二训练子模块包括第一训练单元、第二训练单元和第三训练单元。 According to an embodiment of the present disclosure, the second training sub-module includes a first training unit, a second training unit and a third training unit.
第一训练单元,用于从第一特征图谱中确定与多个第一目标分区图相关的第一子特征图谱。The first training unit is configured to determine the first sub-feature map related to the plurality of first target partition maps from the first feature map.
第二训练单元,用于将第二特征图谱拆分为与多个第一目标分区图相关的第二子特征图谱和与多个第一目标分区图无关的第三子特征图谱。The second training unit is used to split the second feature map into a second sub-feature map related to the plurality of first target partition maps and a third sub-feature map unrelated to the plurality of first target partition maps.
第三训练单元,用于在第一子特征图谱的置信概率大于预设阈值的情况下,以第一子特征图谱和第二子特征图谱作为正样本对,以第一子特征图谱和第三子特征图谱作为负样本对,计算正样本对和负样本对之间的信息熵损失,得到第一损失值。The third training unit is used to use the first sub-feature map and the second sub-feature map as a positive sample pair when the confidence probability of the first sub-feature map is greater than the preset threshold, and use the first sub-feature map and the third sub-feature map as a positive sample pair. The sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.
根据本公开的实施例,第三训练子模块包括第四训练单元、第五训练单元、第六训练单元和第七训练单元。According to an embodiment of the present disclosure, the third training sub-module includes a fourth training unit, a fifth training unit, a sixth training unit and a seventh training unit.
第四训练单元,用于从第一分割结果中确定与多个第一目标分区图相关的第一子分割结果。The fourth training unit is configured to determine the first sub-segmentation results related to the plurality of first target partition maps from the first segmentation results.
第五训练单元,用于从第二分割结果中确定与多个第一目标分区图相关的第二子分割结果。A fifth training unit is configured to determine second sub-segmentation results related to the plurality of first target partition maps from the second segmentation results.
第六训练单元,用于基于第一子分割结果的置信概率和第二子分割结果的置信概率,确定预测值和标签值。The sixth training unit is used to determine the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result.
第七训练单元,用于计算预测值和标签值之间的交叉熵损失,得到第二损失值。The seventh training unit is used to calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.
根据本公开的实施例,第六训练单元包括第一训练子单元和第二训练子单元。According to an embodiment of the present disclosure, the sixth training unit includes a first training sub-unit and a second training sub-unit.
第一训练子单元,用于在第一子分割结果的置信概率大于第二子分割结果的置信概率的情况下,确定第一子分割结果为标签值,第二子分割结果为预测值。The first training subunit is used to determine that the first sub-segmentation result is the label value and the second sub-segmentation result is the predicted value when the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result.
第二训练子单元,用于在第一子分割结果的置信概率小于第二子分割结果的置信概率的情况下,确定第一子分割结果为预测值,第二子分割结果为标签值。The second training subunit is used to determine that the first sub-segmentation result is the predicted value and the second sub-segmentation result is the label value when the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result.
根据本公开的实施例,多个第一目标分区图中包括第三目标分区图,第三目标分区图具有真实标签。 According to an embodiment of the present disclosure, the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label.
根据本公开的实施例,第三训练子模块包括第八训练单元、第九训练单元、第十训练单元、第十一训练单元和第十二训练单元。According to an embodiment of the present disclosure, the third training sub-module includes an eighth training unit, a ninth training unit, a tenth training unit, an eleventh training unit and a twelfth training unit.
第八训练单元,用于从第一分割结果中确定与第三目标分区图相关的第三子分割结果,和与第三目标分区图无关且与多个第一目标分区图相关的第四子分割结果。The eighth training unit is used to determine, from the first segmentation result, a third sub-segmentation result related to the third target partition map, and a fourth sub-segmentation result that is independent of the third target partition map and related to the plurality of first target partition maps. Segmentation results.
第九训练单元,用于从第二分割结果中确定与第三目标分区图相关的第五子分割结果,和与第三目标分区图无关且与多个第一目标分区图相关的第六子分割结果。A ninth training unit, configured to determine a fifth sub-segmentation result related to the third target partition map from the second segmentation result, and a sixth sub-segmentation result that is independent of the third target partition map and related to the plurality of first target partition maps. Segmentation results.
第十训练单元,用于计算第三子分割结果和真实标签之间的交叉熵损失,得到第三损失值。The tenth training unit is used to calculate the cross-entropy loss between the third sub-segmentation result and the real label to obtain the third loss value.
第十一训练单元,用于计算第四子分割结果和第六子分割结果之间的交叉熵损失,得到第四损失值。The eleventh training unit is used to calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain the fourth loss value.
第十二训练单元,用于基于第三损失值和第四损失值,确定第二损失值。The twelfth training unit is used to determine the second loss value based on the third loss value and the fourth loss value.
根据本公开的实施例,初始网络包括编码器和解码器。According to an embodiment of the present disclosure, the initial network includes an encoder and a decoder.
根据本公开的实施例,第一训练子模块包括第十三训练单元和第十四训练单元。According to an embodiment of the present disclosure, the first training sub-module includes a thirteenth training unit and a fourteenth training unit.
第十三训练单元,用于分别将第一环视投影图和混合投影图输入编码器,得到与第一环视投影图对应的第一图像特征和与混合投影图对应的第二图像特征。The thirteenth training unit is used to input the first surround projection image and the hybrid projection image into the encoder respectively, and obtain the first image feature corresponding to the first surround projection image and the second image feature corresponding to the hybrid projection image.
第十四训练单元,用于分别将第一图像特征和第二图像特征输入解码器,得到与第一环视投影图对应的第一特征图谱和第一分割结果,以及与混合投影图对应的第二特征图谱和第二分割结果。The fourteenth training unit is used to input the first image feature and the second image feature into the decoder respectively, and obtain the first feature map and the first segmentation result corresponding to the first surround projection image, and the first segmentation result corresponding to the mixed projection image. Two feature maps and second segmentation results.
根据本公开的实施例,第一映射模块510包括第一映射单元、第二映射单元、第三映射单元和第四映射单元。According to an embodiment of the present disclosure, the first mapping module 510 includes a first mapping unit, a second mapping unit, a third mapping unit and a fourth mapping unit.
第一映射单元,用于对于每组点云数据,分别对点云数据中每个点的三维坐标数据进行极坐标转换,以得到点云数据中每个点的极坐标数据。 The first mapping unit is used to perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data for each set of point cloud data, so as to obtain the polar coordinate data of each point in the point cloud data.
第二映射单元,用于基于点云数据中每个点的极坐标数据,将点云数据中的多个点分别映射到初始视图的多个栅格中。The second mapping unit is used to map multiple points in the point cloud data to multiple grids in the initial view based on the polar coordinate data of each point in the point cloud data.
第三映射单元,用于对于初始视图的每个栅格,基于栅格中的点的三维坐标数据和极坐标数据,确定栅格的特征数据。The third mapping unit is used for determining, for each grid of the initial view, the characteristic data of the grid based on the three-dimensional coordinate data and polar coordinate data of the points in the grid.
第四映射单元,用于基于多个栅格的特征数据,构建得到环视投影图。The fourth mapping unit is used to construct a surround projection map based on the feature data of multiple grids.
需要说明的是,本公开的实施例中点云语义分割网络训练装置部分与本公开的实施例中点云语义分割网络训练方法部分是相对应的,点云语义分割网络训练装置部分的描述具体参考点云语义分割网络训练方法部分,在此不再赘述。It should be noted that the point cloud semantic segmentation network training device part in the embodiment of the present disclosure corresponds to the point cloud semantic segmentation network training method part in the embodiment of the present disclosure. The description of the point cloud semantic segmentation network training device part is specific. Refer to the point cloud semantic segmentation network training method section and will not go into details here.
图6示意性示出了根据本公开实施例的点云语义分割装置的框图。Figure 6 schematically shows a block diagram of a point cloud semantic segmentation device according to an embodiment of the present disclosure.
如图6所示,点云语义分割装置600包括第二映射模块610和第三处理模块620。As shown in FIG. 6 , the point cloud semantic segmentation device 600 includes a second mapping module 610 and a third processing module 620 .
第二映射模块610,用于将目标点云数据映射到初始视图中,得到环视投影图。The second mapping module 610 is used to map the target point cloud data to the initial view to obtain a surround projection image.
第三处理模块620,用于将所述环视投影图输入点云语义分割网络中,得到所述目标点云数据的语义分割特征图谱。The third processing module 620 is used to input the surround projection image into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data.
根据本公开的实施例,该点云语义分割网络可以利用上述点云语义分割网络训练方法部分的方法训练得到,在此不再赘述。According to embodiments of the present disclosure, the point cloud semantic segmentation network can be trained using the method in the point cloud semantic segmentation network training method section, which will not be described again here.
根据本公开的实施例的模块、子模块、单元、子单元中的任意多个、或其中任意多个的至少部分功能可以在一个模块中实现。根据本公开实施例的模块、子模块、单元、子单元中的任意一个或多个可以被拆分成多个模块来实现。根据本公开实施例的模块、子模块、单元、子单元中的任意一个或多个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式的硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,根据本公开实施例的模块、子模块、单元、子单元中的一个或多个 可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。Any number of modules, sub-modules, units, sub-units according to embodiments of the present disclosure, or at least part of the functions of any number of them, may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be split into multiple modules for implementation. Any one or more of the modules, sub-modules, units, and sub-units according to embodiments of the present disclosure may be at least partially implemented as hardware circuits, such as field programmable gate arrays (FPGAs), programmable logic arrays (PLA), System-on-a-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuits that can be implemented in hardware or firmware, or in a combination of software, hardware, and firmware Any one of these implementation methods or an appropriate combination of any of them. Alternatively, one or more of the modules, sub-modules, units, and sub-units according to embodiments of the present disclosure It can be at least partially implemented as a computer program module, and when the computer program module is executed, it can perform corresponding functions.
例如,第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550,或者,第二映射模块610和第三处理模块620中的任意多个可以合并在一个模块/单元/子单元中实现,或者其中的任意一个模块/单元/子单元可以被拆分成多个模块/单元/子单元。或者,这些模块/单元/子单元中的一个或多个模块/单元/子单元的至少部分功能可以与其他模块/单元/子单元的至少部分功能相结合,并在一个模块/单元/子单元中实现。根据本公开的实施例,第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550,或者,第二映射模块610和第三处理模块620中的至少一个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或二者,第一映射模块510、第一处理模块520、确定模块530、第二处理模块540和训练模块550,或者,第二映射模块610和第三处理模块620中的至少一个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。For example, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or any more of the second mapping module 610 and the third processing module 620 can be combined into one Implemented in modules/units/subunits, or any one of the modules/units/subunits can be split into multiple modules/units/subunits. Alternatively, at least part of the functionality of one or more of these modules/units/subunits may be combined with at least part of the functionality of other modules/units/subunits and combined in one module/unit/subunit realized in. According to an embodiment of the present disclosure, at least one of the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or the second mapping module 610 and the third processing module 620 may be implemented, at least in part, as hardware circuitry, such as a field programmable gate array (FPGA), a programmable logic array (PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (ASIC), or may It can be implemented by hardware or firmware in any other reasonable way to integrate or package circuits, or it can be implemented in any one of the three implementation methods of software, hardware and firmware or in an appropriate combination of any of them. Or both, the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or at least one of the second mapping module 610 and the third processing module 620 can be at least Partially implemented as computer program modules, when the computer program modules are executed, corresponding functions can be performed.
图7示意性示出了根据本公开实施例的适于实现点云语义分割网络训练方法或点云语义分割方法的电子设备的框图。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。FIG. 7 schematically shows a block diagram of an electronic device suitable for implementing a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure. The electronic device shown in FIG. 7 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
如图7所示,根据本公开实施例的计算机电子设备700包括处理器701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。处理器701例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC)),等等。处理器701还可以包括用于缓存用途的板载存储器。处理器701可以包括用于执行根据本公开实施例的方法流程的不同动作 的单一处理单元或者是多个处理单元。As shown in FIG. 7 , a computer electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can be loaded into a random access memory (RAM) 703 according to a program stored in a read-only memory (ROM) 702 or from a storage portion 708 perform various appropriate actions and processing according to the program in it. Processor 701 may include, for example, a general purpose microprocessor (eg, a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (eg, an application specific integrated circuit (ASIC)), among others. Processor 701 may also include onboard memory for caching purposes. The processor 701 may include different actions for performing the method flow according to embodiments of the present disclosure. A single processing unit or multiple processing units.
在RAM 703中,存储有电子设备700操作所需的各种程序和数据。处理器701、ROM 702以及RAM 703通过总线704彼此相连。处理器701通过执行ROM 702和/或RAM 703中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意,所述程序也可以存储在除ROM 702和RAM 703以外的一个或多个存储器中。处理器701也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。In the RAM 703, various programs and data required for the operation of the electronic device 700 are stored. The processor 701, ROM 702 and RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations according to the method flow of the embodiment of the present disclosure by executing programs in the ROM 702 and/or RAM 703. It should be noted that the program may also be stored in one or more memories other than ROM 702 and RAM 703. The processor 701 may also perform various operations according to the method flow of embodiments of the present disclosure by executing programs stored in the one or more memories.
根据本公开的实施例,电子设备700还可以包括输入/输出(I/O)接口705,输入/输出(I/O)接口705也连接至总线704。电子设备700还可以包括连接至I/O接口705的以下部件中的一项或多项:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。According to embodiments of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705 that is also connected to the bus 704 . Electronic device 700 may also include one or more of the following components connected to I/O interface 705: an input portion 706 including a keyboard, mouse, etc.; including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and an output section 707 of speakers and the like; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem and the like. The communication section 709 performs communication processing via a network such as the Internet. Driver 710 is also connected to I/O interface 705 as needed. Removable media 711, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on the drive 710 as needed, so that a computer program read therefrom is installed into the storage portion 708 as needed.
根据本公开的实施例,根据本公开实施例的方法流程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被处理器701执行时,执行本公开实施例的系统中限定的上述功能。根据本公开的实施例,上文描述的系统、设备、装置、模块、单元等可以通过计算机程序模块来实现。According to embodiments of the present disclosure, the method flow according to the embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the method illustrated in the flowchart. In such embodiments, the computer program may be downloaded and installed from the network via communication portion 709 and/or installed from removable media 711 . When the computer program is executed by the processor 701, the above-described functions defined in the system of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the systems, devices, devices, modules, units, etc. described above may be implemented by computer program modules.
本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的;也可以是单独存在,而未装配入该设备/装置/系统中。上述计算机可读存储介质承载 有一个或者多个程序,当上述一个或者多个程序被执行时,实现根据本公开实施例的方法。The present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be included in the device/device/system described in the above embodiments; it may also exist independently without being assembled into the device/system. in the device/system. The above computer-readable storage media carries There are one or more programs, and when the above one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.
根据本公开的实施例,计算机可读存储介质可以是非易失性的计算机可读存储介质。例如可以包括但不限于:便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include but are not limited to: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
例如,根据本公开的实施例,计算机可读存储介质可以包括上文描述的ROM 702和/或RAM 703和/或ROM 702和RAM 703以外的一个或多个存储器。For example, according to embodiments of the present disclosure, the computer-readable storage medium may include one or more memories other than ROM 702 and/or RAM 703 and/or ROM 702 and RAM 703 described above.
本公开的实施例还包括一种计算机程序产品,其包括计算机程序,该计算机程序包含用于执行本公开实施例所提供的方法的程序代码,当计算机程序产品在电子设备上运行时,该程序代码用于使电子设备实现本公开实施例所提供的点云语义分割网络训练方法或点云语义分割方法。Embodiments of the present disclosure also include a computer program product, which includes a computer program. The computer program includes program code for executing the method provided by the embodiment of the present disclosure. When the computer program product is run on an electronic device, the program The code is used to enable the electronic device to implement the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure.
在该计算机程序被处理器701执行时,执行本公开实施例的系统/装置中限定的上述功能。根据本公开的实施例,上文描述的系统、装置、模块、单元等可以通过计算机程序模块来实现。When the computer program is executed by the processor 701, the above-mentioned functions defined in the system/device of the embodiment of the present disclosure are performed. According to embodiments of the present disclosure, the systems, devices, modules, units, etc. described above may be implemented by computer program modules.
在一种实施例中,该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中,该计算机程序也可以在网络介质上以信号的形式进行传输、分发,并通过通信部分709被下载和安装,和/或从可拆卸介质711被安装。该计算机程序包含的程序代码可以用任何适当的网络介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。In one embodiment, the computer program may rely on tangible storage media such as optical storage devices and magnetic storage devices. In another embodiment, the computer program can also be transmitted and distributed in the form of a signal on a network medium, and downloaded and installed through the communication part 709, and/or installed from the removable medium 711. The program code contained in the computer program can be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
根据本公开的实施例,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码,具体地,可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实 施这些计算程序。程序设计语言包括但不限于诸如Java,C++,python,“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。According to the embodiments of the present disclosure, the program code for executing the computer program provided by the embodiments of the present disclosure may be written in any combination of one or more programming languages. Specifically, high-level procedural and/or object-oriented programming may be utilized. programming language, and/or assembly/machine language to implement implement these calculation procedures. Programming languages include, but are not limited to, programming languages such as Java, C++, python, "C" language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device, such as provided by an Internet service. (business comes via Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。本领域技术人员可以理解,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合,即使这样的组合或结合没有明确记载于本公开中。特别地,在不脱离本公开精神和教导的情况下,本公开的各个实施例和/或权利要求中记载的特征可以进行多种组合和/或结合。所有这些组合和/或结合均落入本公开的范围。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block in the block diagram or flowchart illustration, and combinations of blocks in the block diagram or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or may be implemented by special purpose hardware-based systems that perform the specified functions or operations. Achieved by a combination of specialized hardware and computer instructions. Those skilled in the art will understand that features recited in various embodiments and/or claims of the present disclosure may be combined and/or combined in various ways, even if such combinations or combinations are not explicitly recited in the present disclosure. In particular, various combinations and/or combinations of features recited in the various embodiments and/or claims of the disclosure may be made without departing from the spirit and teachings of the disclosure. All such combinations and/or combinations fall within the scope of this disclosure.
以上对本公开的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本公开的范围。尽管在以上分别描述了各实施例,但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本公开的范围之内。 The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although each embodiment is described separately above, this does not mean that the measures in the various embodiments cannot be used in combination to advantage. The scope of the disclosure is defined by the appended claims and their equivalents. Without departing from the scope of the present disclosure, those skilled in the art can make various substitutions and modifications, and these substitutions and modifications should all fall within the scope of the present disclosure.

Claims (14)

  1. 一种点云语义分割网络训练方法,包括:A point cloud semantic segmentation network training method, including:
    将多组点云数据分别映射到初始视图中,得到多个环视投影图;Map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections;
    基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中,所述第一环视投影图和所述第二环视投影图属于多个所述环视投影图;Based on the preset size, partition processing is performed on the first surround projection image and the second surround projection image respectively to obtain a plurality of first partition images and a plurality of second partition images, wherein the first surround view projection image and the third surround view projection image are The two surround projection images belong to a plurality of said surround projection images;
    从多个所述第一分区图中确定多个第一目标分区图;determining a plurality of first target partition maps from a plurality of said first partition maps;
    利用多个所述第一目标分区图中的每个所述第一目标分区图分别对所述第二环视投影图中的第二目标分区图进行替换,得到混合投影图,其中,所述第二目标分区图属于多个所述第二分区图,所述第一目标分区图与所述第二目标分区图的位置相同;以及Using each of the first target partition maps in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image, a hybrid projection image is obtained, wherein the third Two target partition maps belong to multiple second partition maps, and the positions of the first target partition map and the second target partition map are the same; and
    将所述第一环视投影图和所述混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。The first surround projection image and the hybrid projection image are used as training samples to train the initial network to obtain a point cloud semantic segmentation network.
  2. 根据权利要求1所述的方法,其中,所述将所述第一环视投影图和所述混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络,包括:The method according to claim 1, wherein the first surround projection image and the hybrid projection image are used as training samples to train an initial network to obtain a point cloud semantic segmentation network, including:
    分别将所述第一环视投影图和所述混合投影图输入所述初始网络中,得到与所述第一环视投影图对应的第一特征图谱和第一分割结果,以及与所述混合投影图对应的第二特征图谱和第二分割结果;The first surround projection image and the hybrid projection image are respectively input into the initial network to obtain a first feature map and a first segmentation result corresponding to the first surround projection image, as well as a first feature map corresponding to the hybrid projection image. The corresponding second feature map and second segmentation result;
    计算所述第一特征图谱和所述第二特征图谱之间的信息熵损失,得到第一损失值;Calculate the information entropy loss between the first feature map and the second feature map to obtain a first loss value;
    计算所述第一分割结果和所述第二分割结果之间的交叉熵损失,得到第二损失值;以及Calculate the cross-entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value; and
    利用所述第一损失值和第二损失值来调整所述初始网络的模型参数,以最终得到所述点云数据语义分割网络。The first loss value and the second loss value are used to adjust the model parameters of the initial network to finally obtain the point cloud data semantic segmentation network.
  3. 根据权利要求2所述的方法,其中,所述计算所述第一特征图谱和所述第二特征图谱之间的信息熵损失,得到第一损失值,包括: The method according to claim 2, wherein calculating the information entropy loss between the first feature map and the second feature map to obtain the first loss value includes:
    从所述第一特征图谱中确定与多个所述第一目标分区图相关的第一子特征图谱;Determine first sub-feature maps related to a plurality of the first target partition maps from the first feature maps;
    将所述第二特征图谱拆分为与多个所述第一目标分区图相关的第二子特征图谱和与多个所述第一目标分区图无关的第三子特征图谱;以及splitting the second feature map into a second sub-feature map related to the plurality of first target partition maps and a third sub-feature map unrelated to the plurality of first target partition maps; and
    在所述第一子特征图谱的置信概率大于预设阈值的情况下,以所述第一子特征图谱和所述第二子特征图谱作为正样本对,以所述第一子特征图谱和所述第三子特征图谱作为负样本对,计算所述正样本对和所述负样本对之间的信息熵损失,得到所述第一损失值。When the confidence probability of the first sub-feature map is greater than the preset threshold, the first sub-feature map and the second sub-feature map are used as a positive sample pair, and the first sub-feature map and the second sub-feature map are used as a positive sample pair. The third sub-feature map is used as a negative sample pair, and the information entropy loss between the positive sample pair and the negative sample pair is calculated to obtain the first loss value.
  4. 根据权利要求2所述的方法,其中,所述计算所述第一分割结果和所述第二分割结果之间的交叉熵损失,得到第二损失值,包括:The method according to claim 2, wherein calculating the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value includes:
    从所述第一分割结果中确定与多个所述第一目标分区图相关的第一子分割结果;Determine first sub-segmentation results related to a plurality of the first target partition maps from the first segmentation results;
    从所述第二分割结果中确定与多个所述第一目标分区图相关的第二子分割结果;Determine second sub-segmentation results related to a plurality of the first target partition maps from the second segmentation results;
    基于所述第一子分割结果的置信概率和所述第二子分割结果的置信概率,确定预测值和标签值;以及Determine prediction values and label values based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result; and
    计算所述预测值和所述标签值之间的交叉熵损失,得到所述第二损失值。Calculate the cross-entropy loss between the predicted value and the label value to obtain the second loss value.
  5. 根据权利要求4所述的方法,其中,所述基于所述第一子分割结果的置信概率和所述第二子分割结果的置信概率,确定预测值和标签值,包括:The method of claim 4, wherein determining the prediction value and the label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes:
    在所述第一子分割结果的置信概率大于所述第二子分割结果的置信概率的情况下,确定所述第一子分割结果为所述标签值,所述第二子分割结果为所述预测值;以及When the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result, it is determined that the first sub-segmentation result is the label value, and the second sub-segmentation result is the predicted value; and
    在所述第一子分割结果的置信概率小于所述第二子分割结果的置信概率的情况下,确定所述第一子分割结果为所述预测值,所述第二子分割结果为所述标签值。 When the confidence probability of the first sub-segmentation result is less than the confidence probability of the second sub-segmentation result, it is determined that the first sub-segmentation result is the predicted value, and the second sub-segmentation result is the tag value.
  6. 根据权利要求2所述的方法,其中,多个所述第一目标分区图中包括第三目标分区图,所述第三目标分区图具有真实标签;The method according to claim 2, wherein the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label;
    其中,所述计算所述第一分割结果和所述第二分割结果之间的交叉熵损失,得到第二损失值,包括:Wherein, calculating the cross-entropy loss between the first segmentation result and the second segmentation result to obtain the second loss value includes:
    从所述第一分割结果中确定与所述第三目标分区图相关的第三子分割结果,和与所述第三目标分区图无关且与多个所述第一目标分区图相关的第四子分割结果;A third sub-segmentation result related to the third target partition map is determined from the first segmentation result, and a fourth sub-segmentation result that is independent of the third target partition map and related to a plurality of the first target partition maps is determined. sub-segmentation results;
    从所述第二分割结果中确定与所述第三目标分区图相关的第五子分割结果,和与所述第三目标分区图无关且与多个所述第一目标分区图相关的第六子分割结果;A fifth sub-segmentation result related to the third target partition map is determined from the second segmentation result, and a sixth sub-segmentation result that is independent of the third target partition map and related to a plurality of the first target partition maps is determined. sub-segmentation result;
    计算所述第三子分割结果和所述真实标签之间的交叉熵损失,得到第三损失值;Calculate the cross-entropy loss between the third sub-segmentation result and the real label to obtain a third loss value;
    计算所述第四子分割结果和所述第六子分割结果之间的交叉熵损失,得到第四损失值;以及Calculate the cross-entropy loss between the fourth sub-segmentation result and the sixth sub-segmentation result to obtain a fourth loss value; and
    基于所述第三损失值和所述第四损失值,确定所述第二损失值。The second loss value is determined based on the third loss value and the fourth loss value.
  7. 根据权利要求2所述的方法,其中,所述初始网络包括编码器和解码器;The method of claim 2, wherein the initial network includes an encoder and a decoder;
    其中,所述分别将所述第一环视投影图和所述混合投影图输入所述初始网络中,得到与所述第一环视投影图对应的第一特征图谱和第一分割结果,以及与所述混合投影图对应的第二特征图谱和第二分割结果,包括:Wherein, the first surround projection image and the hybrid projection image are respectively input into the initial network, and a first feature map and a first segmentation result corresponding to the first surround projection image are obtained, as well as a first segmentation result corresponding to the first surround view projection image and the first segmentation result. The second feature map and the second segmentation result corresponding to the mixed projection image include:
    分别将所述第一环视投影图和所述混合投影图输入所述编码器,得到与所述第一环视投影图对应的第一图像特征和与所述混合投影图对应的第二图像特征;以及Input the first surround projection image and the mixed projection image into the encoder respectively, and obtain the first image feature corresponding to the first surround view projection image and the second image feature corresponding to the mixed projection image; as well as
    分别将所述第一图像特征和所述第二图像特征输入所述解码器,得到与所述第一环视投影图对应的所述第一特征图谱和所述第一分割结果,以及与所述混合投影图对应的所述第二特征图谱和所述第二分割结果。 The first image feature and the second image feature are input into the decoder respectively, and the first feature map and the first segmentation result corresponding to the first surround projection image are obtained, as well as the The second feature map corresponding to the projection map and the second segmentation result are mixed.
  8. 根据权利要求1所述的方法,其中,所述将多组点云数据分别映射到初始视图中,得到多个环视投影图,包括:The method according to claim 1, wherein the plurality of sets of point cloud data are respectively mapped to the initial view to obtain multiple surround projections, including:
    对于每组所述点云数据,分别对所述点云数据中每个点的三维坐标数据进行极坐标转换,以得到所述点云数据中每个点的极坐标数据;For each set of point cloud data, perform polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data to obtain polar coordinate data of each point in the point cloud data;
    基于所述点云数据中每个点的极坐标数据,将所述点云数据中的多个点分别映射到所述初始视图的多个栅格中;Based on the polar coordinate data of each point in the point cloud data, map multiple points in the point cloud data to multiple grids of the initial view respectively;
    对于所述初始视图的每个栅格,基于所述栅格中的点的三维坐标数据和极坐标数据,确定所述栅格的特征数据;以及For each grid of the initial view, determining characteristic data of the grid based on three-dimensional coordinate data and polar coordinate data of points in the grid; and
    基于多个所述栅格的特征数据,构建得到所述环视投影图。Based on the characteristic data of a plurality of the grids, the surrounding projection map is constructed.
  9. 一种点云语义分割方法,包括:A point cloud semantic segmentation method, including:
    将目标点云数据映射到初始视图中,得到环视投影图;以及Map the target point cloud data into the initial view to obtain the surround projection; and
    将所述环视投影图输入点云语义分割网络中,得到所述目标点云数据的语义分割特征图谱;Input the surround projection image into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the target point cloud data;
    其中,所述点云语义分割网络包括利用根据权利要求1~8中任一项所述的点云语义分割网络训练方法训练得到。Wherein, the point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method according to any one of claims 1 to 8.
  10. 一种点云语义分割网络训练装置,包括:A point cloud semantic segmentation network training device, including:
    第一映射模块,用于将多组点云数据分别映射到初始视图中,得到多个环视投影图;The first mapping module is used to map multiple sets of point cloud data to the initial view respectively to obtain multiple surround projections;
    第一处理模块,用于基于预设尺寸,分别对第一环视投影图和第二环视投影图进行分区处理,得到多个第一分区图和多个第二分区图,其中,所述第一环视投影图和所述第二环视投影图属于多个所述环视投影图;The first processing module is configured to perform partition processing on the first surround projection image and the second surround projection image respectively based on the preset size to obtain a plurality of first partition images and a plurality of second partition images, wherein the first The surround projection image and the second surround projection image belong to a plurality of the surround projection images;
    确定模块,用于从多个所述第一分区图中确定多个第一目标分区图;a determining module, configured to determine a plurality of first target partition maps from a plurality of first partition maps;
    第二处理模块,用于利用多个所述第一目标分区图中的每个所述第一目标分区图分别对所述第二环视投影图中的第二目标分区图进行替换,得到混合投影图,其中,所述第二目标分区图属于多个所述第二分区图,所述第一目标分区图与所述第二目标分区图的位置相同;以及 The second processing module is configured to use each of the first target partition maps in the plurality of first target partition maps to respectively replace the second target partition map in the second surround projection image to obtain a hybrid projection. Figure, wherein the second target partition map belongs to a plurality of the second partition maps, and the positions of the first target partition map and the second target partition map are the same; and
    训练模块,用于将所述第一环视投影图和所述混合投影图作为训练样本来对初始网络进行训练,得到点云语义分割网络。A training module, configured to use the first surround projection image and the hybrid projection image as training samples to train the initial network to obtain a point cloud semantic segmentation network.
  11. 一种点云语义分割装置,包括:A point cloud semantic segmentation device, including:
    第二映射模块,用于将目标点云数据映射到初始视图中,得到环视投影图;以及The second mapping module is used to map the target point cloud data to the initial view to obtain the surround projection; and
    第三处理模块,用于将所述环视投影图输入点云语义分割网络中,得到所述目标点云数据的语义分割特征图谱;The third processing module is used to input the surround projection image into the point cloud semantic segmentation network to obtain the semantic segmentation feature map of the target point cloud data;
    其中,所述点云语义分割网络包括利用根据权利要求1~8中任一项所述的点云语义分割网络训练方法训练得到。Wherein, the point cloud semantic segmentation network is trained using the point cloud semantic segmentation network training method according to any one of claims 1 to 8.
  12. 一种电子设备,包括:An electronic device including:
    一个或多个处理器;one or more processors;
    存储器,用于存储一个或多个指令,memory for storing one or more instructions,
    其中,当所述一个或多个指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现权利要求1至9中任一项所述的方法。Wherein, when the one or more instructions are executed by the one or more processors, the one or more processors are caused to implement the method described in any one of claims 1 to 9.
  13. 一种计算机可读存储介质,其上存储有可执行指令,所述可执行指令被处理器执行时使处理器实现权利要求1至9中任一项所述的方法。A computer-readable storage medium having executable instructions stored thereon, which when executed by a processor causes the processor to implement the method described in any one of claims 1 to 9.
  14. 一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,所述计算机可执行指令在被执行时用于实现权利要求1至9中任一项所述的方法。 A computer program product, which includes computer-executable instructions that, when executed, are used to implement the method according to any one of claims 1 to 9.
PCT/CN2023/082749 2022-08-24 2023-03-21 Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus WO2024040954A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211022552.3 2022-08-24
CN202211022552.3A CN115375899A (en) 2022-08-24 2022-08-24 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Publications (1)

Publication Number Publication Date
WO2024040954A1 true WO2024040954A1 (en) 2024-02-29

Family

ID=84068279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082749 WO2024040954A1 (en) 2022-08-24 2023-03-21 Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus

Country Status (2)

Country Link
CN (1) CN115375899A (en)
WO (1) WO2024040954A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115375899A (en) * 2022-08-24 2022-11-22 北京京东乾石科技有限公司 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device
CN116721399B (en) * 2023-07-26 2023-11-14 之江实验室 Point cloud target detection method and device for quantitative perception training

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008941A (en) * 2019-06-05 2019-07-12 长沙智能驾驶研究院有限公司 Drivable region detection method, device, computer equipment and storage medium
CN111354478A (en) * 2018-12-24 2020-06-30 黄庆武整形医生集团(深圳)有限公司 Shaping simulation information processing method, shaping simulation terminal and shaping service terminal
CN113421217A (en) * 2020-03-02 2021-09-21 北京京东乾石科技有限公司 Method and device for detecting travelable area
CN113496491A (en) * 2020-03-19 2021-10-12 广州汽车集团股份有限公司 Road surface segmentation method and device based on multi-line laser radar
KR102334177B1 (en) * 2020-07-21 2021-12-03 대한민국 Method and system for establishing 3-dimensional indoor information for indoor evacuation
CN115375899A (en) * 2022-08-24 2022-11-22 北京京东乾石科技有限公司 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354478A (en) * 2018-12-24 2020-06-30 黄庆武整形医生集团(深圳)有限公司 Shaping simulation information processing method, shaping simulation terminal and shaping service terminal
CN110008941A (en) * 2019-06-05 2019-07-12 长沙智能驾驶研究院有限公司 Drivable region detection method, device, computer equipment and storage medium
CN113421217A (en) * 2020-03-02 2021-09-21 北京京东乾石科技有限公司 Method and device for detecting travelable area
CN113496491A (en) * 2020-03-19 2021-10-12 广州汽车集团股份有限公司 Road surface segmentation method and device based on multi-line laser radar
KR102334177B1 (en) * 2020-07-21 2021-12-03 대한민국 Method and system for establishing 3-dimensional indoor information for indoor evacuation
CN115375899A (en) * 2022-08-24 2022-11-22 北京京东乾石科技有限公司 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Also Published As

Publication number Publication date
CN115375899A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
WO2024040954A1 (en) Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus
US10733441B2 (en) Three dimensional bounding box estimation from two dimensional images
US11610115B2 (en) Learning to generate synthetic datasets for training neural networks
CN108304775B (en) Remote sensing image recognition method and device, storage medium and electronic equipment
CN108229479B (en) Training method and device of semantic segmentation model, electronic equipment and storage medium
EP4033453A1 (en) Training method and apparatus for target detection model, device and storage medium
WO2020108311A1 (en) 3d detection method and apparatus for target object, and medium and device
US10817714B2 (en) Method and apparatus for predicting walking behaviors, data processing apparatus, and electronic device
WO2020020146A1 (en) Method and apparatus for processing laser radar sparse depth map, device, and medium
US10210418B2 (en) Object detection system and object detection method
CN110622177B (en) Instance partitioning
WO2020253121A1 (en) Target detection method and apparatus, intelligent driving method and device, and storage medium
US11151447B1 (en) Network training process for hardware definition
US20190080455A1 (en) Method and device for three-dimensional feature-embedded image object component-level semantic segmentation
US10346996B2 (en) Image depth inference from semantic labels
Raghavan et al. Optimized building extraction from high-resolution satellite imagery using deep learning
EP3859560A2 (en) Method and apparatus for visual question answering, computer device and medium
US11967132B2 (en) Lane marking detecting method, apparatus, electronic device, storage medium, and vehicle
CN112927234A (en) Point cloud semantic segmentation method and device, electronic equipment and readable storage medium
US20220222824A1 (en) Fully automated multimodal system architecture for semantic segmentation of large-scale 3d outdoor point cloud data
EP3665614A1 (en) Extraction of spatial-temporal features from a video
WO2022143366A1 (en) Image processing method and apparatus, electronic device, medium, and computer program product
EP4307219A1 (en) Three-dimensional target detection method and apparatus
WO2023083030A1 (en) Posture recognition method and related device
WO2023082588A1 (en) Semantic annotation method and apparatus, electronic device, storage medium, and computer program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856050

Country of ref document: EP

Kind code of ref document: A1