CN115375899A - Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device - Google Patents

Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device Download PDF

Info

Publication number
CN115375899A
CN115375899A CN202211022552.3A CN202211022552A CN115375899A CN 115375899 A CN115375899 A CN 115375899A CN 202211022552 A CN202211022552 A CN 202211022552A CN 115375899 A CN115375899 A CN 115375899A
Authority
CN
China
Prior art keywords
point cloud
sub
map
projection
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211022552.3A
Other languages
Chinese (zh)
Inventor
温欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Qianshi Technology Co Ltd
Original Assignee
Beijing Jingdong Qianshi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Qianshi Technology Co Ltd filed Critical Beijing Jingdong Qianshi Technology Co Ltd
Priority to CN202211022552.3A priority Critical patent/CN115375899A/en
Publication of CN115375899A publication Critical patent/CN115375899A/en
Priority to PCT/CN2023/082749 priority patent/WO2024040954A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a point cloud semantic segmentation network training method, a point cloud semantic segmentation device, electronic equipment and a storage medium, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: respectively mapping the plurality of groups of point cloud data into the initial view to obtain a plurality of all-round projection views; based on a preset size, respectively carrying out partition processing on the first annular view projection drawing and the second annular view projection drawing to obtain a plurality of first partition drawings and a plurality of second partition drawings; determining a plurality of first target partition maps from the plurality of first partition maps; replacing a second target partition map in the second surround-view projection map with each of the plurality of first target partition maps to obtain a mixed projection map; and training the initial network by taking the first all-round projection graph and the mixed projection graph as training samples to obtain the point cloud semantic segmentation network.

Description

Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a point cloud semantic segmentation network training method, a point cloud semantic segmentation apparatus, an electronic device, and a storage medium.
Background
With the development of three-dimensional sensing technology, point cloud data is widely applied to many fields such as automatic driving, robot grabbing and the like. The deep learning technology is used as a mainstream solution for point cloud data analysis, and shows good performance in the aspect of point cloud data processing. Because point cloud data collected by various sensors is usually label-free data and the cost of manually labeling the data is high, a semi-supervised training method is usually used to construct a deep neural network in the related art.
In the related technology, the research of a semi-supervised training algorithm for promoting a semantic segmentation task is mainly focused on the field of two-dimensional images, and the problem of three-dimensional shape distortion is generated when the method is directly applied to a segmentation task of three-dimensional point cloud, so that the semantic segmentation effect of point cloud data is poor indirectly.
Disclosure of Invention
In view of the above, the present disclosure provides a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, an apparatus, an electronic device, a readable storage medium, and a computer program product.
One aspect of the present disclosure provides a point cloud semantic segmentation network training method, including: respectively mapping the multiple groups of point cloud data to the initial view to obtain multiple all-round projection views; based on a preset size, respectively carrying out partition processing on a first annular view projection drawing and a second annular view projection drawing to obtain a plurality of first partition drawings and a plurality of second partition drawings, wherein the first annular view projection drawing and the second annular view projection drawing belong to the plurality of annular view projection drawings; determining a plurality of first target zone maps from a plurality of first zone maps; replacing a second object partition map in the second circular projection map with each of the first object partition maps to obtain a hybrid projection map, wherein the second object partition map belongs to the second partition maps, and the first object partition map and the second object partition map are located at the same position; and training the initial network by taking the first all-round projection drawing and the mixed projection drawing as training samples to obtain the point cloud semantic segmentation network.
According to an embodiment of the disclosure, training an initial network by using the first perspective projection view and the mixed projection view as training samples to obtain a point cloud semantic segmentation network includes: inputting the first all-round projection drawing and the mixed projection drawing into the initial network respectively to obtain a first feature map and a first segmentation result corresponding to the first all-round projection drawing, and a second feature map and a second segmentation result corresponding to the mixed projection drawing; calculating information entropy loss between the first characteristic map and the second characteristic map to obtain a first loss value; calculating the cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value; and adjusting the model parameters of the initial network by using the first loss value and the second loss value to finally obtain the point cloud data semantic segmentation network.
According to an embodiment of the present disclosure, the calculating an entropy loss between the first feature map and the second feature map to obtain a first loss value includes: determining a first sub-feature map associated with a plurality of said first target partition maps from said first feature map; splitting the second feature map into a second sub-feature map associated with the plurality of first target partition maps and a third sub-feature map not associated with the plurality of first target partition maps; and under the condition that the confidence probability of the first sub-feature map is larger than a preset threshold value, taking the first sub-feature map and the second sub-feature map as a positive sample pair, taking the first sub-feature map and the third sub-feature map as a negative sample pair, and calculating the information entropy loss between the positive sample pair and the negative sample pair to obtain the first loss value.
According to an embodiment of the present disclosure, the calculating a cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value includes: determining a first sub-segmentation result related to a plurality of first target partition maps from the first segmentation result; determining a second sub-division result related to a plurality of first target division maps from the second division result; determining a predicted value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result; and calculating the cross entropy loss between the predicted value and the label value to obtain the second loss value.
According to an embodiment of the present disclosure, the determining a predicted value and a tag value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result includes: determining the first sub-segmentation result as the label value and the second sub-segmentation result as the predicted value when the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result; and determining the first sub-division result as the predicted value and the second sub-division result as the label value when the confidence probability of the first sub-division result is smaller than the confidence probability of the second sub-division result.
According to an embodiment of the present disclosure, the plurality of first target partition maps include a third target partition map, where the third target partition map has a real tag; wherein the calculating the cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value comprises: determining a third sub-division result related to the third target partition map and a fourth sub-division result unrelated to the third target partition map and related to the plurality of first target partition maps from the first division result; determining, from the second segmentation result, a fifth sub-segmentation result related to the third target partition map and a sixth sub-segmentation result unrelated to the third target partition map and related to the plurality of first target partition maps; calculating the cross entropy loss between the third sub-segmentation result and the real label to obtain a third loss value; calculating cross entropy loss between the fourth sub-division result and the sixth sub-division result to obtain a fourth loss value; and determining the second loss value based on the third loss value and the fourth loss value.
According to an embodiment of the present disclosure, the initial network includes an encoder and a decoder; wherein the inputting the first all-round projection drawing and the mixed projection drawing into the initial network to obtain a first feature map and a first segmentation result corresponding to the first all-round projection drawing, and a second feature map and a second segmentation result corresponding to the mixed projection drawing comprises: inputting the first circular projection drawing and the mixed projection drawing into the encoder respectively to obtain a first image feature corresponding to the first circular projection drawing and a second image feature corresponding to the mixed projection drawing; and inputting the first image feature and the second image feature to the decoder, respectively, to obtain the first feature map and the first division result corresponding to the first all-round projection view, and the second feature map and the second division result corresponding to the mixed projection view.
According to an embodiment of the present disclosure, the mapping the plurality of groups of point cloud data to the initial view respectively to obtain a plurality of perspective projection views includes: for each group of point cloud data, respectively performing polar coordinate conversion on three-dimensional coordinate data of each point in the point cloud data to obtain polar coordinate data of each point in the point cloud data; respectively mapping a plurality of points in the point cloud data into a plurality of grids of the initial view based on the polar coordinate data of each point in the point cloud data; for each grid of the initial view, determining feature data of the grid based on three-dimensional coordinate data and polar coordinate data of points in the grid; and constructing and obtaining the circular-view projection drawing based on the characteristic data of a plurality of grids.
Another aspect of the present disclosure provides a point cloud semantic segmentation method, including: mapping the target point cloud data to an initial view to obtain a perspective projection view; inputting the all-round projection drawing into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data; the point cloud semantic segmentation network is obtained by utilizing the point cloud semantic segmentation network training method.
Another aspect of the present disclosure provides a point cloud semantic segmentation network training apparatus, including: the first mapping module is used for mapping the multiple groups of point cloud data into the initial view respectively to obtain multiple all-round projection views; the first processing module is used for respectively carrying out partition processing on a first annular view projection drawing and a second annular view projection drawing based on a preset size to obtain a plurality of first partition drawings and a plurality of second partition drawings, wherein the first annular view projection drawing and the second annular view projection drawing belong to the plurality of annular view projection drawings; a determining module, configured to determine a plurality of first target partition maps from the plurality of first partition maps; a second processing module, configured to replace a second object partition map in the second circular projection map with each of the first object partition maps to obtain a hybrid projection map, where the second object partition map belongs to the second object partition maps, and the first object partition map and the second object partition map are located at the same position; and the training module is used for training the initial network by taking the first all-round projection drawing and the mixed projection drawing as training samples to obtain the point cloud semantic segmentation network.
Another aspect of the present disclosure provides a point cloud semantic segmentation apparatus, including: the second mapping module is used for mapping the target point cloud data to the initial view to obtain a perspective projection view; the third processing module is used for inputting the all-round projection drawing into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data; the point cloud semantic segmentation network is obtained by utilizing the point cloud semantic segmentation network training method.
Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more instructions, wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the disclosure provides a computer program product comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, when a point cloud semantic segmentation network is trained, point cloud data can be mapped into a surround view projection graph, and a first surround view projection graph and a second surround view projection graph are subjected to partition mixing, namely, a part of partitions in the first surround view projection graph are used for replacing corresponding partitions in the second surround view projection graph to obtain a mixed projection graph, and then an initial network can be trained by using the mixed projection graph and the first surround view projection graph to finally obtain the point cloud semantic segmentation network. Through a partition mixing mode, the forced decoupling of the partial partitions and the background can be realized, the richness of data can be effectively improved, the dependence of the network on the background and global information when predicting local areas is reduced, and the identification capability of the network is improved. Meanwhile, the three-dimensional shape of the original point cloud projected on the panoramic projection drawing can be effectively kept through a partition mixing mode, the problems of three-dimensional deformation and shape information loss caused by data enhancement can be at least partially solved, and the robustness of the network can be improved. By the technical means, the utilization efficiency of hardware resources in the network training process can be effectively improved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an exemplary system architecture to which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, and an apparatus may be applied according to an embodiment of the present disclosure.
Fig. 2 schematically illustrates a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.
Fig. 3 schematically illustrates a schematic diagram of a training flow of a point cloud semantic segmentation network according to an embodiment of the present disclosure.
Fig. 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.
Fig. 5 schematically illustrates a block diagram of a point cloud semantic segmentation network training apparatus according to an embodiment of the present disclosure.
Fig. 6 schematically illustrates a block diagram of a point cloud semantic segmentation apparatus according to an embodiment of the present disclosure.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
In those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). Where a convention analogous to "at least one of A, B, or C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, or C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
In the technical field of automatic driving, sensing and identifying the surrounding environment by using a deep learning technology is an extremely important basic research. However, the deep neural network constructed by the deep learning technology often needs a large amount of manual labeling data for training, and the cost and time consumption of the part of manual labeling data often become barriers for preventing the performance of the deep neural network model from being improved. On the other hand, during the driving process of the unmanned vehicle, a large amount of original non-labeled data can be collected through various sensors. Therefore, how to use the original unmarked data and a small amount of marked data for assistance to train the neural network, namely, the recognition and classification performance of the neural network is improved by adopting a semi-supervised training mode, and the method is an important research task which can play roles of synergy and cost reduction in the process of researching and developing an automatic driving system.
In the related art, the research for improving the semantic segmentation task by using a semi-supervised training algorithm is mainly focused on the field of two-dimensional images. The semi-supervised training algorithm research aiming at the three-dimensional point cloud scene, in particular to a three-dimensional point cloud semantic segmentation model based on a laser radar scanning result is still in a blank stage. Due to the modal difference between the two-dimensional image and the three-dimensional point cloud, the point cloud semantic segmentation semi-supervised training algorithm on the two-dimensional image cannot be directly and effectively transplanted to a three-dimensional point cloud semantic segmentation task. For example, when semantic segmentation is performed on a three-dimensional point cloud through a perspective projection view, the three-dimensional shape of the three-dimensional point cloud is distorted by using a conventional two-dimensional image data enhancement method, such as noise increase, rotation, scaling, and the like, so that the training effect of a model is affected.
In view of this, the embodiment of the present disclosure provides a method capable of performing semi-supervised training on a point cloud semantic segmentation network by effectively using a large amount of laser radar original point cloud data and simultaneously assisting with a small amount of labeled data.
Specifically, embodiments of the present disclosure provide a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, an apparatus, an electronic device, and a storage medium. The point cloud semantic segmentation network training method comprises the following steps: respectively mapping the multiple groups of point cloud data to the initial view to obtain multiple all-round projection views; based on a preset size, respectively carrying out partition processing on a first surround view projection drawing and a second surround view projection drawing to obtain a plurality of first partition drawings and a plurality of second partition drawings, wherein the first surround view projection drawing and the second surround view projection drawing belong to the plurality of surround view projection drawings; determining a plurality of first target partition maps from the plurality of first partition maps; replacing a second target partition map in the second surround-view projection map by using each first target partition map in the plurality of first target partition maps to obtain a mixed projection map, wherein the second target partition map belongs to the plurality of second partition maps, and the positions of the first target partition map and the second target partition map are the same; and training the initial network by taking the first all-round projection graph and the mixed projection graph as training samples to obtain the point cloud semantic segmentation network.
Fig. 1 schematically illustrates an exemplary system architecture to which a point cloud semantic segmentation network training method, a point cloud semantic segmentation method, and an apparatus may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105.
The terminal devices 101, 102, and 103 may be various devices configured with a laser radar, or may be various electronic devices capable of controlling a laser radar, or may be various electronic devices capable of storing point cloud data.
Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The server 105 may be a server that provides various services, for example, the server may provide support for computing resources and storage resources for training processes of point cloud semantic segmentation networks.
It should be noted that the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The terminal devices 101, 102, and 103 may acquire point cloud data, or the terminal devices 101, 102, and 103 may acquire point cloud data acquired by other terminal devices through the internet or the like, and the point cloud data may be sent to the server 105 through the network, so that the server 105 executes the method provided in the embodiment of the present disclosure, to implement training of the point cloud semantic segmentation network or perform point cloud semantic segmentation on the point cloud data. The point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105. Correspondingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Alternatively, the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiment of the present disclosure may also be executed by the terminal device 101, 102, or 103, or may also be executed by another terminal device different from the terminal device 101, 102, or 103. Accordingly, the point cloud semantic segmentation network training device or the point cloud semantic segmentation device provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103, or in another terminal device different from the terminal device 101, 102, or 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flow chart of a point cloud semantic segmentation network training method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S205.
In operation S201, a plurality of groups of point cloud data are respectively mapped to the initial views, so as to obtain a plurality of perspective projection views.
In operation S202, the first surround view projection and the second surround view projection are respectively partitioned based on a preset size, so as to obtain a plurality of first partition maps and a plurality of second partition maps, where the first surround view projection and the second surround view projection belong to the plurality of surround view projections.
In operation S203, a plurality of first target partition maps are determined from among the plurality of first partition maps.
In operation S204, a second object partition map in the second ring-view projection map is replaced with each of the plurality of first object partition maps, respectively, to obtain a hybrid projection map, where the second object partition map belongs to the plurality of second partition maps, and the first object partition map and the second object partition map are located at the same position.
In operation S205, the initial network is trained by using the first all-round projection graph and the mixed projection graph as training samples, so as to obtain a point cloud semantic segmentation network.
According to the embodiment of the disclosure, the point cloud data can be acquired by using sensing equipment such as a rotary scanning laser radar, each group of point cloud data can be configured with a preset rectangular coordinate system, each point in the point cloud data can be represented as a three-dimensional coordinate under the rectangular coordinate system, and the center of the rectangular coordinate system can represent the position of the sensing equipment when the point cloud data is acquired.
According to embodiments of the present disclosure, point cloud data acquired using a rotary scanning lidar may be distributed within a sphere, and the initial view may be developed from an annular surface of the sphere near a horizontal plane. For each point in the point cloud data, a direction vector of the point in mapping can be determined based on the coordinates of the point, and the point can be projected onto the initial view by using the direction vector.
According to an embodiment of the present disclosure, partitioning the perspective view based on a preset size may equally divide the perspective view into a plurality of rectangular regions. The size of the preset size may be determined according to the size of the perspective projection view in a specific application scenario, and is not limited herein. For example, the resolution of the perspective view may be 24 × 480, and when the perspective view is divided, the perspective view may be divided into 16 parts and 6 parts in the length and width directions, respectively, so that the perspective view may be divided into 96 pieces of divisional views with a resolution of 4 × 30.
According to an embodiment of the present disclosure, the first and second surround-view projection views may be randomly selected from a plurality of surround-view projection views. The first and second perspective projection views may have completely different features, that is, the point cloud data corresponding to the first perspective projection view and the point cloud data corresponding to the second perspective projection view may be acquired from different objects in different scenes.
According to an embodiment of the present disclosure, the first target partition map may be randomly acquired from a plurality of first partition maps, and the first target partition map may occupy a certain proportion in the first partition map, for example, the proportion may be 25%, 30%, and the like, which is not limited herein.
According to an embodiment of the present disclosure, the replacing the second object partition map of the second circular projection map with each of the plurality of first object partition maps may include: and determining a second target partition map from the second annular view projection map according to the position information of the first target partition map, deleting the second target partition map, and filling the first target partition map into a corresponding position.
According to the embodiments of the present disclosure, the method used in training the initial network is not limited herein, and may be, for example, a gradient descent method, a least squares method, or the like. The training parameters, such as training times, batch capacity, learning rate, and the like, set when the initial network is trained may be set according to a specific application scenario, which is not limited herein.
According to the embodiment of the disclosure, when a point cloud semantic segmentation network is trained, point cloud data can be mapped into a surround view projection graph, and a first surround view projection graph and a second surround view projection graph are subjected to partition mixing, namely, a part of partitions in the first surround view projection graph are used for replacing corresponding partitions in the second surround view projection graph to obtain a mixed projection graph, and then an initial network can be trained by using the mixed projection graph and the first surround view projection graph to finally obtain the point cloud semantic segmentation network. Through a partition mixing mode, the forced decoupling of the partial partitions and the background can be realized, the richness of data can be effectively improved, the dependence of the network on the background and global information when predicting a local area is reduced, and the identification capability of the network is improved. Meanwhile, the three-dimensional shape of the original point cloud projected on the panoramic projection graph can be effectively reserved through a partition mixing mode, the problems of three-dimensional deformation and shape information loss caused by data enhancement can be at least partially solved, and the robustness of the network can be improved. By the technical means, the utilization efficiency of hardware resources in the network training process can be effectively improved.
The method of fig. 2 is further described with reference to fig. 3 in conjunction with specific embodiments.
According to an embodiment of the present disclosure, the perspective projection view may be obtained by using the method of operation S201, and specifically, operation S201 may include the following operations:
for each group of point cloud data, respectively carrying out polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data to obtain the polar coordinate data of each point in the point cloud data; respectively mapping a plurality of points in the point cloud data into a plurality of grids of the initial view based on the polar coordinate data of each point in the point cloud data; for each grid of the initial view, determining feature data of the grid based on three-dimensional coordinate data and polar coordinate data of points in the grid; and constructing and obtaining a surrounding view projection diagram based on the characteristic data of the grids.
According to an embodiment of the present disclosure, each point in the point cloud data may have three-dimensional coordinate data, i.e., x, y, and z, and polar coordinate conversion is performed on the point, and converted coordinates yaw and pitch, i.e., polar coordinate data, in a rotating coordinate system may be obtained.
According to embodiments of the present disclosure, a grid of an initial view may refer to a pixel patch corresponding to a single pixel point in the initial view. For example, the resolution of the initial view may be 20 × 480, the initial view may have 9600 pixel tiles, and accordingly, the initial view may have 9600 grids.
According to an embodiment of the present disclosure, in a case where a plurality of points are mapped in a grid, feature data of a point closest to an origin among the plurality of points may be taken as feature data of the grid. The feature data of the point may include three-dimensional coordinate data, polar coordinate data, and data processed based on the three-dimensional coordinate data and the polar coordinate data, such as reflectivity data, depth data, and the like.
Fig. 3 schematically illustrates a schematic diagram of a training flow of a point cloud semantic segmentation network according to an embodiment of the present disclosure.
As shown in fig. 3, the training process of the point cloud semantic segmentation network may include a sample preprocessing process and a network iteration training process.
According to an embodiment of the present disclosure, a portion of the partitions in the first surround-view projection view may be replaced in the second surround-view projection view during sample preprocessing to obtain a hybrid projection view. For a specific method, refer to the methods of operations S202 to S204, which are not described herein again.
According to the embodiment of the disclosure, in the network iterative training process, the first all-round projection graph and the mixed projection graph are used as a sample pair and input into the initial network, and model parameters of the initial network are adjusted based on a set loss function and model iterative methods such as a gradient descent method and a least square method, so as to achieve training of the initial network.
According to embodiments of the present disclosure, the initial network may include an encoder and a decoder.
According to an embodiment of the present disclosure, inputting the first all-round projection graph and the mixed projection graph into the initial network respectively to obtain a first feature map and a first segmentation result corresponding to the first all-round projection graph, and a second feature map and a second segmentation result corresponding to the mixed projection graph may include the following operations:
inputting the first all-round projection drawing and the mixed projection drawing into an encoder respectively to obtain a first image characteristic corresponding to the first all-round projection drawing and a second image characteristic corresponding to the mixed projection drawing; and inputting the first image characteristic and the second image characteristic into a decoder respectively to obtain a first characteristic map and a first segmentation result corresponding to the first all-round projection drawing, and a second characteristic map and a second segmentation result corresponding to the mixed projection drawing.
According to embodiments of the present disclosure, the encoder may be any feature extraction network, such as ResNet 18.
According to embodiments of the present disclosure, the decoder may be any feature upsampling network, such as UperNet or the like.
According to an embodiment of the present disclosure, the network iterative training process may specifically include the following operations:
inputting the first all-round projection drawing and the mixed projection drawing into an initial network respectively to obtain a first feature map and a first segmentation result corresponding to the first all-round projection drawing, and a second feature map and a second segmentation result corresponding to the mixed projection drawing; calculating information entropy loss between the first characteristic map and the second characteristic map to obtain a first loss value; calculating the cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value; and adjusting the model parameters of the initial network by using the first loss value and the second loss value to finally obtain the point cloud data semantic segmentation network.
According to an embodiment of the present disclosure, the first segmentation result may represent a semantic feature segmentation result of each region in the first perspective projection view.
According to an embodiment of the present disclosure, the first feature map may have the same size as the first perspective projection view, and regions on the first feature map having different semantic features may have different color features. For example, the regions with different semantic features on the first feature map may respectively refer to regions where people, cars and obstacles are located, and the three regions may be respectively represented by red, blue and green.
According to an embodiment of the present disclosure, calculating an information entropy loss between the first feature map and the second feature map, and obtaining the first loss value may include the following operations:
determining a first sub-feature map associated with a plurality of first target partition maps from the first feature map; splitting the second feature map into a second sub-feature map associated with the plurality of first target partition maps and a third sub-feature map not associated with the plurality of first target partition maps; and under the condition that the confidence probability of the first sub-feature map is larger than a preset threshold value, taking the first sub-feature map and the second sub-feature map as a positive sample pair, taking the first sub-feature map and the third sub-feature map as a negative sample pair, and calculating the information entropy loss between the positive sample pair and the negative sample pair to obtain a first loss value.
According to an embodiment of the present disclosure, since the first feature map may have the same size as the first all-round projection view, the first sub-feature map may be determined from the first feature map based on position information of the first object partition map in the first all-round projection view.
According to an embodiment of the present disclosure, the calculation method of the confidence probability of the first sub-feature map is not limited herein, and the confidence probability may be determined by using a gaussian formula, for example.
According to an embodiment of the present disclosure, the preset threshold may be determined according to a specific application scenario, and may be set to 90%, 95%, or the like, for example, and is not limited herein.
According to an embodiment of the present disclosure, a calculation method of information entropy loss may be as shown in formula (1):
Figure BDA0003813698500000141
in the formula (1), L 1 Representing information entropy loss; fp denotes a first sub-feature map; fx represents a second sub-feature map; fy represents the third sub-feature map.
According to an embodiment of the present disclosure, calculating a cross entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include the following operations:
determining a first sub-segmentation result related to a plurality of first target partition maps from the first segmentation result; determining a second sub-segmentation result related to the plurality of first target segmentation maps from the second segmentation result; determining a predicted value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result; and calculating cross entropy loss between the predicted value and the label value to obtain a second loss value.
According to an embodiment of the present disclosure, the first segmentation result may have the same size as the first all-around projection view, and thus the first sub-segmentation result may be determined from the first segmentation result based on the position information of the first target partition view in the first all-around projection view.
According to an embodiment of the present disclosure, the calculation method of the confidence probability of the first sub-segmented result and the second sub-segmented result is not limited herein, and the confidence probability may be determined by using a gaussian formula, for example.
According to an embodiment of the present disclosure, the predicted value and the tag value may be determined by comparing the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result, respectively, and in particular, in case that the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result, the first sub-segmentation result is determined to be the tag value and the second sub-segmentation result is determined to be the predicted value; and under the condition that the confidence probability of the first sub-segmentation result is smaller than that of the second sub-segmentation result, determining the first sub-segmentation result as a predicted value and the second sub-segmentation result as a label value.
According to an embodiment of the present disclosure, the calculation method of the cross entropy loss may be as shown in formula (2):
L 2 =∑(ylogy p +(1-y)log(1-y p )) (2)
in the formula (2), L 2 Represents the cross entropy loss; y represents a tag value; y is p Indicating the predicted value.
According to the embodiment of the disclosure, the total loss for performing model parameter adjustment of the initial network may be a weighted sum of information entropy loss and cross entropy loss, and the weight thereof may be a super parameter, which can be arbitrarily set by a user during model tuning.
According to an embodiment of the present disclosure, the plurality of first target partition maps may include a third target partition map having a real tag.
According to an embodiment of the present disclosure, when it is determined that the third target partition map exists, calculating a cross-entropy loss between the first segmentation result and the second segmentation result, and obtaining the second loss value may include:
determining a third sub-segmentation result related to the third target partition map and a fourth sub-segmentation result which is not related to the third target partition map and is related to the plurality of first target partition maps from the first segmentation results; determining a fifth sub-segmentation result related to the third target partition map and a sixth sub-segmentation result which is not related to the third target partition map and is related to the plurality of first target partition maps from the second segmentation result; calculating cross entropy loss between the third sub-segmentation result and the real label to obtain a third loss value; calculating the cross entropy loss between the fourth sub-division result and the sixth sub-division result to obtain a fourth loss value; and determining a second loss value based on the third loss value and the fourth loss value.
According to the embodiment of the disclosure, through the design of the loss function, the network can be trained by using original non-labeled data and adding a small amount of labeled data, and semi-supervised training of the point cloud semantic segmentation network is realized, so that the cost of data labeling can be reduced on the basis of ensuring the semantic segmentation effect of the network.
Fig. 4 schematically shows a flow chart of a point cloud semantic segmentation method according to an embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S401 to S402.
In operation S401, the target point cloud data is mapped into an initial view to obtain a perspective projection view.
In operation S402, the ring view projection map is input into the point cloud semantic segmentation network, so as to obtain a semantic segmentation feature map of the target point cloud data.
According to the embodiment of the disclosure, the point cloud semantic segmentation network can be obtained by utilizing the method training of the point cloud semantic segmentation network training method part, and is not repeated herein.
Fig. 5 schematically illustrates a block diagram of a point cloud semantic segmentation network training apparatus according to an embodiment of the present disclosure.
As shown in fig. 5, the point cloud semantic segmentation network training apparatus 500 includes a first mapping module 510, a first processing module 520, a determining module 530, a second processing module 540, and a training module 550.
The first mapping module 510 is configured to map the multiple sets of point cloud data into the initial view respectively, so as to obtain multiple perspective projection views.
The first processing module 520 is configured to perform partition processing on the first annular view projection diagram and the second annular view projection diagram respectively based on a preset size to obtain a plurality of first partition diagrams and a plurality of second partition diagrams, where the first annular view projection diagram and the second annular view projection diagram belong to the plurality of annular view projection diagrams.
A determining module 530, configured to determine a plurality of first target partition maps from the plurality of first partition maps.
A second processing module 540, configured to replace a second object partition map in the second surround view projection map with each of the multiple first object partition maps, so as to obtain a mixed projection map, where the second object partition map belongs to the multiple second partition maps, and the first object partition map and the second object partition map are located at the same position.
The training module 550 is configured to train the initial network by using the first perspective projection view and the mixed projection view as training samples, so as to obtain a point cloud semantic segmentation network.
According to the embodiment of the disclosure, when a point cloud semantic segmentation network is trained, point cloud data can be mapped into a surround view projection graph, and a first surround view projection graph and a second surround view projection graph are subjected to partition mixing, namely, a part of partitions in the first surround view projection graph are used for replacing corresponding partitions in the second surround view projection graph to obtain a mixed projection graph, and then an initial network can be trained by using the mixed projection graph and the first surround view projection graph to finally obtain the point cloud semantic segmentation network. Through a partition mixing mode, the forced decoupling of the partial partitions and the background can be realized, the richness of data can be effectively improved, the dependence of the network on the background and global information when predicting local areas is reduced, and the identification capability of the network is improved. Meanwhile, the three-dimensional shape of the original point cloud projected on the panoramic projection graph can be effectively reserved through a partition mixing mode, the problems of three-dimensional deformation and shape information loss caused by data enhancement can be at least partially solved, and the robustness of the network can be improved. By the technical means, the utilization efficiency of hardware resources in the network training process can be effectively improved.
According to an embodiment of the present disclosure, the training module 550 includes a first training submodule, a second training submodule, a third training submodule, and a fourth training submodule.
And the first training submodule is used for respectively inputting the first all-round projection drawing and the mixed projection drawing into the initial network to obtain a first characteristic map and a first segmentation result corresponding to the first all-round projection drawing, and a second characteristic map and a second segmentation result corresponding to the mixed projection drawing.
And the second training submodule is used for calculating the information entropy loss between the first characteristic map and the second characteristic map to obtain a first loss value.
And the third training submodule is used for calculating the cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value.
And the fourth training submodule is used for adjusting the model parameters of the initial network by utilizing the first loss value and the second loss value so as to finally obtain the point cloud data semantic segmentation network.
According to an embodiment of the present disclosure, the second training submodule includes a first training unit, a second training unit, and a third training unit.
A first training unit for determining a first sub-feature map associated with the plurality of first target partition maps from the first feature map.
And the second training unit is used for splitting the second feature map into a second sub-feature map related to the plurality of first target partition maps and a third sub-feature map unrelated to the plurality of first target partition maps.
And the third training unit is used for calculating the information entropy loss between the positive sample pair and the negative sample pair by taking the first sub-feature map and the second sub-feature map as the positive sample pair and taking the first sub-feature map and the third sub-feature map as the negative sample pair under the condition that the confidence probability of the first sub-feature map is larger than a preset threshold value, so as to obtain a first loss value.
According to an embodiment of the present disclosure, the third training submodule includes a fourth training unit, a fifth training unit, a sixth training unit, and a seventh training unit.
And the fourth training unit is used for determining a first sub-segmentation result related to the plurality of first target partition graphs from the first segmentation result.
And the fifth training unit is used for determining a second sub-segmentation result related to the plurality of first target partition graphs from the second segmentation result.
And the sixth training unit is used for determining a predicted value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result.
And the seventh training unit is used for calculating the cross entropy loss between the predicted value and the label value to obtain a second loss value.
According to an embodiment of the disclosure, the sixth training unit comprises a first training subunit and a second training subunit.
And the first training subunit is used for determining that the first sub-segmentation result is a label value and the second sub-segmentation result is a predicted value under the condition that the confidence probability of the first sub-segmentation result is greater than that of the second sub-segmentation result.
And the second training subunit is used for determining the first sub-segmentation result as a predicted value and the second sub-segmentation result as a label value under the condition that the confidence probability of the first sub-segmentation result is smaller than that of the second sub-segmentation result.
According to the embodiment of the disclosure, the plurality of first target partition maps include a third target partition map, and the third target partition map has a real label.
According to an embodiment of the present disclosure, the third training submodule includes an eighth training unit, a ninth training unit, a tenth training unit, an eleventh training unit, and a twelfth training unit.
An eighth training unit for determining, from the first segmentation results, a third sub-segmentation result associated with the third target partition map and a fourth sub-segmentation result not associated with the third target partition map and associated with the plurality of first target partition maps.
A ninth training unit for determining, from the second segmentation result, a fifth sub-segmentation result associated with the third target partition map and a sixth sub-segmentation result not associated with the third target partition map and associated with the plurality of first target partition maps.
And the tenth training unit is used for calculating the cross entropy loss between the third sub-segmentation result and the real label to obtain a third loss value.
And the eleventh training unit is used for calculating the cross entropy loss between the fourth sub-division result and the sixth sub-division result to obtain a fourth loss value.
A twelfth training unit to determine a second loss value based on the third loss value and the fourth loss value.
According to an embodiment of the present disclosure, an initial network includes an encoder and a decoder.
According to an embodiment of the disclosure, the first training submodule includes a thirteenth training element and a fourteenth training element.
And the thirteenth training unit is used for respectively inputting the first all-round projection drawing and the mixed projection drawing into the encoder to obtain a first image characteristic corresponding to the first all-round projection drawing and a second image characteristic corresponding to the mixed projection drawing.
And the fourteenth training unit is used for respectively inputting the first image characteristic and the second image characteristic into the decoder to obtain a first characteristic map and a first segmentation result corresponding to the first all-round projection graph and a second characteristic map and a second segmentation result corresponding to the mixed projection graph.
According to an embodiment of the present disclosure, the first mapping module 510 includes a first mapping unit, a second mapping unit, a third mapping unit, and a fourth mapping unit.
And the first mapping unit is used for respectively carrying out polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data for each group of point cloud data so as to obtain the polar coordinate data of each point in the point cloud data.
And the second mapping unit is used for mapping the points in the point cloud data into a plurality of grids of the initial view respectively based on the polar coordinate data of each point in the point cloud data.
A third mapping unit for determining, for each grid of the initial view, feature data of the grid based on three-dimensional coordinate data and polar coordinate data of points in the grid.
And the fourth mapping unit is used for constructing and obtaining the all-round projection drawing based on the characteristic data of the grids.
It should be noted that the point cloud semantic segmentation network training device part in the embodiment of the present disclosure corresponds to the point cloud semantic segmentation network training method part in the embodiment of the present disclosure, and the description of the point cloud semantic segmentation network training device part specifically refers to the point cloud semantic segmentation network training method part, which is not described herein again.
Fig. 6 schematically illustrates a block diagram of a point cloud semantic segmentation apparatus according to an embodiment of the present disclosure.
As shown in fig. 6, the point cloud semantic segmentation apparatus 600 includes a second mapping module 610 and a third processing module 620.
And a second mapping module 610, configured to map the target point cloud data into the initial view to obtain a perspective projection view.
And a third processing module 620, configured to input the all-round projection map into a point cloud semantic segmentation network, so as to obtain a semantic segmentation feature map of the target point cloud data.
According to the embodiment of the disclosure, the point cloud semantic segmentation network can be obtained by utilizing the method training of the point cloud semantic segmentation network training method part, and is not repeated herein.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any number of the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or the second mapping module 610 and the third processing module 620 may be combined and implemented in one module/unit/sub-unit, or any one of the modules/units/sub-units may be split into a plurality of modules/units/sub-units. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540, and the training module 550, or the second mapping module 610 and the third processing module 620 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or by a suitable combination of any of them. Alternatively, at least one of the first mapping module 510, the first processing module 520, the determining module 530, the second processing module 540 and the training module 550, or the second mapping module 610 and the third processing module 620 may be at least partially implemented as a computer program module which, when executed, may perform a corresponding function.
Fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a point cloud semantic segmentation network training method or a point cloud semantic segmentation method according to an embodiment of the present disclosure. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, a computer electronic device 700 according to an embodiment of the present disclosure includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM702, and the RAM703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM702 and/or the RAM 703. Note that the programs may also be stored in one or more memories other than the ROM702 and the RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 700 may also include input/output (I/O) interface 705, which input/output (I/O) interface 705 is also connected to bus 704, according to an embodiment of the present disclosure. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
According to an embodiment of the present disclosure, the method flow according to an embodiment of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the processor 701, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM702 and/or the RAM703 and/or one or more memories other than the ROM702 and the RAM703 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code is configured to cause the electronic device to implement the point cloud semantic segmentation network training method or the point cloud semantic segmentation method provided by the embodiments of the present disclosure.
The computer program, when executed by the processor 701, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal over a network medium, distributed, and downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the disclosure, and these alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (14)

1. A point cloud semantic segmentation network training method comprises the following steps:
respectively mapping the plurality of groups of point cloud data into the initial view to obtain a plurality of all-round projection views;
based on a preset size, respectively carrying out partition processing on a first annular view projection drawing and a second annular view projection drawing to obtain a plurality of first partition drawings and a plurality of second partition drawings, wherein the first annular view projection drawing and the second annular view projection drawing belong to the plurality of annular view projection drawings;
determining a plurality of first target partition maps from a plurality of the first partition maps;
replacing a second target partition map in the second ring-view projection map with each of the first target partition maps to obtain a mixed projection map, wherein the second target partition map belongs to the second partition maps, and the first target partition map and the second target partition map are located at the same position; and
and training an initial network by taking the first all-round projection graph and the mixed projection graph as training samples to obtain a point cloud semantic segmentation network.
2. The method of claim 1, wherein training an initial network using the first perspective projection view and the hybrid projection view as training samples results in a point cloud semantic segmentation network comprising:
inputting the first all-round projection drawing and the mixed projection drawing into the initial network respectively to obtain a first feature map and a first segmentation result corresponding to the first all-round projection drawing, and a second feature map and a second segmentation result corresponding to the mixed projection drawing;
calculating information entropy loss between the first feature map and the second feature map to obtain a first loss value;
calculating cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value; and
and adjusting the model parameters of the initial network by using the first loss value and the second loss value so as to finally obtain the point cloud data semantic segmentation network.
3. The method of claim 2, wherein the calculating an entropy loss between the first feature map and the second feature map resulting in a first loss value comprises:
determining a first sub-feature map associated with a plurality of the first target partition maps from the first feature map;
splitting the second feature map into a second sub-feature map associated with the plurality of first target partition maps and a third sub-feature map not associated with the plurality of first target partition maps; and
and under the condition that the confidence probability of the first sub-feature map is larger than a preset threshold value, taking the first sub-feature map and the second sub-feature map as a positive sample pair, taking the first sub-feature map and the third sub-feature map as a negative sample pair, and calculating the information entropy loss between the positive sample pair and the negative sample pair to obtain the first loss value.
4. The method of claim 2, wherein said calculating a cross-entropy loss between the first and second segmented results to obtain a second loss value comprises:
determining a first sub-segmentation result related to a plurality of the first target partition maps from the first segmentation result;
determining a second sub-segmentation result related to a plurality of the first target segmentation maps from the second segmentation result;
determining a predicted value and a label value based on the confidence probability of the first sub-segmentation result and the confidence probability of the second sub-segmentation result; and
and calculating the cross entropy loss between the predicted value and the label value to obtain the second loss value.
5. The method of claim 4, wherein the determining a predictive value and a label value based on the confidence probability of the first sub-segmented result and the confidence probability of the second sub-segmented result comprises:
determining the first sub-segmentation result as the label value and the second sub-segmentation result as the predicted value under the condition that the confidence probability of the first sub-segmentation result is greater than the confidence probability of the second sub-segmentation result; and
and under the condition that the confidence probability of the first sub-segmentation result is smaller than the confidence probability of the second sub-segmentation result, determining the first sub-segmentation result as the predicted value and the second sub-segmentation result as the label value.
6. The method of claim 2, wherein the plurality of first target partition maps includes a third target partition map, the third target partition map having a real label;
wherein the calculating the cross entropy loss between the first segmentation result and the second segmentation result to obtain a second loss value comprises:
determining, from the first segmentation results, a third sub-segmentation result related to the third target partition map and a fourth sub-segmentation result unrelated to the third target partition map and related to a plurality of the first target partition maps;
determining, from the second segmentation result, a fifth sub-segmentation result related to the third target segmentation map and a sixth sub-segmentation result unrelated to the third target segmentation map and related to the plurality of first target segmentation maps;
calculating cross entropy loss between the third sub-segmentation result and the real label to obtain a third loss value;
calculating cross entropy loss between the fourth sub-division result and the sixth sub-division result to obtain a fourth loss value; and
determining the second loss value based on the third loss value and the fourth loss value.
7. The method of claim 2, wherein the initial network comprises an encoder and a decoder;
wherein the inputting the first perspective projection drawing and the mixed projection drawing into the initial network respectively to obtain a first feature map and a first segmentation result corresponding to the first perspective projection drawing, and a second feature map and a second segmentation result corresponding to the mixed projection drawing comprises:
inputting the first all-round projection drawing and the mixed projection drawing into the encoder respectively to obtain a first image feature corresponding to the first all-round projection drawing and a second image feature corresponding to the mixed projection drawing; and
and inputting the first image characteristic and the second image characteristic into the decoder respectively to obtain the first characteristic map and the first segmentation result corresponding to the first all-round projection drawing, and the second characteristic map and the second segmentation result corresponding to the mixed projection drawing.
8. The method of claim 1, wherein the mapping the plurality of sets of point cloud data into initial views respectively to obtain a plurality of perspective projection views comprises:
for each group of point cloud data, respectively carrying out polar coordinate conversion on the three-dimensional coordinate data of each point in the point cloud data to obtain the polar coordinate data of each point in the point cloud data;
mapping the points in the point cloud data into a plurality of grids of the initial view respectively based on the polar coordinate data of each point in the point cloud data;
for each grid of the initial view, determining feature data for the grid based on three-dimensional coordinate data and polar coordinate data for points in the grid; and
and constructing and obtaining the all-round projection drawing based on the characteristic data of the grids.
9. A point cloud semantic segmentation method comprises the following steps:
mapping the target point cloud data to an initial view to obtain a perspective projection view; and
inputting the all-round projection drawing into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data;
the point cloud semantic segmentation network is obtained by training by using the point cloud semantic segmentation network training method according to any one of claims 1 to 8.
10. A point cloud semantic segmentation network training device comprises:
the first mapping module is used for mapping the multiple groups of point cloud data into the initial view respectively to obtain multiple all-round projection views;
the first processing module is used for respectively carrying out partition processing on a first annular view projection drawing and a second annular view projection drawing based on a preset size to obtain a plurality of first partition drawings and a plurality of second partition drawings, wherein the first annular view projection drawing and the second annular view projection drawing belong to the plurality of annular view projection drawings;
a determining module, configured to determine a plurality of first target partition maps from the plurality of first partition maps;
a second processing module, configured to replace a second object partition map in the second circular projection map with each of the first object partition maps to obtain a hybrid projection map, where the second object partition map belongs to the second partition maps, and the first object partition map and the second object partition map are located at the same position; and
and the training module is used for training the initial network by taking the first all-round projection graph and the mixed projection graph as training samples to obtain a point cloud semantic segmentation network.
11. A point cloud semantic segmentation apparatus, comprising:
the second mapping module is used for mapping the target point cloud data to the initial view to obtain a perspective projection view; and
the third processing module is used for inputting the all-round projection drawing into a point cloud semantic segmentation network to obtain a semantic segmentation feature map of the target point cloud data;
the point cloud semantic segmentation network is obtained by training by using the point cloud semantic segmentation network training method according to any one of claims 1 to 8.
12. An electronic device, comprising:
one or more processors;
a memory to store one or more instructions that,
wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.
14. A computer program product comprising computer executable instructions for implementing the method of any one of claims 1 to 9 when executed.
CN202211022552.3A 2022-08-24 2022-08-24 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device Pending CN115375899A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211022552.3A CN115375899A (en) 2022-08-24 2022-08-24 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device
PCT/CN2023/082749 WO2024040954A1 (en) 2022-08-24 2023-03-21 Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211022552.3A CN115375899A (en) 2022-08-24 2022-08-24 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Publications (1)

Publication Number Publication Date
CN115375899A true CN115375899A (en) 2022-11-22

Family

ID=84068279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211022552.3A Pending CN115375899A (en) 2022-08-24 2022-08-24 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Country Status (2)

Country Link
CN (1) CN115375899A (en)
WO (1) WO2024040954A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
WO2024040954A1 (en) * 2022-08-24 2024-02-29 北京京东乾石科技有限公司 Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118334352B (en) * 2024-06-13 2024-08-13 宁波大学 Training method, system, medium and equipment for point cloud semantic segmentation model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354478B (en) * 2018-12-24 2024-07-12 黄庆武整形医生集团(深圳)有限公司 Shaping simulation information processing method, shaping simulation terminal and shaping service terminal
CN110008941B (en) * 2019-06-05 2020-01-17 长沙智能驾驶研究院有限公司 Method and device for detecting travelable area, computer equipment and storage medium
CN113421217A (en) * 2020-03-02 2021-09-21 北京京东乾石科技有限公司 Method and device for detecting travelable area
CN113496491B (en) * 2020-03-19 2023-12-15 广州汽车集团股份有限公司 Road surface segmentation method and device based on multi-line laser radar
KR102334177B1 (en) * 2020-07-21 2021-12-03 대한민국 Method and system for establishing 3-dimensional indoor information for indoor evacuation
CN115375899A (en) * 2022-08-24 2022-11-22 北京京东乾石科技有限公司 Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024040954A1 (en) * 2022-08-24 2024-02-29 北京京东乾石科技有限公司 Point cloud semantic segmentation network training method, and point cloud semantic segmentation method and apparatus
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116721399B (en) * 2023-07-26 2023-11-14 之江实验室 Point cloud target detection method and device for quantitative perception training

Also Published As

Publication number Publication date
WO2024040954A1 (en) 2024-02-29

Similar Documents

Publication Publication Date Title
CN115375899A (en) Point cloud semantic segmentation network training method, point cloud semantic segmentation method and point cloud semantic segmentation device
EP3321842B1 (en) Lane line recognition modeling method, apparatus, storage medium, and device, recognition method and apparatus, storage medium, and device
CN108304775B (en) Remote sensing image recognition method and device, storage medium and electronic equipment
EP3859560A2 (en) Method and apparatus for visual question answering, computer device and medium
US11151447B1 (en) Network training process for hardware definition
KR102539942B1 (en) Method and apparatus for training trajectory planning model, electronic device, storage medium and program
US20220222824A1 (en) Fully automated multimodal system architecture for semantic segmentation of large-scale 3d outdoor point cloud data
Raghavan et al. Optimized building extraction from high-resolution satellite imagery using deep learning
CN110276345B (en) Convolutional neural network model training method and device and computer readable storage medium
CN112927234A (en) Point cloud semantic segmentation method and device, electronic equipment and readable storage medium
US20210049372A1 (en) Method and system for generating depth information of street view image using 2d map
CN112016569B (en) Attention mechanism-based object detection method, network, device and storage medium
CN115546630A (en) Construction site extraction method and system based on remote sensing image characteristic target detection
US12118807B2 (en) Apparatus and method for three-dimensional object recognition
EP4307219A1 (en) Three-dimensional target detection method and apparatus
CN112362059A (en) Method, apparatus, computer device and medium for positioning mobile carrier
Osuna-Coutiño et al. Structure extraction in urbanized aerial images from a single view using a CNN-based approach
CN116128048A (en) Optical remote sensing image cloud detection model training method, detection method and device
CN114419338B (en) Image processing method, image processing device, computer equipment and storage medium
CN115511870A (en) Object detection method and device, electronic equipment and storage medium
CN112434591A (en) Lane line determination method and device
Manka Developing an efficient real-time terrestrial infrastructure inspection system using autonomous drones and deep learning
CN111414925B (en) Image processing method, apparatus, computing device and medium
CN113553939A (en) Point cloud classification model training method and device, electronic equipment and storage medium
CN114387492B (en) Deep learning-based near-shore water surface area ship detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination