CN114723949A - Three-dimensional scene segmentation method and method for training segmentation model - Google Patents

Three-dimensional scene segmentation method and method for training segmentation model Download PDF

Info

Publication number
CN114723949A
CN114723949A CN202210403610.0A CN202210403610A CN114723949A CN 114723949 A CN114723949 A CN 114723949A CN 202210403610 A CN202210403610 A CN 202210403610A CN 114723949 A CN114723949 A CN 114723949A
Authority
CN
China
Prior art keywords
point cloud
cloud data
data
subset
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210403610.0A
Other languages
Chinese (zh)
Inventor
叶晓青
储瑞航
孙昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210403610.0A priority Critical patent/CN114723949A/en
Publication of CN114723949A publication Critical patent/CN114723949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds

Abstract

The disclosure provides a three-dimensional scene segmentation method and a method for training a segmentation model, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as image processing, 3D vision and augmented reality. The implementation scheme is as follows: acquiring a point cloud data set of a target three-dimensional scene, wherein the point cloud data set comprises point cloud data of each point in a target point set corresponding to the target three-dimensional scene, and the point cloud data indicates the position of the point in the target three-dimensional scene; obtaining a target classification label and a target instance label of each point cloud data in the point cloud data set; and for a plurality of first point cloud data in the point cloud dataset having the same target classification tag, in response to determining that a target instance tag of any one of the plurality of first point cloud data corresponds to a target instance tag of another first point cloud data, determining that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene.

Description

Three-dimensional scene segmentation method and method for training segmentation model
Technical Field
The present disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, which can be applied to scenes such as image processing, 3D vision, augmented reality, and the like, and in particular to a three-dimensional scene segmentation method, a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for training a segmentation model.
Background
Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
Three-dimensional visual techniques based on artificial intelligence have penetrated various fields. For example, by segmenting an instance in a three-dimensional scene in a road scene, objects such as pedestrians and automobiles can be identified, and the vehicle can understand the road environment.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.
Disclosure of Invention
The present disclosure provides a three-dimensional scene segmentation and methods, apparatuses, electronic devices, computer-readable storage media and computer program products for training a segmentation model.
According to an aspect of the present disclosure, there is provided a three-dimensional scene segmentation method, including: acquiring a point cloud data set of a target three-dimensional scene, wherein the point cloud data set comprises point cloud data of each point in a target point set corresponding to the target three-dimensional scene, and the point cloud data indicates the position of the point in the target three-dimensional scene; obtaining a target classification label and a target instance label for each point cloud data in the point cloud dataset, the target classification label indicating a respective classification of the point cloud data in a plurality of classifications, the target instance label indicating a direction and distance that the point cloud data is offset relative to a clustering center of a respective first subset in the point cloud dataset, the respective first subset capable of being clustered into an instance; and for a plurality of first point cloud data in the point cloud dataset having a same target classification label, in response to determining that a target instance label of any first point cloud data in the plurality of first point cloud data corresponds to a target instance label of another first point cloud data, determining that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene.
According to another aspect of the present disclosure, there is provided a method for training a segmentation model, comprising: obtaining a point cloud dataset of a training three-dimensional scene, the point cloud dataset comprising point cloud data for each point in a sample point set located on an example sample in the three-dimensional scene, the point cloud data indicating a location of the point in the training three-dimensional scene, the example sample corresponding to a first classification of a plurality of classifications; labeling at least each point cloud data in a first data subset of the point cloud data set to obtain a label classification label and a label instance label of each point cloud data in the first data set, the first data subset including point cloud data of each point in at least the first subset of the sample point set, the label classification label indicating the first classification, and the label instance label indicating a distance and a direction of a point corresponding to the point cloud data offset with respect to a center of the instance sample; and performing supervised training on the segmentation model based on each point cloud data in the first data subset and the label classification label and the label instance label of the point cloud data to obtain a supervised-trained segmentation model.
According to another aspect of the present disclosure, there is provided a three-dimensional scene segmentation apparatus including: a point cloud data acquisition unit configured to acquire a point cloud data set of a target three-dimensional scene, the point cloud data set including point cloud data of each point in a target point set corresponding to the target three-dimensional scene, the point cloud data indicating a position of the point in the target three-dimensional scene; a tag obtaining unit configured to obtain a target classification tag and a target instance tag of each point cloud data in the point cloud data set, the target classification tag indicating a corresponding classification of the point cloud data in a plurality of classifications, the target instance tag indicating a direction and a distance in which the point cloud data is offset with respect to a clustering center of a corresponding first subset in the point cloud data set, the corresponding first subset capable of being clustered into an instance; and a first determination unit configured to determine, for a plurality of first point cloud data having a same target classification label in the point cloud data set, that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene in response to determining that a target instance label of any one of the plurality of first point cloud data corresponds to a target instance label of another first point cloud data.
According to another aspect of the present disclosure, there is provided an apparatus for training a segmentation model, comprising: a data acquisition unit configured to obtain a point cloud dataset of a training three-dimensional scene, the point cloud dataset comprising point cloud data for each point in a sample point set located on an example sample in the three-dimensional scene, the point cloud data indicating a location of the point in the training three-dimensional scene, the example sample corresponding to a first classification of a plurality of classifications; a labeling unit configured to label at least each point cloud data in a first data subset of the point cloud data set to obtain a label classification label and a label instance label of each point cloud data in the first data set, the first data subset including point cloud data of each point in at least a first subset of the sample point set, the label classification label indicating the first classification, and the label instance label indicating a sum direction of a center offset of a point corresponding to the point cloud data with respect to the instance sample; and a supervised training unit configured to supervise and train the segmentation model based on each point cloud data in the first data subset and the labeled classification label and labeled instance label of the point cloud data to obtain a supervised and trained segmentation model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to implement a method according to the above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to implement the method according to the above.
According to another aspect of the present disclosure, a computer program product is provided comprising a computer program, wherein the computer program realizes the method according to the above when executed by a processor.
According to one or more embodiments of the disclosure, segmentation can be performed on an instance located in a three-dimensional scene, and the segmentation precision of the three-dimensional scene is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of example only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a three-dimensional scene segmentation method according to an embodiment of the present disclosure;
FIG. 3 shows a flow diagram of a three-dimensional scene segmentation method according to an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a segmentation model in a three-dimensional scene segmentation method according to an embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a three-dimensional scene segmentation method according to an embodiment of the present disclosure;
FIG. 6 shows a flow diagram of a method for training a segmentation model in accordance with an embodiment of the present disclosure;
fig. 7 shows a flowchart of a process of labeling each point cloud data of at least a first data subset of a point cloud data set in a method for training a segmentation model according to an embodiment of the present disclosure;
FIG. 8 illustrates a flow chart of a process of supervised training of a segmentation model based on each point cloud data in a first subset of data and its label classification tags and label instance tags in a method for training a segmentation model according to an embodiment of the present disclosure;
FIG. 9 shows a flow diagram of a method for training a segmentation model in accordance with an embodiment of the present disclosure;
FIG. 10 shows a flow diagram of a process of obtaining a plurality of third data subsets from the second data subset in a method for training a segmentation model according to an embodiment of the present disclosure;
FIG. 11 shows a flowchart of a process of obtaining a fourth data subset of the third data subset and a cluster center of the fourth data subset in a method for training a segmentation model according to an embodiment of the present disclosure;
fig. 12 shows a block diagram of a three-dimensional scene segmentation apparatus according to an embodiment of the present disclosure;
FIG. 13 shows a block diagram of an apparatus for training a segmentation model, according to an embodiment of the present disclosure; and
FIG. 14 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.
The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.
In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the three-dimensional scene segmentation method to be performed.
In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating client devices 101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The user may receive the first classification result using client devices 101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.
Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems, such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.
Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.
The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.
In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.
The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and object files. The data store 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In certain embodiments, the data store used by the server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.
In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
Referring to fig. 2, a three-dimensional scene segmentation method 200 according to some embodiments of the present disclosure includes:
step S210: acquiring a point cloud data set of a target three-dimensional scene, wherein the point cloud data set comprises point cloud data of each point in a target point set corresponding to the target three-dimensional scene, and the point cloud data indicates the position of the point in the target three-dimensional scene;
step S220: obtaining a target classification label and a target instance label for each point cloud data in the point cloud dataset, the target classification label indicating a respective classification of the point cloud data in a plurality of classifications, the target instance label indicating a direction and distance that the point cloud data is offset relative to a clustering center of a respective first subset in the point cloud dataset, the respective first subset capable of being clustered into an instance;
step S230: for a plurality of first point cloud data in the point cloud dataset having a same target classification tag, in response to determining that a target instance tag of any first point cloud data in the plurality of first point cloud data corresponds to a target instance tag of another first point cloud data, determining that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene.
The classification label and the example label of the point cloud data are obtained based on the point cloud data of each point in a target point set in a target three-dimensional scene, the classification label indicates corresponding classifications of the point cloud data in a plurality of classifications, the example label indicates the direction and distance of the deviation of the point cloud data relative to the clustering center of the corresponding first subset, the corresponding first subset can be clustered into the same example, namely the classification corresponding to the point can be obtained through the classification label, and the corresponding subset corresponding to the point cloud data can be determined through the example label. The classification label and the instance label of the point cloud data of each point in the target point set can determine a plurality of points corresponding to the same classification label and the corresponding instance label as one instance in the three-dimensional scene, and even if a plurality of instances exist in the three-dimensional scene and the plurality of instances correspond to the same classification in the plurality of classifications, the three-dimensional scene can be segmented according to each instance, and the segmentation precision is improved.
In the related art, a segmentation model is trained by labeling point cloud data corresponding to points on an instance in a first three-dimensional scene for a point cloud data set in the first three-dimensional scene. And segmenting the second three-dimensional scene to be segmented based on the trained segmentation model. In the process of marking point cloud data corresponding to points on the instances, the points are marked according to corresponding classifications of the instances in the first three-dimensional scene in a plurality of classifications. Corresponding space and structure information in the point cloud data is not mined in the marking process. The trained segmentation model segments the second three-dimensional scene only based on the semantic information in the point cloud data, and corresponding spatial and structural information in the point cloud data is not fully mined.
According to the embodiment of the disclosure, the target classification label and the target instance label of the point cloud data are obtained based on the point cloud data of each point in the target three-dimensional scene. The target classification label indicates a corresponding classification of the point cloud data in multiple classifications, that is, the target classification label represents semantic information in the point cloud data. At the same time, the target instance label indicates a direction and distance by which the point cloud data is offset with respect to a clustering center of the respective first subset, wherein the respective first subset can be clustered into one instance; namely, the target instance label can contain the space and structure information of the corresponding point of the point cloud data in the three-dimensional space. Based on the target classification label and the target instance label, the determined plurality of first point cloud data corresponding to the first instance in the target three-dimensional scene are not only based on semantic information contained in the point cloud data of each point in the target point set, but also based on space and structure information of the point in the three-dimensional space contained in the point cloud data of each point. So that for the target three-dimensional scene, beyond the segmentation at the instance level (non-classification level), the obtained first instance is made accurate.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The target three-dimensional scene is the three-dimensional scene that is determined to need to be segmented. In some embodiments, the three-dimensional scene may be any indoor scene or outdoor scene, for example, a scene of a three-dimensional space for a single classroom, a scene of a three-dimensional space for a football stadium.
The point cloud dataset of the target three-dimensional scene may be a dataset acquired by scanning the target three-dimensional scene with a three-dimensional scanning device. In some embodiments, the target three-dimensional scene includes a plurality of instances, and each data in the point cloud data set corresponds to a point on a respective instance of the plurality of instances scanned by the three-dimensional scanning device. Examples in the target three-dimensional scene, i.e. objects located in the three-dimensional scene that can be scanned to obtain corresponding point cloud data, may be, for example, a table, a chair, a car, a person, or the like, but are not limited thereto.
In some embodiments, the three-dimensional scanning device includes a laser radar (2D/3D), a stereo camera (stereo camera), a time-of-flight camera (time-of-flight camera), and the like.
In some embodiments, each point cloud data in the point cloud data set indicates position information, color information, gray value information, etc. of a respective point of the point cloud data.
At one isIn an example, the point cloud dataset is a collection
Figure BDA0003600956070000101
Wherein p isi=(xi,yi,zi) Is the coordinate of the ith point in three-dimensional space, ciIs the RGB color information corresponding to the ith point, (p)i,ci) The point cloud data of the ith point is N, wherein N represents that the set comprises point cloud data corresponding to N points, and the sum of i is a positive integer.
In some embodiments, obtaining, for each point in the set of target points, a target classification label and a target instance label for the point based on the point cloud dataset comprises: and inputting the point cloud data of each point in the target point set into a segmentation model to obtain a target classification label and a target example label of the point, wherein the segmentation model is obtained by labeling each point in the sample point set on an example sample in a training three-dimensional scene to obtain a corresponding labeling classification label and a labeling example label, and training the point cloud data of each point in the sample point set and the corresponding labeling classification label and labeling example label.
In some embodiments, the obtaining, based on the point cloud data set, a target classification tag and a target instance tag for each point cloud data in the point cloud data set comprises:
inputting the point cloud dataset to a segmentation model to obtain a target classification label and a target instance label for each point cloud data in the point cloud dataset, wherein,
the segmentation model is obtained by semi-supervised training using a training dataset and a labeling dataset corresponding to a first dataset of the training datasets, wherein,
the training data set includes point cloud data from each point in a sample point set on an example sample of three-dimensional space,
the first data set comprises point cloud data from each point in a first subset of samples in the set of sample points,
the annotation data set includes an annotation classification label indicating a respective classification of the instance sample among the plurality of classifications and an annotation instance label indicating a direction and distance of a point corresponding to the respective point cloud data offset relative to a center of the instance sample corresponding to each point in the first subset of samples.
The point cloud data of each point in the first sample subset in the sample point set on the example sample of the three-dimensional space is labeled to label part of data in the training data set, so that in the process of training the segmentation model by adopting the training data set, the labeled part of data is supervised, semi-supervised training of the segmentation model is realized, and the labeling cost of the data is reduced.
In some embodiments, each point cloud data in the training data set comprises a position coordinate of a point to which the point cloud data corresponds, and the annotated instance label of the point cloud data for each point in the first sample subset comprises a coordinate offset between the position coordinate of the point and a position coordinate of a center of the instance sample, wherein,
the location coordinates of the center of the example sample are obtained based at least on the location coordinates of each point in the first subset of samples.
The point cloud data are marked by calculating the coordinate deviation between the position coordinates in the point cloud data and the position coordinates of the clustering center of the first sample data subset, so that the space and structure information of the points represented by the marked example labels in the examples are visual and accurate, and the obtained target classification labels and target example labels are accurate when the trained segmentation model obtains the target classification labels and the target example labels of each point cloud data in the point cloud data set of the target three-dimensional scene.
In some embodiments, the point cloud data for each point in the set of target points includes location coordinates for the point, and the target instance label for each point in the set of target points includes deviation coordinates corresponding to the point cloud data, as shown in fig. 3, the method further comprising:
step S310: for each first point cloud data in the plurality of first point cloud data, obtaining a corresponding updated coordinate based on the deviation coordinate corresponding to the first point cloud data and the position coordinate in the first point cloud data; and
step S320: in response to determining that a distance between updated coordinates of any two of the plurality of first point cloud data is less than a distance threshold, determining that a target instance tag of any one of the plurality of first point cloud data corresponds to a target instance tag of another one of the first point cloud data.
As the target instance label of each point cloud data in the point cloud data set is a deviation coordinate corresponding to the point cloud data, it indicates a deviation between the position coordinate of the respective point and the position coordinate of the cluster center of the respective first subset; thus, based on the position coordinates and the deviation coordinates in the point cloud data, the obtained updated coordinates indicate the position coordinates of the cluster centers of the respective first subsets. For a plurality of first points in the target point set, which have the same target classification label, when it is determined that the distance between the updated coordinates corresponding to the point cloud data of any two first points in the plurality of first points is smaller than the distance threshold, it is determined that the distance between the cluster centers of the corresponding first subsets of the point cloud data of any two first points in the plurality of first points is smaller than the distance threshold, and it may be further determined that the plurality of first points may be clustered into the same instance.
In some embodiments, in response to determining that each of a preset number of first point cloud data of the plurality of first point cloud data has a respective first point cloud data, determining that the plurality of first point cloud data can be clustered into the same instance, wherein, for each of the preset number of first point cloud data, a sum of a corresponding coordinate deviation of the first point cloud data and a corresponding coordinate deviation of its respective first point cloud data is within a preset range around a coordinate origin of the three-dimensional space.
In some embodiments, the segmentation model comprises a semantic network and a clustering network, and the inputting each point cloud data in the point cloud data set to the segmentation model to obtain a target classification label and a target instance label for each point cloud data in the point cloud data set comprises:
inputting the point cloud data sets to the semantic network to obtain a target classification label for each point cloud data in the point cloud data sets; and is
Inputting the point cloud data sets to the clustering network to obtain a target instance label for each point cloud data in the point cloud data sets.
And extracting semantic information of the point cloud data through a semantic network, and extracting spatial structure information of corresponding points in the point cloud data in a three-dimensional space through a clustering network, so as to realize accurate segmentation of the target three-dimensional scene.
In some embodiments, the semantic segmentation network and the clustering network may be implemented by sparse convolution based U-type network structures.
Referring to fig. 4, a schematic diagram of a segmentation model in a three-dimensional scene segmentation method is shown, according to some embodiments of the present disclosure. As shown in fig. 4, the segmentation model 400 includes a semantic network 410 and a clustering network 420, and during the process of segmenting the three-dimensional scene, the point cloud data set a of the three-dimensional scene is respectively input to the semantic network 410 and the segmentation network 420, and a target classification tag set B1 and a target instance tag set B2 of the point cloud data set a are respectively obtained. The target classification tag set B1 includes target classification tags for each point cloud data in the point cloud data set a, and the target instance tag set B2 includes target instance tags for each point cloud data in the point cloud data set a. Finally, based on the target classification tag set B1 and the target instance tag set B2, a segmentation instance C of the three-dimensional scene is obtained.
In some embodiments, as shown in fig. 5, the three-dimensional scene segmentation method according to the present disclosure further includes:
step S510: in response to determining that a plurality of second point cloud data in the point cloud dataset correspond to a second instance in the target three-dimensional scene and that a same target classification label of the plurality of second point cloud data is the same as a same target classification label of the plurality of first point cloud data, obtaining a first label and a second label, the first label being different from the second label;
step S520: displaying a first set of points of the set of target points based on the first marker, and displaying a second set of points of the set of target points based on the second marker, a plurality of points of the first set of points corresponding to the plurality of first point cloud data, a plurality of points of the second set of points corresponding to the plurality of second point cloud data.
By determining that a plurality of second points in the set of target points correspond to a second instance in the three-dimensional scene and that the same target classification label for the plurality of second points is the same as the same target classification label for the plurality of first points, then the first instance and the second instance correspond to the same classification in the plurality of classifications at the same time. By respectively obtaining the first mark and the second mark different from the first mark, displaying the plurality of second points based on the first mark and displaying the plurality of second points based on the second mark, the plurality of first points and the plurality of second points can be distinguished and displayed, and further the distinguishing and displaying of the target three-dimensional scene can be realized.
In some embodiments, the first marker and the second marker are marker points of different colors.
In some embodiments, the first set of points and the second set of points are displayed by obtaining a planar image corresponding to the target three-dimensional scene and by displaying points in the planar image corresponding to each point in the first set of points and the second set of points.
According to another aspect of the present disclosure, there is also disclosed a method for training a segmentation model, as shown in fig. 6, the method 600 includes:
step S610: obtaining a point cloud dataset of a training three-dimensional scene, the point cloud dataset comprising point cloud data for each point in a sample point set located on an example sample in the three-dimensional scene, the point cloud data indicating a location of the point in the training three-dimensional scene, the example sample corresponding to a first classification of a plurality of classifications;
step S620: labeling each point cloud data in at least a first data subset of the point cloud data set to obtain a label classification label and a label instance label of each point cloud data in the first data set, the first data subset including the point cloud data of each point in at least the first subset of the sample point set, the label classification label indicating the first classification, and the label instance label indicating a distance and a direction of a point corresponding to the point cloud data offset with respect to a center of the instance sample; and
step S630: and performing supervised training on the segmentation model based on each point cloud data in the first data subset and the labeling classification label and the labeling instance label of the point cloud data to obtain a supervised-trained segmentation model.
And aiming at the point cloud data set of the segmentation model, in the labeling process, obtaining a corresponding labeling classification label and a labeling instance label of each point cloud data label to be labeled, wherein the labeling instance label indicates the corresponding classification of the sample instance to which the point cloud data belongs in a plurality of classifications, and the labeling instance label indicates the direction and the distance of the point corresponding to the point cloud data offset relative to the center of the sample instance to which the point cloud data belongs. And enabling the label of the labeling instance to contain semantic information in the point cloud data, and enabling the label of the labeling instance to contain space and structure information of the point corresponding to the point cloud data in the instance sample. Based on the training data and the labeled classification labels and labeled example labels of the point cloud data labeled in the training data, the segmentation model after supervision training can extract semantic information in the local point cloud data and space and structure information of corresponding points in the example samples. Therefore, the segmentation result obtained after the three-dimensional scene is segmented based on the trained segmentation model is accurate.
In some embodiments, each point cloud data in the point cloud dataset of the training scene is labeled to enable fully supervised training of the segmentation model.
In some embodiments, the point cloud data for each point in the sample point set includes location coordinates for the point, and as shown in fig. 7, the labeling of each point cloud data in at least the first data subset of the point cloud data set includes:
step S710: obtaining location coordinates of a center of the sample instance based on location coordinates in the point cloud data for each point in the first subset; and
step S720: and obtaining a labeling instance label of the point cloud data of each point in the first subset, wherein the labeling instance label is a coordinate deviation between a position coordinate in the point cloud data and a position coordinate of the center of the sample instance.
And obtaining the position coordinates of the center of the sample instance based on the position coordinates in the point cloud data of each point in the first subset of the sample point set, and using the coordinate deviation between the obtained position coordinates of the center of the sample instance and the position coordinates in the point cloud data of each point as a labeling instance label to label the point cloud data.
In some embodiments, the points in the first subset may be points evenly distributed over the sample instance. For example, one point per preset area range is acquired on the sample instance.
In some embodiments, the center of the sample instance may be obtained by a clustering algorithm, e.g., by euclidean clustering.
In some embodiments, the position coordinates of the center of the sample instance may be an average of the position coordinates of the individual points in the first subset.
In some embodiments, the segmentation model includes a semantic network and a clustering network, and as shown in fig. 8, the supervised training of the segmentation model based on each point cloud data in the first data subset and the label classification label and the label instance label of the point cloud data includes:
step S810: inputting the first subset of data to the semantic network to obtain a predicted classification label for each point cloud data in the first subset of data, and inputting the first subset of data to the clustering network to obtain a predicted instance label for each point cloud data in the first subset of data;
step S820: calculating a first loss of each point cloud data in the first data subset based on the annotation classification label and the prediction classification label of the point cloud data, and calculating a second loss of each point cloud data in the first data subset based on the annotation instance label and the prediction instance label of the point cloud data; and
step S830: adjusting parameters of the semantic network and the clustering network based on a sum of the first loss and the second loss of each point cloud data in the first data subset.
The method comprises the steps of respectively predicting a corresponding prediction classification label and a corresponding prediction instance label of point cloud data by setting a semantic network and a clustering network, respectively calculating a first loss corresponding to the labeling classification label and the prediction classification label of each point cloud data in a first data subset and a second loss corresponding to the labeling instance label and the prediction instance label of the point cloud data, and simultaneously adjusting parameters of the semantic network and the clustering network based on the sum of the first loss and the second loss corresponding to each point cloud data, so that the semantic network and the clustering network are trained in a coupling manner, and the trained segmentation model can simultaneously predict accurate prediction instance labels and prediction classification labels.
In some embodiments, the semantic segmentation network and the clustering network may be implemented by sparse convolution-based U-type network structures.
In some embodiments, adjusting parameters of the semantic network and the clustering network based on a sum of the first loss and the second loss of each point cloud data in the first subset of data comprises:
obtaining a sum of the first losses of all point cloud data in the first data subset and a sum of the first losses of all point cloud data in the first data subset;
obtaining a loss corresponding to the first data subset, wherein the loss is a sum of the plurality of first losses and a sum of the plurality of first losses; and
and adjusting parameters of the semantic network and the clustering network based on the loss corresponding to the first data subset.
In some embodiments, each of a preset number of point cloud data (i.e., the first data subset) in the point cloud dataset of the training scene is labeled to enable semi-supervised training of the segmentation model.
In some embodiments, the first subset is not greater than the preset number threshold, as shown in fig. 7, the method further comprising
Step S910: inputting each point cloud data in a second data subset of the set of point cloud data to the supervised trained segmentation model to obtain a predicted classification label and a predicted instance label for that point cloud data, the second data subset comprising point cloud data for each point in a second subset of the sample set of points other than the first subset,
step S920: obtaining a plurality of third data subsets from the second data subsets, wherein the predicted classification labels of any two point cloud data in each third data subset of the plurality of third data subsets are the same;
step S930: for each third data subset in the plurality of third data subsets, obtaining a fourth data subset in the third data subset and a cluster center of the fourth data subset, wherein the fourth data subset can be clustered into an example, based on the predicted example label of each point cloud data in the third data subset;
step S940: for each fourth data subset in a plurality of fourth data subsets corresponding to the third data subsets, obtaining a classification pseudo label and an example pseudo label of each point cloud data in the fourth data subset, where the classification pseudo label indicates a corresponding classification of the point cloud data in the plurality of classifications, and the example pseudo label is a distance and a direction of a shift of a point corresponding to the point cloud data with respect to a cluster center of the fourth data subset;
step S950: performing supervised training on the supervised-trained segmentation model based on each point cloud data in each of the plurality of fourth data subsets and the classification pseudo label and the instance pseudo label of the point cloud data, and based on each point cloud data in the first data subset and the labeling classification label and the labeling instance label of the point cloud data, to obtain a pseudo-supervised-trained segmentation model; and
step S960: determining the pseudo-supervised trained segmentation model as the supervised trained segmentation model to perform the inputting of each point cloud data in the second data subset of the point cloud data set to the supervised trained segmentation model.
In an embodiment according to the present disclosure, a segmentation model is supervised-trained based on each point cloud data in the first data subset and the corresponding label classification label and label instance label. And predicting each point cloud data in the second data subset based on the supervised training model, and obtaining a classification pseudo label and an example pseudo label of each point cloud data in the third data subset based on the corresponding predicted classification label and predicted example label of each point cloud data. And finally, performing supervised training based on each point cloud data in the third data subset and the corresponding classification pseudo label and the example pseudo label thereof to realize semi-supervised training of the segmentation model. In the whole process, only the point cloud data of the points in the first subset in the sample point set are marked, namely the point cloud data set for training the three-dimensional scene is partially marked, so that the marking cost in the model training process is reduced.
In some embodiments, the number of points in the first subset is much smaller than the number of points in a second subset of the set of sample points other than the first subset.
In some embodiments, a plurality of point cloud data having the same predicted classification label in the second data subset is taken as a third data subset.
In some embodiments, as shown in fig. 10, the obtaining a plurality of third data subsets from the second data subset comprises:
step S1010: obtaining a confidence corresponding to the predicted classification label of each point cloud data in the second data subset; and
step S1020: in response to determining that the confidence corresponding to the predicted classification label of the first point cloud data in the second data subset is not less than the confidence threshold, adding the first point cloud data to a third data subset of the plurality of third data subsets that corresponds to the predicted classification label of the first point cloud data.
By obtaining the confidence degree corresponding to the predicted classification label of each point cloud data in the second data subset, the predicted classification labels are the same, and the point cloud data with the confidence degree not less than the confidence degree threshold value corresponding to the predicted classification labels are added with the same third data subset, so that the point cloud data in the obtained third data subset are the point cloud data which are predicted to be possibly of the same classification, that is, the point cloud data in the third data subset are more likely to be clustered into the same instance. The classification pseudo labels and the example pseudo labels obtained based on the third data subset are more accurate, and therefore the segmentation model trained based on each point cloud data in the third data subset and the corresponding classification pseudo labels and example pseudo labels thereof is more accurate.
In some embodiments, the confidence threshold is 0.90.
In some embodiments, the second predicted real label of each point cloud data in the second data subset is a predicted coordinate deviation corresponding to the point cloud data, and as shown in fig. 11, the obtaining a fourth data subset in the third data subset and a cluster center of the fourth data subset includes:
step S1110: for each point cloud data in the third data subset, obtaining corresponding updated coordinates based on the position coordinates in the point cloud data and the predicted coordinate deviation of the point cloud data; and
step S1120: and obtaining a fourth data subset in the third data subset, wherein the distance between the updated coordinates corresponding to any two point cloud data in the fourth data subset is not greater than a distance threshold.
Because the predicted instance label of each point cloud data in the point cloud data set is the deviation coordinate corresponding to the point cloud data, it indicates the deviation between the position coordinate of the corresponding point and the position coordinate of the training data set and the clustering center of the subset that the point cloud data corresponds to and can be clustered as an instance; thus, based on the position coordinates and the deviation coordinates in the point cloud data, the obtained updated coordinates indicate the position coordinates of the cluster centers of the respective subsets. For a fourth data subset with the same prediction classification label, when it is determined that the distance between the update coordinates corresponding to any two point cloud data in the fourth data subset is smaller than the distance threshold, that is, it is determined that the distance between the cluster centers of the corresponding subsets of any two point cloud data in the fourth data subset is smaller than the distance threshold, it may be determined that the fourth data subset may be clustered into the same instance.
In some embodiments, the position coordinate of the cluster center of each of the plurality of fourth data subsets is a coordinate mean of a plurality of update coordinates corresponding to the fourth data subset.
For example, the position coordinates of the cluster center of the fourth data subset are calculated by the following formula (1).
Figure BDA0003600956070000181
Wherein the content of the first and second substances,
Figure BDA0003600956070000182
for the point cloud data in the jth fourth data subset,
Figure BDA0003600956070000183
the cluster center of the jth fourth data subset,
Figure BDA0003600956070000184
representing coordinates in the ith point cloud data belonging to the jth fourth data subset,
Figure BDA0003600956070000185
is the predicted coordinate deviation, N, in the ith point cloud data in the jth fourth data subsetjAnd representing the number of point cloud data in the jth fourth data subset, wherein i and j are positive integers.
And the coordinate mean value of a plurality of updated coordinates corresponding to the fourth data subset is used as the position coordinate of the clustering center of the fourth data subset, so that the error caused by the abnormal point is reduced, and the accuracy of the subsequently obtained example pseudo label is improved.
In some embodiments, for each of a plurality of fourth data subsets, merging the fourth data subset with a fifth data subset of the first data subset into a sixth data subset, and calculating a cluster center for the fourth data subset based on the location coordinates in each point cloud data in the sixth data subset, wherein the labeling classification label for each point cloud data in the fifth data subset is the same as the predictive classification label for the fourth data subset.
In some embodiments, a clustering algorithm is used to obtain a clustering center based on the position coordinates in each point cloud data in the fourth subset of data.
For each fourth data subset of the plurality of fourth data subsets, after obtaining the position coordinates of the clustering center of the fourth data subset, obtaining an instance pseudo label of the point cloud data for each point cloud data of the fourth data subset, wherein the instance pseudo label of the point cloud data is a coordinate deviation between the position coordinates of the point cloud data and the position coordinates of the clustering center of the fourth data subset.
In some embodiments, the classification pseudo-tag is a predicted classification tag of the point cloud data.
In some embodiments, after obtaining the classification pseudo label and the instance pseudo label of each point cloud data in each of the plurality of fourth data subsets, supervised training a supervised trained segmentation model obtained by supervised training using each point cloud data in the first data subset and its corresponding labeled classification label and labeled instance label is further performed based on the plurality of fourth data subsets and the first data subset.
Wherein supervised training the supervised trained segmentation model based on the plurality of fourth data subsets and the first data subset comprises:
inputting the plurality of fourth data subsets and the first data subset into the supervised trained segmentation model to obtain a predicted classification label and a predicted instance label for each point cloud data of the plurality of fourth data subsets and the first data subset;
calculating a first loss of the point cloud data based on the classification pseudo tag and the prediction classification tag of each point cloud data in the plurality of fourth data subsets, calculating a second loss of the point cloud data based on the instance pseudo tag and the prediction instance tag of each point cloud data in the plurality of fourth data subsets, calculating a first loss of the point cloud data based on the annotation classification tag and the prediction classification tag of each point cloud data of the first data subset, and calculating a second loss of the point cloud data based on the annotation instance tag and the prediction instance tag of each point cloud data of the first data subset;
obtaining a first loss corresponding to the plurality of fourth data subsets based on the first loss and the second loss of each point cloud data in the plurality of fourth data subsets, and obtaining a second loss corresponding to the first data subset based on the first loss and the second loss of each point cloud data in the first data subset; and
the supervised trained segmentation model is adjusted based on the first loss and a second loss.
In one example, the first loss for each of the plurality of fourth subsets of data and the first subset of data is a cross-entropy loss. The second loss of each of the point cloud data in the plurality of fourth data subsets and the first data subset is an L1 loss, i.e., an average absolute value error.
In one example, the supervised trained segmentation model is adjusted based on the first loss and a summation of the first loss and the second loss.
In the above process, the supervised training of the supervised training segmentation model is performed based on the plurality of fourth data subsets and the first data subset, and one-time parameter adjustment of the supervised training segmentation model is completed. The supervised training segmentation model after parameter adjustment is a pseudo-supervised training segmentation model. Then, the pseudo-supervised trained segmentation model is re-determined as the supervised trained segmentation model to iteratively perform the above steps S910-S960 until the model converges.
According to another aspect of the present disclosure, there is also provided a three-dimensional scene segmentation apparatus, as shown in fig. 12, the apparatus 1200 includes: a point cloud data acquisition unit 1210 configured to acquire a point cloud data set of a target three-dimensional scene, the point cloud data set including point cloud data of each point in a target point set corresponding to the target three-dimensional scene, the point cloud data indicating a position of the point in the target three-dimensional scene; a tag obtaining unit 1220 configured to obtain a target classification tag and a target instance tag of each point cloud data in the point cloud data set, the target classification tag indicating a corresponding classification of the point cloud data in a plurality of classifications, the target instance tag indicating a direction and a distance in which the point cloud data is offset with respect to a clustering center of a corresponding first subset in the point cloud data set, the corresponding first subset being capable of being clustered into an instance; and a first determining unit 1230 configured to determine, for a plurality of first point cloud data in the point cloud data set having the same target classification label, that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene in response to determining that a target instance label of any one of the plurality of first point cloud data corresponds to a target instance label of another first point cloud data.
In some embodiments, the tag obtaining unit 1220 includes: a model calculation unit configured to input the point cloud data set to a segmentation model to obtain a target classification label and a target instance label for each of the point cloud data sets, wherein the segmentation model is obtained by semi-supervised training using a training data set and a labeling data set corresponding to a first data set of the training data set, wherein the training data set includes the point cloud data from each of a sample point set on an instance sample in a three-dimensional space, the first data set includes the point cloud data from each of a first subset of the sample point set, the labeling data set includes a labeling classification label and a labeling instance label corresponding to the point cloud data of each of the first subset, the labeling classification label indicates a corresponding classification of the instance sample in the plurality of classifications, the annotation instance tag indicates a direction and distance that a point corresponding to the respective point cloud data is offset from a center of the instance sample.
In some embodiments, each point cloud data in the training data set includes a position coordinate of a point to which the point cloud data corresponds, and the labeled instance label of the point cloud data of each point in the first sample subset includes a coordinate offset between the position coordinate of the point and a position coordinate of a center of the instance sample, wherein the position coordinate of the center of the instance sample is obtained based at least on the position coordinate of each point in the first sample subset.
In some embodiments, the point cloud data for each point in the set of target points includes location coordinates for the point, the target instance tag for each point in the set of target points includes deviation coordinates for the point, the apparatus further comprising: an updating unit, configured to obtain, for each of the plurality of first point cloud data, a corresponding updated coordinate based on the offset coordinate corresponding to the first point cloud data and the position coordinate in the first point cloud data; and a second determination unit configured to, in response to determining that a distance between updated coordinates of any two of the plurality of first point cloud data is less than a distance threshold, determine that a target instance tag of any one of the plurality of first point cloud data corresponds to a target instance tag of another one of the plurality of first point cloud data.
In some embodiments, the segmentation model includes a semantic network and a clustering network, and the model calculation unit includes: inputting the point cloud data sets to the semantic network to obtain a target classification label for each point cloud data in the point cloud data sets; and inputting the point cloud data sets to the clustering network to obtain a target instance label for each point cloud data in the point cloud data sets.
In some embodiments, the apparatus 1200 further comprises: a tag obtaining unit configured to obtain a first tag and a second tag in response to determining that a plurality of second point cloud data in the point cloud data set correspond to a second instance in the target three-dimensional scene and that a same target classification tag of the plurality of second point cloud data is the same as a same target classification tag of the plurality of first point cloud data, the first tag being different from the second tag; and a display unit configured to display a first point set of the target point set based on the first mark, and display a second point set of the target point set based on the second mark, a plurality of points in the first point set corresponding to the plurality of first point cloud data, and a plurality of points in the second point set corresponding to the plurality of second point cloud data.
According to another aspect of the present disclosure, there is also provided an apparatus for training a segmentation model, as shown in fig. 13, the apparatus 1300 includes: a data acquisition unit 1310 configured to obtain a point cloud dataset of a training three-dimensional scene, the point cloud dataset comprising point cloud data for each point in a sample point set located on an example sample in the three-dimensional scene, the point cloud data indicating a location of the point in the training three-dimensional scene, the example sample corresponding to a first classification of a plurality of classifications; a labeling unit 1320 configured to label at least each point cloud data in a first data subset of the point cloud data set to obtain a label classification label and a label instance label of each point cloud data in the first data set, the first data subset including point cloud data of each point in at least a first subset of the sample point set, the label classification label indicating the first classification, and the label instance label indicating a distance and a direction of a shift of a point corresponding to the point cloud data with respect to a center of the instance sample; and a supervised training unit 1330 configured to supervise train the segmentation model based on each point cloud data in the first subset of data and the labeled classification label and labeled instance label of the point cloud data to obtain a supervised trained segmentation model.
In some embodiments, the first subset is not greater than a preset number threshold, the apparatus further comprising: a data input unit configured to input a second data subset of the set of point cloud data to the supervised trained segmentation model to obtain a predicted classification label and a predicted instance label for each point cloud data in the second data subset, the second data subset comprising point cloud data for each point in a second subset of the set of sample points other than the first subset; a first data obtaining subunit, configured to obtain a plurality of third data subsets from the second data subset, where predicted classification labels corresponding to a plurality of point cloud data in each of the plurality of third data subsets are the same; a second data obtaining subunit, configured to obtain, for each of the plurality of third data subsets, a fourth data subset capable of being clustered into an instance in the third data subset and a clustering center of the fourth data subset based on the predicted instance label of each point cloud data in the third data subset; a pseudo label obtaining unit, configured to obtain, for each of a plurality of fourth data subsets corresponding to the plurality of third data subsets, a classification pseudo label and an example pseudo label of each point cloud data in the fourth data subset, where the classification pseudo label indicates a corresponding classification of the point cloud data in the plurality of classifications, and the example pseudo label is a distance and a direction in which a point corresponding to the point cloud data is offset from a cluster center of the fourth data subset; a pseudo supervised training unit configured to supervise and train the supervised and trained segmentation model based on each point cloud data in each of the plurality of fourth data subsets and a classification pseudo label and an instance pseudo label of the point cloud data, and based on each point cloud data in the first data subset and a labeling classification label and a labeling instance label of the point cloud data, to obtain a pseudo supervised and trained segmentation model; and a determining unit configured for determining the pseudo-supervised trained segmentation model as the supervised trained segmentation model to perform an input of a second data subset of the point cloud dataset to the supervised trained segmentation model.
In some embodiments, the point cloud data for each point in the sample point set comprises location coordinates for that point, the labeling unit comprises: a first clustering unit configured to obtain a position coordinate of a center of the sample instance based on a position coordinate in the point cloud data of each point in the first subset; and a marked instance label obtaining unit configured to obtain a marked instance label of the point cloud data of each point in the first subset, wherein the marked instance label is a coordinate deviation between a position coordinate in the point cloud data and a position coordinate of the center of the sample instance.
In some embodiments, the second predicted real label of each point cloud data in the second data subset is a predicted coordinate deviation corresponding to the point cloud data, and the second data obtaining subunit includes: a coordinate updating unit configured to obtain, for each point cloud data in the third subset of data, an updated coordinate corresponding to the point cloud data based on the position coordinate in the point cloud data and the predicted coordinate deviation of the point cloud data; and the second clustering unit is configured to obtain a fourth data subset in the third data subset, wherein the distance between the updated coordinates corresponding to any two point cloud data in the fourth data subset is not greater than a distance threshold.
In some embodiments, the position coordinate of the cluster center of each of the plurality of fourth data subsets is a coordinate mean of a plurality of updated coordinates corresponding to the fourth data subset.
In some embodiments, the first data acquisition subunit comprises: a confidence coefficient obtaining unit configured to obtain a confidence coefficient corresponding to the predicted classification label of each point cloud data in the second data subset; and a determining unit configured to add the first point cloud data to a third data subset corresponding to the predicted classification tag of the first point cloud data among the plurality of third data subsets in response to determining that the confidence degree corresponding to the predicted classification tag of the first point cloud data in the second data subset is not less than a confidence degree threshold.
In some embodiments, the segmentation model comprises a semantic network and a clustering network, the supervised training unit comprises: a data input subunit configured to input the first data subset to the semantic network to obtain a predicted classification label for each point cloud data in the first data subset, and to input the first data subset to the clustering network to obtain a predicted instance label for each point cloud data in the first data subset; a loss calculation unit configured to calculate a first loss of each point cloud data in the first data subset based on the annotation classification label and the prediction classification label of the point cloud data, and calculate a second loss of the point cloud data based on the annotation instance label and the prediction instance label of each point cloud data in the first data subset; and a parameter adjusting unit configured to adjust parameters of the semantic network and the clustering network based on a sum of the first loss and the second loss of each point cloud data in the first data subset.
According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the present disclosure.
According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the present disclosure.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the method according to the present disclosure when executed by a processor.
Referring to fig. 14, a block diagram of a structure of an electronic device 1400, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 14, the electronic device 1400 includes a computing unit 1401 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1402 or a computer program loaded from a storage unit 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data necessary for the operation of the electronic device 1400 can also be stored. The calculation unit 1401, the ROM 1402, and the RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.
A number of components in the electronic device 1400 are connected to the I/O interface 1405, including: an input unit 1406, an output unit 1407, a storage unit 1408, and a communication unit 1409. The input unit 1406 may be any type of device capable of inputting information to the electronic device 1400, and the input unit 1406 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 1407 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, an object/audio output terminal, a vibrator, and/or a printer. Storage unit 1408 may include, but is not limited to, a magnetic disk, optical disk. The communication unit 1409 allows the electronic device 1400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as a bluetooth (TM) device, an 802.11 device, a WiFi device, a WiMax device, a cellular communication device, and/or the like.
The computing unit 1401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 1401 performs the various methods and processes described above, such as the method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1408. In some embodiments, part or all of a computer program may be loaded and/or installed onto the electronic device 1400 via the ROM 1402 and/or the communication unit 1409. When the computer program is loaded into the RAM 1403 and executed by the computing unit 1401, one or more steps of the method 200 described above may be performed. Alternatively, in other embodiments, the computing unit 1401 may be configured to perform the method 200 by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims (29)

1. A three-dimensional scene segmentation method, comprising:
acquiring a point cloud data set of a target three-dimensional scene, wherein the point cloud data set comprises point cloud data of each point in a target point set corresponding to the target three-dimensional scene, and the point cloud data indicates the position of the point in the target three-dimensional scene;
obtaining a target classification label and a target instance label for each point cloud data in the point cloud dataset, the target classification label indicating a respective classification of the point cloud data in a plurality of classifications, the target instance label indicating a direction and distance that the point cloud data is offset relative to a clustering center of a respective first subset in the point cloud dataset, the respective first subset capable of being clustered into an instance; and
for a plurality of first point cloud data in the point cloud dataset having a same target classification tag, in response to determining that a target instance tag of any first point cloud data in the plurality of first point cloud data corresponds to a target instance tag of another first point cloud data, determining that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene.
2. The method of claim 1, wherein the obtaining a target classification label and a target instance label for each point cloud data in the point cloud data set comprises:
inputting the point cloud dataset to a segmentation model to obtain a target classification label and a target instance label for each point cloud data in the point cloud dataset, wherein,
the segmentation model is obtained by semi-supervised training using a training dataset and a labeling dataset corresponding to a first dataset of the training datasets, wherein,
the training data set includes point cloud data from each point in a sample point set on an example sample of three-dimensional space,
the first data set comprises point cloud data from each point in a first subset of samples in the set of sample points,
the annotation dataset includes an annotation classification label corresponding to the point cloud data for each point in the first subset of samples and an annotation instance label indicating a respective classification of the instance sample among the plurality of classifications indicating a direction and distance that a point corresponding to the respective point cloud data is offset relative to a center of the instance sample.
3. The method of claim 2, wherein each point cloud data in the training data set includes location coordinates of a point to which the point cloud data corresponds, and a labeled instance label of the point cloud data for each point in the first subset of samples includes a coordinate deviation between the location coordinates of the point and the location coordinates of the center of the instance sample, wherein,
the location coordinates of the center of the example sample are obtained based at least on the location coordinates of each point in the first subset of samples.
4. The method of claim 3, wherein the point cloud data for each point in the set of target points includes location coordinates for the point, the target instance label for each point cloud data in the set of point cloud data includes deviation coordinates corresponding to the point cloud data, the method further comprising:
for each first point cloud data in the plurality of first point cloud data, obtaining a corresponding updated coordinate based on the deviation coordinate corresponding to the first point cloud data and the position coordinate in the first point cloud data; and
in response to determining that a distance between updated coordinates of any two of the plurality of first point cloud data is less than a distance threshold, determining that a target instance tag of any one of the plurality of first point cloud data corresponds to a target instance tag of another one of the first point cloud data.
5. The method of any of claims 2-4, wherein the segmentation model includes a semantic network and a clustering network, and the inputting the point cloud data sets into the segmentation model to obtain target classification tags and target instance tags for each point cloud data in the point cloud data sets includes:
inputting the point cloud data sets to the semantic network to obtain a target classification label for each point cloud data in the point cloud data sets; and is
Inputting the point cloud data sets to the clustering network to obtain a target instance label for each point cloud data in the point cloud data sets.
6. The method of any of claims 1-5, further comprising:
in response to determining that a plurality of second point cloud data in the point cloud data set corresponds to a second instance in the target three-dimensional scene and that a same target classification label of the plurality of second point cloud data is the same as a same target classification label of the plurality of first point cloud data, obtaining a first label and a second label, the first label being different from the second label; and
displaying a first set of points of the set of target points based on the first marker, and displaying a second set of points of the set of target points based on the second marker, a plurality of points of the first set of points corresponding to the plurality of first point cloud data, a plurality of points of the second set of points corresponding to the plurality of second point cloud data.
7. A method for training a segmentation model, comprising:
obtaining a point cloud dataset of a training three-dimensional scene, the point cloud dataset comprising point cloud data for each point in a sample point set located on an example sample in the three-dimensional scene, the point cloud data indicating a location of the point in the training three-dimensional scene, the example sample corresponding to a first classification of a plurality of classifications;
labeling each point cloud data in at least a first data subset of the point cloud data set to obtain a label classification label and a label instance label of each point cloud data in the first data set, the first data subset including the point cloud data of each point in at least the first subset of the sample point set, the label classification label indicating the first classification, and the label instance label indicating a distance and a direction of a point corresponding to the point cloud data offset with respect to a center of the instance sample; and
and performing supervised training on the segmentation model based on each point cloud data in the first data subset and the labeling classification label and the labeling instance label of the point cloud data to obtain a supervised-trained segmentation model.
8. The method of claim 7, wherein the first subset is not greater than a preset number threshold, the method further comprising:
inputting a second data subset of the set of point cloud data to the supervised trained segmentation model to obtain a predicted classification label and a predicted instance label for each point cloud data in the second data subset, the second data subset comprising point cloud data for each point in a second subset of the sample point set other than the first subset;
obtaining a plurality of third data subsets from the second data subsets, wherein the predicted classification labels of any two point cloud data in each third data subset of the plurality of third data subsets are the same;
for each third data subset in the plurality of third data subsets, obtaining a fourth data subset in the third data subset and a cluster center of the fourth data subset, wherein the fourth data subset can be clustered into an example, based on the predicted example label of each point cloud data in the third data subset;
for each fourth data subset in a plurality of fourth data subsets corresponding to the third data subsets, obtaining a classification pseudo label and an example pseudo label of each point cloud data in the fourth data subset, where the classification pseudo label indicates a corresponding classification of the point cloud data in the plurality of classifications, and the example pseudo label is a distance and a direction of a shift of a point corresponding to the point cloud data with respect to a cluster center of the fourth data subset;
performing supervised training on the supervised-trained segmentation model based on each point cloud data in each of the plurality of fourth data subsets and the classification pseudo label and the instance pseudo label of the point cloud data, and based on each point cloud data in the first data subset and the labeling classification label and the labeling instance label of the point cloud data, to obtain a pseudo-supervised-trained segmentation model; and
determining the pseudo-supervised trained segmentation model as the supervised trained segmentation model to perform inputting a second data subset of the point cloud dataset to the supervised trained segmentation model.
9. The method of claim 7, wherein the point cloud data for each point in the sample point set includes location coordinates for the point, and wherein labeling each point cloud data in at least a first data subset of the point cloud data set includes:
obtaining location coordinates of a center of the sample instance based on location coordinates in the point cloud data for each point in the first subset; and
and obtaining a labeling instance label of the point cloud data of each point in the first subset, wherein the labeling instance label is a coordinate deviation between a position coordinate in the point cloud data and a position coordinate of the center of the sample instance.
10. The method of claim 9, wherein the second predicted real label of each point cloud data in the second data subset is a predicted coordinate deviation corresponding to the point cloud data, and the obtaining a fourth data subset in the third data subset and a cluster center of the fourth data subset comprises:
for each point cloud data in the third data subset, obtaining corresponding updated coordinates based on the position coordinates in the point cloud data and the predicted coordinate deviation of the point cloud data; and
and obtaining a fourth data subset in the third data subset, wherein the distance between the updated coordinates corresponding to any two point cloud data in the fourth data subset is not greater than a distance threshold.
11. The method of claim 10, wherein the position coordinate of the cluster center of each of the plurality of fourth data subsets is a coordinate mean of a plurality of updated coordinates corresponding to the fourth data subset.
12. The method of any of claims 8-11, wherein the obtaining a plurality of third data subsets from the second data subset comprises:
obtaining a confidence corresponding to the predicted classification label of each point cloud data in the second data subset; and
in response to determining that the confidence corresponding to the predicted classification label of the first point cloud data in the second data subset is not less than the confidence threshold, adding the first point cloud data to a third data subset of the plurality of third data subsets that corresponds to the predicted classification label of the first point cloud data.
13. The method of any of claims 7-12, wherein the segmentation model comprises a semantic network and a clustering network, and wherein supervised training the segmentation model based on each point cloud data in the first subset of data and its label classification label and label instance label comprises:
inputting the first subset of data to the semantic network to obtain a predicted classification label for each point cloud data in the first subset of data, and inputting the first subset of data to the clustering network to obtain a predicted instance label for each point cloud data in the first subset of data;
calculating a first loss of each point cloud data in the first data subset based on the annotation classification label and the prediction classification label of the point cloud data, and calculating a second loss of each point cloud data in the first data subset based on the annotation instance label and the prediction instance label of the point cloud data; and
adjusting parameters of the semantic network and the clustering network based on a sum of the first loss and the second loss of each point cloud data in the first data subset.
14. A three-dimensional scene segmentation apparatus comprising:
a point cloud data acquisition unit configured to acquire a point cloud data set of a target three-dimensional scene, the point cloud data set including point cloud data of each point in a target point set corresponding to the target three-dimensional scene, the point cloud data indicating a position of the point in the target three-dimensional scene;
a tag obtaining unit configured to obtain a target classification tag and a target instance tag of each point cloud data in the point cloud data set, the target classification tag indicating a corresponding classification of the point cloud data in a plurality of classifications, the target instance tag indicating a direction and a distance in which the point cloud data is offset with respect to a clustering center of a corresponding first subset in the point cloud data set, the corresponding first subset capable of being clustered into an instance; and
a first determination unit configured to determine, for a plurality of first point cloud data in the point cloud data set having a same target classification label, that the plurality of first point cloud data corresponds to a first instance in the target three-dimensional scene in response to determining that a target instance label of any one of the plurality of first point cloud data corresponds to a target instance label of another first point cloud data.
15. The apparatus of claim 14, wherein the tag obtaining unit comprises:
a model calculation unit configured for inputting the point cloud data set to a segmentation model to obtain a target classification tag and a target instance tag for each point cloud data in the point cloud data set, wherein,
the segmentation model is obtained by semi-supervised training using a training dataset and an annotation dataset corresponding to a first dataset of the training datasets, wherein,
the training data set includes point cloud data from each point in a sample point set on an example sample of three-dimensional space,
the first data set comprises point cloud data from each point in a first subset of samples in the set of sample points,
the annotation data set includes an annotation classification label indicating a respective classification of the instance sample among the plurality of classifications and an annotation instance label indicating a direction and distance of a point corresponding to the respective point cloud data offset relative to a center of the instance sample corresponding to each point in the first subset of samples.
16. The apparatus of claim 15, wherein each point cloud data in the training data set comprises a location coordinate of a point to which the point cloud data corresponds, and wherein a labeled instance label of the point cloud data for each point in the first subset of samples comprises a coordinate offset between the location coordinate of the point and a location coordinate of a center of the instance sample,
the location coordinates of the center of the example sample are obtained based at least on the location coordinates of each point in the first subset of samples.
17. The apparatus of claim 16, wherein the point cloud data for each point in the set of target points includes location coordinates for the point, the target instance tag for each point in the set of target points includes deviation coordinates for the point, the apparatus further comprising:
an updating unit, configured to obtain, for each of the plurality of first point cloud data, a corresponding updated coordinate based on the offset coordinate corresponding to the first point cloud data and the position coordinate in the first point cloud data; and
a second determination unit configured to determine that a target instance tag of any one of the plurality of first point cloud data corresponds to a target instance tag of another one of the plurality of first point cloud data in response to determining that a distance between updated coordinates of any two of the plurality of first point cloud data is less than a distance threshold.
18. The apparatus according to any one of claims 15-17, wherein the segmentation model comprises a semantic network and a clustering network, the model computation unit comprising:
inputting the point cloud data sets to the semantic network to obtain a target classification label for each point cloud data in the point cloud data sets; and is
Inputting the point cloud data sets to the clustering network to obtain a target instance label for each point cloud data in the point cloud data sets.
19. The apparatus of any of claims 14-18, further comprising:
a tag obtaining unit configured to obtain a first tag and a second tag in response to determining that a plurality of second point cloud data in the point cloud data set correspond to a second instance in the target three-dimensional scene and that a same target classification tag of the plurality of second point cloud data is the same as a same target classification tag of the plurality of first point cloud data, the first tag being different from the second tag; and
a display unit configured to display a first point set of the target point set based on the first mark and a second point set of the target point set based on the second mark, a plurality of points of the first point set corresponding to the plurality of first point cloud data, and a plurality of points of the second point set corresponding to the plurality of second point cloud data.
20. An apparatus for training a segmentation model, comprising:
a data acquisition unit configured to obtain a point cloud dataset of a training three-dimensional scene, the point cloud dataset comprising point cloud data for each point in a sample point set located on an example sample in the three-dimensional scene, the point cloud data indicating a location of the point in the training three-dimensional scene, the example sample corresponding to a first classification of a plurality of classifications;
a labeling unit configured to label at least each point cloud data in a first data subset of the point cloud data set to obtain a label classification label and a label instance label of each point cloud data in the first data set, the first data subset including point cloud data of each point in at least a first subset of the sample point set, the label classification label indicating the first classification, and the label instance label indicating a distance and a direction of a shift of a point corresponding to the point cloud data with respect to a center of the instance sample; and
a supervised training unit configured to supervise and train the segmentation model based on each point cloud data in the first data subset and the labeled classification label and labeled instance label of the point cloud data to obtain a supervised and trained segmentation model.
21. The apparatus of claim 20, wherein the first subset is not greater than a preset number threshold, the apparatus further comprising:
a data input unit configured to input a second data subset of the set of point cloud data to the supervised trained segmentation model to obtain a predicted classification label and a predicted instance label for each point cloud data in the second data subset, the second data subset comprising point cloud data for each point in a second subset of the set of sample points other than the first subset;
a first data obtaining subunit, configured to obtain a plurality of third data subsets from the second data subset, where predicted classification labels corresponding to a plurality of point cloud data in each of the plurality of third data subsets are the same;
a second data obtaining subunit, configured to obtain, for each of the plurality of third data subsets, a fourth data subset capable of being clustered into an instance in the third data subset and a clustering center of the fourth data subset based on the predicted instance label of each point cloud data in the third data subset;
a pseudo label obtaining unit, configured to obtain, for each of a plurality of fourth data subsets corresponding to the plurality of third data subsets, a classification pseudo label and an example pseudo label of each point cloud data in the fourth data subset, where the classification pseudo label indicates a corresponding classification of the point cloud data in the plurality of classifications, and the example pseudo label is a distance and a direction in which a point corresponding to the point cloud data is offset from a cluster center of the fourth data subset;
a pseudo supervised training unit configured to supervise and train the supervised and trained segmentation model based on each point cloud data in each of the plurality of fourth data subsets and a classification pseudo label and an instance pseudo label of the point cloud data, and based on each point cloud data in the first data subset and a labeling classification label and a labeling instance label of the point cloud data, to obtain a pseudo supervised and trained segmentation model; and
a determination unit configured for determining the pseudo-supervised trained segmentation model as the supervised trained segmentation model to perform an input of a second data subset of the point cloud dataset to the supervised trained segmentation model.
22. The apparatus of claim 21, wherein the point cloud data for each point in the sample point set includes location coordinates for the point, the labeling unit comprising:
a first clustering unit configured to obtain a position coordinate of a center of the sample instance based on a position coordinate in the point cloud data of each point in the first subset; and
a marked instance tag obtaining unit configured to obtain a marked instance tag of the point cloud data of each point in the first subset, wherein the marked instance tag is a coordinate deviation between a position coordinate in the point cloud data and a position coordinate of the center of the sample instance.
23. The apparatus of claim 21, wherein the second predicted real label of each point cloud data in the second subset of data is a predicted coordinate offset corresponding to the point cloud data, and the second data obtaining subunit comprises:
a coordinate updating unit configured to obtain, for each point cloud data in the third subset of data, an updated coordinate corresponding to the point cloud data based on the position coordinate in the point cloud data and the predicted coordinate deviation of the point cloud data; and
and the second clustering unit is configured to obtain a fourth data subset in the third data subset, wherein the distance between the updated coordinates corresponding to any two point cloud data in the fourth data subset is not greater than a distance threshold.
24. The apparatus of claim 23, wherein the position coordinate of the cluster center of each of the plurality of fourth data subsets is a coordinate mean of a plurality of updated coordinates corresponding to the fourth data subset.
25. The apparatus of any one of claims 20-24, wherein the first data acquisition subunit comprises:
a confidence coefficient obtaining unit configured to obtain a confidence coefficient corresponding to the predicted classification label of each point cloud data in the second data subset; and
a determining unit configured to add the first point cloud data to a third data subset of the plurality of third data subsets corresponding to the predicted classification tag of the first point cloud data in response to determining that the confidence corresponding to the predicted classification tag of the first point cloud data in the second data subset is not less than a confidence threshold.
26. The apparatus according to any one of claims 19-25, wherein the segmentation model includes a semantic network and a clustering network, the supervised training unit including:
a data input subunit configured to input the first data subset to the semantic network to obtain a predicted classification label for each point cloud data in the first data subset, and to input the first data subset to the clustering network to obtain a predicted instance label for each point cloud data in the first data subset;
a loss calculation unit configured to calculate a first loss of each point cloud data in the first data subset based on the annotation classification label and the prediction classification label of the point cloud data, and calculate a second loss of the point cloud data based on the annotation instance label and the prediction instance label of each point cloud data in the first data subset; and
a parameter adjusting unit configured to adjust parameters of the semantic network and the clustering network based on a sum of the first loss and the second loss of each point cloud data in the first data subset.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-13.
29. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-13 when executed by a processor.
CN202210403610.0A 2022-04-18 2022-04-18 Three-dimensional scene segmentation method and method for training segmentation model Pending CN114723949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210403610.0A CN114723949A (en) 2022-04-18 2022-04-18 Three-dimensional scene segmentation method and method for training segmentation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210403610.0A CN114723949A (en) 2022-04-18 2022-04-18 Three-dimensional scene segmentation method and method for training segmentation model

Publications (1)

Publication Number Publication Date
CN114723949A true CN114723949A (en) 2022-07-08

Family

ID=82243960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210403610.0A Pending CN114723949A (en) 2022-04-18 2022-04-18 Three-dimensional scene segmentation method and method for training segmentation model

Country Status (1)

Country Link
CN (1) CN114723949A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114937265A (en) * 2022-07-25 2022-08-23 深圳市商汤科技有限公司 Point cloud detection method, model training method, device, equipment and storage medium
CN115115923A (en) * 2022-07-18 2022-09-27 北京有竹居网络技术有限公司 Model training method, instance segmentation method, device, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115923A (en) * 2022-07-18 2022-09-27 北京有竹居网络技术有限公司 Model training method, instance segmentation method, device, equipment and medium
CN115115923B (en) * 2022-07-18 2024-04-09 北京有竹居网络技术有限公司 Model training method, instance segmentation method, device, equipment and medium
CN114937265A (en) * 2022-07-25 2022-08-23 深圳市商汤科技有限公司 Point cloud detection method, model training method, device, equipment and storage medium
CN114937265B (en) * 2022-07-25 2022-10-28 深圳市商汤科技有限公司 Point cloud detection method, model training method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
EP3961579A2 (en) Target detection method, apparatus, medium and computer program product
CN114648638A (en) Training method of semantic segmentation model, semantic segmentation method and device
CN112857268B (en) Object area measuring method, device, electronic equipment and storage medium
CN115438214B (en) Method and device for processing text image and training method of neural network
CN114723949A (en) Three-dimensional scene segmentation method and method for training segmentation model
CN115082740A (en) Target detection model training method, target detection method, device and electronic equipment
CN114821581A (en) Image recognition method and method for training image recognition model
CN114550313A (en) Image processing method, neural network, and training method, device, and medium thereof
CN115797660A (en) Image detection method, image detection device, electronic equipment and storage medium
CN115578501A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114429678A (en) Model training method and device, electronic device and medium
CN114998963A (en) Image detection method and method for training image detection model
CN115269989A (en) Object recommendation method and device, electronic equipment and storage medium
CN114741623A (en) Interest point state determination method, model training method and device
CN115359309A (en) Training method, device, equipment and medium of target detection model
CN113868453A (en) Object recommendation method and device
CN115019048B (en) Three-dimensional scene segmentation method, model training method and device and electronic equipment
CN115131562B (en) Three-dimensional scene segmentation method, model training method, device and electronic equipment
CN115511779B (en) Image detection method, device, electronic equipment and storage medium
CN115512131B (en) Image detection method and training method of image detection model
CN115170536B (en) Image detection method, training method and device of model
CN112765975B (en) Word segmentation disambiguation processing method, device, equipment and medium
CN114898387A (en) Table image processing method and device
CN115601561A (en) High-precision map target detection method, device, equipment and medium
CN115578451A (en) Image processing method, and training method and device of image processing model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination