CN112085840A

CN112085840A - Semantic segmentation method, device, equipment and computer readable storage medium

Info

Publication number: CN112085840A
Application number: CN202010981890.4A
Authority: CN
Inventors: 者雪飞; 暴林超; 林鸿鑫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2020-12-15
Anticipated expiration: 2040-09-17
Also published as: CN112085840B

Abstract

The application provides a semantic segmentation method, a semantic segmentation device, semantic segmentation equipment and a computer-readable storage medium; the method comprises the following steps: obtaining a display operation aiming at the three-dimensional model; responding to the display operation, and displaying the three-dimensional model on a human-computer interaction interface; the man-machine interaction interface comprises a semantic segmentation option; acquiring selection operation aiming at semantic segmentation options; responding to the selection operation, and displaying a semantic segmentation result of the three-dimensional model on a human-computer interaction interface after a two-dimensional segmentation result of the two-dimensional image is obtained; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene. By the semantic segmentation method based on artificial intelligence, segmentation efficiency of the three-dimensional model can be improved, and user experience can be improved.

Description

Semantic segmentation method, device, equipment and computer readable storage medium

Technical Field

The present application relates to data processing technologies, and in particular, to a semantic segmentation method, apparatus, device, and computer readable storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Three-dimensional object reconstruction (3D object reconstruction) and Target segmentation (Target segmentation) are important branches of artificial intelligence, and in actual engineering application, semantic segmentation needs to be performed on a specific three-dimensional model to obtain a partial three-dimensional model of a Target object, and subsequent engineering application is already performed.

In the traditional three-dimensional model segmentation process, the point cloud data of the three-dimensional model is usually subjected to semantic segmentation according to a three-dimensional segmentation network, and due to the huge data volume of the point cloud data, in the actual segmentation process, not only the calculated amount is large, the time consumption is long, but also the segmentation accuracy is low, and a more real semantic segmentation result cannot be obtained.

Disclosure of Invention

The embodiment of the application provides a semantic segmentation method, a semantic segmentation device, semantic segmentation equipment and a computer-readable storage medium, and can improve the segmentation efficiency and the segmentation accuracy of a three-dimensional model.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a semantic segmentation method, which comprises the following steps: obtaining a display operation aiming at the three-dimensional model; responding to the display operation, and displaying the three-dimensional model on a human-computer interaction interface; the man-machine interaction interface comprises a semantic segmentation option; acquiring selection operation aiming at semantic segmentation options; responding to the selection operation, and displaying a semantic segmentation result of the three-dimensional model on a human-computer interaction interface after a two-dimensional segmentation result of the two-dimensional image is obtained; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene.

In some embodiments, the method further comprises mapping the two-dimensional segmentation result to a three-dimensional model to obtain a semantic segmentation result of the three-dimensional model; the mapping the two-dimensional segmentation result to the three-dimensional model to obtain the semantic segmentation result of the three-dimensional model comprises the following steps: acquiring a two-dimensional segmentation result of the two-dimensional image; the two-dimensional segmentation result at least comprises labels corresponding to a plurality of pixel points in the two-dimensional image; mapping a plurality of pixel points of the two-dimensional image with a plurality of vertexes of the three-dimensional model; determining an initial label of each vertex according to labels corresponding to the multiple pixel points; establishing an energy function according to the initial label of each vertex and the attribute information of each sub-surface in the plurality of sub-surfaces of the three-dimensional model; and optimizing the initial segmentation result based on the energy function to obtain a semantic segmentation result.

In some embodiments, the energy function comprises a first loss term, a second loss term, and a weight; the three-dimensional model further comprises a plurality of intersecting edges; the establishing an energy function according to the initial label of each vertex and the attribute information of each sub-surface in the plurality of sub-surfaces of the three-dimensional model comprises the following steps: determining a first loss term according to each vertex and a plurality of first sub-surfaces corresponding to each vertex; the first loss item is related to the initial label of each vertex and the attribute information of each first sub-surface in the plurality of first sub-surfaces; establishing a second loss term according to two adjacent vertexes corresponding to each intersecting edge in the plurality of intersecting edges and two adjacent second sub-surfaces; the second loss term is related to the length of the intersecting edge, the included angle between the two second sub-surfaces and the initial label of each of the two adjacent vertexes; an energy function is established based on the first loss term, the second loss term, and the weight.

In some embodiments, the attribute information includes at least one of: height information, plane information, vertical information, and area information, the method further comprising: attribute information for each subsurface is determined.

In some embodiments, where the attribute information includes height information, said determining attribute information for each subsurface comprises: determining a centroid location for each subsurface; determining a neighborhood centroid set corresponding to each sub-surface according to the centroid position of each sub-surface and a preset local neighborhood range; the neighborhood centroid set comprises a plurality of neighborhood centroids; and determining the height information corresponding to each sub-surface according to the centroid positions of the plurality of neighborhood centroids corresponding to each sub-surface.

In some embodiments, where the attribute information includes planar information, the determining attribute information for each subsurface includes: determining a set of adjacent sub-surfaces corresponding to each sub-surface; the adjacent sub-surface comprises a plurality of adjacent sub-surfaces adjacent to the sub-surface; acquiring an adjacent surface vertex set corresponding to each sub-surface according to the adjacent sub-surface set corresponding to each sub-surface; establishing a covariance matrix according to the vertex set of the adjacent surface corresponding to each sub-surface; the covariance matrix corresponds to a plurality of eigenvalues; and determining the plane information corresponding to each sub-surface according to the plurality of characteristic values corresponding to each sub-surface.

In some embodiments, where the attribute information comprises vertical information, said determining attribute information for each subsurface comprises: determining a unit normal vector of each sub-surface; and determining the vertical information corresponding to each sub-surface according to the unit normal vector of each sub-surface and a preset standard unit normal vector.

In some embodiments, the optimizing the initial segmentation result based on the energy function to obtain a semantic segmentation result includes: and optimizing the initial segmentation result by performing minimum iteration processing on the energy function to obtain a semantic segmentation result.

In some embodiments, the obtaining the two-dimensional segmentation result of the two-dimensional image comprises: and segmenting the two-dimensional image by adopting a semantic segmentation model to obtain a segmentation result of the two-dimensional image.

In some embodiments, in the case that the two-dimensional image is an aerial image, the obtaining process of the semantic segmentation model includes: acquiring a street view sample data set and an aerial photography sample data set; the street view sample data set comprises a plurality of street view sample pictures carrying segmentation labels; the aerial photography sample data set comprises a plurality of aerial photography sample pictures carrying segmentation labels; training a preset initial segmentation model according to the street view sample data set to obtain a trained street view segmentation model; and training the street view segmentation model according to the aerial photography sample data set to obtain a semantic segmentation model.

In some embodiments, the mapping a plurality of pixel points of the two-dimensional image with a plurality of vertices of the three-dimensional model includes: determining a conversion relation between a three-dimensional coordinate system corresponding to the three-dimensional model and a two-dimensional coordinate system corresponding to the two-dimensional image according to the shooting parameters of the two-dimensional image; and mapping the plurality of pixel points and the plurality of vertexes according to the two-dimensional coordinates of each pixel point in the plurality of pixel points, the three-dimensional coordinates of each vertex in the plurality of vertexes and the conversion relation.

In some embodiments, the determining an initial label for each vertex from labels corresponding to a plurality of pixel points includes: and determining the initial label of each vertex according to the label corresponding to the pixel point mapped by each vertex.

The embodiment of the application provides a semantic segmentation device, the device includes:

the first obtaining module is used for obtaining the display operation aiming at the three-dimensional model.

The first display module is used for responding to display operation and displaying the three-dimensional model on a human-computer interaction interface; the human-computer interaction interface includes a semantic segmentation option.

And the second acquisition module is used for acquiring the selection operation aiming at the semantic segmentation option.

The second display module is used for responding to the selection operation, and displaying the semantic segmentation result of the three-dimensional model on the human-computer interaction interface after the two-dimensional segmentation result of the two-dimensional image is obtained; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene.

An embodiment of the present application provides a semantic segmentation apparatus, including:

a memory for storing executable instructions;

and the processor is used for realizing the semantic segmentation method provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for semantic segmentation provided by the embodiment of the present application.

The embodiment of the application has the following beneficial effects:

the embodiment of the application acquires display operation aiming at the three-dimensional model; responding to the display operation, and displaying the three-dimensional model on a human-computer interaction interface; the man-machine interaction interface comprises a semantic segmentation option; acquiring selection operation aiming at semantic segmentation options; responding to the selection operation, and displaying a semantic segmentation result of the three-dimensional model on a human-computer interaction interface after a two-dimensional segmentation result of the two-dimensional image is obtained; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene. Therefore, in the process of displaying the three-dimensional model, the three-dimensional model can be subjected to semantic segmentation according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model, and compared with the scheme of directly segmenting point cloud data corresponding to the three-dimensional model in the traditional technology, the scheme reduces the data calculation amount when the three-dimensional model is subjected to semantic segmentation while ensuring the segmentation accuracy, improves the segmentation efficiency, further improves the response efficiency of user operation, and improves the user experience.

Drawings

FIG. 1 is an alternative architecture diagram of a semantic segmentation system provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a semantic segmentation apparatus provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart of an alternative semantic segmentation method provided by an embodiment of the present application;

FIG. 4 is a schematic flow chart of an alternative semantic segmentation method provided by the embodiment of the present application;

FIG. 5A is a schematic flow chart of an alternative semantic segmentation method provided by the embodiment of the present application;

FIG. 5B is a schematic diagram of an alternative three-dimensional model provided by embodiments of the present application;

FIG. 6 is a schematic flow chart of an alternative semantic segmentation method provided by the embodiment of the present application;

FIG. 7 is a schematic flow chart of an alternative semantic segmentation method provided by an embodiment of the present application;

FIG. 8 is a schematic flow chart of an alternative semantic segmentation method provided by an embodiment of the present application;

FIG. 9A is an alternative flow chart of a semantic segmentation method for a reconstruction model for urban aerial photography according to an embodiment of the present disclosure;

FIG. 9B is an alternative flow diagram of a mapping process provided by embodiments of the present application;

FIG. 9C is a schematic illustration of an alternative aerial image provided by an embodiment of the present application;

FIG. 9D is a schematic diagram of an alternative two-dimensional segmentation effect provided by an embodiment of the present application;

FIG. 9E is a schematic diagram of an alternative three-dimensional model provided by embodiments of the present application;

fig. 9F is a schematic diagram of an alternative three-dimensional segmentation effect provided in the embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, the terms "first \ second \ third" are used merely for distinguishing similar objects and do not represent specific ordering for the objects, and it is understood that "first \ second \ third" may be interchanged with specific order or sequence where permitted so that the embodiments of the present application described in the present embodiment can be implemented in an order other than that shown or described in the present embodiment.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

The scheme provided by the embodiment of the application relates to the technologies such as the computer vision technology of artificial intelligence and the like, and is specifically explained by the following embodiment:

artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science, attempting to understand the essence of intelligence and producing a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The embodiment of the application relates to a computer vision technology and a machine learning technology.

Computer Vision technology (CV) is a science for researching how to make a machine look, and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, OCR (Optical Character Recognition), synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face Recognition and fingerprint Recognition. The embodiment of the application mainly relates to an image semantic understanding technology in computer vision, and image segmentation is carried out based on image semantic understanding.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

(1)3D semantic segmentation: the 3D semantic segmentation technique is to divide the point cloud into semantically meaningful parts and then semantically label each part as one of the predefined classes.

(2) Markov Random Field (MRF): a markov random field is a probability distribution model that can be represented by an undirected graph. Each node in the graph represents one variable or a group of variables, and the edges between the nodes represent the dependency between two variables.

The embodiment of the application provides a semantic segmentation method, a semantic segmentation device, semantic segmentation equipment and a computer-readable storage medium, wherein a three-dimensional model and semantic segmentation options are displayed on a human-computer interaction interface by responding to three-dimensional model display operation; responding to the selection operation aiming at the semantic segmentation option, and displaying the semantic segmentation result of the three-dimensional model on the human-computer interaction interface; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene. By the aid of the method and the device, the segmentation efficiency of the three-dimensional model can be improved, and user experience is improved. An exemplary application of the electronic device provided by the embodiment of the present application is described below.

Referring to fig. 1, fig. 1 is an alternative architecture diagram of a semantic segmentation system 100 provided in this embodiment of the present application, in order to implement supporting a semantic segmentation application, a terminal 400 (an exemplary terminal 400-1 and a terminal 400-2 are shown) is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two. Fig. 1 further shows that the server 200 may be a server cluster, where the server cluster includes servers 200-1 to 200-3, and similarly, the servers 200-1 to 200-3 may be physical machines, or virtual machines constructed by using virtualization technologies (such as container technology and virtual machine technology), which is not limited in this embodiment, and of course, a single server may also be used to provide services in this embodiment.

In some embodiments, the terminal 400 may obtain the data file of the three-dimensional model after receiving the three-dimensional model display operation, where the data file of the three-dimensional model is stored in the terminal 400 in advance, and the terminal 400 may directly obtain the data file of the three-dimensional model after receiving the display operation of the user on the three-dimensional model; the data file of the three-dimensional model may also be stored in the server 200 connected to the terminal 400, and after the terminal 400 receives a display operation of the three-dimensional model by a user, a file request may be sent to the server 200, and the data file of the three-dimensional model returned by the server 200 may be received. The terminal 400, after parsing, may display the three-dimensional model and the semantic segmentation option in a graphical interface 410 (graphical interface 410-1 and graphical interface 410-2 are exemplarily shown). Then, after the terminal 400 receives the selection operation of the user for the semantic segmentation option, the terminal 400 may determine the semantic segmentation result of the three-dimensional model according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the semantic segmentation request may also be sent to the server 200 through the terminal 400, and the server 200 determines a semantic segmentation result of the three-dimensional model according to a two-dimensional segmentation result of the two-dimensional image and a model attribute of the three-dimensional model and sends the semantic segmentation result to the terminal 400. The terminal 400 may display the semantic segmentation results of the three-dimensional model on a graphical interface 410 (an example of which is shown graphical interface 410-1 and graphical interface 410-2).

Referring to fig. 2, fig. 2 is a schematic structural diagram of a semantic segmentation apparatus 500 provided in an embodiment of the present application, where the semantic segmentation apparatus 500 shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in semantic segmentation apparatus 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.

The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 530 includes one or more output devices 531 enabling presentation of media content, including one or more speakers and/or one or more visual display screens. The user interface 530 also includes one or more input devices 532, including user interface components to facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 552 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

an input processing module 554 to detect one or more user inputs or interactions from one of the one or more input devices 532 and to translate the detected inputs or interactions.

In some embodiments, the semantic segmentation apparatus provided in the embodiments of the present application may be implemented by a combination of hardware and software, and as an example, the semantic segmentation apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor, which is programmed to execute the semantic segmentation method provided in the embodiments of the present application.

In some embodiments, the semantic segmentation apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 shows a semantic segmentation apparatus 555 stored in a memory 550, which may be software in the form of programs and plug-ins, and includes the following software modules: the first obtaining module 5551, the first presenting module 5552, the second obtaining module 5553 and the second presenting module 5554 are logical modules, and thus may be arbitrarily combined or further split according to the implemented functions.

The functions of the respective modules will be explained below.

In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the semantic segmentation method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In this embodiment of the present application, a semantic segmentation method provided in this embodiment of the present application will be described with a terminal as an execution subject.

Referring to fig. 3, fig. 3 is an alternative flow chart diagram of a semantic segmentation method provided in the embodiment of the present application, and will be described with reference to the steps shown in fig. 3.

In step 301, a rendering operation for a three-dimensional model is obtained.

In step 302, responding to a display operation, displaying a three-dimensional model on a human-computer interaction interface; the human-computer interaction interface includes a semantic segmentation option.

In some embodiments, the data file of the three-dimensional model is already stored in the terminal in advance, and the data file of the three-dimensional model can be acquired after the terminal receives a display operation of the user on the three-dimensional model. After the analysis, the three-dimensional model can be displayed on the current human-computer interaction interface of the terminal. The data file of the three-dimensional model can also be stored in a server connected with the terminal, and after the terminal receives the display operation of the user on the three-dimensional model, the data file of the three-dimensional model returned by the server can be received by sending a file request to the server. After the analysis, the three-dimensional model can be displayed on the current human-computer interaction interface of the terminal.

In some embodiments, during the process of presenting the three-dimensional model, a real-time operation of the three-dimensional model by a user may also be received through the human-computer interaction interface, and the real-time operation may include, but is not limited to, various dragging operations, zooming operations, rotating operations, and the like, and a display state of the three-dimensional model is adjusted in real time according to the real-time operation, and the display state includes, but is not limited to, a display position, a size, a display angle, and the like.

In this embodiment, while the human-computer interaction interface displays the three-dimensional model, the human-computer interaction interface also displays semantic segmentation options corresponding to the three-dimensional model. The option may be directly displayed or displayed in a floating manner on the human-computer interaction interface, or may be displayed on the human-computer interaction interface in a manner of a two-level or higher-level menu option, which is not limited in this application.

In step 303, a selection operation for a semantic segmentation option is obtained.

In step 304, in response to the selection operation, after a two-dimensional segmentation result of the two-dimensional image is obtained, a semantic segmentation result of the three-dimensional model is displayed on a human-computer interaction interface; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene.

In some embodiments, after the terminal receives the selection operation of the user on the semantic segmentation option, a semantic segmentation processing step on the three-dimensional model may be started. The selection operation can be direct selection operation such as clicking, long pressing and the like of the semantic segmentation option by a user; the semantic segmentation option can also be triggered after the instruction is converted by the terminal.

In some embodiments, the semantic segmentation result is determined based on a two-dimensional segmentation result of the two-dimensional image and a model property of the three-dimensional model. The two-dimensional image may be an image in which a corresponding relationship with the three-dimensional model is established in advance, and after receiving a selection operation of the user on the semantic segmentation option, the two-dimensional image corresponding to the three-dimensional model may be directly acquired according to the corresponding relationship. The two-dimensional image may be stored in the terminal in advance, may be stored in the server in advance, or may be obtained by on-line search.

In some embodiments, the two-dimensional image may also be selected by the user in real-time. After the terminal receives the selection operation of the user on the semantic segmentation option, an image acquisition window is displayed in the human-computer interaction interface, and the image selected by the user in real time is acquired through the image acquisition window and serves as the two-dimensional image. The manner of acquiring the image includes, but is not limited to: displaying a plurality of images to be selected in the image acquisition window, and determining a part of the images to be selected as two-dimensional images selected in real time according to the selection operation of a user; receiving a two-dimensional image dragged by a user from other windows in the manual interaction interface through the image acquisition window; and receiving a picture address input by a user through the image acquisition window, and acquiring a two-dimensional image selected by the user in real time according to the picture address.

In some embodiments, the model properties of the three-dimensional model may include at least one of: the size of the three-dimensional model, the type of the three-dimensional model, the geometric properties of the three-dimensional model. Wherein the set attributes of the three-dimensional model include geometric attributes of vertices, subsurface, and intersecting edges in the three-dimensional model.

In some embodiments, in the process of displaying the semantic segmentation result of the three-dimensional model on the human-computer interaction interface, the original three-dimensional model can be colored according to the semantic segmentation result. The terminal stores colors corresponding to all labels in the semantic segmentation result in advance, and in the process of displaying the semantic segmentation result, the original three-dimensional model is colored according to the color corresponding to each label so as to reflect the difference between different types of areas in the three-dimensional model.

As can be seen from the foregoing exemplary implementation of fig. 3, in the embodiment of the present application, a display operation for a three-dimensional model is obtained; responding to the display operation, and displaying the three-dimensional model on a human-computer interaction interface; the man-machine interaction interface comprises a semantic segmentation option; acquiring selection operation aiming at semantic segmentation options; responding to the selection operation, and displaying a semantic segmentation result of the three-dimensional model on a human-computer interaction interface after a two-dimensional segmentation result of the two-dimensional image is obtained; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene. Therefore, in the process of displaying the three-dimensional model, the three-dimensional model can be subjected to semantic segmentation according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model, and compared with the scheme of directly segmenting point cloud data corresponding to the three-dimensional model in the traditional technology, the scheme reduces the data calculation amount when the three-dimensional model is subjected to semantic segmentation while ensuring the segmentation accuracy, improves the segmentation efficiency, further improves the response efficiency of user operation, and improves the user experience.

In some embodiments, based on fig. 3, before the human-computer interaction interface presents the semantic segmentation result of the three-dimensional model, the method further includes: and mapping the two-dimensional segmentation result to the three-dimensional model to obtain a semantic segmentation result of the three-dimensional model. The step of mapping the two-dimensional segmentation result to the three-dimensional model to obtain the semantic segmentation result of the three-dimensional model can be realized in the following manner:

referring to fig. 4, fig. 4 is an alternative flow chart diagram of a semantic segmentation method provided in the embodiment of the present application, and will be described with reference to the steps shown in fig. 4.

In step 401, a two-dimensional segmentation result of a two-dimensional image is obtained; the two-dimensional segmentation result at least comprises labels corresponding to a plurality of pixel points in the two-dimensional image.

In some embodiments, before the human-computer interaction interface displays the semantic segmentation result of the three-dimensional model, the two-dimensional segmentation result of the two-dimensional image can be obtained by the following method: (1) converting the two-dimensional image into a gray image, acquiring the gray value of each pixel in the gray image, dividing a plurality of pixels of which the gray values are in the same threshold interval into the same category by setting at least one gray threshold, and adding corresponding labels to the plurality of pixels of each category to obtain a two-dimensional segmentation result; (2) detecting edges of different areas in the two-dimensional image, dividing a plurality of pixel points in the two-dimensional image through the detected edges and adding corresponding labels to obtain a two-dimensional division result; (3) the initial segmentation model can be trained through a large number of labeled sample pictures, a corresponding segmentation model is obtained after the training is completed, and the two-dimensional image is subjected to two-dimensional semantic segmentation according to the segmentation model to obtain a two-dimensional segmentation result.

The two-dimensional image comprises a plurality of pixel points, and the two-dimensional segmentation result at least comprises labels corresponding to the pixel points in the two-dimensional image. In one embodiment, the two-dimensional segmentation result comprises a label corresponding to each pixel point in the two-dimensional image; in one embodiment, the two-dimensional segmentation result includes labels corresponding to some pixel points in the two-dimensional image.

For example, taking the two-dimensional image including H (height) × W (width) pixels as an example, the two-dimensional segmentation result may be a label corresponding to each pixel point, that is, the obtained two-dimensional segmentation result may be represented as H (height) × W (width) × K, where K represents a label corresponding to each pixel point; the two-dimensional segmentation result can also be a label corresponding to a part of pixel points in the two-dimensional image, namely the obtained two-dimensional segmentation result can be represented as N × K, wherein N represents the number of the pixel points of the part of pixel points, and K represents the label corresponding to each pixel point.

In step 402, a plurality of pixel points of the two-dimensional image are mapped with a plurality of vertices of the three-dimensional model.

In some embodiments, the three-dimensional model and the two-dimensional image belong to the same scene, that is, in the case that the three-dimensional model is a virtual three-dimensional model obtained by modeling a target object in a real scene, the two-dimensional image is a shot of the target object.

For example, in a road scene, if the three-dimensional model is a virtual three-dimensional model obtained by modeling a certain area (including roads, vegetation, buildings, and the like) in a real road scene, the two-dimensional image is a shot picture of the area, and the shot picture at least includes an image of a real object in the area.

In some embodiments, a conversion relationship between a two-dimensional coordinate system in which each pixel point in the two-dimensional image is located and a three-dimensional coordinate system in which each vertex in the three-dimensional model is located may be obtained, and the pixel points in the two-dimensional image and the vertices in the three-dimensional model may be mapped according to the conversion relationship.

In step 403, an initial label of each vertex is determined according to the labels corresponding to the plurality of pixel points.

In some embodiments, after determining the pixel point corresponding to each vertex, the label of the corresponding pixel point is used as the initial label corresponding to the vertex.

In step 404, an energy function is created based on the initial label of each vertex and the attribute information for each subsurface of the plurality of sub-surfaces of the three-dimensional model.

In some embodiments, the attribute information for each subsurface may be a geometric attribute of that subsurface. For example, the area property, height property, direction property, flatness property, and the like of the subsurface may be included. In step 404, a Markov Random Field (MRF) corresponding to the initial label is constructed based on the initial label of each vertex, and an energy function of the markov random field is defined according to the attribute information of each of the plurality of sub-surfaces in the three-dimensional model. The label of each vertex at which the energy function takes a minimum is the desired segmentation result.

In step 405, the initial segmentation result is optimized based on the energy function to obtain a semantic segmentation result.

In some embodiments, this step 405 may include optimizing the initial segmentation result by performing a minimization iteration process on the energy function, resulting in a semantic segmentation result. Wherein, in each of the minimization iterations, the value of the energy function is attempted to be reduced by randomly changing the label of any one or more of the vertices. If the value of the energy function is reduced, keeping the changed label of any one or more vertexes; and when the value of the energy function is increased, restoring the label after the random one or more vertexes are changed to the label before the change. And continuously minimizing until the energy function reaches the minimum value, and taking the label corresponding to each vertex as a final semantic segmentation result.

In some embodiments, the iterative process of minimizing the energy function may be accomplished by any one of the following methods: an alpha unfolding method, an alpha beta swap method, and a fast PD method. The alpha beta exchange method comprises the following steps of adjusting two labels to enable an energy function to descend each time of exchange, traversing all combined labels until the energy function cannot descend in all the labels adjusted in one time of exchange, and completing the minimized iteration processing to achieve the optimal effect. After the minimization iterative processing, the energy function reaches the minimum value, and then the labels of a plurality of vertexes corresponding to the current minimum value can be used as a final semantic segmentation result.

As can be seen from the above exemplary implementation of fig. 4, in the embodiment of the present application, the initial label of each vertex in the three-dimensional model can be quickly determined by converting the two-dimensional segmentation result of the two-dimensional image into the three-dimensional model in the same scene, and compared with a scheme in the conventional technology in which point cloud data corresponding to the three-dimensional model is directly segmented, the scheme reduces the data calculation amount when performing semantic segmentation on the three-dimensional model while ensuring the segmentation accuracy, and improves the segmentation efficiency; in addition, labels of all vertexes of the three-dimensional model are combined with geometric attributes of all sub-surfaces in the three-dimensional model, geometric constraints of a Markov Random Field (MRF) are added, an energy function is established, and in the process of optimizing an initial segmentation result through the energy function, the final semantic segmentation result is more real and the segmentation accuracy is higher due to the consideration of the set attributes of the three-dimensional model.

In some embodiments, referring to fig. 5A, fig. 5A is an optional flowchart of the semantic segmentation method provided in the embodiments of the present application, and based on fig. 4, step 404 shown in fig. 4 may be updated to step 501 to step 503.

In step 501, determining a first loss term according to each vertex and a plurality of first sub-surfaces corresponding to each vertex; the first cost term is associated with the initial label of each vertex and attribute information for each of the plurality of first sub-surfaces.

In some embodiments, the three-dimensional model may be composed of a plurality of sub-surfaces, wherein two sub-surfaces may define an intersecting edge therebetween and three sub-surfaces may define a vertex. The surface may be triangular, quadrilateral, etc.

Referring to fig. 5B, fig. 5B is an alternative three-dimensional model schematic diagram provided in the embodiment of the present application, wherein the three-dimensional model schematic diagram simply shows a partial three-dimensional model of a complete three-dimensional model, and it can be seen that the partial three-dimensional model is composed of three sub-surfaces a1, a2 and A3, wherein a1 and a2 can determine an intersecting edge O1O4, a1 and A3 can determine an intersecting edge O2O4, and a2 and A3 can determine an intersecting edge O3O 4; a1, A2, and A3 may define a vertex O4.

In one embodiment, for a plurality of vertices included in the three-dimensional model, energy data corresponding to each vertex is determined, and the sum of the energy data of each vertex is used as the first loss term. For example, the first loss term E1 can be expressed as the following equation (1):

E1＝∑_i∈SD_i(l_i) (1)；

where i represents the vertex in the three-dimensional model, S represents the set of all sub-surfaces in the three-dimensional model, l_iDenotes the initial label, D, corresponding to vertex i_iIndicating the energy data corresponding to vertex i.

In one embodiment, the energy data corresponding to a vertex is associated with the initial label corresponding to the vertex and the attribute information of the plurality of first sub-surfaces corresponding to the vertex. The first sub-surfaces corresponding to the vertex are sub-surfaces including the vertex. For example, referring to fig. 5B, for the vertex O4, the corresponding sub-surfaces include a1, a2 and A3, and thus, the energy data corresponding to the vertex O4 is related to the attribute information of a1, the attribute information of a2 and the attribute information of A3.

In one embodiment, the attribute information may include at least one of: height information, plane information, vertical information, and area information. The height information is used for representing the relative height attribute of the sub-surface in the three-dimensional model, the plane information is used for representing the flatness degree of the surface formed by the sub-surface and the adjacent sub-surface, the vertical information is used for representing the difference degree between the sub-surface and the preset vertical direction, and the area information is used for representing the size of the sub-surface.

For a first sub-surface corresponding to the vertex i, the first sub-surface corresponding energy data can be expressed as the following formula (2):

D_i(l_i)＝A_i×(1-B(l_i)) (2)；

wherein D is_i(l_i) Representing energy data of a first subsurface comprising a vertex i, A_iRepresents the area information of the first sub-surface, B (l)_i) Is related to at least one of height information, planar information and vertical information of the first sub-surface. For different l_iThe B (l) of_i) May be in different forms.

For example, taking the example that the initial label of the vertex includes a "ground" label and a "vegetation" label, in the case that the initial label of the vertex i is "ground", this B (l)_i) Can be expressed as (vertical information plane information (1-height information)); in the case where the initial label of vertex i is "vegetation", this B (l)_i) Can be expressed as (vertical information (1-plane information)), and accordingly, the first subsurface corresponding energy data can be expressed as the following formula (3):

wherein, a_pRepresenting plane information of the first sub-surface, a_hRepresenting vertical information of the first subsurface, a_eIndicating the height information of the first sub-surface,

in step 502, establishing a second loss term according to two adjacent vertexes and two adjacent second sub-surfaces corresponding to each intersecting edge in the plurality of intersecting edges; the second loss term is associated with a distance of a length of the intersecting edge, an included angle between the two second sub-surfaces, and an initial label of each of the two adjacent vertices.

In some embodiments, for a plurality of intersecting edges included in the three-dimensional model, loss data corresponding to each intersecting edge is determined, and the sum of the loss data of each intersecting edge is used as the second loss term. For example, the second loss term E2 can be expressed as the following equation (4):

E2＝∑_{i,j}∈EV_ij(l_i,l_j) (4)；

where E represents the set of intersecting edges in the three-dimensional model, i and j represent the two vertices of each intersecting edge, l_iIndicates the initial label, l, corresponding to the vertex i_jIndicates the initial label, V, corresponding to vertex j_ijRepresenting the loss data corresponding to the intersecting edge defined by vertex i and vertex j.

In some embodiments, each intersecting edge may determine a set of adjacent vertices, the set of adjacent vertices including two adjacent vertices; while each intersecting edge may define a set of adjacent sub-surfaces, including two second sub-surfaces. The V is_ijAssociated with the length of the intersecting edge, the included angle between the two second sub-surfaces, and the initial label of each of the two adjacent vertices. Wherein the V_ijCan be expressed as the following formula (5):

wherein, C_ijRepresents the length of the intersecting edge defined by vertex i and vertex j; w is a_ijThe included angle between the two second sub surfaces can be represented, the normal vector included angle corresponding to the two second sub surfaces can be represented, the cosine value of the included angle between the two second sub surfaces can be represented, and the cosine value of the normal vector included angle corresponding to the two second sub surfaces can be represented;

and representing an indicative function, wherein the indicative function is 1 when the initial labels corresponding to the vertex i and the vertex j are different, and the indicative function is 0 when the initial labels corresponding to the vertex i and the vertex j are the same.

For example, referring to FIG. 5B, for the intersecting edge O1O4, the corresponding loss data V is_O4,O1Can be expressed as formula (6):

in step 503, an energy function is established based on the first loss term, the second loss term, and the weight.

In some embodiments, the weight is used to balance the first loss term and the second loss term, and may be preset according to a specific segmentation scenario. The energy function can be expressed as formula (7):

U(l)＝E1+γE2＝∑_i∈S D_i(l_i)+γ∑_{i,j}∈EV_ij(l_i,l_j) (7)；

where γ represents the weight.

As can be seen from the foregoing exemplary implementation of fig. 5A in the embodiment of the present application, by determining height information, plane information, vertical information, and area information of each sub-surface in the three-dimensional model, in defining the first loss term, the embodiment of the present application may implement that the segmentation result of each vertex is constrained from each geometric dimension, and in defining the second loss term, factors of the length of the intersecting edge, the intersecting angle, and the label of the adjacent vertex are considered, so that the authenticity of the segmentation result may be improved by such constraints, and the accuracy of semantic segmentation is improved.

In some embodiments, referring to fig. 6, fig. 6 is an optional flowchart of the semantic segmentation method provided in the embodiments of the present application, and based on fig. 5A, the method may further include step 601.

In step 601, attribute information for each subsurface is determined.

In some embodiments, in order to speed up the process of creating the energy function and improve the efficiency of creating the energy function, before step 501, the attribute information of each subsurface in the three-dimensional model may be determined. Before determining the attribute information of each sub-surface in the three-dimensional model, a selection instruction of a user for a plurality of sub-surfaces in the three-dimensional model may be further received through the terminal, so as to select a part of sub-surfaces from the plurality of sub-surfaces as target sub-surfaces, and in step 601, only the target sub-surfaces are processed to determine the attribute information of each target sub-surface.

In some embodiments, where the attribute information includes height information, determining the attribute information for each subsurface includes:

determining a centroid location for each subsurface; determining a neighborhood centroid set corresponding to each sub-surface according to the centroid position of each sub-surface and a preset local neighborhood range; the neighborhood centroid set comprises a plurality of neighborhood centroids; and determining the height information corresponding to each sub-surface according to the centroid positions of the plurality of neighborhood centroids corresponding to each sub-surface.

In this embodiment, if the three-dimensional model includes N sub-surfaces, the N centroids and the centroid position corresponding to each centroid can be obtained by calculating the centroid position of each sub-surface. For any sub-surface, the centroid position of the sub-surface can be used as the center, the neighborhood centroid in the preset local neighborhood range is found, each neighborhood centroid can correspond to one neighborhood sub-surface, the neighborhood centroid set corresponding to each sub-surface is established, and the height information of each sub-surface can be determined according to the centroid positions of a plurality of neighborhood centroids in the neighborhood centroid set corresponding to each sub-surface.

The neighborhood centroid set corresponding to each sub-surface can be determined according to the centroid position of each sub-surface and the preset local neighborhood range in the following manner: (1) the local neighborhood range can be a distance threshold, and for the centroid position of a sub-surface, other centroids with the centroid position distance smaller than the distance threshold are taken as neighborhood centroids, so that a neighborhood centroid set corresponding to the sub-surface is obtained; (2) the local neighborhood range can be a quantity threshold, and for the centroid position of a sub-surface, other centroids closest to the centroid position can be sequentially acquired as neighborhood centroids until the neighborhood centroids of the quantity threshold are acquired, so that a neighborhood centroid set corresponding to the sub-surface is acquired. The local neighborhood range can be related according to the size of the three-dimensional model, and the larger the three-dimensional model is, the larger the local neighborhood range is.

After obtaining the neighborhood centroid set corresponding to each sub-surface, the height information corresponding to each sub-surface can be determined by the following formula (8):

wherein f is_iIs a sub-surface of_eAs height information, z_iIs a sub-surface f_iHeight of center of mass, z_minIs the sub-surface f_iMinimum value of centroid height, z, in corresponding neighborhood centroid set_maxIs the sub-surface f_iThe maximum of the centroid heights in the corresponding neighborhood centroid set. The purpose of using square roots is to ensure that smaller relative height values also allow larger height information to be obtained.

In some embodiments, where the attribute information includes planar information, determining the attribute information for each subsurface includes:

determining a set of adjacent sub-surfaces corresponding to each sub-surface; the adjacent sub-surface comprises a plurality of adjacent sub-surfaces adjacent to the sub-surface; acquiring an adjacent surface vertex set corresponding to each sub-surface according to the adjacent sub-surface set corresponding to each sub-surface; establishing a covariance matrix according to the vertex set of the adjacent surface corresponding to each sub-surface; the covariance matrix corresponds to a plurality of eigenvalues; and determining the plane information corresponding to each sub-surface according to the plurality of characteristic values corresponding to each sub-surface.

In this embodiment, the plurality of adjacent sub-surfaces included in the adjacent sub-surface set corresponding to the sub-surface may be sub-surfaces directly adjacent to the sub-surface, that is, the adjacent sub-surfaces may intersect with the sub-surface. In the adjacent sub-surface set, each adjacent sub-surface comprises at least three vertexes, the vertexes of all the adjacent sub-surfaces corresponding to the sub-surface are counted, and the repeated vertexes are deleted, so that the adjacent surface vertex set can be obtained. Taking the sub-surfaces as triangular surfaces as an example, if there is one sub-surface a5 corresponding to an adjacent sub-surface set including adjacent sub-surfaces a51, a52 and a53, where a5 includes vertices O51, O52 and O53, a51 includes vertices O51, O52 and O54, a52 includes vertices O52, O53 and O55, and a53 includes vertices O53, O51 and O56, the vertices of all adjacent sub-surfaces of the sub-surface a5 are counted, and repeated vertices are deleted, so that the adjacent surface vertex set including (O51, O52, O53, O54, O55 and O56) can be obtained.

In this embodiment, a covariance matrix may be established according to each vertex of the adjacent surfaces in the set of vertices of the adjacent surfaces, and the covariance matrix may be solved to obtain three eigenvalues λ corresponding to the covariance matrix₀、λ₁And λ₂Wherein λ is₀≤λ₁≤λ₂. The plane information corresponding to the sub-surface is determined by equation (9):

wherein f is_iIs a sub-surface of_pIs the plane information. The plane information may characterize the sub-surface f_iThe property of the neighboring plane is 1 if it is a perfect plane, and 0 if it is a plurality of same anisotropic hyperplanes, i.e. three eigenvalues are all equal.

In some embodiments, where the attribute information includes vertical information, determining the attribute information for each subsurface includes:

determining a unit normal vector of each sub-surface; and determining the vertical information corresponding to each sub-surface according to the unit normal vector of each sub-surface and a preset standard unit normal vector.

In the present embodiment, the plane information corresponding to each sub-surface is determined by equation (10):

a_h(f_i)＝|n_i·n_z| (10)；

wherein f is_iIs a sub-surface of_hFor vertical information, n_iIs a sub-surface f_iUnit normal vector of (1), n_zIs a preset standard unit normal vector. The normal vector of the standard unit can be in the same direction with the z-axis of the coordinate system where the three-dimensional model is located.

As can be seen from the above exemplary implementation of fig. 6 in the embodiment of the present application, in the process of establishing an energy function according to each sub-surface, the present application can realize that the segmentation result of each vertex is constrained from each geometric dimension by determining the height information, the plane information, the vertical information, and the area information of each sub-surface in the three-dimensional model, so that the authenticity of the segmentation result can be improved, and the accuracy of semantic segmentation is improved.

In some embodiments, based on fig. 4, acquiring the two-dimensional segmentation result of the two-dimensional image in step 401 may include: and segmenting the two-dimensional image by adopting a semantic segmentation model to obtain a segmentation result of the two-dimensional image. The embodiment of the application also provides an acquisition method of the semantic segmentation model, and the acquisition method of the semantic segmentation model provided by the embodiment of the application can be used in various fields such as agriculture, industry, medical health and the like. For convenience of understanding, the following description will be made of an acquisition process of the semantic segmentation model by taking a road scene as an example.

In some embodiments, an aerial photography sample data set may be obtained; the aerial photography sample data set comprises a plurality of aerial photography sample pictures carrying segmentation labels; and training a preset initial segmentation model according to the aerial photography sample data set to obtain the semantic segmentation model.

Wherein, each sample picture of taking photo by plane all carries the segmentation mark. By the segmentation marking, the aerial sample picture can be segmented into different areas, and each area is a target object of the same type. For example, the aerial image can be divided into the area 1, the area 2 and other areas through the division labels, the road labels are added to the pixels in the area 1, the vegetation labels are added to the pixels in the area 2, and no labels or null labels are added to the other areas.

In the process of training a preset initial segmentation model according to an aerial photography sample data set to obtain the semantic segmentation model, the aerial photography sample data set can be firstly divided into a training set and a testing set, the training set is used for training the preset initial segmentation model, then the testing set is used for verifying the trained initial segmentation model, and when the segmentation accuracy reaches a preset standard, the trained initial segmentation model is output as the semantic segmentation model.

In the process of obtaining the three-dimensional model, modeling data of vegetation, roads and buildings in a current road scene are often obtained by methods such as unmanned aerial vehicle aerial photography, and the modeling data may include aerial images, point cloud data and the like. In the segmentation process of the three-dimensional model, in order to achieve a better segmentation effect, the two-dimensional image is often acquired at the same angle, that is, in order to achieve the better segmentation effect, the two-dimensional image in the embodiment of the application is an aerial image acquired by methods such as aerial photography by an unmanned aerial vehicle.

For the problem of image segmentation under a road scene, a traditional semantic segmentation model usually obtains a corresponding street view segmentation model through a large number of street view images, and when an aerial image is processed through the street view segmentation model, the problem of segmenting a building into road categories often occurs due to different viewing angles. Therefore, please refer to fig. 7, fig. 7 is a flowchart illustrating an alternative semantic segmentation model obtaining process according to an embodiment of the present application, which will be described with reference to the steps shown in fig. 7.

In step 701, a street view sample data set and an aerial photography sample data set are obtained; the street view sample data set comprises a plurality of street view sample pictures carrying segmentation labels; the aerial photography sample data set comprises a plurality of aerial photography sample pictures carrying segmentation labels.

In step 702, a preset initial segmentation model is trained according to the street view sample data set to obtain a trained street view segmentation model.

In some embodiments, in the process of training a preset initial segmentation model according to a street view sample data set to obtain the street view segmentation model, the street view sample data set may be first divided into a training set and a test set, the training set is used to train the preset initial segmentation model, the test set is used to verify the trained initial segmentation model, and when the segmentation accuracy reaches a preset standard, the trained initial segmentation model is output as the street view segmentation model.

In step 703, the street view segmentation model is trained according to the aerial photography sample data set to obtain a semantic segmentation model.

In some embodiments, after obtaining the street view segmentation model, the street view segmentation model may continue to be trained through the aerial photography sample data set. The aerial photography sample data set can be divided into a training set and a testing set, the street view segmentation model is trained by the training set, the trained street view segmentation model is verified by the testing set, and the trained street view segmentation model is output as the semantic segmentation model when the segmentation accuracy reaches a preset standard. The number of aerial photography sample data set aerial photography sample pictures can be far smaller than the number of street view sample pictures in the street view sample data set.

As can be seen from the above exemplary implementation of fig. 7, in the embodiment of the present application, the pre-training is performed on the preset initial segmentation model through the street view sample data set to obtain a street view segmentation model with a certain learning capability, and then the pre-trained street view segmentation model is learned again through a small number of aerial photo sample pictures, so that the obtained semantic segmentation model can perform accurate segmentation processing on street view pictures and can also perform accurate segmentation processing on aerial photo pictures. Because a large amount of existing street view sample data sets are adopted to pre-train the initial segmentation model, only a small batch of aerial photo sample pictures are needed in the process of obtaining the final semantic segmentation model, and the labor cost for marking the aerial photo sample pictures is reduced; in addition, the model acquisition efficiency can be improved through the training sequence of the streetscape class pictures and the aerial photo class pictures in the road scene.

In some embodiments, referring to fig. 8, fig. 8 is an optional flowchart of the semantic segmentation method provided in the embodiments of the present application, and based on fig. 4, step 402 may be updated to step 801, and step 403 may be updated to step 802.

In step 801, determining a conversion relation between a three-dimensional coordinate system corresponding to the three-dimensional model and a two-dimensional coordinate system corresponding to the two-dimensional image according to the shooting parameters of the two-dimensional image; and mapping the plurality of pixel points and the plurality of vertexes according to the two-dimensional coordinates of each pixel point in the plurality of pixel points, the three-dimensional coordinates of each vertex in the plurality of vertexes and the conversion relation.

In some embodiments, the photographing parameters may be photographing parameters of a camera that photographs the two-dimensional image. The shooting parameters comprise camera intrinsic parameters and camera extrinsic parameters, wherein the camera intrinsic parameters are parameters related to the characteristics of the camera, such as the focal length and the pixel size of the camera; the camera-out parameters are parameters of the camera in a world coordinate system, such as the position, the rotation direction and the like of the camera.

The conversion relationship between the three-dimensional coordinate system corresponding to the three-dimensional model and the two-dimensional coordinate system corresponding to the two-dimensional image can be obtained by the following method: acquiring a first conversion relation between a three-dimensional coordinate system corresponding to the three-dimensional model and a world coordinate system; acquiring a second conversion relation between a two-dimensional coordinate system corresponding to the two-dimensional image and a world coordinate system through the shooting parameters; and determining the conversion relation between the three-dimensional coordinate system corresponding to the three-dimensional model and the two-dimensional coordinate system corresponding to the two-dimensional image according to the first conversion relation and the second conversion relation.

In some embodiments, for each vertex in the three-dimensional model, a corresponding pixel point of each vertex in the two-dimensional image may be obtained through the transformation relationship. Wherein, one vertex in the three-dimensional model only corresponds to one pixel point in the two-dimensional image; multiple vertices in the three-dimensional model may simultaneously correspond to a pixel point in the two-dimensional image.

In step 802, an initial label for each vertex is determined based on the label corresponding to the pixel point to which each vertex is mapped.

In some embodiments, after determining the pixel points mapped by each vertex, the initial label of each vertex may be determined according to the label corresponding to the pixel point.

As can be seen from the above exemplary implementation of fig. 8, in the embodiment of the present application, by establishing a transformation relationship between a two-dimensional image and a three-dimensional model and according to a segmentation result in the two-dimensional image, a segmentation result of each vertex in the three-dimensional model can be quickly determined.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

The embodiment of the application can solve the semantic segmentation problem of various types of three-dimensional models, and for convenience of understanding, the semantic segmentation process of the urban aerial photography reconstruction model is taken as an example below, so that the semantic segmentation method of the urban aerial photography reconstruction model is provided. The semantic segmentation method for the urban aerial photography reconstruction model is used for extracting resources available to an engine from the reconstructed 3D model, and for the reconstructed 3D model, semantic segmentation is required to be carried out in order to extract a useful model for post-processing, so that trees, buildings, roads and the like of the 3D model are distinguished.

3D semantic segmentation: the 3D semantic segmentation technique is to divide the point cloud into semantically meaningful parts and then semantically label each part as one of the predefined classes.

Markov Random Field (MRF): a markov random field is a probability distribution model that can be represented by an undirected graph. Each node in the graph represents one variable or a group of variables, and the edges between the nodes represent the dependency between two variables.

In order to solve the semantic segmentation problem of the three-dimensional model, the following solutions exist in the related art:

(1) and 3D point cloud semantic segmentation network.

Wherein, the 3D point cloud semantic segmentation network can comprise: A. the input point cloud is segmented by adopting a PointNet + + network, a hierarchical neural network is introduced into the PointNet + + network for extracting neighborhood characteristics, and meanwhile, a self-adaptive characteristic learning layer is provided for learning characteristics of different neighborhood sampling scales. B. In the PointNet + + method, due to the limitation of GPU hardware equipment, when processing a large scene point cloud, only block processing input can be performed on the large scene point cloud, and this method can split the relevant connections between scene point clouds and is relatively troublesome to process, so RandLA-Net proposes a semantic segmentation method for a large scene 3D point cloud, and the specific strategy is: the input point cloud is firstly subjected to down-sampling through random point cloud sampling, and then a feature fusion module is adopted to increase the receptive field of each 3D point so as to make up for the loss of key features caused by random sampling.

(2) And (5) image semantic segmentation network.

The image semantic segmentation task is primarily popular in image block classification, that is, each pixel is independently classified by using image blocks around the pixel. The main reason for using image block classification is that the classification network is typically a fully connected layer and requires a fixed size image. The full convolution network FCN is adopted, so that the convolution neural network can carry out dense pixel prediction without a full connection layer, and the full convolution network is popularized. Using this method, image segmentation maps of arbitrary size can be generated and this method is much faster than image block classification. Later, almost all advanced methods in the field of semantic segmentation employ full convolution network models. Because the FCN does not fully consider the relation between pixels when performing image semantic segmentation, the FCN lacks spatial consistency and is not sensitive to details in an image, and a segmentation result is not fine enough. A complete connectivity conditional random field is added at the end of the FCN, boundary optimization is carried out on the rough segmentation graph, the receptive field of the feature graph is expanded by using a perforated convolution (aperture convolution), and a deep Lab network is provided. Then, DeepLab v2, DeepLab v3, Deeplab v3+ and the like appear in succession.

The inventor finds that the image semantic segmentation task is more mature and more practical compared with a 3D semantic segmentation technology. Therefore, the inventor considers that in engineering application, the effect of firstly obtaining a segmentation result by adopting a semantic segmentation network on a 2D image and then mapping the segmentation result on a 3D model is much better than that of direct 3D point cloud semantic segmentation.

Accordingly, there are problems in the related art as follows:

(1)3D point cloud segmentation network

The biggest disadvantage of the 3D point cloud segmentation network is that the training data is an ideal CAD model or a laser scanning model with high precision, but the precision of the model reconstructed from the front end of the scene is limited, and the difference between the model and the training data is large, so that the testing effect of the model trained by networks such as PointNet + + or Randla-Net is poor. In addition, because the labeling difficulty and cost of the 3D point cloud semantic segmentation data are high, and different types of reconstructed buildings are also required to be labeled again, the cost for training the point cloud network by adopting the artificially labeled 3D semantic segmentation data is also high, and the method is not practical.

(2) Image segmentation network

The biggest problem of 3D segmentation based on a 2D image segmentation network is data, most of the current mainstream 2D semantic segmentation network training data is street view data or natural pictures, and the segmentation data of aerial images is lacked. Therefore, a street view segmentation data set is used for replacing the street view segmentation data set during training of the segmentation network, however, due to the difference between the aerial photography data and the pictures of the street view data, many misjudgments may occur in the 2D semantic segmentation result, for example, because the aerial photography data is viewed from top to bottom, the part of the building marked by the aerial photography image is wrongly divided into roads in the model pre-trained by the street view image.

The urban aerial photography reconstruction model semantic segmentation method can solve the problem of misjudgment of a 2D semantic segmentation network, and particularly aims at the situation that buildings in aerial photography data are prone to being mistakenly divided into roads.

The method for segmenting the semantics of the urban aerial photography reconstruction model comprises the following steps: and inputting the aerial image into a segmentation network to obtain a segmentation result, mapping the image segmentation result to a 3D model, and further refining the segmentation result by using geometric constraint on the 3D model. After the 2D image segmentation result is obtained, the 2D image segmentation result is mapped to a 3D model, the image segmentation result is further refined by adopting the geometric constraint of a Markov Random Field (MRF) energy function, and the segmentation accuracy can be improved.

Referring to fig. 9A, fig. 9A is an optional flowchart of a semantic segmentation method for a city aerial photography reconstruction model according to an embodiment of the present application.

In step 901, a segmentation network is trained. The method comprises the steps of training a segmentation network by adopting a streetscape semantic segmentation data set, finely adjusting the network trained in the last step by utilizing a small amount of marked aerial photography data sets, and sending aerial photography images obtained by an unmanned aerial vehicle of the system into the segmentation network to obtain image segmentation results after network training is finished. At present, a large quantity of aerial photo data sets are lacked, so that the street view data sets are adopted to pre-train the network firstly, the network has certain learning capacity, and the pre-trained network is learned again on a small-quantity aerial photo data set marked by the user, so that the best effect can be achieved.

In step 902, the segmentation results are mapped to 3D. The method comprises the steps of utilizing camera internal and external parameters to enable a 2D segmentation result of each aerial image to correspond to coordinates of the aerial image in an original 3D model, and on the aspect of a coloring strategy, under the condition that a plurality of image pixel point colors correspond to one vertex, obtaining a final vertex color value by adopting a voting mode instead of direct weighted averaging, so that the accuracy of a 3D model semantic segmentation result after mapping is further improved. Wherein, camera internal and external reference includes: camera-internal parameters (parameters related to the camera's own characteristics, such as the focal length of the camera, the pixel size, etc.) and camera-external parameters (parameters in a world coordinate system, such as the position of the camera, the direction of rotation, etc.).

The process of mapping the segmentation result to the 3D model includes: a. acquiring a two-dimensional segmentation result of the aerial image, wherein the two-dimensional segmentation result comprises a segmentation label corresponding to each pixel point; b. and mapping the vertexes in the three-dimensional model (consisting of a plurality of vertexes and a plurality of triangular surfaces) to the two-dimensional segmentation image according to the internal and external parameters of the camera to obtain the segmentation label corresponding to each vertex in the three-dimensional model.

Referring to fig. 9B, fig. 9B is an alternative flowchart of the mapping process according to the embodiment of the present application. And 921 is a plane where the aerial image is located, the plane corresponds to a two-dimensional coordinate system X ' -Y ', 922 is a three-dimensional coordinate system X-Y-Z of the camera in a three-dimensional space, and each space point P (vertex) in the three-dimensional coordinate system has a corresponding pixel position P ' on the aerial image, so that each vertex of the 3D model can find the corresponding position of the vertex in the aerial image, and then the segmentation result of the vertex can be obtained through the segmentation result of the image. For example, the division label of the vertex P9 is 3(1 represents a road, 2 represents a tree, and 3 represents a building), and the system presets the color corresponding to each division label, wherein red represents 1, blue represents 2, and yellow represents 3. If the label of the vertex P9 is 3, the vertex P9 may be set to yellow in color, and the coloring process is completed.

In step 903, geometric properties (property information) are calculated. The geometric properties of each triangular face, which may include a height property (height information) a, are computed for the 3D model in step 902_ePlane attribute (plane information) a_pAnd vertical attribute (vertical information) a_h. Wherein each collection attribute can be defined by the following procedure.

In some embodiments, the height attribute a_eIs defined as the triangular face f_iThe relative height function of the centroid, can be expressed as equation (8), where (z)_min，z_max) The height range of all triangular facets in a local space neighborhood (the neighborhood refers to K points which are closest to a certain point in space around the certain point, K can be defined by users, the larger the K is, the larger the neighborhood is, and the smaller the K is, the smaller the neighborhood is, and the smaller the K is, the height range of all triangular facets in the local space neighborhood (the neighborhood refers to the K points which are closest to the certain point in space around the certain point in space), and the purpose of adopting square roots is to. The size of the neighborhood can be set by the coordinate values of the actual model.

In some embodiments, the planar property a_pIs defined as a triangular face f_iCan be expressed as formula (9) whichIn, calculate and triangle plane f_iThe covariance matrixes of all the adjacent triangular surface vertexes obtain corresponding three eigenvalues (lambda)₀、λ₁And λ₂) Thus, the plane property, lambda, near the triangular surface fi can be reflected₀Representing the minimum eigenvalue of the plane covariance, and the flatness is 1 for the complete planar super-facet; for a hyperplane with three identical isotropies, the flatness is 0.

In some embodiments, the vertical attribute a_hFor measuring triangular surface f_iThe deviation of the unit normal vector of (a) from the vertical axis can be expressed as equation (10). Wherein n is_zRepresenting a unit normal vector along the z-axis, n_iThen represents the face f_iThe unit normal vector.

In step 904, an energy function is calculated: on the basis of a 3D model obtained by semantic segmentation mapping of a 2D image, in combination with the above geometric attributes, a Markov Random Field (MRF) model can be used to further constrain and refine the 3D semantic segmentation result, and an energy function U is shown as formula (11):

U(l)＝∑_i∈SD_i(l_i)+γ∑_{i,j}∈EV_ij(l_i,l_j) (11)；

where S represents the set of all triangular faces,/_iRepresenting the preliminary 3D semantic segmentation label result obtained in (2), i.e. the vertex i belongs to l_iClass E then represents all the two intersecting triangular faces in the 3D model, D_iIs expressed as formula (3) in which

And V_ijThe goal is to make the connected segmentation result smoother and avoid abrupt changes, as shown in equation (5), where C_ijRepresenting the length of the edge of intersection between two faces, w_ijAnd (3) representing cosine values of included angles of two normal vectors, and l represents a characteristic function (an indicative function).

In step 905, the swapping algorithm optimizes the iteration. And performing minimization iteration on the energy function by adopting an alpha-beta exchange algorithm. And then the optimized segmentation label corresponding to each vertex can be obtained.

Referring to fig. 9C to 9F, fig. 9C is a schematic view of an alternative aerial image according to an embodiment of the present disclosure. After two-dimensional semantic segmentation is performed on the aerial image, a two-dimensional segmentation effect shown in fig. 9D can be obtained. For the sake of identification, the "vegetation" label has been added to each pixel in the area outlined by the solid white line 931 in the figure, and the "road" label has been added to each pixel in the area outlined by the dashed white line 932 in the figure. Fig. 9E is a schematic diagram of an alternative three-dimensional model provided in the embodiments of the present application. After the two-dimensional segmentation result corresponding to fig. 9D is mapped into the three-dimensional model shown in fig. 9E, the three-dimensional segmentation effect shown in fig. 9F can be obtained. For ease of identification, the vertices in the area outlined by solid white lines 941 have been labeled "vegetation", and the vertices in the area outlined by dashed white lines 942 have been labeled "road".

Continuing with the exemplary structure of the semantic segmentation apparatus 555 provided by the embodiments of the present application implemented as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the semantic segmentation apparatus 555 in the memory 550 may include:

a first obtaining module 5551, configured to obtain a presentation operation for the three-dimensional model.

A first display module 5552, configured to display the three-dimensional model on the human-computer interaction interface in response to a display operation; the human-computer interaction interface includes a semantic segmentation option.

A second obtaining module 5553, configured to obtain a selection operation for the semantic segmentation option.

The second display module 5554 is configured to, in response to a selection operation, display a semantic segmentation result of the three-dimensional model on the human-computer interaction interface after obtaining a two-dimensional segmentation result of the two-dimensional image; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene.

In some embodiments, the semantic segmentation apparatus 555 further includes an obtaining module, a mapping module, a determining module, a building module, and an optimizing module, among others;

the acquisition module is used for acquiring a two-dimensional segmentation result of the two-dimensional image; the two-dimensional segmentation result at least comprises labels corresponding to a plurality of pixel points in the two-dimensional image;

the mapping module is used for mapping a plurality of pixel points of the two-dimensional image with a plurality of vertexes of the three-dimensional model; the three-dimensional model and the two-dimensional image belong to the same scene;

the determining module is used for determining an initial label of each vertex according to the labels corresponding to the multiple pixel points;

the building module is used for building an energy function according to the initial label of each vertex and the attribute information of each sub-surface in the plurality of sub-surfaces of the three-dimensional model;

and the optimization module is used for optimizing the initial segmentation result based on the energy function to obtain a semantic segmentation result.

In some embodiments, the energy function includes a first loss term, a second loss term, and a weight; the three-dimensional model further comprises a plurality of intersecting edges, and the establishing module is further used for determining a first loss term according to each vertex and a plurality of first sub-surfaces corresponding to each vertex; the first loss item is related to the initial label of each vertex and the attribute information of each first sub-surface in the plurality of first sub-surfaces;

establishing a second loss term according to two adjacent vertexes corresponding to each intersecting edge in the plurality of intersecting edges and two adjacent second sub-surfaces; the second loss term is related to the length of the intersecting edge, the included angle between the two second sub-surfaces and the initial label of each of the two adjacent vertexes;

an energy function is established based on the first loss term, the second loss term, and the weight.

In some embodiments, the attribute information includes at least one of: the height information, the plane information, the vertical information, and the area information, the semantic segmentation device 555 further includes an attribute determination module for determining attribute information for each subsurface.

In some embodiments, where the attribute information includes height information, the attribute determination module is further to determine a centroid position for each subsurface; determining a neighborhood centroid set corresponding to each sub-surface according to the centroid position of each sub-surface and a preset local neighborhood range; the neighborhood centroid set comprises a plurality of neighborhood centroids; and determining the height information corresponding to each sub-surface according to the centroid positions of the plurality of neighborhood centroids corresponding to each sub-surface.

In some embodiments, in a case that the attribute information includes plane information, the attribute determination module is further configured to determine a set of adjacent sub-surfaces corresponding to each sub-surface; the adjacent sub-surface comprises a plurality of adjacent sub-surfaces adjacent to the sub-surface; acquiring an adjacent surface vertex set corresponding to each sub-surface according to the adjacent sub-surface set corresponding to each sub-surface; establishing a covariance matrix according to the vertex set of the adjacent surface corresponding to each sub-surface; the covariance matrix corresponds to a plurality of eigenvalues; and determining the plane information corresponding to each sub-surface according to the plurality of characteristic values corresponding to each sub-surface.

In some embodiments, where the attribute information includes vertical information, the attribute determination module is further for determining a unit normal vector for each subsurface; and determining the vertical information corresponding to each sub-surface according to the unit normal vector of each sub-surface and a preset standard unit normal vector.

In some embodiments, the optimization module is further configured to optimize the initial segmentation result by performing minimization iterative processing on the energy function, so as to obtain a semantic segmentation result.

In some embodiments, the obtaining module is further configured to segment the two-dimensional image by using a semantic segmentation model to obtain a segmentation result of the two-dimensional image.

In some embodiments, in the case that the two-dimensional image is an aerial image, the semantic segmentation device 555 further includes a training module, and the training module is configured to obtain a street view sample data set and an aerial image sample data set; the street view sample data set comprises a plurality of street view sample pictures carrying segmentation labels; the aerial photography sample data set comprises a plurality of aerial photography sample pictures carrying segmentation labels; training a preset initial segmentation model according to the street view sample data set to obtain a trained street view segmentation model; and training the street view segmentation model according to the aerial photography sample data set to obtain a semantic segmentation model.

In some embodiments, the mapping module is further configured to determine a conversion relationship between a three-dimensional coordinate system corresponding to the three-dimensional model and a two-dimensional coordinate system corresponding to the two-dimensional image according to the shooting parameters of the two-dimensional image; and mapping the plurality of pixel points and the plurality of vertexes according to the two-dimensional coordinates of each pixel point in the plurality of pixel points, the three-dimensional coordinates of each vertex in the plurality of vertexes and the conversion relation.

In some embodiments, the determining module is further configured to determine an initial label for each vertex from the label corresponding to the pixel point mapped by each vertex.

Embodiments of the present invention provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the semantic segmentation method according to the embodiment of the present invention.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform the semantic segmentation method provided by embodiments of the present application, for example, the method as illustrated in fig. 3, fig. 4, fig. 5A, fig. 6, fig. 7, fig. 8, or fig. 9A.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the following technical effects can be achieved through the embodiments of the present application:

(1) the method can be used for performing semantic segmentation on the three-dimensional model according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model in the process of displaying the three-dimensional model, and compared with a scheme for directly segmenting point cloud data corresponding to the three-dimensional model in the traditional technology, the method reduces the data calculation amount when performing semantic segmentation on the three-dimensional model while ensuring the segmentation accuracy, improves the segmentation efficiency, further can improve the response efficiency of user operation, and improves the user experience.

(2) The initial label of each vertex in the three-dimensional model can be quickly determined by converting the two-dimensional segmentation result of the two-dimensional image into the three-dimensional model in the same scene, and compared with the scheme of directly segmenting point cloud data corresponding to the three-dimensional model in the traditional technology, the scheme ensures the segmentation accuracy, reduces the data calculation amount when performing semantic segmentation on the three-dimensional model and improves the segmentation efficiency; in addition, labels of all vertexes of the three-dimensional model are combined with geometric attributes of all sub-surfaces in the three-dimensional model, geometric constraints of a Markov Random Field (MRF) are added, an energy function is established, and in the process of optimizing an initial segmentation result through the energy function, the final semantic segmentation result is more real and the segmentation accuracy is higher due to the consideration of the set attributes of the three-dimensional model.

(3) By determining the height information, the plane information, the vertical information and the area information of each sub-surface in the three-dimensional model, when a first loss item is defined, the segmentation result of each vertex can be constrained from each geometric dimension, and factors of the length of an intersecting edge, an intersecting angle and a label of an adjacent vertex are considered when a second loss item is defined, so that the authenticity of the segmentation result can be improved through the constraint, and the accuracy of semantic segmentation is improved.

(4) By determining the height information, the plane information, the vertical information and the area information of each sub-surface in the three-dimensional model, the segmentation result of each vertex can be constrained from each geometric dimension in the process of establishing the energy function according to each sub-surface, so that the authenticity of the segmentation result can be improved, and the accuracy of semantic segmentation is improved.

(5) The method comprises the steps of pre-training a preset initial segmentation model through a street view sample data set to obtain a street view segmentation model with certain learning capability, and then learning the pre-trained street view segmentation model again through a small number of aerial photo sample pictures, wherein the obtained semantic segmentation model not only can accurately segment street view pictures, but also can accurately segment aerial photo pictures. Because a large amount of existing street view sample data sets are adopted to pre-train the initial segmentation model, only a small batch of aerial photo sample pictures are needed in the process of obtaining the final semantic segmentation model, and the labor cost for marking the aerial photo sample pictures is reduced; in addition, the model acquisition efficiency can be improved through the training sequence of the streetscape class pictures and the aerial photo class pictures in the road scene.

(6) By establishing a conversion relation of a coordinate system between the two-dimensional image and the three-dimensional model, the segmentation result of each vertex in the three-dimensional model can be quickly determined according to the segmentation result in the two-dimensional image.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A method of semantic segmentation, comprising:

obtaining a display operation aiming at the three-dimensional model;

responding to the display operation, and displaying the three-dimensional model on a human-computer interaction interface; the human-computer interaction interface comprises a semantic segmentation option;

acquiring a selection operation aiming at the semantic segmentation option;

responding to the selection operation, after a two-dimensional segmentation result of the two-dimensional image is obtained, displaying a semantic segmentation result of the three-dimensional model on the human-computer interaction interface; the semantic segmentation result is determined according to the two-dimensional segmentation result of the two-dimensional image and the model attribute of the three-dimensional model; the two-dimensional image and the three-dimensional model belong to the same scene.

2. The method of claim 1, wherein before the human-computer interaction interface presents the semantic segmentation results of the three-dimensional model, the method further comprises: mapping the two-dimensional segmentation result to the three-dimensional model to obtain a semantic segmentation result of the three-dimensional model;

the mapping the two-dimensional segmentation result to the three-dimensional model to obtain a semantic segmentation result of the three-dimensional model includes: acquiring a two-dimensional segmentation result of the two-dimensional image; the two-dimensional segmentation result at least comprises labels corresponding to a plurality of pixel points in the two-dimensional image; mapping a plurality of pixel points of the two-dimensional image with a plurality of vertexes of the three-dimensional model; determining an initial label of each vertex according to labels corresponding to the multiple pixel points; establishing an energy function according to the initial label of each vertex and the attribute information of each sub-surface in a plurality of sub-surfaces of the three-dimensional model; and optimizing the initial segmentation result based on the energy function to obtain a semantic segmentation result.

3. The method of claim 2, wherein the energy function comprises a first loss term, a second loss term, and a weight; the three-dimensional model further comprises a plurality of intersecting edges; said building an energy function from the initial label of each said vertex and the attribute information of each said subsurface of the plurality of sub-surfaces of the three-dimensional model, comprising:

determining the first loss term according to each vertex and a plurality of first sub-surfaces corresponding to each vertex; the first loss term is associated with an initial label for each of the vertices and attribute information for each of the plurality of first sub-surfaces;

establishing the second loss term according to two adjacent vertexes and two adjacent second sub-surfaces corresponding to each intersecting edge in the plurality of intersecting edges; the second loss term is associated with a length of the intersecting edge, an included angle between the two second sub-surfaces, and an initial label of each of the two adjacent vertices;

establishing the energy function according to the first loss term, the second loss term, and the weight.

4. The method of claim 3, wherein the attribute information comprises at least one of: height information, plane information, vertical information, and area information, the method further comprising:

attribute information for each of the sub-surfaces is determined.

5. The method of claim 4, wherein, in the case that the attribute information includes the height information, the determining the attribute information for each of the sub-surfaces comprises:

determining a centroid location for each of said sub-surfaces;

determining a neighborhood centroid set corresponding to each sub-surface according to the centroid position of each sub-surface and a preset local neighborhood range; the neighborhood centroid set comprises a plurality of neighborhood centroids;

and determining the height information corresponding to each sub-surface according to the centroid positions of a plurality of neighborhood centroids corresponding to each sub-surface.

6. The method of claim 4, wherein, in the case that the attribute information includes the plane information, the determining attribute information for each of the sub-surfaces comprises:

determining a set of adjacent sub-surfaces corresponding to each sub-surface; the adjacent sub-surface comprises a plurality of adjacent sub-surfaces adjacent to the sub-surface;

acquiring an adjacent surface vertex set corresponding to each sub-surface according to the adjacent sub-surface set corresponding to each sub-surface;

establishing a covariance matrix according to the vertex set of the adjacent surface corresponding to each sub-surface; the covariance matrix corresponds to a plurality of eigenvalues;

and determining the plane information corresponding to each sub-surface according to a plurality of characteristic values corresponding to each sub-surface.

7. The method of claim 4, wherein, in the case that the attribute information includes the vertical information, the determining the attribute information for each of the sub-surfaces comprises:

determining a unit normal vector for each of said sub-surfaces;

and determining the vertical information corresponding to each sub-surface according to the unit normal vector of each sub-surface and a preset standard unit normal vector.

8. A semantic segmentation apparatus, comprising:

the first acquisition module is used for acquiring display operation aiming at the three-dimensional model;

the first display module is used for responding to the display operation and displaying the three-dimensional model on a human-computer interaction interface; the human-computer interaction interface comprises a semantic segmentation option;

the second acquisition module is used for acquiring the selection operation aiming at the semantic segmentation option;

9. A semantic segmentation apparatus, comprising:

a memory for storing executable instructions;

a processor for implementing the method of any one of claims 1 to 7 when executing executable instructions stored in the memory.

10. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 7.