CN114820988A

CN114820988A - Three-dimensional modeling method, device, equipment and storage medium

Info

Publication number: CN114820988A
Application number: CN202210594630.0A
Authority: CN
Inventors: 孙大运; 唐忠樑
Original assignee: Meiping Meiwu Shanghai Technology Co ltd
Current assignee: Meiping Meiwu Shanghai Technology Co ltd
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-07-29

Abstract

The embodiment of the application provides a three-dimensional modeling method, a three-dimensional modeling device, a three-dimensional modeling equipment and a three-dimensional modeling storage medium, wherein firstly, a target object is segmented from a two-dimensional image through an example segmentation network to obtain a mask image of the target object; and then, carrying out three-dimensional modeling by adopting a three-dimensional modeling network according to the mask image to obtain a three-dimensional grid model of the target object. The complex scene can be simplified through the example segmentation network, interference of other objects in the image to be processed on the modeling process is prevented, the three-dimensional modeling precision of the target object is improved, and the three-dimensional modeling effect is guaranteed. In addition, three-dimensional reconstruction is carried out through the two-dimensional image, multi-angle views and continuous images are not needed, a complex three-dimensional model base is not needed to be constructed, the method can be flexibly applied to various three-dimensional modeling scenes, the detail characteristics of the three-dimensional model can be reserved, and the three-dimensional modeling effect is further improved.

Description

Three-dimensional modeling method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a three-dimensional modeling method, apparatus, device, and storage medium.

Background

With the continuous progress and development of computer technology, three-dimensional model technology has become one of the important technologies in people's life. The method has wide application in various industries such as industrial product design, virtual reality, three-dimensional games, education, movie animation, home decoration and the like, so that the model effect of a scene or an object can be better displayed.

In the related art, in the process of three-dimensional modeling, a view method is generally adopted to perform three-dimensional modeling on an object to be modeled. However, the modeling mode requires multi-angle images of the object to be modeled, and it is difficult to retain detailed information of the object to be modeled.

Disclosure of Invention

Aspects of the present application provide a three-dimensional modeling method, apparatus, device, and storage medium to simplify a three-dimensional modeling process of an object while preserving detailed information of the object to be modeled to ensure a three-dimensional modeling effect.

In a first aspect, an embodiment of the present application provides a three-dimensional modeling method, including: acquiring an image to be processed, wherein the image to be processed comprises a target object; inputting an image to be processed into an example segmentation network, and acquiring a mask image of a target object through the example segmentation network, wherein the mask image comprises the target object; inputting the mask image into a three-dimensional modeling network, and acquiring a three-dimensional grid model of a target object through the three-dimensional modeling network; the example segmentation network is obtained by training a sample object and a mask image of the sample object, and the three-dimensional modeling network is obtained by training a two-dimensional image of the sample object and a three-dimensional grid model of the sample object.

In a second aspect, an embodiment of the present application provides a three-dimensional modeling method, including: responding to a three-dimensional modeling request of a user, acquiring a to-be-processed image corresponding to the three-dimensional modeling request, wherein the to-be-processed image comprises at least one target object, and the three-dimensional modeling request is used for indicating the target object in the to-be-processed image to be subjected to three-dimensional modeling; carrying out example segmentation on an image to be processed to obtain a mask image corresponding to the image to be processed, wherein the mask image comprises a target object; and performing three-dimensional modeling on the target object based on the mask image, and outputting a three-dimensional grid model corresponding to the target object.

In a third aspect, an embodiment of the present application provides a three-dimensional modeling apparatus, including:

the acquisition module is used for acquiring an image to be processed, wherein the image to be processed comprises a target object;

the processing module is used for inputting an image to be processed into the example segmentation network, acquiring a mask image of the target object through the example segmentation network, inputting the mask image into the three-dimensional modeling network, and acquiring a three-dimensional grid model of the target object through the three-dimensional modeling network; the mask image comprises a target object, the example segmentation network is obtained by training a sample object and the mask image of the sample object, and the three-dimensional modeling network is obtained by training a two-dimensional image of the sample object and a three-dimensional grid model of the sample object.

In a fourth aspect, an embodiment of the present application provides a three-dimensional modeling apparatus, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for responding to a three-dimensional modeling request of a user and acquiring a to-be-processed image corresponding to the three-dimensional modeling request, the to-be-processed image comprises at least one target object, and the three-dimensional modeling request is used for indicating the target object in the to-be-processed image to be subjected to three-dimensional modeling;

the example segmentation module is used for carrying out example segmentation on the image to be processed to obtain a mask image corresponding to the image to be processed, and the mask image comprises a target object;

and the three-dimensional modeling module is used for carrying out three-dimensional modeling on the target object based on the mask image and outputting a three-dimensional grid model corresponding to the target object.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions for execution by the at least one processor, the instructions being executable by the at least one processor to enable the electronic device to perform the three-dimensional modeling method of the first aspect and/or the second aspect.

In a sixth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the three-dimensional modeling method according to the first aspect and/or the second aspect.

In a seventh aspect, the present application provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the three-dimensional modeling method according to the first aspect and/or the second aspect is implemented.

In the embodiment of the application, firstly, a target object is segmented from a two-dimensional image through an example segmentation network to obtain a mask image of the target object; and then, carrying out three-dimensional modeling on the target object in the mask image by adopting a three-dimensional modeling network, thereby obtaining a three-dimensional grid model of the target object. The complex scene can be simplified through the example segmentation network, the target object is segmented, interference of other objects in the image to be processed on the modeling process is prevented, the three-dimensional modeling precision of the target object is improved, and the three-dimensional modeling effect is guaranteed. In addition, three-dimensional reconstruction is carried out through the two-dimensional image, multi-angle views and continuous images are not needed, a complex model base is not needed to be constructed, the method can be flexibly applied to various three-dimensional modeling scenes, and compared with a three-dimensional point cloud model in the prior art, the three-dimensional grid model can also keep the detail characteristics of the three-dimensional model, and the three-dimensional modeling effect is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic view of a scenario provided by an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a three-dimensional modeling method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an example partitioned network provided by an embodiment of the present application;

FIG. 4 is a flowchart illustrating an example segmented network training process according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a three-dimensional modeling network provided by an embodiment of the present application;

FIG. 6 is a schematic flowchart of a training process of a three-dimensional modeling network according to an embodiment of the present application;

fig. 7 is a schematic diagram of a patch augmented network according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of a training process of a patch augmentation network according to an embodiment of the present disclosure;

FIG. 9 is a schematic structural diagram of a three-dimensional modeling apparatus provided in an exemplary embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the continuous progress and development of computer technology, three-dimensional model technology has become one of the important technologies in people's life. The method has wide application in industries such as industrial product design, virtual reality, three-dimensional games, education, movie animation, home furnishing and home decoration and the like, so that the model effect of scenes or objects can be better displayed.

Taking the home industry as an example, in the design or manufacturing process of home, designers often need to perform parametric modeling on home according to scenes or pictures provided by users, so as to generate a three-dimensional model of home from two-dimensional pictures, and better show the scene or model effect.

Exemplary, three-dimensional modeling methods commonly used in the related art mainly include the following:

(1) based on the point cloud network, a three-dimensional point cloud model is constructed, the scheme can only process simple images, and the modeling effect of complex images of the stone epitaxy scene is poor.

(2) Based on images of continuous frames, different camera positions are optimized, a three-dimensional model is constructed according to characteristics corresponding to the different camera positions, the modeling mode needs the images of the continuous frames, the flexibility is poor, multi-angle images of an object to be modeled are needed, a large amount of time and energy are consumed, and meanwhile, detailed information of the object to be modeled is difficult to keep.

(3) Continuous pictures are shot by using a monocular camera, and finally, the continuous results are synthesized into a three-dimensional model, so that the process has poor flexibility and is limited to be applied to scenes in which the continuous pictures cannot be obtained.

(4) And constructing a three-dimensional model library, acquiring the characteristics of the object to be modeled when the object to be modeled is subjected to three-dimensional model construction, and matching the characteristics with the characteristic map dictionary of each model in the three-dimensional model library so as to determine the three-dimensional model corresponding to the object to be modeled from the three-dimensional model library. In the process, the retrieval is carried out in the existing model library, the actual two-dimensional picture is not directly subjected to three-dimensional reconstruction, the three-dimensional model in the three-dimensional model library is limited, the three-dimensional model of the object to be modeled can not be obtained, and the process of constructing the three-dimensional model library consumes larger manpower and material resources.

In view of this, embodiments of the present application provide a three-dimensional modeling method, apparatus, device, and storage medium, which segment a target object from a two-dimensional image through an instance segmentation network to obtain a mask image of the target object, and then perform three-dimensional modeling on the target object in the mask image by using the three-dimensional modeling network to obtain a three-dimensional mesh model of the target object. In the embodiment of the application, the network is segmented through the example, the complex scene can be simplified, and the target object in the two-dimensional image to be processed is segmented, so that the target object to be modeled is extracted, the interference of other objects in the image to be processed on the modeling process is prevented, the three-dimensional modeling precision of the target object is improved, the three-dimensional modeling effect is guaranteed, and the network can be applied to the construction of the three-dimensional model of the complex scene. In addition, three-dimensional reconstruction is carried out through the two-dimensional image, a multi-angle view and continuous images are not needed in the process, a complex three-dimensional model base is not needed to be constructed, the method can be flexibly applied to various three-dimensional modeling scenes, and compared with a three-dimensional point cloud model in the prior art, the three-dimensional grid model can keep the detail characteristics of the three-dimensional model, and the three-dimensional modeling effect is guaranteed.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic view of a scenario provided in an exemplary embodiment of the present application. As shown in fig. 1, the scenario includes: and (4) terminal equipment.

The terminal device may also be referred to as a User Equipment (UE), a Mobile Station (MS), a mobile terminal (mobile terminal), a terminal (terminal), or the like. In practical applications, the terminal device is, for example: desktop computers, notebooks, Personal Digital Assistants (PDAs), smart phones, tablet computers, vehicle-mounted devices, wearable devices (e.g., smart watches, smart bands), smart home devices (e.g., smart display devices), and the like.

For example, after the terminal device obtains the image to be processed (for example, the image to be processed is obtained by shooting the terminal device, or the image to be processed is uploaded or sent to the terminal device by other methods), the terminal device may perform three-dimensional modeling on the target object in the image to be processed by using the three-dimensional modeling method provided in the embodiment of the present application, so as to obtain the three-dimensional model of each target object.

In some optional embodiments, a server may also be included in the scenario. Where a server is a service point that provides data processing, databases, etc., the server may be a unitary server or a distributed server across multiple computers or computer data centers, and the server may include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server. The server is, for example, a blade server, a cloud server, or the like, or may be a server group composed of a plurality of servers.

The terminal device and the server may communicate with each other through a wired network or a wireless network, and in this embodiment, the server may perform some functions of the terminal device. Illustratively, the image to be processed may be uploaded to a server through a terminal device, and the server performs three-dimensional modeling on a target object in the image to be processed through the three-dimensional modeling method provided in the embodiment of the present application, and then outputs a three-dimensional model corresponding to the target object through the terminal device.

It should be understood that fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application, and the embodiment of the present application does not limit the types and the number of devices included in fig. 1, for example, in the application scenario illustrated in fig. 1, a data storage device may be further included for storing service data, and the data storage device may be an external memory or an internal memory integrated in a terminal device or a server.

The following describes in detail the technical solutions of the embodiments of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

In some embodiments, an execution subject of the three-dimensional modeling method provided by the embodiments of the present application may be a terminal device or a server in fig. 1. Specifically, the three-dimensional modeling method provided by the embodiment of the application comprises the following steps:

(1) responding to a three-dimensional modeling request of a user, acquiring a to-be-processed image corresponding to the three-dimensional modeling request, wherein the to-be-processed image comprises at least one target object, and the three-dimensional modeling request is used for indicating that the target object in the to-be-processed image is subjected to three-dimensional modeling.

In the first embodiment, when the execution subject is a terminal device, a user can trigger a three-dimensional modeling operation on any image on the terminal device, so as to generate a three-dimensional modeling request; correspondingly, after the terminal equipment acquires the three-dimensional modeling request, the terminal equipment starts to perform three-dimensional modeling on the target object in the image corresponding to the three-dimensional modeling request.

In a second implementation manner, when the execution main body is a server and a user triggers a three-dimensional modeling operation on any image on a terminal device, the terminal device generates a three-dimensional modeling request according to an image to be processed and sends the three-dimensional modeling request to the server; correspondingly, after the server acquires the three-dimensional modeling request, the server starts to perform three-dimensional modeling on the target object in the image corresponding to the three-dimensional modeling request.

The image to be processed is a two-dimensional image, and the image to be processed in the present application may be in various forms, for example, a picture form or a video form.

In the embodiment of the present application, the image to be processed includes at least one target object, and for different three-dimensional modeling scenes, the corresponding types of the image to be processed are different, and the types of the target object in the image to be processed are also different, for example, taking the home furnishing industry as an example, the image to be processed is generally used for building a three-dimensional model of an object such as a home (or furniture, a household appliance, etc.), and therefore, the image to be processed may be a design drawing or a photograph of a home, where the target object is, for example, a sofa, a table, a chair, a bed, a wardrobe, a computer, a television, etc.

(2) And carrying out example segmentation on the image to be processed to obtain a mask image corresponding to the image to be processed, wherein the mask image comprises the target object.

That is to say, the execution subject according to the embodiment of the present application extracts the corresponding mask image of each target object from the image to be processed, so as to segment the target object from the image to be processed.

Each mask image comprises a target object, each mask image comprises a plurality of pixels, each pixel corresponds to a pixel value, the pixel value of the pixel position of the target object in the mask image is the pixel value of the corresponding pixel position in the image to be processed, and the pixel value of the pixel position of the non-target object in the mask image is 0, so that the pixel of the target object in the mask image and the pixel of the non-target object in the mask image are distinguished.

(3) And performing three-dimensional modeling on the target object based on the mask image, and outputting a three-dimensional grid model corresponding to the target object.

It is understood that a mesh is a polygonal mesh, a data structure used for modeling various irregular objects in computer graphics.

In the embodiment of the application, the target object is directly reconstructed in three dimensions through the two-dimensional mask image, a multi-angle view is not needed in the process, continuous images are not needed, a complex model base is not needed to be constructed, the method can be flexibly applied to various three-dimensional modeling scenes, and compared with a three-dimensional point cloud model in the prior art, the three-dimensional grid model can keep the detail characteristics of the three-dimensional model, and the three-dimensional modeling effect is guaranteed.

Next, the principle of the three-dimensional modeling process will be explained in detail with reference to fig. 2:

fig. 2 is a schematic diagram of a three-dimensional modeling method according to an embodiment of the present disclosure. As shown in fig. 2, the three-dimensional modeling method provided in the embodiment of the present application specifically includes the following steps:

(1) and inputting the image to be processed into an example segmentation network, and acquiring a mask image of the target object through the example segmentation network.

It should be noted that the example segmentation network is obtained by training a sample object and a mask image of the sample object, where the sample object is similar to the target object, and the example segmentation network is trained by different types of sample objects, so that the example segmentation network has example segmentation capabilities of different types of target objects.

In the embodiment of the application, the example segmentation network is used for extracting the corresponding mask image of each target object from the image to be processed, so that the target object is segmented from the image to be processed.

In some optional embodiments, the example segmentation network is further configured to output information such as category information and location information of the target object, where the category information is used to indicate a category to which the target object belongs, and for example, the category information is: sofas, tables, chairs, beds, wardrobes, computers, televisions, etc.; the position information is used for indicating the area of the target object in the image to be processed.

In the embodiment of the application, as the layout of the target object in the image to be processed is possibly complex, the network is segmented through the example, the complex scene can be simplified, and the target object in the two-dimensional image to be processed is segmented, so that the target object to be modeled is extracted, the interference of other objects in the image to be processed on the modeling process is prevented, the three-dimensional modeling precision of the target object is improved, and the three-dimensional modeling effect is guaranteed.

In the above scenario, since the image to be processed may include a plurality of target objects, mask images of the plurality of target objects may be obtained, and a layout of each target object in the image to be processed is complex, the mask image of the target object obtained by segmentation through the example segmentation network may not be accurate enough, for example, a sawtooth may occur at an edge of the mask image, or a hole may be included in the mask image, which may affect a three-dimensional modeling effect. In view of this, in some optional embodiments, after obtaining the mask image corresponding to the target object, the mask image may be further refined (for example, edge smoothing or hole filling), so as to obtain a mask image with a better effect. As for the specific scheme for performing refinement processing on the mask image, details are not described in the embodiments of the present application.

(2) And inputting the mask image into a three-dimensional modeling network, and acquiring a three-dimensional grid model of the target object through the three-dimensional modeling network.

The three-dimensional modeling network can be a neural network model and is used for constructing the object in the two-dimensional image into a three-dimensional grid model.

In a preferred embodiment, the three-dimensional mesh model in the embodiment of the present application may be a three-dimensional model composed of triangular patches, since the triangular patches are the minimum units of division in the polygonal mesh, and the representation is simple, flexible, and the topology description is convenient. Therefore, the three-dimensional grid model generated in the embodiment of the application is beneficial to improving the three-dimensional modeling efficiency, and compared with the three-dimensional point cloud model in the prior art, the three-dimensional grid model can keep the detail characteristics of the three-dimensional model, and the three-dimensional modeling effect is guaranteed.

In some embodiments, the three-dimensional modeling network is trained from two-dimensional images of the sample object, and a constructed three-dimensional mesh model of the sample object. The structure, principle and training manner of the example segmentation network and the three-dimensional modeling network are shown in the following embodiments and will not be described here.

In the above scenario, in the process of directly generating the three-dimensional mesh model from the two-dimensional mask image, the generated three-dimensional mesh model is relatively coarse, so that the three-dimensional mesh model has the situations of mesh intersection, mesh missing or sharpness and the like. In view of this, in the embodiment of the present application, after the three-dimensional mesh model is generated through the three-dimensional modeling network, the generated three-dimensional mesh model may be further refined (for example, operations such as deleting intersecting patches or smoothing sharp positions are performed), so as to solve the problems such as intersecting patches and sharpness, and simultaneously, the missing patches are filled up to obtain the closed three-dimensional mesh model, so as to obtain the refined three-dimensional mesh model.

The inventor also finds that due to the limitation of the practical production environment, for example, the capability of the three-dimensional modeling network is limited due to the configuration of the user machine, we can only obtain a three-dimensional mesh model with a small number of patches, and the degree of refinement is still low, so that the detailed characteristics of the target object are difficult to embody. In view of this, in some alternative embodiments, after obtaining the three-dimensional mesh model of the target object, the number of patches in the three-dimensional mesh model may be increased, so as to obtain a finer three-dimensional mesh model.

Specifically, the method comprises the following steps:

and inputting the three-dimensional grid model into a patch augmentation network, and performing patch augmentation processing on the three-dimensional grid model through the patch augmentation network to obtain a target grid model of the target object.

The number of the surface patches in the target grid model is larger than that of the surface patches in the three-dimensional grid model, the surface patch augmentation network is obtained by training the sample grid model and the target sample grid model corresponding to the sample grid model, and the number of the surface patches in the target sample grid model is larger than that of the surface patches in the sample grid model.

In the embodiment of the application, the patch augmentation processing is carried out on the three-dimensional mesh model through the patch augmentation network, so that the technical problem that the three-dimensional mesh model is not accurate enough due to the configuration of a user machine can be solved, and the detailed characteristics of a target object can be better reflected.

In some optional embodiments, when the three-dimensional mesh model is patch augmented by a patch augmentation network, the generated target mesh model may also have mesh intersections, missing meshes, or sharpness. In view of this, in the embodiment of the present application, after the target mesh model is generated, corresponding refinement processing (for example, deleting patch meshes, performing smoothing processing on sharp positions, and the like) may also be performed, so as to solve the problems of patch intersection, sharpness, and the like in the target mesh model, and simultaneously, the missing patches are filled to obtain a closed target mesh model, thereby obtaining a refined target mesh model.

Next, the structure and principle of each network will be described in detail with reference to the following embodiments:

fig. 3 is a schematic diagram of an example split network according to an embodiment of the present application. As shown in fig. 3, the example split network includes: the device comprises a first backbone network, a target area extraction layer, a size adjustment layer and a mask extraction layer;

the first backbone network is used for extracting image features of an image to be processed and obtaining image features of multiple dimensions;

the target area extraction layer is used for identifying a target object in the image to be processed and extracting a target area where the target object is located so as to obtain a characteristic image based on the target area and the image characteristics;

the size adjusting layer is used for adjusting the characteristic image to a target size;

and the mask extraction layer is used for extracting a mask of a target object with a target size and obtaining a mask image corresponding to the target object.

As shown in fig. 3, when the mask image of the target object is acquired through the example segmentation network, the method specifically includes the following steps:

(1) inputting an image to be processed into an example segmentation network, and extracting image characteristics of the image to be processed through a first backbone network;

it should be understood that, for a specific type of the first backbone network, the embodiment of the present application is not particularly limited, and for example, a structure such as ResNet-FPN may be adopted.

In the embodiment of the application, the first backbone network is used for extracting a plurality of feature layers of the image to be processed from a plurality of dimensions, and multi-layer image features are extracted through the first backbone network, so that detection of multi-scale objects and small objects is facilitated, features in the image to be processed are fully mined, and more accurate example segmentation is carried out.

(2) Extracting a target area where a target object is located in the image to be processed through a target area extraction layer, and obtaining a characteristic image corresponding to the target object based on the target area;

wherein the characteristic image comprises the target object.

For example, taking the image to be processed including the target object a and the target object b as an example, the target region extraction layer may divide a target region for each target object in each feature layer according to the position of the target object, where each target region only includes one target object. As shown in fig. 3, in the feature layer 1, a target object a is within a target area a, and a target object b is within a target area b.

Furthermore, for each target object, the target area in each feature layer is segmented from the image to be processed, so as to obtain a feature image corresponding to the target object.

Specifically, a target area a and a target area b are respectively segmented from a graph to be processed, and the segmentation result of the target area a corresponding to each feature layer is the feature image of a target object a; the segmentation result of the target area b in each feature layer is the feature image of the target object b.

(3) The size adjustment layer adjusts the feature image to a target size.

The size of the target dimension is not particularly limited in the embodiments of the present application.

(4) And extracting a mask image of the characteristic image through the mask extraction layer.

As shown in fig. 3, in some optional embodiments, the example segmentation network may further include a full connection layer, configured to identify a target object in a target image and obtain category information of the target object; or the method is used for acquiring the region of the target object in the target image.

Next, the training process will be described in detail with reference to the structure of the example segmentation network:

fig. 4 is a flowchart illustrating a training process of an example segmented network according to an embodiment of the present disclosure. As shown in fig. 4, the training process of the example segmented network includes the following steps:

s401, obtaining a sample image and marking information of a sample object in the sample image.

The sample image includes at least one sample object, and for different modeling scenes, the corresponding sample image has different types and different sample object types, which is exemplarily, for example, in the home furnishing industry, and is generally used for constructing a three-dimensional model of an object such as a home (or furniture, a household appliance, and the like), so the sample image may be a design drawing or a photo of a home, where the type of the sample object is, for example, a sofa, a table, a chair, a bed, a wardrobe, a computer, a television, and the like.

In some embodiments, the annotation information includes at least one of a category of the sample object, a sample region in which the sample object is located, and a sample mask image of the sample region.

In the embodiment of the application, the type of the sample object in the sample graph can be labeled by adopting modes of manual or machine learning and the like, so as to obtain the type of the sample object; or marking the sample area where the sample object is located in the sample image by adopting a mode of manual work or machine learning and the like, so as to obtain the sample area; or labeling the mask of the sample region by adopting a manual method or a machine learning method, and the like, so as to obtain a sample mask image corresponding to the sample region.

In the mask image, the mask of the pixel where the sample object is located may be labeled as the pixel value corresponding to the pixel, and the mask of the pixel where the non-sample object is located may be labeled as 0, so as to distinguish the pixel where the sample object is located in the sample mask image.

S402, inputting the sample image into the example segmentation network, and obtaining segmentation information of the sample object output by the example segmentation network.

The segmentation information includes at least one of a prediction type of the sample object, a prediction mask image, and a prediction region where the sample object is located.

With reference to fig. 3, first, image features of a sample image may be extracted through a first backbone network in an example segmentation network to obtain image features of multiple dimensions;

further, identifying a sample object in the sample image through the target area extraction layer, and extracting a prediction area where the sample object is located to obtain a sample characteristic image based on the prediction area and the image characteristics;

optionally, the sample feature image may be adjusted to a target size by a size adjustment layer;

furthermore, the mask of the sample object in the sample characteristic image is extracted through the mask extraction layer, so that a predicted mask image corresponding to the sample object is obtained.

In addition, it is also possible to output the prediction region obtained by the target region extraction layer through the fully-connected layer, and identify the prediction category of the sample object in the sample feature image through the fully-connected layer.

And S403, obtaining a first loss value according to the labeling information and the segmentation information of the sample object based on the first loss function.

Wherein the first loss function is a weight calculation result of at least one of the following loss functions:

a class Loss class, a region Loss class, a mask Loss class for a sample region, and the like.

Illustratively, the first Loss function Loss1 may be obtained by the following equation:

Loss1＝a ₁₁ ×Loss _categories +a ₁₂ ×Loss _Region(s) +a ₁₃ ×Loss _{Mask film}

Wherein, a ₁₁ Is a weight value corresponding to the class loss, a ₁₂ For the loss of corresponding weight value of the sample area, a ₁₃ For mask loss corresponding weight values, for a ₁₁ 、a ₁₂ And a ₁₃ The specific values of (b) are not particularly limited in the examples of the present application.

It should be noted that the category Loss category and the area Loss area may be regression Loss functions; the mask Loss Loss mask may be a binary cross entropy Loss function.

It should be understood that the embodiment of the present application is not repeated herein for the method for calculating the area Loss value based on the Loss area, the sample area, and the prediction area, the method for calculating the category Loss value based on the Loss category, the sample category, and the prediction category, and the method for calculating the mask Loss value based on the Loss mask, the sample mask image, and the prediction mask image.

S405, based on the first loss value, adjusting model parameters of the example segmentation network to obtain the trained example segmentation network.

Specifically, in each round of training, a model parameter of the example segmentation network is adjusted by using a first loss value corresponding to the current training process, and the adjusted example segmentation network is used to continuously obtain the first loss value according to the above steps until the first loss value meets the preset training requirement, and the trained example segmentation network is obtained according to the model parameter of the example segmentation network corresponding to the current round.

Fig. 5 is a schematic diagram of a three-dimensional modeling network provided in an embodiment of the present application. As shown in fig. 5, the three-dimensional modeling network includes: a second backbone network and at least one graph convolutional neural network;

inputting the mask image into a three-dimensional modeling network, and acquiring a three-dimensional grid model of a target object through the three-dimensional modeling network, wherein the method comprises the following steps:

(1) inputting the mask image into a three-dimensional modeling network, and extracting the characteristics of the mask image through a second backbone network to obtain coordinate information of an interested pixel point in the mask image;

the interested pixel point is a pixel point where the target object is located, and the coordinate information includes but is not limited to at least one of the following: vertex coordinates and normal vector coordinates of patches used to construct the three-dimensional mesh model.

It should be noted that, regarding the type of the second backbone network, the embodiment of the present application is not particularly limited.

(2) And performing three-dimensional reconstruction according to the coordinate information through at least one graph convolution neural network to obtain a three-dimensional grid model of the target object.

The three-dimensional mesh model is composed of a plurality of patches (mesh), each patch is a set of a Vertex (Point or Vertex), a Normal Vector (Normal Vector) and a Face (Face), and the three-dimensional characteristics of the target object are defined through the patches.

Because the three-dimensional model exists in the non-Euclidean space, the three-dimensional model cannot be represented by conventional convolution, in the embodiment of the application, three-dimensional reconstruction is carried out on the extracted region of interest based on the graph convolution neural network, and then the three-dimensional grid model corresponding to the target object can be obtained.

Specifically, the input of the graph convolution neural network is the spatial coordinate information of the point of interest, and the output is the vertex coordinates of each patch in the three-dimensional mesh model through the refinement of the graph convolution layer. It should be appreciated that since points in the three-dimensional mesh model are in non-Euclidean space, the points in space are not connected with a fixed number of points, and the situation can be represented by a graph structure, and the graph convolution neural network can perform convolution operation on graph structure data.

It should be noted that, the number of the graph convolution neural networks in the three-dimensional modeling network is not particularly limited in the embodiments of the present application.

Next, the training process of the three-dimensional modeling network is explained in detail with reference to fig. 6:

fig. 6 is a flowchart illustrating a training process of a three-dimensional modeling network according to an embodiment of the present application. As shown in fig. 6, the training process of the three-dimensional modeling network includes the following steps:

s601, obtaining a sample image and a sample three-dimensional grid model of a sample object in the sample image.

Wherein the sample image includes at least one sample object therein.

It should be noted that the sample image used for training the three-dimensional modeling network and the sample image used for training the example segmentation network may be the same sample image or different sample images, and the embodiment of the present application is not particularly limited.

And S602, inputting the sample image into a three-dimensional modeling network to obtain a predicted three-dimensional grid model output by the three-dimensional modeling network.

Specifically, the specific scheme of the three-dimensional modeling network performing three-dimensional reconstruction through the sample image to obtain the predicted three-dimensional mesh model is similar to the embodiment shown in fig. 5, which may be referred to specifically for the above embodiment, and is not repeated here.

And S603, based on the second loss function, obtaining a second loss value according to the sample three-dimensional grid model and the prediction three-dimensional grid model.

Specifically, the second loss function may be a weight calculation result of at least one of the following loss functions:

the 'edge' Loss function Loss edge of each patch in the three-dimensional grid model;

loss function Loss vertex coordinates of each vertex coordinate in a point cloud model corresponding to the three-dimensional mesh model;

and Loss function Loss normal vectors of normal vector coordinates of all points in the point cloud model.

Illustratively, the second Loss function Loss2 may be obtained according to the following formula, and the second Loss value may be obtained according to Loss 2:

Loss2＝a ₂₁ ×Loss _edge +a ₂₂ ×Loss _{Vertex coordinates} +a ₂₃ ×Loss _{Normal vector}

Wherein, a ₂₁ For "edge" loss of the corresponding weight value, a ₂₂ For the loss of corresponding weight values of vertex coordinates, a ₂₃ For a weight value corresponding to the normal vector coordinate loss, for a ₂₁ 、a ₂₂ And a ₂₃ The specific values of (b) are not particularly limited in the examples of the present application.

Specifically, the predicted three-dimensional mesh model and the sample three-dimensional mesh model may be converted into point cloud models, and the vertex coordinates and normal vector coordinates corresponding to the predicted three-dimensional mesh model and the vertex coordinates and normal vector coordinates corresponding to the sample three-dimensional mesh model may be obtained from the point cloud models.

Further, based on the Loss vertex coordinates, similarity calculation is carried out according to the vertex coordinates of the sample three-dimensional mesh model and the predicted three-dimensional mesh model, and further the vertex coordinate Loss of the three-dimensional modeling network in the current training process is obtained;

or based on the Loss normal vector, performing similarity calculation according to normal vector coordinates of the sample three-dimensional grid model and the prediction three-dimensional grid model, thereby obtaining the normal vector coordinate Loss of the three-dimensional modeling network.

Or, based on the Loss side, similarity calculation can be performed according to the edges of the grids in the sample three-dimensional grid model and the edges of the grids in the predicted three-dimensional grid model, so that the Loss value of the edges of the three-dimensional modeling network in the current training process can be obtained.

And S604, adjusting model parameters of the three-dimensional modeling network based on the second loss value to obtain the trained three-dimensional modeling network.

Specifically, in each training process, a second loss value corresponding to the current training process is adopted to adjust model parameters of the three-dimensional modeling network, the adjusted three-dimensional modeling network is adopted to continuously obtain the second loss value according to the steps, and when the second loss value meets the preset training requirement, the trained three-dimensional modeling network is obtained according to the model parameters of the three-dimensional modeling network corresponding to the current round.

Fig. 7 is a schematic diagram of a patch augmented network according to an embodiment of the present application. As shown in fig. 7, the patch augmented network includes an encoder and a decoder;

in the embodiment of the application, the patch augmentation network is used for obtaining the mesh model with more patches so as to better reflect the detailed characteristics of the target object. Specifically, the method for obtaining the target mesh model of the target object by performing patch augmentation processing on the three-dimensional mesh model through the patch augmentation network comprises the following steps:

(1) carrying out parameterization processing on the three-dimensional grid model through an encoder to obtain high-dimensional characteristics corresponding to the three-dimensional patch model;

firstly, carrying out parameterization processing on a three-dimensional grid model through an encoder, and thus establishing a one-to-one correspondence relationship between three-dimensional points of all surface patches in the three-dimensional grid model and points on a two-dimensional parameter domain;

furthermore, the patch is subjected to regular discrete resampling in the parameter domain, so that the patch is represented as a height domain to obtain the corresponding high-dimensional features of the three-dimensional patch model, and thus, each point in the three-dimensional grid can be represented by a one-dimensional height value.

In an optional embodiment, before performing the parameterization process on the three-dimensional mesh model, the three-dimensional mesh model may also be subdivided to reduce distortion caused in the parameterization process.

Furthermore, the high-dimensional features can be encoded by adopting a traditional image processing algorithm to obtain a code stream corresponding to the high-dimensional features, and the code stream is transmitted to a decoder.

(2) And decoding the code stream with the high-dimensional characteristics through a decoder, and reconstructing according to data obtained by decoding to obtain a target grid model.

Specifically, at the decoder side, a codebook obtained through pre-training is adopted to decode the code stream output by the encoder, so as to obtain high-dimensional characteristics, and then the high-dimensional characteristics are reconstructed and mapped into a three-dimensional model, so as to obtain a target grid model.

Next, the training process of the mask augmentation network will be described in detail with reference to fig. 8:

fig. 8 is a flowchart illustrating a training process of a patch augmentation network according to an embodiment of the present disclosure. As shown in fig. 8, the training process of the patch augmented network includes the following steps:

s801, obtaining a sample grid model and a target sample grid model corresponding to the sample grid model.

And the number of patches in the target sample grid model is greater than that in the sample grid model.

S802, inputting the sample grid model into a patch augmentation network, and obtaining a prediction sample grid model output by the patch augmentation network.

It should be noted that the method for obtaining the prediction sample mesh model through the patch augmented network is similar to the method for obtaining the target mesh model in the embodiment shown in fig. 7 in principle, and reference may be specifically made to the above embodiment, and details are not repeated here.

And S803, based on the third loss function, obtaining a third loss value according to the target sample grid model and the prediction sample grid model.

Wherein the third loss function may be a regression loss function. The regression loss function is used to indicate the similarity of the target sample mesh model and the prediction sample mesh model.

And S804, based on the third loss value, adjusting the model parameters of the patch augmentation network to obtain the trained patch augmentation network.

Specifically, in each round of training, the model parameters of the patch augmentation network are adjusted by using the third loss value corresponding to the current training process, and the adjusted patch augmentation network is used to continuously obtain the third loss value according to the above steps until the third loss value meets the preset training requirement, and the trained patch augmentation network is obtained according to the model parameters of the patch augmentation network corresponding to the current round.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subject of steps S201 to S203 may be device a; for another example, the execution subject of steps S201 and S202 may be device a, and the execution subject of step S203 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 201, 202, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a three-dimensional modeling apparatus according to an exemplary embodiment of the present application. As shown in fig. 9, a three-dimensional modeling apparatus 900 provided in an embodiment of the present application includes:

an obtaining module 901, configured to obtain an image to be processed, where the image to be processed includes a target object;

the processing module 902 is configured to input an image to be processed into an example segmentation network, obtain a mask image of a target object through the example segmentation network, input the mask image into a three-dimensional modeling network, and obtain a three-dimensional mesh model of the target object through the three-dimensional modeling network;

the mask image comprises a target object, the example segmentation network is obtained by training a sample object and the mask image of the sample object, and the three-dimensional modeling network is obtained by training a two-dimensional image of the sample object and a three-dimensional grid model of the sample object.

In some optional embodiments, the instance splitting network comprises: the device comprises a first backbone network, a target area extraction layer, a size adjustment layer and a mask extraction layer; the processing module 900 is specifically configured to: inputting an image to be processed into an example segmentation network, and extracting image characteristics of the image to be processed through a first backbone network; extracting a target area where a target object is located in an image to be processed through a target area extraction layer, and obtaining a characteristic image corresponding to the target object based on the target area and image characteristics, wherein the characteristic image comprises the target object; adjusting the characteristic image to a target size through a size adjusting layer; and extracting a mask image of the characteristic image through the mask extraction layer.

In some alternative embodiments, the three-dimensional modeling network comprises: a second backbone network and at least one graph convolutional neural network; the processing module 900 is specifically configured to: inputting the mask image into a three-dimensional modeling network, and extracting the characteristics of the mask image through a second backbone network to obtain coordinate information of an interested pixel point in the mask image; and performing three-dimensional reconstruction according to the coordinate information through at least one graph convolution neural network to obtain a three-dimensional grid model of the target object.

In some optional embodiments, the processing module 902 is further configured to: inputting the three-dimensional grid model into a patch augmentation network, and performing patch augmentation processing on the three-dimensional grid model through the patch augmentation network to obtain a target grid model of a target object; the number of surface patches in the target grid model is larger than that of surface patches in the three-dimensional grid model, the surface patch augmentation network is obtained by training based on the sample grid model and a target sample grid model corresponding to the sample grid model, and the number of surface patches in the target sample grid model is larger than that of the surface patches in the sample grid model.

In some optional embodiments, the patch augmented network includes an encoder and a decoder; the processing module 902 is specifically configured to: carrying out parameterization processing on the three-dimensional grid model through an encoder to obtain high-dimensional characteristics corresponding to the three-dimensional patch model; and decoding the code stream with the high-dimensional characteristics through a decoder, and reconstructing according to data obtained by decoding to obtain a target grid model.

In some alternative embodiments, the example split network is obtained by:

acquiring a sample image and annotation information of a sample object in the sample image, wherein the annotation information comprises at least one of the category of the sample object, a sample region where the sample object is located and a mask image of the sample region; inputting the sample image into an example segmentation network, and obtaining segmentation information of the sample object output by the example segmentation network, wherein the segmentation information comprises at least one of a prediction type of the sample object, a prediction mask and a prediction region where the sample object is located; obtaining a first loss value according to the marking information and the segmentation information of the sample object based on the first loss function; and adjusting the model parameters of the example segmentation network based on the first loss value to obtain the trained example segmentation network.

In some alternative embodiments, the three-dimensional modeling network is obtained by:

obtaining a sample image and a sample three-dimensional grid model of a sample object in the sample image, wherein the sample image comprises at least one sample object; inputting the sample image into a three-dimensional modeling network to obtain a predicted three-dimensional grid model output by the three-dimensional modeling network; based on a second loss function, obtaining a second loss value according to the sample three-dimensional grid model and the prediction three-dimensional grid model; and adjusting the model parameters of the three-dimensional modeling network based on the second loss value to obtain the trained three-dimensional modeling network.

In some alternative embodiments, the patch augmented network is obtained by:

obtaining a sample grid model and a target sample grid model corresponding to the sample grid model; inputting the sample grid model into a patch augmentation network, and obtaining a prediction sample grid model output by the patch augmentation network, wherein the number of patches of the prediction sample grid model is greater than that of the patch of the sample grid model; based on a third loss function, obtaining a third loss value according to the target sample grid model and the prediction sample grid model; and adjusting the model parameters of the patch augmentation network based on the third loss value to obtain the trained patch augmentation network.

It should be noted that, the three-dimensional modeling apparatus 900 provided in the embodiment of the present application is used for executing the technical solutions in the corresponding method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

An exemplary embodiment of the present application further provides a three-dimensional modeling apparatus, which is applied to a terminal device or a server, and the three-dimensional modeling apparatus specifically includes:

It should be noted that the three-dimensional modeling apparatus provided in the embodiment of the present application is used for executing the technical solutions in the corresponding method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 10 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 10, the electronic apparatus 1000 includes: a memory 1003 and a processor 1004.

The electronic device 1000 provided in the embodiment of the present application is illustrated by taking a cloud server as an example, but is not limited thereto, and may also be a terminal device or the like, for example.

The memory 1003 is used for storing computer programs and can be configured to store other various data to support operations on the central management server. The store 1003 may be an Object Storage Service (OSS).

The memory 1003 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 1004 coupled to the memory 1003 for executing the computer program in the memory 1003 for executing the three-dimensional modeling method provided by the above method embodiments.

Further, as shown in fig. 10, the edge computing device 1000 further includes: firewall 1001, load balancer 1002, communications component 1005, power component 1006, and other components. Only some of the components are schematically shown in fig. 10, and the electronic device is not meant to include only the components shown in fig. 10.

Accordingly, the present application also provides a computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps in the above method embodiments.

Accordingly, the present application also provides a computer program product, which includes a computer program/instructions, when the computer program/instructions are executed by a processor, the processor is caused to implement the steps of the three-dimensional modeling method in the above method embodiments.

The communications component 1005 of fig. 10 described above is configured to facilitate communications between the device in which the communications component resides and other devices in a wired or wireless manner. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

The power supply assembly 1006 of fig. 10 provides power to the various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, device, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A three-dimensional modeling method, comprising:

acquiring an image to be processed, wherein the image to be processed comprises a target object;

inputting the image to be processed into an example segmentation network, and acquiring a mask image of the target object through the example segmentation network, wherein the mask image comprises the target object;

inputting the mask image into a three-dimensional modeling network, and acquiring a three-dimensional grid model of the target object through the three-dimensional modeling network;

wherein the example segmentation network is obtained by training a sample object and a mask image of the sample object, and the three-dimensional modeling network is obtained by training a two-dimensional image of the sample object and a three-dimensional mesh model of the sample object.

2. The three-dimensional modeling method of claim 1, wherein the instance splitting network comprises: the device comprises a first backbone network, a target area extraction layer, a size adjustment layer and a mask extraction layer;

the inputting the image to be processed into an example segmentation network, and acquiring a mask image of the target object through the example segmentation network, includes:

inputting the image to be processed into the example segmentation network, and extracting image features of the image to be processed through the first backbone network;

extracting a target area where the target object is located in the image to be processed through the target area extraction layer, and obtaining a characteristic image corresponding to the target object based on the target area and the image characteristics, wherein the characteristic image comprises the target object;

adjusting the characteristic image to a target size through the size adjusting layer;

and extracting a mask image of the characteristic image through the mask extraction layer.

3. The three-dimensional modeling method of claim 1, wherein the three-dimensional modeling network comprises: a second backbone network and at least one graph convolutional neural network;

inputting the mask image into a three-dimensional modeling network, and acquiring a three-dimensional grid model of the target object through the three-dimensional modeling network, wherein the three-dimensional grid model comprises the following steps:

inputting the mask image into the three-dimensional modeling network, and performing feature extraction on the mask image through the second backbone network to obtain coordinate information of an interested pixel point in the mask image;

and performing three-dimensional reconstruction according to the coordinate information through the at least one graph convolution neural network to obtain a three-dimensional grid model of the target object.

4. The three-dimensional modeling method of any of claims 1-3, further comprising:

inputting the three-dimensional grid model into a patch augmentation network, and performing patch augmentation processing on the three-dimensional grid model through the patch augmentation network to obtain a target grid model of the target object;

the number of patches in the target grid model is greater than that of patches in the three-dimensional grid model, the patch augmentation network is obtained by training based on the sample grid model and a target sample grid model corresponding to the sample grid model, and the number of patches in the target sample grid model is greater than that of patches in the sample grid model.

5. The three-dimensional modeling method of claim 4, wherein the patch augmentation network comprises an encoder and a decoder;

performing patch augmentation processing on the three-dimensional mesh model through the patch augmentation network to obtain a target mesh model of the target object, including:

carrying out parameterization processing on the three-dimensional grid model through the encoder to obtain high-dimensional characteristics corresponding to the three-dimensional grid model;

and decoding the code stream with the high-dimensional characteristics through the decoder, and reconstructing according to data obtained by decoding to obtain the target grid model.

6. A three-dimensional modeling method according to any of claims 1 to 3, characterized in that said instance segmentation network is obtained by:

acquiring a sample image and annotation information of a sample object in the sample image, wherein the annotation information comprises at least one of a category of the sample object, a sample region where the sample object is located and a mask image of the sample region;

inputting the sample image into an example segmentation network, and obtaining segmentation information of the sample object output by the example segmentation network, wherein the segmentation information comprises at least one of a prediction category of the sample object, a prediction mask and a prediction region where the sample object is located;

obtaining a first loss value according to the marking information and the segmentation information of the sample object based on a first loss function;

and adjusting the model parameters of the example segmentation network based on the first loss value to obtain the trained example segmentation network.

7. A three-dimensional modeling method according to any of claims 1 to 3, characterized in that said three-dimensional modeling network is obtained by:

obtaining a sample image and a sample three-dimensional grid model of a sample object in the sample image, wherein the sample image comprises at least one sample object;

inputting the sample image into a three-dimensional modeling network to obtain a predicted three-dimensional grid model output by the three-dimensional modeling network;

obtaining a second loss value according to the sample three-dimensional grid model and the prediction three-dimensional grid model based on a second loss function;

and adjusting the model parameters of the three-dimensional modeling network based on the second loss value to obtain the trained three-dimensional modeling network.

8. The three-dimensional modeling method of claim 4, wherein the patch augmentation network is obtained by:

obtaining a sample grid model and a target sample grid model corresponding to the sample grid model;

inputting the sample mesh model into a patch augmentation network, and obtaining a prediction sample mesh model output by the patch augmentation network, wherein the number of patches of the prediction sample mesh model is greater than that of the sample mesh model;

obtaining a third loss value according to the target sample grid model and the prediction sample grid model based on a third loss function;

and adjusting the model parameters of the patch augmentation network based on the third loss value to obtain the trained patch augmentation network.

9. A three-dimensional modeling method, comprising:

responding to a three-dimensional modeling request of a user, acquiring a to-be-processed image corresponding to the three-dimensional modeling request, wherein the to-be-processed image comprises at least one target object, and the three-dimensional modeling request is used for indicating that the target object in the to-be-processed image is subjected to three-dimensional modeling;

performing example segmentation on the image to be processed to obtain a mask image corresponding to the image to be processed, wherein the mask image comprises the target object;

and performing three-dimensional modeling on the target object based on the mask image, and outputting a three-dimensional grid model corresponding to the target object.

10. A three-dimensional modeling apparatus, comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed, and the image to be processed comprises a target object;

the processing module is used for inputting the image to be processed into an example segmentation network, acquiring a mask image of the target object through the example segmentation network, inputting the mask image into a three-dimensional modeling network, and acquiring a three-dimensional grid model of the target object through the three-dimensional modeling network;

wherein the mask image includes the target object, the example segmentation network is obtained by training a sample object and the mask image of the sample object, and the three-dimensional modeling network is obtained by training a two-dimensional image of the sample object and a three-dimensional mesh model of the sample object.

11. A three-dimensional modeling apparatus, comprising:

the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for responding to a three-dimensional modeling request of a user and acquiring a to-be-processed image corresponding to the three-dimensional modeling request, the to-be-processed image comprises at least one target object, and the three-dimensional modeling request is used for indicating the three-dimensional modeling of the target object in the to-be-processed image;

the example segmentation module is used for carrying out example segmentation on the image to be processed to obtain a mask image corresponding to the image to be processed, wherein the mask image comprises the target object;

12. An electronic device, comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions for execution by the at least one processor to enable the electronic device to perform the three-dimensional modeling method of any of claims 1-8 and/or to perform the three-dimensional modeling method of claim 9.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the three-dimensional modeling method of any one of claims 1 to 8 and/or carries out the three-dimensional modeling method of claim 9.