WO2024066689A1

WO2024066689A1 - Model processing method, and apparatus

Info

Publication number: WO2024066689A1
Application number: PCT/CN2023/108396
Authority: WO
Inventors: 宋晗; 肖艺; 鲍文; 柳跃天; 曾柏伟
Original assignee: 华为技术有限公司
Priority date: 2022-09-29
Filing date: 2023-07-20
Publication date: 2024-04-04
Also published as: CN117830577A

Abstract

The present application relates to the technical field of media. Disclosed in the embodiments of the present application are a model processing method and an apparatus, which can obtain three-dimensional models having relatively high similarities with scenes. The method comprises: first acquiring a first three-dimensional model of a target instance in a scene and an image of the scene; and then, according to the first three-dimensional model of the target instance and the image of the scene, determining from a plurality of second three-dimensional models a target second three-dimensional model of the target instance, the target second three-dimensional model of the target instance being a three-dimensional model matched with the target instance in respect of the geometric shape and having the same style and type as the scene in which the instance is located, and the target instance being any one object or background in the scene.

Description

Model processing method and device

This application claims the priority of the Chinese patent application filed with the China Patent Office on September 29, 2022, with application number 202211197410.0 and application name “Model Processing Method and Device”, the entire contents of which are incorporated by reference in this application.

Technical Field

The embodiments of the present application relate to the field of media technology, and in particular to a model processing method and device.

Background technique

Three-dimensional reconstruction technology refers to the establishment of a mathematical model suitable for computer representation and processing of three-dimensional objects. It is the basis for processing, operating and analyzing its properties in a computer environment. It is also a key technology for establishing virtual reality in computers to express the objective world.

Three-dimensional reconstruction technology can reconstruct the real scene in three dimensions using the data of the real scene to obtain a three-dimensional model of the real scene.

However, related 3D reconstruction technologies only consider the geometric structure of the scene when performing 3D reconstruction of the real scene, resulting in a significant difference between the reconstructed 3D model of the real scene and the actual real scene.

Summary of the invention

The embodiment of the present application provides a model processing method and device, which can obtain a three-dimensional model with a high degree of similarity to the scene. To achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

In a first aspect, an embodiment of the present application provides a model processing method, the method comprising: first obtaining a first three-dimensional model of a target instance in a scene and an image of the scene. Then, according to the first three-dimensional model of the target instance and the image of the scene, a target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models. The target second three-dimensional model is a three-dimensional model that matches the geometric shape of the target instance and has the same style type as the scene where the instance is located, and the target instance is any object or background in the scene.

It can be seen that the model processing method provided in the embodiment of the present application can match a target second three-dimensional model with the same style and similar geometry as the object or background from multiple three-dimensional models through the image of the scene and the three-dimensional model of any object or background in the scene. Compared with the related three-dimensional reconstruction technology that only considers the geometric structure of the scene when performing three-dimensional reconstruction of the real scene, the model processing method provided in the embodiment of the present application not only considers the geometric shape of each instance in the scene but also considers the style type of the scene when performing three-dimensional reconstruction of the real scene, so as to be able to produce a three-dimensional model with high similarity to the scene (consistent scene style and similar geometry).

In a possible implementation, the style type of the scene may be determined according to the image of the scene, and then a target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models according to the first three-dimensional model of the target instance and the style type of the scene.

It can be seen that the embodiment of the present application can determine the style type of the scene through the image of the scene, and then determine a three-dimensional model that matches the geometric shape of the scene target instance and has the same style type as the scene where the target instance is located from multiple second three-dimensional models according to the style type of the scene and the first three-dimensional model of the target instance. Since the three-dimensional reconstruction of the real scene not only considers the geometric shape of each instance in the scene but also the style type of the scene, a three-dimensional model with a high degree of similarity to the scene (consistent scene style and similar geometric shape) can be obtained.

In a possible implementation, an image of the scene may be input into a first network to determine the style type of the scene.

For example, an image of the scene may be input into the first network to determine the style type of the scene from a plurality of preset style types.

It can be seen that the embodiment of the present application can determine the style type of the scene by inputting the image of the scene into the first network capable of determining the style type of the scene, and then determine a three-dimensional model that matches the geometric shape of the scene target instance and is the same as the style type of the scene where the target instance is located from multiple second three-dimensional models according to the style type of the scene and the first three-dimensional model of the target instance. Since the three-dimensional reconstruction of the real scene not only considers the geometric shape of each instance in the scene but also the style type of the scene, a three-dimensional model with a high degree of similarity to the scene (consistent scene style and similar geometric shape) can be obtained.

Optionally, the first three-dimensional model may be a point cloud model or a mesh model.

Optionally, the plurality of second three-dimensional models may include a computer aided design (CAD) model, a pore (a three-dimensional drawing software) model, a SolidWorks (a three-dimensional drawing software) model or a UG (a three-dimensional drawing software) model. Model.

Optionally, the image of the scene may be a key frame image of the scene, wherein the key frame image of the scene is an image required to generate a Mesh model of the scene.

Optionally, the image of the scene above may be a red, green, blue (RGB) image of the scene.

In a possible implementation manner, the first three-dimensional model of the target instance and the style type of the scene may be input into a second network to determine a target second three-dimensional model of the target instance from a plurality of second three-dimensional models.

It can be seen that the embodiment of the present application can determine the style type of the scene through the image of the scene, and then input the style type of the scene and the first three-dimensional model of the target instance in the scene into the second network to match a three-dimensional model that matches the geometric shape of the target instance of the scene and has the same style type as the scene where the target instance is located from multiple second three-dimensional models. Since the three-dimensional reconstruction of the real scene not only considers the geometric shape of each instance in the scene but also the style type of the scene, it is possible to obtain a three-dimensional model with a high degree of similarity to the scene (consistent scene style and similar geometric shape).

In a possible implementation, a segmentation operation may be performed on the first three-dimensional model of the scene to obtain the first three-dimensional model of the target instance, where the segmentation operation includes semantic segmentation and/or instance segmentation.

It should be noted that semantic segmentation assigns a category to each pixel in the image, but does not distinguish between objects in the same category. Instance segmentation, on the other hand, classifies objects in the same category.

Semantic segmentation can be used to segment instances of different categories in a scene. For example, semantic segmentation can be used to segment the sofa and table in a scene.

Instance segmentation can be used to segment instances of the same category in a scene. For example, semantic segmentation can be used to segment chairs of different categories in a scene, such as dividing chairs into office chairs and dining chairs.

In a possible implementation, the method may further include: determining the plurality of second three-dimensional models according to the plurality of three-dimensional models without material information and the images of the plurality of instances, wherein the plurality of instances include at least two instances of different style types.

It should be noted that the multiple second three-dimensional models determined based on multiple three-dimensional models without material information and multiple instance images may include multiple second three-dimensional models with the same geometric shapes but different style types and multiple second three-dimensional models with the same style type but different geometric shapes.

In a possible implementation, the method may further include: determining the plurality of second three-dimensional models according to the plurality of three-dimensional models without material information, the plurality of instance images and style classification codes, wherein the style classification codes are used to characterize the style types of the plurality of second three-dimensional models determined.

Exemplarily, a plurality of three-dimensional models without material information, a plurality of instance images and style classification codes may be input into a network training model to output a plurality of second three-dimensional models.

It should be noted that the style classification code is input into the network training, and the obtained 3D model is also a 3D model of the style type corresponding to the style classification code. In this way, the network can be trained to predict the corresponding material classification for each morphological part of the model based on the input style code and the 3D model without material (such as a CAD model), and then generate a high-realistic CAD model of a specified style. In the reasoning stage of the network, only the 3D model without material (such as a CAD model) and the style code of the desired style classification need to be input to output a 3D model of the specified style with material information. In this way, a large number of 3D models without material information can be assigned materials according to style classification.

For example, you can input 3D models of furniture such as sofas, TV cabinets, wardrobes, dining tables and coffee tables without material information, as well as images of multiple furniture and Chinese-style style classification codes, and then obtain 3D models of Chinese-style sofas, TV cabinets, wardrobes, dining tables and coffee tables through network training.

In a possible implementation manner, the second three-dimensional model of the scene may be generated according to the first three-dimensional model of the scene and the target second three-dimensional model of the target instance.

For example, the first three-dimensional model of each instance in the first three-dimensional model of the scene may be replaced with the target second three-dimensional model of each instance to obtain the second three-dimensional model of the scene.

In a possible implementation, the target position of the target instance can be determined according to the image of the scene, and then the first three-dimensional model of the target instance in the first three-dimensional model of the scene is deleted, and then the target second three-dimensional model of the target instance is set at the target position of the target instance to generate the second three-dimensional model of the scene. The target position is used to indicate the position of the first three-dimensional model of the target instance in the first three-dimensional model of the scene.

It can be seen that the method provided in the embodiment of the present application can replace the first 3D model of each instance in the first 3D model of the scene with a target second 3D model that is similar in geometry to the instance and consistent with the scene style type. The model obtains a second 3D model of the scene. Since the 3D reconstruction of the real scene not only considers the geometric shape of each instance in the scene but also the style type of the scene, a 3D model with high similarity to the scene (consistent scene style and similar geometric shape) can be obtained.

In a possible implementation, the method may further include: receiving an editing operation, the editing operation being used to instruct editing of a target instance in the second three-dimensional model of the scene, and in response to the editing operation, editing the target instance in the second three-dimensional model of the scene.

Optionally, the editing operation may include a moving operation, where the moving operation is used to instruct moving a target instance in the second three-dimensional model of the scene.

In a possible implementation manner, in response to the movement operation, a target second three-dimensional model of a target instance in the second three-dimensional model of the scene may be moved from a first position to a second position.

It should be noted that the 3D model obtained by the related technology is a whole, and each object in the 3D model cannot be edited separately. In the model processing method provided in the embodiment of the present application, since the second 3D model of the scene is obtained by combining the 3D models of each instance in the scene, each instance of the second 3D model of the scene can be moved in the second 3D model of the scene through a move operation. Compared with the 3D model obtained by the related technology, the 3D model obtained in the embodiment of the present application is more flexible.

For example, taking the target instance as the dining table in the center of the house, the user can select the dining table in the three-dimensional model of the house by touching the screen with a finger, and move the dining table in the three-dimensional model of the house by moving the finger touching the screen.

Optionally, the editing operation may include a deleting operation, where the deleting operation is used to instruct deleting a target instance in the second three-dimensional model of the scene.

In a possible implementation manner, in response to the deletion operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene may be deleted from the second three-dimensional model of the scene.

It should be noted that the 3D model obtained by the related art is a whole, and each object in the 3D model cannot be edited separately. In the model processing method provided in the embodiment of the present application, since the second 3D model of the scene is obtained by combining the 3D models of each instance in the scene, each instance of the second 3D model of the scene can be deleted separately through the deletion operation. Compared with the 3D model obtained by the related art, the 3D model obtained by the embodiment of the present application is more flexible.

For example, taking the target instance as a sofa in a house, the user can touch the screen with a finger to select the sofa in the three-dimensional model of the house and drag it outside the three-dimensional model, thereby deleting the sofa in the three-dimensional model of the house.

Optionally, the editing operation may include a replacement operation for indicating replacing a target instance in the second three-dimensional model of the scene with a preset instance.

In a possible implementation manner, in response to the replacement operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene may be replaced with the target second three-dimensional model of the preset instance.

It should be noted that the 3D model obtained by the related technology is a whole, and each object in the 3D model cannot be edited separately. In the model processing method provided in the embodiment of the present application, since the second 3D model of the scene is obtained by combining the 3D models of each instance in the scene, each instance of the second 3D model of the scene can be replaced separately through a replacement operation. Compared with the 3D model obtained by the related technology, the 3D model obtained by the embodiment of the present application is more flexible.

For example, taking the target instance as a Chinese-style tea table in a house, the user can select the Chinese-style tea table in the three-dimensional model of the house by touching the screen with his finger, enter the three-dimensional model library by long pressing the screen, and then select a modern-style tea table from the three-dimensional model library to replace the Chinese-style tea table in the three-dimensional scene of the house with a modern-style tea table.

In a possible implementation, the three-dimensional position of the target second three-dimensional model of the target instance may be determined. Then, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted. Then, the target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance. The three-dimensional position is used to indicate the position of the target second three-dimensional model of the target instance in the second three-dimensional model of the scene.

It should be noted that the 3D model obtained by the related technology is a whole, and each object in the 3D model cannot be edited separately. In the model processing method provided in the embodiment of the present application, since the second 3D model of the scene is obtained by combining the 3D models of each instance in the scene, each instance of the second 3D model of the scene can determine the 3D position of the target second 3D model of the target instance through replacement operation. Then, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted, and then the target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance, so as to replace the target instance in the three-dimensional model individually. Compared with the three-dimensional model obtained by the related technology, the three-dimensional model obtained in the embodiment of the present application is more flexible.

In a second aspect, an embodiment of the present application provides another model processing method, the method comprising: receiving an editing operation. In response to the editing operation, editing a target instance in a second three-dimensional model of the scene. The editing operation is used to indicate editing a target instance in the second three-dimensional model of the scene, the second three-dimensional model of the scene includes a target second three-dimensional model of the target instance in the scene, the target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models by a first three-dimensional model of the target instance of the scene and an image of the scene, the target second three-dimensional model is a three-dimensional model that matches the geometry of the target instance and is of the same style type as the scene where the target instance is located, and the target instance is any object or background in the scene.

It can be seen that the model processing method provided in the embodiment of the present application can obtain a second three-dimensional model of the scene through a three-dimensional model of an instance that is consistent in style and similar in geometry to the instance in the scene (i.e., the object and the background). Compared with the related three-dimensional reconstruction technology that only considers the geometric structure of the scene when performing three-dimensional reconstruction of the real scene, the model processing method provided in the embodiment of the present application not only considers the geometric shape of each instance in the scene but also considers the style type of the scene when performing three-dimensional reconstruction of the real scene, so that a three-dimensional model with a high degree of similarity to the scene (consistent in scene style and similar in geometry) can be obtained.

In a possible implementation, the three-dimensional position of the target second three-dimensional model of the target instance can be determined. Then, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted. Then, the target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance. The three-dimensional position is used to indicate the target second three-dimensional model of the target instance. The position of the second 3D model within the second 3D model of the scene.

It should be noted that the three-dimensional model obtained by the related technology is a whole, and each object in the three-dimensional model cannot be edited separately. In the model processing method provided in the embodiment of the present application, since the second three-dimensional model of the scene is obtained by combining the three-dimensional models of each instance in the scene, each instance of the second three-dimensional model of the scene can determine the three-dimensional position of the target second three-dimensional model of the target instance through a replacement operation, and then delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene, and then set the target second three-dimensional model of the preset instance at the three-dimensional position of the target second three-dimensional model of the target instance, so as to replace the target instance in the three-dimensional model separately. Compared with the three-dimensional model obtained by the related technology, the three-dimensional model obtained in the embodiment of the present application is more flexible.

In a third aspect, an embodiment of the present application provides a model processing device, which includes: a transceiver unit and a processing unit. The transceiver unit is used to obtain a first three-dimensional model of a target instance in a scene and an image of the scene, wherein the target instance is any object or background in the scene. The processing unit is used to determine a target second three-dimensional model of the target instance from multiple second three-dimensional models based on the first three-dimensional model of the target instance and the image of the scene, wherein the target second three-dimensional model is a three-dimensional model that matches the geometric shape of the target instance and has the same style type as the scene where the target instance is located.

In a possible implementation, the processing unit is specifically used to: determine the style type of the scene based on the image of the scene; and determine the target second three-dimensional model of the target instance from multiple second three-dimensional models based on the first three-dimensional model of the target instance and the style type of the scene.

In a possible implementation manner, the processing unit is specifically configured to: input the image of the scene into a first network to determine the style type of the scene.

In a possible implementation, the processing unit is specifically configured to: input the first three-dimensional model of the target instance and the style type of the scene into a second network to determine a target second three-dimensional model of the target instance from a plurality of second three-dimensional models.

In a possible implementation, the transceiver unit is specifically configured to: perform a segmentation operation on the first three-dimensional model of the scene to obtain the first three-dimensional model of the target instance, wherein the segmentation operation includes semantic segmentation and/or instance segmentation.

In a possible implementation, the processing unit is further configured to: determine the plurality of second three-dimensional models according to the plurality of three-dimensional models without material information and the images of the plurality of instances, wherein the plurality of instances include at least two instances of different style types.

In a possible implementation manner, the processing unit is further configured to: generate a second three-dimensional model of the scene according to the first three-dimensional model of the scene and a target second three-dimensional model of the target instance.

In a possible implementation, the processing unit is specifically configured to: determine a target position of the target instance according to the image of the scene, the target position being used to indicate a position of a first three-dimensional model of the target instance in a first three-dimensional model of the scene, delete the first three-dimensional model of the target instance in the first three-dimensional model of the scene, and set a target second three-dimensional model of the target instance at the target position of the target instance to generate a second three-dimensional model of the scene.

In a possible implementation manner, the transceiver unit is further used to: receive an editing operation, where the editing operation is used to instruct editing of a target instance in the second three-dimensional model of the scene.

In a possible implementation, the processing unit is further configured to edit the target instance in the second three-dimensional model of the scene in response to the editing operation.

Optionally, the editing operation includes a moving operation, and the moving operation is used to instruct to move a target instance in the second three-dimensional model of the scene.

In a possible implementation manner, the processing unit is specifically configured to: in response to the movement operation, move a target second three-dimensional model of a target instance in a second three-dimensional model of the scene from a first position to a second position.

Optionally, the editing operation includes a deleting operation, and the deleting operation is used to indicate deleting a target instance in the second three-dimensional model of the scene.

In a possible implementation manner, the processing unit is specifically configured to: in response to a deletion operation, delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene from the second three-dimensional model of the scene.

Optionally, the editing operation includes a replacement operation, and the replacement operation is used to instruct to replace a target instance in the second three-dimensional model of the scene with a preset instance.

In a possible implementation manner, the processing unit is specifically configured to: in response to the replacement operation, replace the target second three-dimensional model of the target instance in the second three-dimensional model of the scene with the target second three-dimensional model of the preset instance.

In a possible implementation, the processing unit is specifically used to: determine a three-dimensional position of the target second three-dimensional model of the target instance, where the three-dimensional position is used to indicate a position of the target second three-dimensional model of the target instance in the second three-dimensional model of the scene. Delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene. The three-dimensional position of the preset instance is set as the target second three-dimensional model.

In a fourth aspect, an embodiment of the present application provides another model processing device, which includes: a transceiver unit and a processing unit. The transceiver unit is used to receive an editing operation. The processing unit is used to edit a target instance in the second three-dimensional model of the scene in response to the editing operation. The editing operation is used to indicate the target instance in the second three-dimensional model of the editing scene, and the second three-dimensional model of the scene includes a target second three-dimensional model of the target instance in the scene, and the target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models by a first three-dimensional model of the target instance of the scene and an image of the scene, and the target second three-dimensional model is a three-dimensional model that matches the geometry of the target instance and has the same style type as the scene where the target instance is located, and the target instance is any object or background in the scene.

In a possible implementation, the processing unit is specifically configured to: determine a three-dimensional position of a target second three-dimensional model of the target instance, the three-dimensional position being used to indicate a position of the target second three-dimensional model of the target instance in a second three-dimensional model of a scene; delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene; and set the target second three-dimensional model of the preset instance at the three-dimensional position of the target second three-dimensional model of the target instance.

In a fifth aspect, an embodiment of the present application further provides a model processing device, which includes: at least one processor, when the at least one processor executes program code or instructions, it implements the method described in the above first aspect or any possible implementation method thereof.

Optionally, the model processing device may further include at least one memory, and the at least one memory is used to store the program code or instruction.

In a sixth aspect, an embodiment of the present application further provides a chip, comprising: an input interface, an output interface, and at least one processor. Optionally, the chip further comprises a memory. The at least one processor is used to execute the code in the memory, and when the at least one processor executes the code, the chip implements the method described in the first aspect or any possible implementation thereof.

Optionally, the above chip may also be an integrated circuit.

In a seventh aspect, an embodiment of the present application further provides a computer-readable storage medium for storing a computer program, wherein the computer program includes methods for implementing the method described in the above-mentioned first aspect or any possible implementation thereof.

In an eighth aspect, an embodiment of the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to implement the method described in the first aspect or any possible implementation thereof.

The model processing device, computer storage medium, computer program product and chip provided in this embodiment are all used to execute the method provided above. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the method provided above and will not be repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

FIG1 is a schematic diagram of the structure of a model processing system provided in an embodiment of the present application;

FIG2 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application;

FIG3 is a schematic diagram of the structure of a model processing device provided in an embodiment of the present application;

FIG4 is a schematic diagram of the structure of another electronic device provided in an embodiment of the present application;

FIG5 is a schematic diagram of a flow chart of a model processing method provided in an embodiment of the present application;

FIG6 is a schematic diagram of an editing page provided in an embodiment of the present application;

FIG7 is a flow chart of another model processing method provided in an embodiment of the present application;

FIG8 is a flow chart of another model processing method provided in an embodiment of the present application;

FIG9 is a schematic diagram of the structure of another model processing device provided in an embodiment of the present application;

FIG10 is a schematic diagram of the structure of another model processing device provided in an embodiment of the present application;

FIG11 is a schematic diagram of the structure of a chip provided in an embodiment of the present application;

FIG. 12 is a schematic diagram of the structure of another electronic device provided in an embodiment of the present application.

Detailed ways

The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the embodiments of the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the embodiments of the present application.

The term "and/or" in this article is merely a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.

The terms "first" and "second" and the like in the description and drawings of the embodiments of the present application are used to distinguish different objects, or to distinguish different processing of the same object, rather than to describe a specific order of objects.

In addition, the terms "including" and "having" and any variations thereof mentioned in the description of the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device including a series of steps or units is not limited to the listed steps or units, but may optionally include other steps or units that are not listed, or may optionally include other steps or units that are inherent to these processes, methods, products or devices.

It should be noted that, in the description of the embodiments of the present application, words such as "exemplarily" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplarily" or "for example" in the embodiments of the present application should not be interpreted as having priority or advantage over other embodiments or designs. Specifically, the use of words such as "exemplarily" or "for example" is intended to present related concepts in a specific way.

In the description of the embodiments of the present application, unless otherwise specified, “plurality” means two or more.

However, related 3D reconstruction technologies only consider the geometric structure of the scene when performing 3D reconstruction of the real scene, resulting in a significant difference between the reconstructed 3D model of the real scene and the real scene.

To this end, an embodiment of the present application provides a model processing method that can obtain a three-dimensional model with a high degree of similarity to the scene. The method can be applied to a model processing system.

Fig. 1 shows a possible existence form of the above model processing system. As shown in Fig. 1, the above model processing system includes: a model processing device and a plurality of electronic devices.

The electronic device is used to determine a first three-dimensional model of a scene based on data collected by a sensor, and transmit the first three-dimensional model of the scene and an image of the scene to a model processing device.

For example, the electronic device can collect data such as the posture of the electronic device, the image of the scene (such as a key frame RGB image, a depth map of the scene) through a sensor and use these data as input to reconstruct the Mesh model of the scene and extract the vertex data of the scene in the Mesh model of the scene. Then, the vertex data of the scene is used as input to output the first three-dimensional model of the scene. Finally, the image of the scene and the first three-dimensional model of the scene are uploaded to the model processing device through the network transmission unit.

A model processing device is used to execute the model processing method provided in the embodiment of the present application.

The electronic device is also used to receive user operations and edit (such as move, delete and replace) instances in the three-dimensional model of the scene according to the user operations.

Optionally, the electronic device may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (PDA).

FIG. 2 shows a possible existence form of the electronic device mentioned above. As shown in FIG. 2 , the electronic device may include: a sensor unit, a computing unit, a storage unit and a network transmission unit.

Optionally, the sensor unit may include: a visual sensor, a depth sensor and other sensors.

Visual sensor, used to obtain image information of the scene.

Optionally, the visual sensor may be a camera or other device with a visual acquisition function.

Depth sensor, used to obtain scene depth information.

Optionally, the depth sensor can be an indirect time of flight (Indirect Time of Fly, iToF), a direct time of flight (Direct Time of Fly, dToF) or other device with visual acquisition function.

The network transmission unit is used for communicating and exchanging data with other devices (such as a model processing device or other electronic devices).

Optionally, the network transmission unit may include a Wireless Fidelity (WiFi) communication unit, a 4th generation mobile communication technology (4G) communication unit, a 5th generation mobile communication technology (5G) communication unit and other communication units.

A computing unit is used to run the operating system of the electronic device and use a reconstruction algorithm to reconstruct the scene based on the data collected by the sensor of the electronic device (such as using a real-time Mesh reconstruction algorithm to reconstruct the scene), and use a depth estimation method (such as a monocular depth estimation method) to obtain the depth information of the image based on the image information of the scene.

Optionally, the computing unit may include a central processing unit (CPU), a graphics processing unit (GPU), a cache, and registers.

The storage unit is used to store data of the electronic device.

Optionally, the storage unit may include internal storage and external storage.

FIG3 shows a possible existence form of the above-mentioned model processing device. As shown in FIG3 , the model processing device may include: a computing unit, a storage unit and a network transmission unit.

A computing unit is used to execute the model processing method provided in the embodiment of the present application.

The network transmission unit is used for communicating and exchanging data with other devices (such as other electronic devices).

A storage unit for data of the model processing device.

FIG4 shows another possible existence form of the electronic device mentioned above. As shown in FIG4 , the electronic device may include: a computing unit, a storage unit, a display unit, an interaction unit and a network transmission unit.

The computing unit is used to run the operating system of the electronic device and edit (such as move, delete and replace) the three-dimensional model of the scene according to the user operation (instruction).

The storage unit is used to store data of the electronic device.

Display unit, used for visual display.

Optionally, the display unit may be a display screen.

The interaction unit is used to receive user instructions.

In a possible implementation, the interaction unit may receive the user instruction through the interaction operation device.

Optionally, the above-mentioned interactive operation device may include a device with interactive operation function such as a mouse, a keyboard or a touch screen.

FIG5 shows a model processing method provided by an embodiment of the present application. The method can be executed by a model processing device in the above-mentioned model processing system. As shown in FIG5 , the method includes:

S501: Acquire a first three-dimensional model of a target instance in a scene and an image of the scene.

The target instance is any object or background in the scene.

Exemplarily, the model processing device may receive a first three-dimensional model of a target instance in a scene and an image of the scene sent by an electronic device.

As another example, the model processing device can collect the posture of the model processing device, the image of the scene (such as the key frame RGB image of the scene), the depth map, etc. as input through the sensor unit, and output the Mesh model of the scene. Then, the computing unit outputs the first three-dimensional model of the target instance in the scene according to the vertex data of the Mesh model of the scene. Thus, the first three-dimensional model of the target instance in the scene and the image of the scene are obtained.

The above-mentioned model processing device can be a mobile terminal. Compared with the reconstruction method based on special instruments (laser scanner, panoramic camera, etc.) in the related technology to obtain the first three-dimensional model of the scene or the target instance in the scene, the embodiment of the present application only uses a mobile terminal and does not require more complicated operations to obtain the first three-dimensional model of the scene or the target instance in the scene.

The vertex data of the Mesh model is used to obtain the first three-dimensional model of the scene. Compared with the related technology based on multi-view stereo matching algorithm, It can effectively solve the problems of incomplete reconstruction of weak texture areas by traditional algorithms and missing reconstructed objects due to incomplete scanning areas.

Optionally, the image of the scene may be an RGB image of the scene.

In a possible implementation, a segmentation operation may be performed on the first three-dimensional model of the scene to obtain a first three-dimensional model of the target instance in the scene, and the segmentation operation includes semantic segmentation and/or instance segmentation.

S502: Determine a target second three-dimensional model of the target instance from a plurality of second three-dimensional models according to the first three-dimensional model of the target instance and the image of the scene.

The target second three-dimensional model of the target instance is a three-dimensional model that matches the geometric shape of the target instance and has the same style type as the scene where the above instance is located.

Optionally, the target second three-dimensional model of the target instance may also be a three-dimensional model that matches the geometric shape of the target instance and has the same style type and the same item category as the scene where the above instance is located.

In a possible implementation, the style type of the scene may be determined according to the image of the scene, and then a target second 3D model of the target instance is determined from a plurality of second 3D models according to the first 3D model of the target instance and the style type of the scene.

The above style types may include classical style, modern style, business style, Chinese style, Nordic style, Japanese style, etc.

Optionally, the plurality of second three-dimensional models may include a CAD model, a pore model, a SolidWorks model or a UG model.

Exemplarily, the target second three-dimensional model of the target instance may be determined from a plurality of CAD models according to the style type of the scene and the first three-dimensional model of the target instance.

Exemplarily, the style type of the scene can be determined as Chinese style based on the image of the scene. Then, the geometric shape of the target instance is determined based on the first three-dimensional model of the target instance. Then, based on the style type of the scene and the geometric shape of the target instance, a second three-dimensional model of Chinese style matching the geometric shape of the target instance is matched in a model library containing multiple second three-dimensional models.

It should be noted that the specific method for determining the style type of the above scene based on the image of the above scene can be processed by any method that can be thought of by those skilled in the art, and the embodiments of the present application do not specifically limit this.

In a possible implementation, the image of the scene may be input into a first network to determine the style type of the scene.

For example, a first network using a ResNext (an image classification network structure) network architecture as a backbone can be used to take an image of a scene as input and output a style classification of the scene.

It should be noted that the specific method for determining the target second three-dimensional model of the target instance from multiple second three-dimensional models according to the first three-dimensional model of the target instance and the style type of the scene can be processed by any method that can be thought of by a person skilled in the art, and the embodiment of the present application does not specifically limit this. For example, the first three-dimensional model of the target instance and the style type of the scene can be input into the second network to determine the target second three-dimensional model of the target instance from multiple second three-dimensional models.

Taking a target instance in a scene as an example, the following describes how to input the first 3D model of the target instance and the style type of the scene into a second network to determine a target second 3D model of the target instance from multiple second 3D models.

First, the first 3D model of the target instance is input into the geometric encoding network (such as Point Autoencoder) in the second network to obtain the geometric encoding of the target instance. The Euclidean distance between the geometric morphology encoding and the geometric morphology encoding of the target instance is calculated, and then N (such as 10) second 3D models with the smallest distance are used as replacement candidate second 3D models.

After that, each replacement candidate second 3D model is projected onto the 2D image to obtain the corresponding front view, top view, and left view. Then, the Intersection over Union (IOU) is calculated with the semantic segmentation map containing the target instance, and the second 3D model with the highest multi-view IOU average is determined as the target second 3D model of the target instance.

It should be noted that the model processing method provided in the embodiment of the present application is based on a three-dimensional model retrieval and matching algorithm with style consistency, which can solve the shortcomings of inconsistent styles of different instances (such as furniture) in the reconstructed scene, resulting in poor coordination of the reconstructed scene model.

For example, in a scene with a business style, there is a traditional style chair, which looks inconsistent. After being processed by the model processing method provided in the embodiment of the present application, the overall scene style will be determined to be business style through scene style matching. According to the current algorithm, during model retrieval and replacement, the traditional style chair will be replaced with the business style one, ensuring the style consistency of the reconstructed scene.

It can be seen that the model processing method provided in the embodiment of the present application can match a target second three-dimensional model with the same style and similar geometry as the object or background from multiple three-dimensional models through the image of the scene and the three-dimensional model of any object or background in the scene. Compared with the related three-dimensional reconstruction technology that only considers the geometric structure of the scene when performing three-dimensional reconstruction of the real scene, the model processing method provided in the embodiment of the present application not only considers the geometric shape of each instance in the scene but also considers the style type of the scene when performing three-dimensional reconstruction of the real scene, so as to obtain a three-dimensional model with high similarity to the scene (consistent scene style and similar geometry).

Optionally, the method provided in the embodiment of the present application may further include:

S503: Generate a second three-dimensional model of the scene according to the first three-dimensional model of the scene and the target second three-dimensional model of the target instance.

It can be seen that the method provided in the embodiment of the present application can replace the first three-dimensional model of each instance in the first three-dimensional model of the above scene with a target second three-dimensional model that is similar to the instance geometry and consistent with the scene style type. Then, the second three-dimensional model of the scene is obtained through the three-dimensional model of the instance. Since the three-dimensional reconstruction of the real scene not only considers the geometry of each instance in the scene but also the style type of the scene, a three-dimensional model with a high degree of similarity to the scene (consistent scene style and similar geometry) can be obtained.

It should be noted that the specific method for determining the target position of the above-mentioned target instance based on the image of the above-mentioned scene can be processed by any method that can be thought of by a person skilled in the art, and the embodiments of the present application do not specifically limit this. For example, a voting (Canonical Voting) algorithm can be used to obtain the oriented bounding box (Oriented Bounding Box) of the first three-dimensional model of each instance, and then the pose of the target instance in the camera coordinate system is determined based on the image of the scene (the key frame RGB image of the scene), and then the pose of the target instance in the camera coordinate system is converted into the pose of the world coordinate system. Finally, according to the image of the scene, the target instance can be back-projected to find the corresponding predicted oriented bounding box (Oriented Bounding Box) to determine the target position of the target instance. After that, the target second three-dimensional model of the target instance can be resized to be similar to the size of the bounding box of the instance, and the point cloud in the bounding box in the first three-dimensional model of the scene can be erased, and the model of the target instance can be placed in the first three-dimensional model of the scene according to the bounding box.

Each instance in the first three-dimensional model of the scene is processed as above, and a reconstructed new combined scene model (ie, the second three-dimensional model of the scene) can be output.

Optionally, the method may further include:

S504: Determine the plurality of second three-dimensional models according to the plurality of three-dimensional models without material information and the plurality of instance images.

The multiple instances mentioned above include at least two instances of different style types.

For example, the three-dimensional models of a Chinese-style sofa, a business-style sofa, a Chinese-style tea table, and a business-style tea table can be obtained based on the three-dimensional models of a sofa and a tea table without material information and the images of Chinese-style and business-style instances.

In a possible implementation, the plurality of second three-dimensional models may be determined based on a plurality of three-dimensional models without material information, a plurality of instance images, and style classification codes. The style classification codes are used to characterize the style categories of the plurality of second three-dimensional models determined. type.

S505: Receive and respond to the editing operation.

The above-mentioned editing operation is used to indicate the target instance in the second three-dimensional model of the editing scene.

As another example, taking the target instance as a dining table in the center of a house, the user can select the dining table in the three-dimensional model of the house with the mouse, and move the dining table in the three-dimensional model of the house with the mouse.

Also exemplarily, as shown in FIG6 , taking the target instance as table 1 in the scene, the user can modify the three-dimensional coordinates (x, y, z) of the table through the keyboard to move table 1 in the second three-dimensional model of the scene in the second three-dimensional model of the scene.

As another example, as shown in FIG6 , taking the target instance as the sofa 1 in the scene, the user can delete the sofa 1 in the second three-dimensional model of the scene by clicking the delete symbol “X” on the right side of the screen with the mouse.

In a possible implementation, the three-dimensional position of the target second three-dimensional model of the target instance may be determined, and the three-dimensional position is used to indicate the position of the target second three-dimensional model of the instance in the second three-dimensional model of the scene. Then, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted. Then, the target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance.

It should be noted that the three-dimensional model obtained by the related technology is a whole, and each object in the three-dimensional model cannot be edited separately. In the model processing method provided in the embodiment of the present application, since the second three-dimensional model of the scene is obtained by combining the three-dimensional models of each instance in the scene, each instance of the second three-dimensional model of the scene can determine the three-dimensional position of the target second three-dimensional model of the target instance through a replacement operation, and then delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene, and then set the target second three-dimensional model of the preset instance at the three-dimensional position of the target second three-dimensional model of the target instance, so as to replace the above target instance in the three-dimensional model separately. Compared with the three-dimensional model obtained by the related technology, the three-dimensional model obtained in the embodiment of the present application is more flexible.

FIG. 7 shows another model processing method provided by an embodiment of the present application. The method may be executed by an electronic device in the above-mentioned model processing system. As shown in FIG. 7 , the method includes:

S701: Receive an editing operation.

The editing operation is used to instruct editing of a target instance in the second three-dimensional model of the scene.

S702: In response to the editing operation, edit the target instance in the second three-dimensional model of the scene.

Optionally, the method may further include:

S703: Obtain a second three-dimensional model of the scene.

Exemplarily, the electronic device may receive the second three-dimensional model of the scene sent by other devices (such as a model processing apparatus or other devices).

For another example, the electronic device may download the second three-dimensional model of the scene from a server, wherein the server is used to store the second three-dimensional model of the scene generated by the model processing device.

FIG8 shows another model processing method provided in an embodiment of the present application. The method is applicable to the above-mentioned model processing system. As shown in FIG8 , the method includes:

S801. A first electronic device sends a first three-dimensional model of a target instance in a scene and an image of the scene to a model processing device.

Correspondingly, the model processing device receives the first three-dimensional model of the target instance in the scene and the image of the scene sent by the first electronic device.

Exemplarily, the first electronic device can collect the posture of the electronic device, the image of the scene (such as the key frame RGB image of the scene), the depth map, etc. as input through the sensor unit, and output the Mesh model of the scene. Then, the computing unit outputs the first three-dimensional model of the target instance in the scene according to the vertex data of the Mesh model of the scene. Thus, the first three-dimensional model of the target instance in the scene and the image of the scene are obtained. Then, the first three-dimensional model of the target instance in the scene and the image of the scene are sent to the model processing device through the network transmission unit.

The specific implementation of S801 can refer to the specific implementation of S501 in the above model processing method, which will not be repeated here.

S802: The model processing device determines a target second three-dimensional model of the target instance in the scene from multiple second three-dimensional models according to the first three-dimensional model of the target instance in the scene and the image of the scene.

Exemplarily, the model processing device may determine the style type of the scene according to the image of the scene, and then determine the target second 3D model of the target instance from multiple second 3D models according to the first 3D model of the target instance and the style type of the scene.

The specific implementation of S802 can refer to the specific implementation of S502 in the above model processing method, which will not be repeated here.

S803: The model processing device generates a second three-dimensional model of the scene according to the first three-dimensional model of the scene and the target second three-dimensional model of the target instance.

Exemplarily, the model processing device may determine the target position of each of the above instances based on the image of the above scene. Delete the first three-dimensional model of each of the above instances in the first three-dimensional model of the above scene. Set the target second three-dimensional model of each instance at the position of each of the above instances. The target position is used to indicate the position of the first three-dimensional model of the instance in the first three-dimensional model of the scene.

The specific implementation of S803 can refer to the specific implementation of S502 in the above model processing method, which will not be repeated here.

S804: The model processing device sends a second three-dimensional model of the scene to the second electronic device.

Correspondingly, the second electronic device downloads the second three-dimensional model of the scene from the model processing device.

Exemplarily, the second electronic device may send a download request for the second three-dimensional model of the scene to the model processing device through the network transmission unit and receive the second three-dimensional model of the scene sent by the model processing device through the network transmission unit.

The specific implementation of S804 can refer to the specific implementation of S701 in the above model processing method, which will not be repeated here.

S805: The second electronic device receives and edits the target instance in the second three-dimensional model of the scene in response to the editing operation.

Exemplarily, the second electronic device may receive a move operation, and in response to the move operation, move the target second three-dimensional model of the target instance in the second three-dimensional model of the scene from the first position to the second position, wherein the move operation is used to indicate moving the target instance in the second three-dimensional model of the scene.

In another exemplary embodiment, the second electronic device may receive a deletion operation, and in response to the deletion operation, delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene from the three-dimensional scene. The deletion operation is used to indicate the deletion of the target instance in the second three-dimensional model of the scene.

Also illustratively, the second electronic device may receive a replacement operation, and in response to the replacement operation, replace the target second three-dimensional model of the target instance in the second three-dimensional model of the scene with the target second three-dimensional model of the preset instance.

The specific implementation of S805 can refer to the specific implementation of S704 in the above model processing method, which will not be repeated here.

The model processing device for executing the above-mentioned model processing method will be introduced below in conjunction with FIG. 9 .

It is understandable that, in order to realize the above functions, the model processing device includes hardware and/or software modules corresponding to the execution of each function. In combination with the algorithm steps of each example described in the embodiments disclosed herein, the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application in combination with the embodiments, but such implementation should not be considered to exceed the scope of the embodiments of the present application.

The embodiment of the present application can divide the model processing device into functional modules according to the above method example. For example, each functional module can be divided according to each function, or two or more functions can be integrated into one processing module. The above integrated module can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. There may be other division methods in actual implementation.

In the case of dividing each functional module according to each function, FIG9 shows a possible composition diagram of the model processing device involved in the above embodiment. As shown in FIG9 , the model processing device 900 may include: a transceiver unit 901 and a processing unit 902 .

The transceiver unit 901 is used to obtain a first three-dimensional model of a target instance in a scene and an image of the scene.

Processing unit 902 is used to determine a target second three-dimensional model of the target instance from multiple second three-dimensional models based on the first three-dimensional model of the target instance and the image of the scene, wherein the target second three-dimensional model is a three-dimensional model that matches the geometric shape of the target instance and has the same style type as the scene where the target instance is located.

The target instance is any object or background in the scene.

In a possible implementation, the processing unit 902 is specifically used to: determine the style type of the scene based on the image of the scene; determine the target second three-dimensional model of the target instance from multiple second three-dimensional models based on the first three-dimensional model of the target instance and the style type of the scene.

In a possible implementation, the processing unit 902 is specifically configured to: input the image of the scene into a first network to determine the style type of the scene.

In a possible implementation, the processing unit 902 is specifically configured to: input the first three-dimensional model of the target instance and the style type of the scene into a second network to determine a target second three-dimensional model of the target instance from multiple second three-dimensional models.

In a possible implementation, the transceiver unit 901 is specifically used to: perform a segmentation operation on the first three-dimensional model of the scene to obtain the first three-dimensional model of the target instance, where the segmentation operation includes semantic segmentation and/or instance segmentation.

In a possible implementation, the processing unit 902 is further configured to determine the plurality of second three-dimensional models according to the plurality of three-dimensional models without material information and the images of the plurality of instances, wherein the plurality of instances include at least two instances of different style types.

In a possible implementation, the processing unit 902 is further configured to generate a second three-dimensional model of the scene according to the first three-dimensional model of the scene and the target second three-dimensional model of the target instance.

In a possible implementation, the processing unit 902 is specifically used to: determine a target position of the target instance according to the image of the scene, the target position being used to indicate a position of a first three-dimensional model of the target instance in the first three-dimensional model of the scene. Delete the first three-dimensional model of the target instance in the first three-dimensional model of the scene. Set a target second three-dimensional model of the target instance at the target position of the target instance to generate a second three-dimensional model of the scene.

In a possible implementation, the processing unit 902 is further configured to: edit the target instance in the second three-dimensional model of the scene in response to the editing operation.

Optionally, the editing operation includes a moving operation, wherein the moving operation is used to indicate moving a portion of the second three-dimensional model of the scene. Target instance.

In a possible implementation, the processing unit 902 is specifically configured to: in response to the movement operation, move a target second three-dimensional model of a target instance in the second three-dimensional model of the scene from a first position to a second position.

In a possible implementation manner, the processing unit 902 is specifically configured to: in response to a deletion operation, delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene from the second three-dimensional model of the scene.

Optionally, the editing operation includes a replacement operation, and the replacement operation is used to indicate replacing a target instance in the second three-dimensional model of the scene with a preset instance.

In a possible implementation, the processing unit 902 is specifically configured to: in response to the replacement operation, replace the target second three-dimensional model of the target instance in the second three-dimensional model of the scene with the target second three-dimensional model of the preset instance.

In a possible implementation, the processing unit 902 is specifically configured to: determine a three-dimensional position of a target second three-dimensional model of the target instance, the three-dimensional position being used to indicate a position of the target second three-dimensional model of the target instance in a second three-dimensional model of a scene. Delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene. Set the target second three-dimensional model of the preset instance at the three-dimensional position of the target second three-dimensional model of the target instance.

In the case of dividing each functional module according to each function, Figure 10 shows another possible composition diagram of the model processing device involved in the above embodiment. As shown in Figure 10, the model processing device 1000 may include: a transceiver unit 1001 and a processing unit 1002.

The transceiver unit 1001 is used to receive an editing operation.

The processing unit 1002 is configured to edit the target instance in the second three-dimensional model of the scene in response to the editing operation.

In which, the editing operation is used to indicate the target instance in the second three-dimensional model of the editing scene, the second three-dimensional model of the scene includes a target second three-dimensional model of the target instance in the scene, the target second three-dimensional model of the target instance is determined from multiple second three-dimensional models by the first three-dimensional model of the target instance of the scene and the image of the scene, the target second three-dimensional model is a three-dimensional model that matches the geometry of the target instance and has the same style type as the scene where the target instance is located, and the target instance is any object or background in the scene.

In a possible implementation, the processing unit 1002 is specifically configured to: in response to the moving operation, move a target second three-dimensional model of a target instance in a second three-dimensional model of the scene from a first position to a second position.

In a possible implementation manner, the processing unit 1002 is specifically configured to: in response to a deletion operation, delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene from the second three-dimensional model of the scene.

In a possible implementation, the processing unit 1002 is specifically configured to: in response to the replacement operation, replace the target second three-dimensional model of the target instance in the second three-dimensional model of the scene with the target second three-dimensional model of the preset instance.

In a possible implementation, the processing unit 1002 is specifically used to: determine a three-dimensional position of a target second three-dimensional model of the target instance, the three-dimensional position being used to indicate a position of the target second three-dimensional model of the target instance in a second three-dimensional model of a scene. Delete the target second three-dimensional model of the target instance in the second three-dimensional model of the scene. Set the target second three-dimensional model of the preset instance at the three-dimensional position of the target second three-dimensional model of the target instance.

The embodiment of the present application further provides a chip. FIG11 shows a schematic diagram of the structure of a chip 1100. The chip 1100 includes one or more processors 1101 and an interface circuit 1102. Optionally, the chip 1100 may also include a bus 1103.

The processor 1101 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above message processing method may be completed by an integrated logic circuit of hardware in the processor 1101 or by instructions in the form of software.

Optionally, the processor 1101 may be a general purpose processor, a digital signal processing (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or a processor. The general purpose processor may be a microprocessor or any conventional processor.

The interface circuit 1102 can be used to send or receive data, instructions or information. The processor 1101 can use the data, instructions or other information received by the interface circuit 1102 to process, and can send the processing completion information through the interface circuit 1102.

Optionally, the chip also includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor. A portion of the memory may also include a non-volatile random access memory (NVRAM).

Optionally, the memory stores executable software modules or data structures, and the processor can perform corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).

Optionally, the chip can be used in the model processing device involved in the embodiment of the present application. Optionally, the interface circuit 1102 can be used to output the execution result of the processor 1101. The message processing method provided in one or more embodiments of the embodiment of the present application can refer to the aforementioned embodiments, which will not be repeated here.

It should be noted that the corresponding functions of the processor 1101 and the interface circuit 1102 can be implemented through hardware design, software design, or a combination of hardware and software, and there is no limitation here.

12 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application. The electronic device 100 may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), a model processing device, or a chip or functional module in a model processing device.

Exemplarily, FIG12 is a schematic diagram of the structure of an electronic device 100 provided in an embodiment of the present application. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, and a subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.

It is to be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc. Different processing units may be independent devices or integrated in one or more processors.

The controller may be the nerve center and command center of the electronic device 100. The controller may generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the processor 110 may include one or more interfaces. The interface may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.

Among them, the I2C interface is a bidirectional synchronous serial bus. The processor 110 can be coupled to the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100. The MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193. The MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), etc. In some embodiments, The processor 110 and the camera 193 communicate via a CSI interface to implement the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate via a DSI interface to implement the display function of the electronic device 100.

It is understandable that the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

The charging management module 140 is used to receive charging input from a charger. The charger can be a wireless charger or a wired charger. The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.

The electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The electronic device 100 can realize the shooting function through ISP, camera 193, touch sensor, video codec, GPU, display screen 194 and application processor.

Among them, ISP is used to process the data fed back by camera 193. For example, when taking a photo, the shutter is opened, and the light is transmitted to the camera photosensitive element through the lens. The light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to ISP for processing and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, ISP can be set in camera 193.

The camera 193 is used to capture still images or videos. In an example, an optical image is generated through a lens and projected onto a photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP for conversion into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. It should be understood that in the description of the embodiments of the present application, an image in RGB format is used as an example for introduction, and the embodiments of the present application do not limit the image format. In some embodiments, the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.

The digital signal processor is used to process digital signals, and can process not only digital image signals but also other digital signals. For example, when the electronic device 100 is selecting a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.

Video codecs are used to compress or decompress digital videos. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The internal memory 121 can be used to store computer executable program codes, which include instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running the instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area.

The electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.

The button 190 includes a power button, a volume button, etc. The button 190 can be a mechanical button. It can also be a touch button. The electronic device 100 can receive button input and generate key signal input related to the user settings and function control of the electronic device 100. The motor 191 can generate a vibration prompt. The motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. For touch operations acting on different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. The indicator 192 can be an indicator light, which can be used to indicate the charging status, power changes, and can also be used to indicate messages, missed calls, notifications, etc. The SIM card interface 195 is used to connect a SIM card.

It should be pointed out that the electronic device 100 can be a chip system or a device with a similar structure as shown in Figure 12. Among them, the chip system can be composed of chips, or it can include chips and other discrete devices. The actions, terms, etc. involved in the various embodiments of the present application can refer to each other without limitation. The message name or parameter name in the message exchanged between the various devices in the embodiments of the present application is only an example, and other names can also be used in the specific implementation without limitation. In addition, the component structure shown in Figure 12 does not constitute a limitation on the electronic device 100. In addition to the components shown in Figure 12, the electronic device 100 may include more or fewer components than those shown in Figure 12, or combine certain components, or arrange the components differently.

The processor and transceiver described in the present application can be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc. The processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.

An embodiment of the present application also provides a model processing device, which includes: at least one processor, when the at least one processor executes program code or instructions, it implements the above-mentioned related method steps to implement the message processing method in the above-mentioned embodiment.

Optionally, the device may further include at least one memory, and the at least one memory is used to store the program code or instruction.

An embodiment of the present application also provides a computer storage medium, in which computer instructions are stored. When the computer instructions are executed on a model processing device, the model processing device executes the above-mentioned related method steps to implement the message processing method in the above-mentioned embodiment.

The embodiment of the present application also provides a computer program product. When the computer program product is run on a computer, the computer is caused to execute the above-mentioned related steps to implement the message processing method in the above-mentioned embodiment.

The embodiment of the present application also provides a model processing device, which can be a chip, an integrated circuit, a component or a module. Specifically, the device may include a connected processor and a memory for storing instructions, or the device includes at least one processor for obtaining instructions from an external memory. When the device is running, the processor can execute instructions so that the chip executes the message processing method in the above-mentioned method embodiments.

It should be understood that in the various embodiments of the present application, the size of the serial numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the above units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

If the above functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for enabling a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the above methods in each embodiment of the present application. The aforementioned storage media include: USB flash drives, mobile hard drives, read-only memories (Read Only Memory), etc. Only Memory, ROM), Random Access Memory (Random Access Memory, RAM), disks or optical disks, etc. that can store program codes.

The above is only a specific implementation of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A model processing method, characterized by comprising:

Acquire a first three-dimensional model of a target instance in a scene and an image of the scene, wherein the target instance is any object or background in the scene;

A target second three-dimensional model of the target instance is determined from multiple second three-dimensional models based on the first three-dimensional model of the target instance and the image of the scene. The target second three-dimensional model is a three-dimensional model that matches the geometric shape of the target instance and has the same style type as the scene where the target instance is located.
The method according to claim 1, characterized in that the step of determining a target second three-dimensional model of the target instance from a plurality of second three-dimensional models based on the first three-dimensional model of the target instance and the image of the scene comprises:

determining a style type of the scene based on the image of the scene;

A target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models according to the first three-dimensional model of the target instance and the style type of the scene.
The method according to claim 2, characterized in that determining the style type of the scene according to the image of the scene comprises:

An image of the scene is input into a first network to determine the style type of the scene.
The method according to claim 2 or 3, characterized in that the step of determining a target second three-dimensional model of the target instance from a plurality of second three-dimensional models according to the first three-dimensional model of the target instance and the style type of the scene comprises:

The first three-dimensional model of the target instance and the style type of the scene are input into a second network to determine a target second three-dimensional model of the target instance from a plurality of second three-dimensional models.
The method according to any one of claims 1 to 4, characterized in that the step of obtaining a first three-dimensional model of a target instance in a scene comprises:

A segmentation operation is performed on the first three-dimensional model of the scene to obtain a first three-dimensional model of the target instance, wherein the segmentation operation includes semantic segmentation and/or instance segmentation.
The method according to any one of claims 1 to 5, characterized in that the method further comprises:

The plurality of second three-dimensional models are determined according to a plurality of three-dimensional models without material information and images of a plurality of instances, wherein the plurality of instances include at least two instances of different style types.
The method according to any one of claims 1 to 6, characterized in that the method further comprises:

A second three-dimensional model of the scene is generated according to the first three-dimensional model of the scene and the target second three-dimensional model of the target instance.
According to the method of claim 7, determining the second three-dimensional model of the scene based on acquiring the first three-dimensional model of the scene and the target second three-dimensional model of the target instance comprises:

Determine a target position of the target instance according to the image of the scene, wherein the target position is used to indicate a position of the first three-dimensional model of the target instance in the first three-dimensional model of the scene;

Deleting the first three-dimensional model of the target instance in the first three-dimensional model of the scene;

A target second three-dimensional model of the target instance is set at a target position of the target instance to generate a second three-dimensional model of the scene.
The method according to claim 7 or 8, characterized in that the method further comprises:

receiving an editing operation, wherein the editing operation is used to instruct editing of a target instance in a second three-dimensional model of the scene;

In response to the editing operation, the target instance in the second three-dimensional model of the scene is edited.
The method according to claim 9, wherein the editing operation comprises a moving operation, the moving operation is used to indicate moving a target instance in the second three-dimensional model of the scene, and the editing of the target instance in the second three-dimensional model of the scene in response to the editing operation comprises:

In response to the moving operation, a target second three-dimensional model of a target instance in a second three-dimensional model of the scene is moved from a first position to a second position.
According to the method of claim 9 or 10, the editing operation includes a deletion operation, the deletion operation is used to indicate deletion of the target instance in the second three-dimensional model of the scene, and the editing of the target instance in the second three-dimensional model of the scene in response to the editing operation comprises:

In response to the deletion operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is removed from the first Delete the 2D and 3D models.
The method according to any one of claims 9 to 11, characterized in that the editing operation includes a replacement operation, the replacement operation is used to indicate replacing a target instance in the second three-dimensional model of the scene with a preset instance, and the editing of the target instance in the second three-dimensional model of the scene in response to the editing operation comprises:

In response to the replacement operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is replaced with the target second three-dimensional model of the preset instance.
The method according to claim 12, characterized in that replacing the target second three-dimensional model of the target instance in the second three-dimensional model of the scene with the target second three-dimensional model of the preset instance comprises:

Determine a three-dimensional position of a target second three-dimensional model of the target instance, where the three-dimensional position is used to indicate a position of the target second three-dimensional model of the target instance in the second three-dimensional model of the scene;

Deleting a target second three-dimensional model of a target instance in a second three-dimensional model of the scene;

The target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance.
A model processing method, characterized by comprising:

receiving an editing operation, the editing operation being used to instruct editing of a target instance in a second three-dimensional model of a scene, the second three-dimensional model of the scene comprising a target second three-dimensional model of the target instance in the scene, the target second three-dimensional model of the target instance being determined from a plurality of second three-dimensional models by using a first three-dimensional model of the target instance of the scene and an image of the scene, the target second three-dimensional model being a three-dimensional model that matches a geometric shape of the target instance and has the same style type as the scene where the target instance is located, and the target instance is any object or background in the scene;

In response to the editing operation, the target instance in the second three-dimensional model of the scene is edited.
The method according to claim 14, wherein the editing operation comprises a moving operation, the moving operation is used to indicate moving a target instance in the second three-dimensional model of the scene, and the editing of the target instance in the second three-dimensional model of the scene in response to the editing operation comprises:

In response to the moving operation, a target second three-dimensional model of a target instance in a second three-dimensional model of the scene is moved from a first position to a second position.
According to the method of claim 14 or 15, the editing operation includes a deletion operation, the deletion operation is used to indicate deletion of the target instance in the second three-dimensional model of the scene, and in response to the editing operation, editing the target instance in the second three-dimensional model of the scene comprises:

In response to the deletion operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted from the second three-dimensional model of the scene.
The method according to any one of claims 14 to 16, characterized in that the editing operation includes a replacement operation, the replacement operation is used to indicate replacing a target instance in the second three-dimensional model of the scene with a preset instance, and the editing of the target instance in the second three-dimensional model of the scene in response to the editing operation comprises:

In response to the replacement operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is replaced with the target second three-dimensional model of the preset instance.
The method according to claim 17, characterized in that replacing the target second three-dimensional model of the target instance in the second three-dimensional model of the scene with the target second three-dimensional model of the preset instance comprises:

Determine a three-dimensional position of a target second three-dimensional model of the target instance, where the three-dimensional position is used to indicate a position of the target second three-dimensional model of the target instance in the second three-dimensional model of the scene;

Deleting a target second three-dimensional model of a target instance in a second three-dimensional model of the scene;

The target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance.
A model processing device, characterized in that it comprises: a transceiver unit and a processing unit;

The transceiver unit is used to obtain a first three-dimensional model of a target instance in a scene and an image of the scene, wherein the target instance is any object or background in the scene;

The processing unit is used to determine a target second three-dimensional model of the target instance from multiple second three-dimensional models based on the first three-dimensional model of the target instance and the image of the scene, wherein the target second three-dimensional model is a three-dimensional model that matches the geometric shape of the target instance and has the same style type as the scene where the target instance is located.
The device according to claim 19, characterized in that the processing unit is specifically used to:

determining a style type of the scene based on the image of the scene;

A target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models according to the first three-dimensional model of the target instance and the style type of the scene.
The device according to claim 20, characterized in that the processing unit is specifically used to:

An image of the scene is input into a first network to determine the style type of the scene.
The device according to claim 19 or 20, characterized in that the processing unit is specifically used for:

The first three-dimensional model of the target instance and the style type of the scene are input into a second network to determine a target second three-dimensional model of the target instance from a plurality of second three-dimensional models.
The device according to any one of claims 19 to 22, characterized in that the transceiver unit is specifically used for:

A segmentation operation is performed on the first three-dimensional model of the scene to obtain a first three-dimensional model of the target instance, wherein the segmentation operation includes semantic segmentation and/or instance segmentation.
The device according to any one of claims 19 to 23, characterized in that the processing unit is further used for:

The plurality of second three-dimensional models are determined according to a plurality of three-dimensional models without material information and images of a plurality of instances, wherein the plurality of instances include at least two instances of different style types.
The device according to any one of claims 19 to 24, characterized in that the processing unit is further used for:

A second three-dimensional model of the scene is generated according to the first three-dimensional model of the scene and the target second three-dimensional model of the target instance.
The device according to claim 25, characterized in that the processing unit is specifically used to:

Determine a target position of the target instance according to the image of the scene, wherein the target position is used to indicate a position of the first three-dimensional model of the target instance in the first three-dimensional model of the scene;

Deleting the first three-dimensional model of the target instance in the first three-dimensional model of the scene;

A target second three-dimensional model of the target instance is set at a target position of the target instance to generate a second three-dimensional model of the scene.
The device according to claim 25 or 26, characterized in that the transceiver unit is also used for:

receiving an editing operation, wherein the editing operation is used to instruct editing of a target instance in a second three-dimensional model of the scene;

The processing unit is further configured to edit the target instance in the second three-dimensional model of the scene in response to the editing operation.
The apparatus according to claim 27, wherein the editing operation comprises a moving operation, the moving operation is used to indicate moving a target instance in the second three-dimensional model of the scene, and the processing unit is specifically used to:

In response to the moving operation, a target second three-dimensional model of a target instance in a second three-dimensional model of the scene is moved from a first position to a second position.
The device according to claim 27 or 28, characterized in that the editing operation includes a deletion operation, the deletion operation is used to indicate deletion of the target instance in the second three-dimensional model of the scene, and the processing unit is specifically used to:

In response to the deletion operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted from the second three-dimensional model of the scene.
The apparatus according to any one of claims 27 to 29, characterized in that the editing operation includes a replacement operation, the replacement operation is used to indicate replacing a target instance in the second three-dimensional model of the scene with a preset instance, and the processing unit is specifically used to:

In response to the replacement operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is replaced with the target second three-dimensional model of the preset instance.
The device according to claim 30, characterized in that the processing unit is specifically used to:

Determine a three-dimensional position of a target second three-dimensional model of the target instance, where the three-dimensional position is used to indicate a position of the target second three-dimensional model of the target instance in the second three-dimensional model of the scene;

Deleting a target second three-dimensional model of a target instance in a second three-dimensional model of the scene;

The target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance.
A model processing device, characterized in that it comprises: a transceiver unit and a processing unit;

The transceiver unit is used to receive an editing operation, wherein the editing operation is used to instruct to edit a target instance in a second three-dimensional model of a scene, wherein the second three-dimensional model of the scene includes a target second three-dimensional model of the target instance in the scene, wherein the target second three-dimensional model of the target instance is determined from a plurality of second three-dimensional models by using a first three-dimensional model of the target instance of the scene and an image of the scene, wherein the target second three-dimensional model is a three-dimensional model that matches a geometric shape of the target instance and has the same style type as the scene where the target instance is located, and the target instance is any object or background in the scene;

The processing unit is used to edit the target instance in the second three-dimensional model of the scene in response to the editing operation.
The apparatus according to claim 32, wherein the editing operation comprises a moving operation, the moving operation being used to indicate moving a target instance in the second three-dimensional model of the scene, and the processing unit being specifically used to:

In response to the moving operation, a target second three-dimensional model of a target instance in a second three-dimensional model of the scene is moved from a first position to a second position.
The device according to claim 32 or 33, characterized in that the editing operation includes a deletion operation, the deletion operation is used to indicate deletion of the target instance in the second three-dimensional model of the scene, and the processing unit is specifically used to:

In response to the deletion operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is deleted from the second three-dimensional model of the scene.
The apparatus according to any one of claims 32 to 34, characterized in that the editing operation includes a replacement operation, the replacement operation is used to indicate replacing a target instance in the second three-dimensional model of the scene with a preset instance, and the processing unit is specifically used to:

In response to the replacement operation, the target second three-dimensional model of the target instance in the second three-dimensional model of the scene is replaced with the target second three-dimensional model of the preset instance.
The device according to claim 35, characterized in that the processing unit is specifically used to:

Determine a three-dimensional position of a target second three-dimensional model of the target instance, where the three-dimensional position is used to indicate a position of the target second three-dimensional model of the target instance in the second three-dimensional model of the scene;

Deleting a target second three-dimensional model of a target instance in a second three-dimensional model of the scene;

The target second three-dimensional model of the preset instance is set at the three-dimensional position of the target second three-dimensional model of the target instance.
A model processing device comprises at least one processor and a memory, wherein the at least one processor executes a program or instruction stored in the memory so that the model processing device implements the method described in any one of claims 1 to 18.
A computer-readable storage medium for storing a computer program, characterized in that when the computer program is executed on a computer or a processor, the computer or the processor implements the method described in any one of claims 1 to 18.
A computer program product, comprising instructions, wherein when the instructions are executed on a computer or a processor, the computer or the processor implements the method according to any one of claims 1 to 18.