WO2023024036A1

WO2023024036A1 - Method and apparatus for reconstructing three-dimensional model of person

Info

Publication number: WO2023024036A1
Application number: PCT/CN2021/114840
Authority: WO
Inventors: 白蔚; 李万琦; 胡伟; 于金波
Original assignee: 华为技术有限公司
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2023-03-02
Also published as: CN116157842A

Abstract

Provided are a method and apparatus for reconstructing a three-dimensional model of a person, which relate to the technical field of media. A more complete three-dimensional model of a person having more abundant detailed information can be obtained, which not only extends the application scenarios of a three-dimensional model of a person, but also improves the quality of a reconstructed three-dimensional model of a person, thereby improving user experience. The method is applied to an electronic device, and comprises: an electronic device obtaining a first scale image and a second scale image of a target person, the first scale image comprising a first part of the target person, the second scale image comprising at least a second part of the target person, and the first part being part of the second part; then determining a first mesh three-dimensional model corresponding to the first scale image and determining a second mesh three-dimensional model corresponding to a second scale image; and fusing the first mesh three-dimensional model and the second mesh three-dimensional model so as to obtain a target three-dimensional model, the target three-dimensional model being used for displaying at least the second part of the target person.

Description

Method and device for reconstructing a three-dimensional model of a person

technical field

The embodiments of the present application relate to the field of media technologies, and in particular to a method and device for reconstructing a 3D model of a character.

Background technique

With the development of augmented reality and virtual reality technology, digital character products (virtual 3D characters) centered on 3D model reconstruction have been widely used in entertainment, education, finance, tourism and other fields.

Currently, the reconstruction of the 3D model of the character may include reconstruction of the 3D model of the head, face, upper body or whole body of the character. Taking the 3D model reconstruction of the face and the 3D model reconstruction of the human body (that is, the whole body) as an example, a method for reconstructing a 3D model of a person is: for the reconstruction of the 3D model of the face, a single face image is input into the face voxel regression network, through the analysis and calculation of the face voxel regression network, the face voxel model is obtained (the face voxel model is a three-dimensional model); for the reconstruction of the human body three-dimensional model, a single human body image is input into the human body voxel regression network, The analysis and calculation of the human body voxel regression network obtains the human body voxel model.

The above 3D face model can only reflect the local information of the person. In many application scenarios, more information about the person may be needed (for example, information other than the face in the upper body), so the 3D model of the face alone The application scenarios are relatively narrow; the above-mentioned 3D human body model can reflect the overall information of the person, but the reconstruction effect of some detailed information is poor, for example, the details of the face area in the human 3D model are relatively rough, that is to say, the reconstruction of the above-mentioned 3D human body model Poor quality, resulting in a poor user experience.

Contents of the invention

Embodiments of the present application provide a method and device for reconstructing a three-dimensional model of a character, which can improve the quality of a reconstructed three-dimensional model of a character.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

In the first aspect, an embodiment of the present application provides a method for reconstructing a three-dimensional model of a person, which is applied to an electronic device. The method includes: the electronic device acquires a first scale image and a second scale image of a target person, and the first scale image includes the target The first part of the person, the second scale image includes at least a second part of the target person, the first part is a part of the second part; then the electronic device determines the first grid 3D model corresponding to the first scale image, and determines the second the second grid 3D model corresponding to the scale image; and the electronic device fuses the first grid 3D model and the second grid 3D model to obtain a target 3D model, and the target 3D model is used to display at least the first grid 3D model of the target person two parts.

Through the method for reconstructing a 3D model of a character provided by the embodiment of the present application, the advantages of rich detailed information and high resolution embodied in the 3D model of a small-scale image (such as a first-scale image) and the advantages of a large-scale image (such as a second-scale image) The 3D model of the image) can reflect the advantages of a wide range of character characteristics (that is, integrity), and the 3D models of the images of different scales of the target person are fused to obtain a 3D model of the target person with a more complete model and richer detailed information. , which not only extends the application scenarios of the three-dimensional character model, but also improves the quality of the reconstructed three-dimensional model of the character, thereby improving user experience.

In a possible implementation manner, the method for determining the first grid 3D model corresponding to the first scale image may include: determining the first voxel 3D model corresponding to the first scale image based on the first voxel regression network, And convert the first voxel three-dimensional model into a first grid three-dimensional model.

In the embodiment of the present application, the first scale image is input to the first voxel regression network, and the first voxel three-dimensional model corresponding to the first scale image can be obtained. It should be understood that the voxel three-dimensional model is the output of the voxel regression network.

Optionally, the first voxel regression network is a convolutional neural network, the convolutional neural network may be a stacked hourglass network, and the first voxel regression network is based on multiple groups of two-dimensional images collected and two The voxel 3D model sample (training data set) corresponding to the 3D image marked with the real voxel value is obtained by training the preset stacked hourglass network. It should be understood that when the first-scale image is an image of a different part of the target person, the first voxel regression network is trained based on the corresponding data set. For example, if the first-scale image is the face image of the target person, then the first voxel The regression network is obtained based on multiple groups of face images and face voxel 3D model samples corresponding to the face images with real voxel values marked.

To sum up, it can be understood that when the first scale image is the face image of the target person, the above-mentioned first voxel regression network is a voxel regression network for predicting the voxel three-dimensional model of the face. When the first scale image is the upper body image of the target person, the above-mentioned first voxel regression network is a voxel regression network for predicting the voxel three-dimensional model of the upper body.

Optionally, in the embodiment of the present application, the method for the electronic device to convert the first voxel 3D model into the first mesh 3D model may be: connect the extracted 3D vertices on the isosurface into polygons according to preset rules (such as a triangle), thereby forming the first grid three-dimensional model, the preset rule may be to connect the three nearest three-dimensional vertices in sequence from left to right and from top to bottom.

Optionally, the electronic device may also convert the first voxel 3D model into the first grid 3D model by using a stereo rendering method.

In a possible implementation manner, the method for determining the second grid 3D model corresponding to the second scale image may include: determining the second voxel 3D model corresponding to the second scale image based on the second voxel regression network, and The second voxel 3D model is converted to a second mesh 3D model.

In the embodiment of the present application, the second scale image is input to the second voxel regression network, and the second voxel three-dimensional model corresponding to the second scale image can be obtained.

Similar to the structure of the first voxel regression network, the second voxel regression network can also be a convolutional neural network, the convolutional neural network is a stacked hourglass network, and the second voxel regression network is based on multiple sets of collected two-dimensional The voxel 3D model sample (training data set) corresponding to the image and the 2D image is obtained by training the preset stacked hourglass network. It should be understood that when the second-scale image is an image of a different part of the target person, the second voxel regression network is trained based on the corresponding data set. For example, if the second-scale image is the upper body image of the target person, then the second voxel regression network The network is obtained based on multiple sets of upper body images and upper body voxel 3D model samples corresponding to the upper body images with real voxel values marked.

To sum up, it can be understood that when the second-scale image is the upper body image of the target person, the above-mentioned second voxel regression network is a voxel regression network for predicting the upper body voxel three-dimensional model. When the second-scale image is a whole-body image of the target person, the above-mentioned second voxel regression network is a voxel regression network for predicting a voxel three-dimensional model of a human body.

It should be noted that the method for the electronic device to convert the second voxel 3D model into the second grid 3D model is similar to the method for the electronic device to convert the first voxel 3D model into the first grid 3D model. Therefore, for the electronic device For a detailed description of the method for converting the second voxel 3D model into the second grid 3D model, please refer to the description of the process of converting the first voxel 3D model into the first grid 3D model by the above-mentioned electronic device, and details will not be repeated here .

In a possible implementation manner, the specific process of merging the first grid 3D model and the second grid 3D model to obtain the target 3D model includes: converting the first grid 3D model to a 2D plane, obtaining a first plane expansion; and converting the second grid three-dimensional model to a two-dimensional plane to obtain a second plane expansion, wherein the first image area in the first plane expansion corresponds to the second plane expansion The second image area, the first image area and the second image area correspond to the first part of the target person; then the first plane expansion diagram is cropped to obtain the first image area, and the second plane expansion diagram is cropped to obtain the second image area; and replacing the second image area in the second plane expanded view with the first image area to obtain a target plane expanded view of the target person; finally, performing a three-dimensional transformation on the target plane expanded view to obtain a target three-dimensional model.

Exemplarily, the first grid 3D model is a grid 3D model of a human face, and the second grid 3D model is a grid 3D model of an upper body, then the first image area in the first plane expanded view may be a grid corresponding to a human face. area, the second image area in the second plane expanded view is also the area corresponding to the face, that is, the first image area and the second image area correspond to the first part (ie, the face part) of the target person.

In the embodiment of the present application, the electronic device may project the first grid three-dimensional model and the second grid three-dimensional model to a two-dimensional plane by using two-dimensional parameterization technology to obtain the first plane expansion diagram and the second plane expansion diagram. For example, a cylindrical projection can be used to surround a cylindrical surface and make it tangent or cut, and then project the latitude and longitude points on the ellipsoid surface onto the cylindrical surface according to certain conditions, and then, along a line of the cylindrical surface, The bus bar is cut open and developed into a plane to obtain the first plane expansion diagram or the second plane expansion diagram.

Taking the first image region as the face part as an example, the electronic device can generate a mask of the face range (also called a face region frame) according to the face feature points, and then expand the image from the first plane and The second plane expanded view crops out the face part.

It should be understood that the electronic device cuts out the first image area from the first expanded view, and the electronic device cuts the second expanded view to remove the second image area in the second expanded view. After smoothing the edge of an image area, and smoothing the edge after removing the second image area in the second plane expansion diagram, the electronic device compares the edge of the first image area with the second plane expansion diagram after removing the second image area The edge is stitched to obtain the target plane expansion diagram. Specifically, the electronic device takes the edge of the first image region and the edge after removing the second image region in the second plane expansion diagram as constraint conditions, calculates the closest vertex of the two edges, and then calculates the closest vertex of the two edges to Close vertices are joined to achieve edge stitching.

In a possible implementation manner, before performing fusion processing on the first grid 3D model and the second grid 3D model, the method for reconstructing the character 3D model provided in the embodiment of the present application further includes: first grid 3D model And/or the second mesh 3D model performs at least one of mesh smoothing or mesh simplification.

Optionally, in the first grid 3D model and the second network 3D model obtained by the electronic device, there may be noise grids (that is, grids with a large deviation from the actual model), in order to obtain a more accurate target 3D model, the electronic The device may perform grid smoothing processing on the first grid 3D model and/or the second grid 3D model by using a related grid smoothing algorithm, so as to delete noise grids. The mesh smoothing algorithm may be any one of Taubin smoothing algorithm, Laplacian smoothing algorithm, and average curvature (Curvature) smoothing algorithm, which is selected according to actual conditions, and is not limited in this embodiment of the present application.

Optionally, in the first grid 3D model and the second network 3D model obtained by the electronic device, the density of the grids may be relatively large, so that the calculation of the grid fusion processing is relatively large and takes a long time. In order to To reduce the amount of calculation in the mesh 3D model fusion process, the electronic device may use a mesh simplification algorithm to perform mesh simplification processing on the first mesh 3D model and/or the second mesh 3D model. The mesh simplification algorithm can be any one of edge collapse (Edge Collapse) algorithm and metric-based edge collapse algorithm, which is selected according to the actual situation, and is not limited in the embodiment of the present application.

When it is necessary to perform mesh smoothing and mesh simplification on the mesh 3D model, the order of these two processes is not limited. For example, the mesh 3D model can be mesh smoothed first, and then the smoothed mesh The mesh simplification process can be performed on the grid 3D model; the grid simplification process can also be performed on the grid 3D model first, and then the grid smoothing process can be performed on the simplified grid 3D model.

In an implementation manner, the grid smoothing process is first performed on the grid three-dimensional model, and then the grid simplification process is performed on the smoothed grid three-dimensional model, which can better improve the quality of the reconstructed three-dimensional model.

In a possible implementation, the first part is the face of the target person, and the second part is the upper body of the target person; or, the first part is the upper body of the target person, and the second part is the whole body of the target person.

In a second aspect, the embodiment of the present application provides a reconstruction device for a 3D model of a character, which is applied to an electronic device, and the device includes: an acquisition module, a determination module, and a fusion module. Wherein, the obtaining module is used to obtain a first scale image of the target person and a second scale image, the first scale image includes a first part of the target person, the second scale image includes at least a second part of the target person, the first part is the first part A part of the second part; the determining module is used to determine the first grid 3D model corresponding to the first scale image, and determine the second grid 3D model corresponding to the second scale image; the fusion module is used for the first grid 3D model and the second grid 3D model The two mesh 3D models are fused to obtain a target 3D model, and the target 3D model is used to display at least the second part of the target person.

In a possible implementation, the determination module is specifically configured to determine the first voxel three-dimensional model corresponding to the first scale image based on the first voxel regression network, and convert the first voxel three-dimensional model into the first grid a three-dimensional model; and based on the second voxel regression network, determine a second voxel three-dimensional model corresponding to the second-scale image, and convert the second voxel three-dimensional model into a second grid three-dimensional model.

In a possible implementation, the above-mentioned fusion module is specifically used to convert the 3D model of the first grid to a 2D plane to obtain the unfolded view of the first plane; and convert the 3D model of the second grid to a 2D plane to obtain the second Two plane expansion diagrams; the first image area in the first plane expansion diagram corresponds to the second image area in the second plane expansion diagram, and the first image area and the second image area correspond to the first part; and the first plane expansion The image is cropped to obtain the first image area, and the second plane expanded view is cropped to obtain the second image area; and the second image area in the second plane expanded view is replaced with the first image area to obtain the target person The target plane expansion diagram; and then perform three-dimensional transformation on the target plane expansion diagram to obtain the target three-dimensional model.

In a possible implementation manner, the apparatus for reconstructing a 3D model of a character provided in the embodiment of the present application further includes a processing module; at least one of mesh smoothing or mesh simplification.

In a possible implementation, the first part of the target person is the face of the target person, and the second part of the target person is the upper body of the target person; or, the first part of the target person is the upper body of the target person, and the target person The second part of is the whole body of the target person.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and at least one processor connected to the memory, the memory is used to store instructions, and after the instructions are read by at least one processor, the first aspect and the first aspect are executed. The method described in any one of its possible implementations.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to implement the method described in any one of the first aspect and its possible implementations .

In the fifth aspect, the embodiment of the present application provides a computer program product, the computer program product includes instructions, and when the computer program product is run on the computer, execute the method described in any one of the first aspect and its possible implementations .

In a sixth aspect, the embodiment of the present application provides a chip, including a memory and a processor. Memory is used to store computer instructions. The processor is used to call and execute the computer instructions from the memory, so as to execute the method described in any one of the first aspect and possible implementations thereof.

It should be understood that the beneficial effects achieved by the technical solutions of the second aspect to the sixth aspect of the embodiment of the present application and the corresponding possible implementation manners can refer to the above-mentioned technical effects on the first aspect and the corresponding possible implementation manners, I won't repeat them here.

Description of drawings

FIG. 1 is a related schematic diagram of a voxel three-dimensional model provided in the embodiment of the present application;

FIG. 2 is a hardware schematic diagram of a mobile phone provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a method for reconstructing a three-dimensional model of a character provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a facial feature point provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a three-dimensional grid model of a human face provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of another method for reconstructing a three-dimensional model of a character provided in the embodiment of the present application;

FIG. 7 is a schematic diagram of another method for reconstructing a three-dimensional model of a character provided by the embodiment of the present application;

FIG. 8 is a schematic diagram of a plane expansion view after projecting a grid 3D model corresponding to the upper body to a 2D plane provided by the embodiment of the present application;

Fig. 9 is a schematic diagram of the effect of a plane unfolded view to be stitched and a schematic diagram of the effect of converting the plane unfolded view to be stitched into a three-dimensional space provided by the embodiment of the present application;

FIG. 10 is a schematic diagram of another method for reconstructing a 3D model of a character provided in the embodiment of the present application;

FIG. 11 is a schematic diagram of the reconstruction process of a 3D model of the upper body of a target person provided in the embodiment of the present application;

FIG. 12 is a schematic structural diagram of a reconstruction device for a three-dimensional model of a person provided by an embodiment of the present application;

FIG. 13 is a schematic structural diagram of another apparatus for reconstructing a three-dimensional model of a person provided by an embodiment of the present application.

Detailed ways

The term "and/or" in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist simultaneously, and there exists alone B these three situations.

The terms "first" and "second" in the description and claims of the embodiments of the present application are used to distinguish different objects, rather than to describe a specific order of objects. For example, the first scale image and the second scale image are used to distinguish different scale images, rather than describing a specific order of the scale images.

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner.

In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more. For example, multiple processing units refer to two or more processing units; multiple systems refer to two or more systems.

The 3D model reconstruction of characters can be used for digital human modeling. Digital human is a digitized virtual 3D character. The digital image of the character, and then present the digital image on the terminal platform based on technologies such as augmented reality (AR) or virtual reality (VR).

At present, a method for reconstructing a 3D model of a person is to obtain a 3D model of a person based on a method of deformation. Reconstruction, specifically, match the face image with the preset average face model (3DMM), and deform the average face model to obtain a three-dimensional face model; The multi-person linear (skinned multi-person linear model, SMPL) model is used to reconstruct the three-dimensional model of the human body. Specifically, the human body image is matched with the preset average human body SMPL model, and the average human body SMPL model is deformed to obtain the three-dimensional human body model. .

Another method for reconstructing a 3D model of a person is: based on a voxel-based 3D model of a person, a voxel regression network is used to predict a 3D voxel model of a human face or human body. The method for reconstructing a 3D model of a character provided in the embodiment of the present application is a voxel-based 3D reconstruction method. The concepts involved in the voxel-based 3D reconstruction process will be briefly introduced below.

Voxel: It is the abbreviation of volume pixel. It is assumed that a large volume space (volume) is a cube as shown in (a) in Figure 1. This volume space can cover the three-dimensional model to be established. The cube includes multiple A small cube, where a small cube is a voxel, it should be understood that each voxel corresponds to a point in the space, each voxel corresponds to a voxel value, and the voxel value is used to indicate whether the voxel is Belongs to the 3D model to be built.

Based on voxels, 3D model reconstruction of characters can be performed, such as 3D model reconstruction of human face, 3D model reconstruction of upper body, 3D model reconstruction of human body, etc.

Taking the reconstruction of the 3D face model as an example, a face image is input to the voxel regression network, and each voxel in the volume space (such as the cube shown in (a) in Figure 1) is predicted by the voxel regression network. The voxel value of the face image, that is, the representation of the three-dimensional model of the face image in the volume space, (b) in Figure 1 is an example of the voxel value of all the voxels contained in a section of the volume space, that is, the face Voxel 3D model.

Optionally, after the voxel value of each voxel in the volume space is predicted (that is, the voxel 3D model of the face), a stereoscopic rendering method or a polygonal isosurface with a preset threshold can be extracted to obtain a face Mesh 3D model. Taking the extraction of a polygonal isosurface with a preset threshold as an example, the isosurface can be extracted based on a truncated signed distance function (truncated signed distance function, TSDF) algorithm. For example, for the example of (b) in Figure 1, pre Set the threshold to be 0.0, then extract all the voxels whose voxel values are equal to 0.0 in the three-dimensional space, and the polygonal surface (called isosurface) formed by all the voxels whose voxel values are equal to 0.0, to obtain the face grid A three-dimensional model, such as (c) in Figure 1.

It can be understood that the voxel regression network used for reconstruction of the 3D face model is trained based on a large number of known face images and 3D models corresponding to the face images. Optionally, the voxel regression network can be different types of neural networks, for example, the voxel regression network is a convolutional neural network, and the convolutional neural network can be a stacked hourglass network.

At present, the schemes for 3D model reconstruction of characters only support model reconstruction of a single scale, for example, only support reconstruction of a 3D model of a face, or only support reconstruction of a 3D model of a human body, or only support reconstruction of a 3D model of an upper body. Exemplarily, for a solution that only supports facial 3D model reconstruction, the obtained 3D face model only contains information about the face, and does not contain information about other parts of the human body. The model lacks integrity, and most applications currently on the market need to be updated There is a lot of character information, the application scenarios of the 3D face model are relatively narrow, and the commercialization prospect is relatively limited. For the scheme that only supports the reconstruction of the 3D human body model, the obtained 3D human body model includes complete human body information, but for parts, such as the face, the model reconstruction effect is poor, and some detailed information of the face cannot be reflected, that is, the local details are relatively rough .

In view of the simple 3D model reconstruction of the face in the background technology, the 3D face model can only reflect the local information of the person, and the application scene is relatively narrow, and the simple 3D model of the human body can reflect the global information of the person, but some details For the problem of poor information reconstruction effect, the embodiment of the present application provides a method and device for reconstructing a 3D model of a person. In this method, the first grid 3D model corresponding to the first scale image can be corresponding to the second scale image. The second grid 3D model is fused to obtain the target 3D model, wherein the first scale image includes a first part of the target person, and the second scale image includes at least a second part of the target person, and the first part is the second part As part of the target person, the three-dimensional model of the target is used to display at least a second part of the target person. Through this solution, the quality of the reconstructed three-dimensional model of the character can be improved, thereby improving user experience.

The method for reconstructing a three-dimensional model of a character provided in the embodiment of the present application can be applied to electronic devices such as a mobile phone, a tablet computer, or a personal computer (Ultra-mobile Personal Computer, UMPC). Alternatively, it can also be used in other electronic devices such as desktop, laptop, handheld, wearable, smart home and vehicle-mounted devices, such as netbooks, smart watches, smart cameras, netbooks, personal digital assistants (Personal Digital Assistant, PDA), portable multimedia player (Portable Multimedia Player, PMP), dedicated media player or AR (augmented reality) / VR (virtual reality) equipment, etc. The embodiment of the present application does not limit the specific type and structure of the electronic device.

Taking the electronic device as a mobile phone as an example, FIG. 2 is a schematic diagram of the hardware structure of a mobile phone 200 provided in the embodiment of the present application. The mobile phone 200 includes a processor 210, an external memory interface 220, an internal memory 221, and a universal serial bus (universal serial bus). bus, USB) interface 230, charging management module 240, power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 270A, receiver 270B, microphone 270C, earphone Interface 270D, sensor module 280, button 290, motor 291, indicator 292, camera 293, display screen 294, and subscriber identification module (subscriber identification module, SIM) card interface 295, etc. The sensor module 280 may include a pressure sensor 280A, a gyro sensor 280B, an air pressure sensor 280C, a magnetic sensor 280D, an acceleration sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, and an ambient light sensor. Sensor 280L, bone conduction sensor 280M, etc.

It can be understood that the structure shown in the embodiment of the present application does not constitute a specific limitation on the mobile phone 200 . In other embodiments of the present application, the mobile phone 200 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.

The processor 210 may include one or more processing units, for example: the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

Wherein, the controller may be the nerve center and command center of the mobile phone 200 . The controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.

A memory may also be provided in the processor 210 for storing instructions and data. In some embodiments, the memory in processor 210 is a cache memory. The memory may hold instructions or data that the processor 210 has just used or recycled. If the processor 210 needs to use the instruction or data again, it can be directly recalled from the memory. Repeated access is avoided, and the waiting time of the processor 210 is reduced, thereby improving the efficiency of the system.

In some embodiments, processor 210 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL). In some embodiments, processor 210 may include multiple sets of I2C buses. The processor 210 can be respectively coupled to the touch sensor 280K, the charger, the flashlight, the camera 293 and so on through different I2C bus interfaces. For example, the processor 210 may be coupled to the touch sensor 280K through the I2C interface, so that the processor 210 and the touch sensor 280K communicate through the I2C bus interface to realize the touch function of the mobile phone 200 .

The I2S interface can be used for audio communication. In some embodiments, processor 210 may include multiple sets of I2S buses. The processor 210 may be coupled to the audio module 270 through an I2S bus to implement communication between the processor 210 and the audio module 270 . In some embodiments, the audio module 270 can transmit audio signals to the wireless communication module 260 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.

The PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal. In some embodiments, the audio module 270 and the wireless communication module 260 may be coupled through a PCM bus interface. In some embodiments, the audio module 270 can also transmit audio signals to the wireless communication module 260 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 210 and the wireless communication module 260 . For example: the processor 210 communicates with the Bluetooth module in the wireless communication module 260 through the UART interface to realize the Bluetooth function. In some embodiments, the audio module 270 can transmit audio signals to the wireless communication module 260 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.

The MIPI interface can be used to connect the processor 210 with the peripheral devices such as the display screen 294 and the camera 293 . MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc. In some embodiments, the processor 210 communicates with the camera 293 through the CSI interface to realize the shooting function of the mobile phone 200 . The processor 210 communicates with the display screen 294 through the DSI interface to realize the display function of the mobile phone 200 .

The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface can be used to connect the processor 210 with the camera 293 , the display screen 294 , the wireless communication module 260 , the audio module 270 , the sensor module 280 and so on. The GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 230 is an interface conforming to the USB standard specification, specifically, it may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 230 can be used to connect a charger to charge the mobile phone 200, and can also be used to transmit data between the mobile phone 200 and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the mobile phone 200 . In other embodiments of the present application, the mobile phone 200 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

The charging management module 240 is configured to receive charging input from the charger. Wherein, the charger may be a wireless charger or a wired charger. In some embodiments of wired charging, the charging management module 240 can receive the charging input of the wired charger through the USB interface 230 . In some wireless charging embodiments, the charging management module 240 can receive wireless charging input through the wireless charging coil of the mobile phone 200 . While the charging management module 240 is charging the battery 242 , it can also supply power to the electronic device through the power management module 241 .

The power management module 241 is used for connecting the battery 242 , the charging management module 240 and the processor 210 . The power management module 241 receives the input from the battery 242 and/or the charging management module 240 to provide power for the processor 210 , internal memory 221 , external memory, display screen 294 , camera 293 , and wireless communication module 260 . The power management module 241 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance). In some other embodiments, the power management module 241 can also be set in the processor 210 . In some other embodiments, the power management module 241 and the charging management module 240 may also be set in the same device.

The wireless communication function of the mobile phone 200 can be realized by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor and the baseband processor.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in handset 200 can be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 250 can provide wireless communication solutions including 2G/3G/4G/5G applied on the mobile phone 200 . The mobile communication module 250 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like. The mobile communication module 250 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and send them to the modem processor for demodulation. The mobile communication module 250 can also amplify the signal modulated by the modem processor, convert it into electromagnetic wave and radiate it through the antenna 1 . In some embodiments, at least part of the functional modules of the mobile communication module 250 may be set in the processor 210 . In some embodiments, at least part of the functional modules of the mobile communication module 250 and at least part of the modules of the processor 210 may be set in the same device.

A modem processor may include a modulator and a demodulator. Wherein, the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is passed to the application processor after being processed by the baseband processor. The application processor outputs sound signals through audio equipment (not limited to speaker 270A, receiver 270B, etc.), or displays images or videos through display screen 294 . In some embodiments, the modem processor may be a stand-alone device. In some other embodiments, the modem processor may be independent of the processor 210, and be set in the same device as the mobile communication module 250 or other functional modules.

The wireless communication module 260 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. applied on the mobile phone 200 (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 260 may be one or more devices integrating at least one communication processing module. The wireless communication module 260 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 . The wireless communication module 260 can also receive the signal to be sent from the processor 210 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 to radiate out.

In some embodiments, the antenna 1 of the mobile phone 200 is coupled to the mobile communication module 250, and the antenna 2 is coupled to the wireless communication module 260, so that the mobile phone 200 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc. The GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).

The mobile phone 200 realizes the display function through the GPU, the display screen 294, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 294 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 294 is used to display images, videos and the like. Display 294 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the mobile phone 200 may include 1 or N display screens 294, where N is a positive integer greater than 1.

The mobile phone 200 can realize the shooting function through ISP, camera 293 , video codec, GPU, display screen 294 and application processor.

The ISP is used for processing the data fed back by the camera 293 . For example, when taking a picture, open the shutter, the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be located in the camera 293 .

Camera 293 is used to capture still images or video. The object generates an optical image through the lens and projects it to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. DSP converts digital image signals into standard RGB, YUV and other image signals. In some embodiments, the mobile phone 200 may include 1 or N cameras 293, where N is a positive integer greater than 1.

A digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals (such as audio signals, etc.). For example, when the mobile phone 200 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.

Video codecs are used to compress or decompress digital video. The handset 200 may support one or more video codecs. In this way, the mobile phone 200 can play or record videos in various encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.

The NPU is a neural-network (NN) computing processor. By referring to the structure of biological neural networks, such as the transfer mode between neurons in the human brain, it can quickly process input information and continuously learn by itself. Applications such as intelligent cognition of the mobile phone 200 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 220 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 200. The external memory card communicates with the processor 210 through the external memory interface 220 to implement a data storage function. Such as saving music, video and other files in the external memory card.

The internal memory 221 may be used to store computer-executable program codes including instructions. The processor 210 executes various functional applications and data processing of the mobile phone 200 by executing instructions stored in the internal memory 221 . The internal memory 221 may include an area for storing programs and an area for storing data. Wherein, the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like. The storage data area can store data (such as audio data, phone book, etc.) created during the use of the mobile phone 200 . In addition, the internal memory 221 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.

The mobile phone 200 can realize the audio function through the audio module 270, the speaker 270A, the receiver 270B, the microphone 270C, the earphone interface 270D, and the application processor. Such as music playback, recording, etc.

The audio module 270 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal. The audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 can be set in the processor 210, or some functional modules of the audio module 270 can be set in the processor 210.

Speaker 270A, also referred to as a "horn", is used to convert audio electrical signals into sound signals. Cell phone 200 can listen to music through speaker 270A, or listen to hands-free calls.

Receiver 270B, also called "earpiece", is used to convert audio electrical signals into audio signals. When the mobile phone 200 receives a call or a voice message, the receiver 270B can be placed close to the human ear to receive the voice.

The microphone 270C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 270C with a human mouth, and input the sound signal to the microphone 270C. The mobile phone 200 can be provided with at least one microphone 270C. In other embodiments, the mobile phone 200 can be provided with two microphones 270C, which can also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the mobile phone 200 can also be provided with three, four or more microphones 270C to realize sound signal collection, noise reduction, identify sound sources, realize directional recording functions, and the like.

The earphone interface 270D is used for connecting wired earphones. The earphone interface 270D can be a USB interface 230, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.

The pressure sensor 280A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. The gyro sensor 280B is used to determine the movement posture of the mobile phone 200 . The air pressure sensor 280C is used to measure the air pressure. The magnetic sensor 280D includes a Hall sensor, the acceleration sensor 280E can detect the acceleration of the mobile phone 200 in various directions (generally three axes), and the proximity light sensor 280G can include, for example, a light emitting diode (LED) and a photodetector, such as a photodiode. The light-emitting diodes can be infrared light-emitting diodes, the ambient light sensor 280L is used to sense the ambient light brightness, the fingerprint sensor 280H is used to collect fingerprints, the temperature sensor 280J is used to detect temperature, and the touch sensor 280K is also called "touch panel". The touch sensor 280K can be arranged on the display screen 294, and the touch sensor 280K and the display screen 294 form a touch screen, also called “touch screen”. The touch sensor 280K is used to detect a touch operation on or near it. The bone conduction sensor 280M can acquire vibration signals. In some embodiments, the bone conduction sensor 280M can acquire the vibration signal of the vibrating bone mass of the human voice.

The keys 290 include a power key, a volume key and the like. The key 290 may be a mechanical key. It can also be a touch button. The mobile phone 200 can receive key input and generate key signal input related to user settings and function control of the mobile phone 200 .

The motor 291 can generate a vibrating prompt. The motor 291 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as taking pictures, playing audio, etc.) may correspond to different vibration feedback effects. The motor 291 can also correspond to different vibration feedback effects for touch operations acting on different areas of the display screen 294 . Different application scenarios (for example: time reminder, receiving information, alarm clock, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 292 can be an indicator light, which can be used to indicate the charging status, the change of the battery capacity, and also can be used to indicate messages, missed calls, notifications and so on.

The SIM card interface 295 is used for connecting a SIM card. The SIM card can be inserted into the SIM card interface 295 or pulled out from the SIM card interface 295 to realize contact and separation with the mobile phone 200 . The mobile phone 200 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1. SIM card interface 295 can support Nano SIM card, Micro SIM card, SIM card etc. Multiple cards can be inserted into the same SIM card interface 295 at the same time. The types of the multiple cards may be the same or different. The SIM card interface 295 is also compatible with different types of SIM cards. The SIM card interface 295 is also compatible with external memory cards. The mobile phone 200 interacts with the network through the SIM card to implement functions such as calling and data communication. In some embodiments, the mobile phone 200 adopts eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the mobile phone 200 and cannot be separated from the mobile phone 200 .

It can be understood that in the embodiment of the present application, the electronic device (such as the above-mentioned mobile phone 200) can perform some or all of the steps in the embodiment of the present application, these steps or operations are only examples, and the electronic device can also perform other operations or various operations deformation. In addition, each step may be performed in a different order presented in the embodiment of the present application, and it may not be necessary to perform all operations in the embodiment of the present application. Each embodiment of the present application may be implemented independently or in any combination, which is not limited in the present application.

The method for reconstructing a three-dimensional model of a character provided in the embodiment of the present application may be applied to an electronic device having a hardware structure as shown in FIG. 2 or an electronic device having a similar structure. Or it may also be applied to electronic devices with other structures, which is not limited in this embodiment of the present application.

As shown in FIG. 3 , the method for reconstructing a three-dimensional model of a character provided in this embodiment of the present application may include steps 301 to 304 . Step 301, the electronic device acquires a first scale image and a second scale image of a target person. In this embodiment of the present application, it is necessary to reconstruct a 3D model of the target person, wherein the first scale image includes a first part of the target person, the second scale image includes at least a second part of the target person, and the first part is a part of the second part. Optionally, the method for reconstructing a 3D model of a character provided in the embodiment of the present application may be used for 3D model reconstruction of a character at different scales, such as 3D model reconstruction of the upper body, 3D model reconstruction of the whole body, and the like.

In one implementation, if the three-dimensional model of the upper body of the person is reconstructed, the first part of the target person is the face of the target person, and the second part of the target person is the upper body of the target person, that is to say, the first scale The image is a face image of the target person, and the second-scale image is an upper body image of the target person, and the face image is a face part in the upper body image.

In the embodiment of the present application, after the electronic device collects the image containing the target person, the electronic device preprocesses the image to obtain the face image of the target person and the upper body image of the target person. Specifically, the electronic device performs face detection on the collected images, and performs operations such as cropping and zooming on the images according to the face detection frame and face feature points to obtain the face image of the target person, that is, the first-scale image; and The collected image is detected for the upper body, and the image is cropped and scaled according to the upper body detection frame and the upper body feature points to obtain the upper body image of the target person, that is, the second scale image.

Exemplarily, the face feature points may include 68 feature points as shown in FIG. 4 , of course, the face feature points may also include fewer or more feature points, which is not limited in this embodiment of the present application. The upper body feature points may include feature points of the head, neck, left shoulder, right shoulder, left elbow, right elbow, and the like.

In another implementation, if the 3D model of the whole body of the person is reconstructed, the first part of the above-mentioned target person is the upper body of the target person, and the second part of the target person is the whole body of the target person, that is to say, the above-mentioned first scale The image is the upper body image of the target person, and the second scale image is the whole body image of the target person.

Similarly, after the electronic device collects the image containing the target person, the electronic device preprocesses the image to obtain an upper body image of the target person and a whole body image of the target person. Specifically, the electronic device detects the upper body of the collected image, and performs operations such as cropping and zooming on the image according to the upper body detection frame and upper body feature points to obtain the upper body image of the target person, that is, the first-scale image; Human body detection (that is, whole-body detection) is to crop and zoom the image according to the human body detection frame and human body feature points, and obtain the whole-body image of the target person, that is, the second-scale image.

Exemplarily, the feature points of the upper body may include feature points of the head, neck, left shoulder, right shoulder, left elbow, right elbow, etc., and the feature points of the human body may include the head, neck, left shoulder, right shoulder, left elbow, right elbow, Left hand, right hand, left waist, right waist, left knee, right knee, left foot, right foot and other feature points.

In step 302, the electronic device determines a first grid three-dimensional model corresponding to the first scale image. In the embodiment of the present application, the grid 3D model is a representation of a 3D model. Exemplarily, FIG. 5 is an example of a grid 3D model of a human face. It can be seen from FIG. 5 that the grid is composed of a human face model A polygon (such as a triangle) formed by connections between three-dimensional vertices in the model, it should be understood that a collection of all three-dimensional vertices corresponding to the model may be called a point cloud.

Referring to FIG. 3 , as shown in FIG. 6 , step 302 may be implemented through steps 3021 to 3022 . Step 3021, the electronic device determines a first voxel three-dimensional model corresponding to the first scale image based on the first voxel regression network. In the embodiment of the present application, the first scale image is input to the first voxel regression network, and the first voxel three-dimensional model corresponding to the first scale image can be obtained. It should be understood that the voxel three-dimensional model is the output of the voxel regression network.

Optionally, the first voxel regression network is a convolutional neural network, the convolutional neural network may be a stacked hourglass network, and the first voxel regression network is based on multiple sets of collected two-dimensional images and the corresponding annotations of the two-dimensional images The voxel 3D model sample (training data set) with the real voxel value is obtained by training the preset stacked hourglass network. It should be understood that when the first-scale image is an image of a different part of the target person, the first voxel regression network is trained based on the corresponding data set. For example, if the first-scale image is the face image of the target person, then the first voxel The regression network is obtained based on multiple groups of face images and face voxel 3D model samples corresponding to the face images with real voxel values marked.

In step 3022, the electronic device converts the first voxel three-dimensional model into a first grid three-dimensional model. Optionally, in the embodiment of the present application, the method for the electronic device to convert the first voxel 3D model into the first mesh 3D model may be: connect the extracted 3D vertices on the isosurface into polygons according to preset rules (such as a triangle), thereby forming a first three-dimensional grid model, and FIG. 5 is an example of a three-dimensional grid model. The preset rule may be to connect the three nearest three-dimensional vertices sequentially from left to right and from top to bottom. Optionally, the electronic device may also convert the first voxel 3D model into the first grid 3D model by using a stereo rendering method.

Step 303, the electronic device determines a second grid three-dimensional model corresponding to the second scale image. As shown in FIG. 6 , similar to the above step 302 , step 303 may be implemented through steps 3031 to 3032 . Step 3031, the electronic device determines a second voxel three-dimensional model corresponding to the second scale image based on the second voxel regression network. In the embodiment of the present application, the second scale image is input to the second voxel regression network, and the second voxel three-dimensional model corresponding to the second scale image can be obtained.

In step 3032, the electronic device converts the second voxel three-dimensional model into a second grid three-dimensional model. It should be noted that the method for the electronic device to convert the second voxel 3D model into the second grid 3D model is similar to the method for the electronic device to convert the first voxel 3D model into the first grid 3D model. Therefore, for step 3032 For a detailed description, reference may be made to the relevant description of step 3022 in the above-mentioned embodiments, and details are not repeated here.

In step 304, the electronic device fuses the first grid 3D model and the second grid 3D model to obtain a target 3D model. Wherein, the three-dimensional model of the target is used to display at least the second part of the target person.

In the embodiment of the present application, if the first part of the target person is the face of the target person, and the second part of the target person is the upper body of the target person, then the target three-dimensional model is the upper body three-dimensional model of the target person, that is, the target three-dimensional model is used for Shows the upper body of the target person. If the first part of the target person is the upper body of the target person, and the second part of the target person is the whole body of the target person, then the target 3D model is a human body 3D model of the target person, that is, the target 3D model is used to display the target person's whole body.

It can be understood that the first grid 3D model and the second grid 3D model determined by the electronic device may not be aligned, therefore, the electronic device fuses the first grid 3D model and the second grid 3D model During the processing, the electronic device needs to perform grid alignment processing on the first grid 3D model and the second grid 3D model. Specifically, according to the introduction of the grid 3D model in the above embodiment, the process of grid alignment processing can be understood as the process of point cloud registration of two grid 3D models, and the grid alignment process can make the first All the grids in the grid 3D model are aligned with the corresponding grids in the second grid model, that is, the point cloud of the first grid 3D model is registered with the point cloud of the second grid 3D model.

Optionally, in this embodiment of the present application, the electronic device may perform coarse registration on the first grid 3D model and the second grid 3D model, and then perform coarse registration on the first grid 3D model and the second grid 3D model. The fine registration is performed on the three-dimensional grid model. Among them, the coarse registration mainly refers to calculating the affine transformation matrix between the two point clouds when the transformation relationship between the two point clouds is unknown. The projection transformation matrix includes a rotation matrix and a translation matrix); fine registration refers to calculating a more accurate affine transformation matrix on the basis of the affine transformation matrix calculated by rough registration.

Exemplarily, taking the first part of the target person as the face of the target person, and the second part as the upper body of the target person as an example, in the above rough registration process, the electronic device can use the combination of the face image and the face image in the upper body The point cloud calculation affine transformation matrix formed by the face feature points, for example, the 68 face feature points shown in Figure 4. After the rough registration, the affine transformation matrix obtained by the rough registration is used as the initial affine transformation matrix to calculate a more accurate affine transformation matrix to align the first grid 3D model and the second grid 3D model. Optionally, the fine registration algorithm may be an iterative closest point (ICP) algorithm or various variants of the ICP algorithm, which is not limited in this embodiment of the present application.

Optionally, the fusion processing of the first grid 3D model and the second grid 3D model by the electronic device is to project the 3D model onto a 2D plane, perform fusion processing according to the obtained plane expansion diagram, and then merge the plane The expanded image is projected into a three-dimensional space to obtain the target three-dimensional model. Referring to FIG. 6 , as shown in FIG. 7 , step 304 may be implemented through steps 3041 to 3045 .

Step 3041. The electronic device converts the first grid 3D model to a 2D plane to obtain a first plane expansion diagram. In step 3042, the electronic device converts the 3D model of the second grid into a 2D plane to obtain a second plane expansion diagram. Wherein, the first image area in the first plan expansion view corresponds to the second image area in the second plan expansion view, and the first image area and the second image area correspond to the first part of the target person. For example, the first grid 3D model is a grid 3D model of a human face, and the second grid 3D model is a grid 3D model of an upper body, then the first image area in the first plane expanded view may be an area corresponding to a human face, Then the second image area in the second plane expanded view is also an area corresponding to the face, that is, the first image area and the second image area correspond to the first part (ie, the face part) of the target person.

In the embodiment of the present application, the electronic device may project the first grid three-dimensional model and the second grid three-dimensional model to a two-dimensional plane by using two-dimensional parameterization technology to obtain the first plane expansion diagram and the second plane expansion diagram. For example, a cylindrical projection can be used to surround a cylindrical surface and make it tangent or cut, and then project the latitude and longitude points on the ellipsoid surface onto the cylindrical surface according to certain conditions, and then, along a line of the cylindrical surface, The bus bar is cut open and developed into a plane to obtain the first plane expansion diagram or the second plane expansion diagram. For example, FIG. 8 is a plane expansion diagram after the grid 3D model corresponding to the upper body is projected onto a 2D plane.

Step 3043, the electronic device crops the first plane expansion view to obtain the first image area, and crops the second plane expansion view to obtain the second image area. In the embodiment of the present application, taking the first image area as a face part as an example, the electronic device generates a mask of the face range (also called a face area frame) according to the face feature points, and then uses the face area frame from the first The first-plane expanded view and the second-planar expanded view are used to cut out the face part.

Step 3044, the electronic device replaces the second image area in the second expanded plan view with the first image area in the first expanded plan view, to obtain the target expanded plan view of the target person. In the embodiment of the present application, the electronic device stitches the edge of the first image area and the edge after removing the second image area in the second plan view to obtain the target plan view of the target person. It should be understood that the electronic device cuts out the first image area from the first expanded view, and the electronic device cuts the second expanded view to remove the second image area in the second expanded view. After smoothing the edge of an image area, and smoothing the edge after removing the second image area in the second plane expansion diagram, the electronic device compares the edge of the first image area with the second plane expansion diagram after removing the second image area The edge is stitched to obtain the target plane expansion diagram. Specifically, the electronic device takes the edge of the first image region and the edge after removing the second image region in the second plane expansion diagram as constraint conditions, calculates the closest vertex of the two edges, and then calculates the closest vertex of the two edges to Close vertices are joined to achieve edge stitching. Exemplarily, (a) in FIG. 9 is a schematic diagram of the effect of the unfolded plan view to be stitched, and (b) in FIG. 9 is a schematic diagram of the effect of converting the unfolded plan view to be stitched into a three-dimensional space.

In step 3045, the electronic device performs three-dimensional conversion on the two-dimensional plane expansion diagram of the target to obtain a three-dimensional model of the target. It should be understood that the inverse process of the process of mapping the three-dimensional information to the two-dimensional plane in the above step 3041 or step 3042 performs three-dimensional conversion on the target two-dimensional plane expansion diagram to obtain the target three-dimensional model, and the target three-dimensional model is a grid three-dimensional model .

Optionally, referring to FIG. 7, as shown in FIG. 10, before the electronic device performs fusion processing on the first grid 3D model and the second grid 3D model (that is, step 304), the character 3D model provided by the embodiment of the present application The reconstruction method further includes step 305. Step 305, the electronic device performs at least one of mesh smoothing or mesh simplification on the first mesh 3D model and/or the second mesh 3D model.

In the embodiment of the present application, in one case, in the first grid 3D model obtained through the above step 302 and the second network 3D model obtained through step 303, there may be noise grids (that is, a large deviation from the actual model grid), in order to obtain a more accurate target 3D model, the electronic device can use a related grid smoothing algorithm to perform grid smoothing on the first grid 3D model and/or the second grid 3D model to delete the noise network grid. Exemplarily, the grid smoothing algorithm can be any one of Taubin smoothing algorithm, Laplacian smoothing algorithm, and average curvature (Curvature) smoothing algorithm, which is selected according to the actual situation, and is not limited in the embodiment of the present application .

In the embodiment of the present application, in another case, in the first grid 3D model obtained through the above step 302 and the second network 3D model obtained through step 303, the density of the grids may be relatively high, so the network Grid fusion processing requires a large amount of calculation and takes a long time. In order to reduce the calculation amount in the process of grid 3D model fusion, the electronic device can use the grid simplification algorithm to process the first grid 3D model and/or the second grid The 3D model is subjected to mesh simplification. Exemplarily, the grid simplification algorithm may be any one of the edge collapse (Edge Collapse) algorithm and the metric-based edge collapse algorithm, which is selected according to the actual situation, and is not limited in the embodiment of the present application.

Optionally, when it is necessary to perform mesh smoothing and mesh simplification on the grid 3D model, the order of these two processes may not be limited. For example, the grid smoothing process may be performed on the grid 3D model first, and then The smoothed 3D mesh model is subjected to mesh simplification; the mesh simplification may also be performed on the mesh 3D model first, and then the mesh smoothing is performed on the simplified mesh 3D model. In an implementation manner, the grid smoothing process is first performed on the grid three-dimensional model, and then the grid simplification process is performed on the smoothed grid three-dimensional model, which can better improve the quality of the reconstructed three-dimensional model.

Optionally, in the embodiment of the present application, the electronic device stitches the edge of the first image area with the edge after removing the second image area in the second plane expansion view, and after obtaining the target plane expansion view of the target person, the electronic device Texture recalculation is performed on the expanded target plane to obtain the vertex coordinates of the mesh three-dimensional model corresponding to the first part (for example, the face area) of the target person in the expanded target plane. Specifically, the electronic device calculates the vertex mapping relationship between the first grid 3D model and the area covered (or called replaced) by the first grid 3D model in the second grid 3D model. According to the vertex mapping relationship, The center of gravity coordinates are used for interpolation to obtain the vertex coordinates of the grid three-dimensional model corresponding to the first part (for example, the face area) of the target person in the target plane expanded view.

Taking the first-scale image as the face image of the target person and the second-scale image as the upper body image of the target person as an example, the process of reconstructing the 3D model of the upper body of the target person based on the face image of the target person and the upper body image of the target person can be referred to The flow chart shown in Fig. 11, firstly, preprocessing (i.e., face detection, upper body detection, cropping, etc.) is performed on the collected image containing the target person to obtain the face image of the target person and the upper body image of the target person; secondly, the human face The image is input to the first voxel regression network to obtain a 3D face voxel model, and the upper body image is input to the second voxel regression network to obtain a 3D upper body body voxel model; again, the face voxel 3D model is converted into a face mesh 3D model Model, converting the upper body voxel 3D model into an upper body mesh 3D model; then, performing at least one of mesh smoothing or mesh simplification on the face mesh 3D model and upper body mesh 3D model; finally, processing The final face mesh model and the processed upper body mesh three-dimensional model are subjected to mesh fusion processing to obtain the upper body three-dimensional model of the target person.

To sum up, in the method for reconstructing a 3D model of a character provided by the embodiment of the present application, after the electronic device can obtain the first scale image and the second scale image of the target person, the electronic device determines the first grid 3D model corresponding to the first scale image. model, and determine the grid 3D model corresponding to the second-scale image, and then, the electronic device performs fusion processing on the first grid 3D model corresponding to the first-scale image and the second grid 3D model corresponding to the second-scale image, to A three-dimensional model of the target is obtained, and the three-dimensional model of the target is used to display at least a second portion of the target person. Wherein, the first scale image includes a first part of the target person, the second scale image includes at least a second part of the target person, and the first part is a part of the second part. Through the method for reconstructing a 3D model of a character provided by the embodiment of the present application, the advantages of rich detailed information and high resolution embodied in the 3D model of a small-scale image (such as a first-scale image) and the advantages of a large-scale image (such as a second-scale image) The 3D model of the image) can reflect the advantages of a wide range of character characteristics (that is, integrity), and the 3D models of the images of different scales of the target person are fused to obtain a 3D model of the target person with a more complete model and richer detailed information. , which not only extends the application scenarios of the three-dimensional character model, but also improves the quality of the reconstructed three-dimensional model of the character, thereby improving user experience.

Correspondingly, an embodiment of the present application provides a device for reconstructing a 3D model of a character. The device can be applied to electronic equipment, and the functional modules of the device can be divided according to the above method example. For example, each function can be divided corresponding to each function module, or integrate two or more functions into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present invention is schematic, and is only a logical function division, and there may be another division manner in actual implementation.

In the case of dividing each functional module corresponding to each function, FIG. 12 shows a possible structural schematic diagram of the apparatus for reconstructing a 3D model of a person involved in the above embodiment. As shown in FIG. 12 , the device includes an acquisition module 1201 , a determination module 1202 and a fusion module 1203 . An acquisition module 1201, configured to acquire a first scale image of the target person and a second scale image, the first scale image includes a first part of the target person, the second scale image includes at least a second part of the target person, the first part is A part of the second part, for example, executes step 301 in the above method embodiment. The determining module 1202 is configured to determine the first grid 3D model corresponding to the first-scale image, and determine the second grid 3D model corresponding to the second-scale image, for example, perform

steps

302 and 303 in the above method embodiments. The fusion module 1203 is configured to perform fusion processing on the first grid 3D model and the second grid 3D model to obtain a target 3D model, and the target 3D model is used to display at least the second part of the target person, for example, performing the above method implementation Step 304 in the example.

Optionally, the determination module 1202 is specifically configured to determine a first voxel 3D model corresponding to the first scale image based on the first voxel regression network, and convert the first voxel 3D model into a first grid 3D model; And based on the second voxel regression network, determine the second voxel three-dimensional model corresponding to the second scale image, and convert the second voxel three-dimensional model into the second grid three-dimensional model, for example, perform step 3021 in the above method embodiment Go to step 3022, step 3031 to step 3032.

Optionally, the above fusion module 1203 is specifically used to convert the first grid 3D model to a 2D plane to obtain the first plane expansion; and convert the second grid 3D model to a 2D plane to obtain the second plane expansion Figure; the first image area in the first plane expanded view corresponds to the second image area in the second plane expanded view, and the first image area and the second image area correspond to the first part; and cutting the first plane expanded view, To obtain the first image area, crop the second plane expansion diagram to obtain the second image area; and replace the second image area in the second plane expansion diagram with the first image area to obtain the target plane expansion of the target person Fig. 3D transformation is performed on the target plane expanded view to obtain the target 3D model, for example, step 3041 to step 3045 in the above-mentioned method embodiment is performed.

Optionally, the apparatus for reconstructing a 3D model of a character provided in the embodiment of the present application further includes a processing module 1204, which is used to perform grid smoothing on the first grid 3D model and/or the second grid 3D model At least one of processing or mesh simplification processing, for example, execute step 305 in the above method embodiment.

Each module of the above-mentioned reconstruction device for a three-dimensional model of a person can also be used to perform other actions in the above-mentioned method embodiment. All relevant content of each step involved in the above-mentioned method embodiment can be referred to the function description of the corresponding functional module. This will not be repeated here.

In the case of using integrated units, FIG. 13 shows another possible structural schematic diagram of the apparatus for reconstructing a 3D model of a person involved in the above embodiment. As shown in FIG. 13 , the apparatus for reconstructing a three-dimensional model of a character provided by the embodiment of the present application may include: a processing module 1301 and a communication module 1302 . The processing module 1301 can be used to control and manage the actions of the device. For example, the processing module 1301 can be used to support the device to execute steps 301 to 304 and 305 in the above method embodiments, and/or to implement the steps described herein. Other procedures of the described techniques. A communications module 1302 may be used to support communications of the device with other network entities. Optionally, as shown in FIG. 13 , the apparatus for reconstructing a three-dimensional model of a person may further include a storage module 1303 for storing program codes and data of the apparatus.

Wherein, the processing module 1301 may be a processor or a controller (such as the above-mentioned processor 210 shown in FIG. 2 ), such as a central processing unit (central processing unit, CPU), a general purpose processor, a digital signal processor (digital signal processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any of them combination. It can implement or execute various exemplary logical blocks, modules and circuits described in conjunction with the disclosure of the embodiments of the present invention. The above-mentioned processors may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and so on. The communication module 1302 may be a transceiver, a transceiver circuit, or a communication interface (for example, it may be the mobile communication module 250 or the wireless communication module 260 shown in FIG. 2 ). The storage module 1303 may be a memory (for example, it may be the above-mentioned internal memory 221 shown in FIG. 2 ).

When the processing module 1301 is a processor, the communication module 1302 is a transceiver, and the storage module 1303 is a memory, the processor, the transceiver, and the memory may be connected through a bus. The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended Industry standard architecture, EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on.

For more details about the modules contained in the above-mentioned reconstruction device for a three-dimensional model of a person to realize the above-mentioned functions, please refer to the descriptions in the previous method embodiments, and will not be repeated here.

Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, all or part of the processes or functions according to the embodiments of the present application will be generated. The computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transferred from a website, computer, server, or data center by wire (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device including a server, a data center, and the like integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a magnetic disk, a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state drive (solid state drives, SSD)), etc. .

Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the above-described system, device, and unit, reference may be made to the corresponding process in the foregoing method embodiments, and details are not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device and method can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk, and other various media capable of storing program codes.

The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

A method for reconstructing a three-dimensional model of a person, characterized in that it is applied to electronic equipment, and the method includes:

Acquiring a first scale image and a second scale image of the target person, the first scale image includes a first part of the target person, the second scale image includes at least a second part of the target person, the first part is part of said second part;

determining a first grid 3D model corresponding to the first scale image;

determining a second grid 3D model corresponding to the second scale image;

Fusion processing is performed on the first grid 3D model and the second grid 3D model to obtain a target 3D model, and the target 3D model is used to display at least a second part of the target person.
The method according to claim 1, characterized in that,

The determining the first grid 3D model corresponding to the first scale image includes:

determining a first voxel three-dimensional model corresponding to the first scale image based on the first voxel regression network, and converting the first voxel three-dimensional model into the first grid three-dimensional model;

The determining the second grid 3D model corresponding to the second scale image includes:

Based on the second voxel regression network, a second voxel three-dimensional model corresponding to the second scale image is determined, and the second voxel three-dimensional model is converted into the second grid three-dimensional model.
The method according to claim 1 or 2, wherein the fusion processing of the first grid 3D model and the second grid 3D model to obtain a target 3D model includes:

converting the first grid three-dimensional model to a two-dimensional plane to obtain a first plane expansion diagram;

converting the second grid three-dimensional model to a two-dimensional plane to obtain a second plane expansion; the first image area in the first plane expansion corresponds to the second image area in the second plane expansion, The first image area and the second image area correspond to the first portion;

clipping the first plane expanded view to obtain the first image area;

clipping the second plane expansion image to obtain the second image area;

replacing the second image area in the second plane expansion diagram with the first image area to obtain a target plane expansion diagram of the target person;

Three-dimensional conversion is performed on the target plane expansion diagram to obtain the target three-dimensional model.
The method according to any one of claims 1 to 3, wherein before performing fusion processing on the first grid 3D model and the second grid 3D model, the method further comprises:

At least one of mesh smoothing or mesh simplification is performed on the first mesh 3D model and/or the second mesh 3D model.
The method according to any one of claims 1 to 4, characterized in that,

The first part is the face of the target person, and the second part is the upper body of the target person; or,

The first part is the upper body of the target person, and the second part is the whole body of the target person.
A reconstruction device for a three-dimensional model of a person, characterized in that it is applied to electronic equipment, and the device includes: an acquisition module, a determination module and a fusion module;

The acquiring module is configured to acquire a first scale image of the target person and a second scale image, the first scale image includes a first part of the target person, and the second scale image includes at least a second scale image of the target person two parts, said first part being part of said second part;

The determination module is configured to determine a first grid 3D model corresponding to the first scale image, and determine a second grid 3D model corresponding to the second scale image;

The fusion module is configured to perform fusion processing on the first grid 3D model and the second grid 3D model to obtain a target 3D model, and the target 3D model is used to display at least the first 3D model of the target person two parts.
The device according to claim 6, characterized in that,

The determining module is specifically configured to determine a first voxel three-dimensional model corresponding to the first scale image based on the first voxel regression network, and convert the first voxel three-dimensional model into the first grid a three-dimensional model; and based on a second voxel regression network, determining a second voxel three-dimensional model corresponding to the second-scale image, and converting the second voxel three-dimensional model into the second grid three-dimensional model.
A device according to claim 6 or 7, characterized in that

The fusion module is specifically used to convert the first grid three-dimensional model to a two-dimensional plane to obtain a first plane expansion diagram; and convert the second grid three-dimensional model to a two-dimensional plane to obtain a second plane Expanded view; the first image area in the first plan expanded view corresponds to the second image area in the second plan expanded view, and the first image area and the second image area correspond to the first part; and clipping the first plane expansion view to obtain the first image area; cutting the second plane expansion view to obtain the second image area; and The second image area is replaced with the first image area to obtain a target plane expanded view of the target person; and then a three-dimensional transformation is performed on the target plane expanded view to obtain the target three-dimensional model.
The device according to any one of claims 6 to 8, wherein the device further comprises a processing module;

The processing module is configured to perform at least one of mesh smoothing or mesh simplification on the first mesh 3D model and/or the second mesh 3D model.
The device according to any one of claims 6 to 9, characterized in that,

The first part is the face of the target person, and the second part is the upper body of the target person; or,

The first part is the upper body of the target person, and the second part is the whole body of the target person.
An electronic device, characterized in that it includes a memory and at least one processor connected to the memory, the memory is used to store instructions, and after the instructions are read by at least one processor, the instructions according to claims 1 to 5 are executed. any one of the methods described.
A computer-readable storage medium on which a computer program is stored, wherein the computer program is executed by a processor to implement the method according to any one of claims 1 to 5.
A computer program product, characterized in that the computer program product includes instructions, and when the computer program product is run on a computer, the method according to any one of claims 1 to 5 is executed.