WO2022057526A1 - 三维模型重建方法、三维重建模型的训练方法和装置 - Google Patents

三维模型重建方法、三维重建模型的训练方法和装置 Download PDF

Info

Publication number
WO2022057526A1
WO2022057526A1 PCT/CN2021/112089 CN2021112089W WO2022057526A1 WO 2022057526 A1 WO2022057526 A1 WO 2022057526A1 CN 2021112089 W CN2021112089 W CN 2021112089W WO 2022057526 A1 WO2022057526 A1 WO 2022057526A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature map
feature
target
texture
Prior art date
Application number
PCT/CN2021/112089
Other languages
English (en)
French (fr)
Inventor
赵艳丹
林书恒
曹煊
葛彦昊
汪铖杰
曹玮剑
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21868354.8A priority Critical patent/EP4109412A4/en
Publication of WO2022057526A1 publication Critical patent/WO2022057526A1/zh
Priority to US17/976,259 priority patent/US20230048906A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • G06T15/503Blending, e.g. for anti-aliasing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/529Depth or shape recovery from texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2021Shape modification

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a three-dimensional model reconstruction method, apparatus, computer equipment and storage medium, and a training method, apparatus, computer equipment and storage medium for a three-dimensional reconstruction model.
  • the traditional technology obtains the shape map, texture map, etc. of the input image, and reconstructs the three-dimensional model according to the obtained shape map, texture map, etc.
  • the shape map and texture map obtained by traditional techniques are prone to distortion, resulting in inaccurate reconstructed 3D models.
  • Embodiments of the present application provide a three-dimensional model reconstruction method, apparatus, computer equipment, and storage medium, and a three-dimensional reconstruction model training method, apparatus, computer equipment, and storage medium.
  • a three-dimensional model reconstruction method includes:
  • image feature coefficients obtain the texture- and shape-based global feature map and the initial local feature map of the input image respectively;
  • the global feature map and the target local feature map are spliced to obtain a target texture image and a target shape image;
  • a three-dimensional model reconstruction process is performed according to the target texture image and the target shape image to obtain a target three-dimensional model.
  • a method for training a three-dimensional reconstruction model comprising:
  • the three-dimensional reconstruction model According to the image feature coefficients, the global feature map and initial local features of the training image based on texture and shape are obtained respectively Figure, perform edge smoothing on the initial local feature map to obtain a target local feature map; based on texture and shape, respectively, splicing the global feature map and the target local feature map to obtain a target texture image and a target shape image ; carry out three-dimensional model reconstruction processing according to the target texture image and the target shape image to obtain a predicted three-dimensional model;
  • the three-dimensional reconstruction model is trained according to the error between the training image and the predicted two-dimensional image until a convergence condition is satisfied, and a trained three-dimensional reconstruction model is obtained.
  • a three-dimensional model reconstruction device includes:
  • a first coefficient obtaining module for obtaining image feature coefficients of the input image
  • a feature map acquisition module configured to acquire the texture- and shape-based global feature map and the initial local feature map of the input image according to the image feature coefficients
  • a smoothing processing module for performing edge smoothing processing on the initial local feature map to obtain a target local feature map
  • a feature map splicing module for splicing the global feature map and the target local feature map based on texture and shape, respectively, to obtain a target texture image and a target shape image
  • the first model reconstruction module is configured to perform three-dimensional model reconstruction processing according to the target texture image and the target shape image to obtain a target three-dimensional model.
  • a training device for a three-dimensional reconstruction model comprising:
  • the second coefficient obtaining module is used to obtain the image feature coefficient and rendering coefficient of the training image
  • the second model reconstruction module is used for inputting the image feature coefficients into a 3D reconstruction model based on deep learning, so that the 3D reconstruction model:
  • the training images are obtained based on texture and shape, respectively.
  • the global feature map and the initial local feature map are obtained by performing edge smoothing processing on the initial local feature map to obtain the target local feature map; based on the texture and shape, the global feature map and the target local feature map are spliced, respectively, obtaining a target texture image and a target shape image; performing a three-dimensional model reconstruction process according to the target texture image and the target shape image to obtain a predicted three-dimensional model;
  • an image rendering module configured to perform image rendering processing on the predicted three-dimensional model according to the rendering coefficient to obtain a predicted two-dimensional image
  • the reconstruction model training module is used for training the 3D reconstruction model according to the error between the training image and the predicted 2D image, until the convergence condition is satisfied, and the trained 3D reconstruction model is obtained.
  • a computer device comprising a memory and one or more processors, the memory storing computer-readable instructions that, when executed by the one or more processors, cause the one or more processors
  • the processor executes the steps of the above-mentioned three-dimensional model reconstruction method and three-dimensional reconstruction model training method.
  • One or more non-volatile computer-readable storage media having computer-readable instructions stored thereon that, when executed by one or more processors, cause the one
  • the steps of the above-mentioned three-dimensional model reconstruction method and three-dimensional reconstruction model training method are performed by or multiple processors.
  • a computer program product or computer program comprising computer readable instructions stored in a computer readable storage medium from which a processor of a computer device readable storage The medium reads the computer-readable instructions, and the processor executes the computer-readable instructions, so that the computer device executes the steps of the above-mentioned three-dimensional model reconstruction method and three-dimensional reconstruction model training method.
  • FIG. 1 is an application environment diagram of a three-dimensional model reconstruction method and a three-dimensional reconstruction model training method in one embodiment
  • FIG. 2 is a schematic flowchart of a three-dimensional model reconstruction method in one embodiment
  • 3 is a schematic diagram of discontinuous edges of texture images in one embodiment
  • 5 is a schematic diagram of discontinuous edges of shape images in one embodiment
  • FIG. 6 is a schematic structural diagram of a gradient smoothing feature map in one embodiment
  • FIG. 8 is a schematic diagram of the effect of filling and merging feature values of feature maps of key parts in one embodiment
  • FIG. 9 is a schematic diagram of a process for obtaining a feature map of a target facial organ in one embodiment
  • FIG. 10 is a schematic diagram of a feature map processing flow of multiple feature channels in one embodiment
  • FIG. 11 is a schematic structural diagram of a convolutional autoencoder in one embodiment
  • FIG. 12 is a schematic diagram of a process for obtaining a face-swapped image in one embodiment
  • FIG. 13 is a schematic flowchart of a three-dimensional model reconstruction method in another embodiment
  • Fig. 15 is the edge effect comparison diagram of 2D texture map in one embodiment
  • Figure 16 is a comparison diagram of a 2D shape map in one embodiment
  • 17 is a schematic flowchart of a training method for a 3D reconstruction model in one embodiment
  • FIG. 18 is a schematic flowchart of a training method for a 3D reconstruction model in another embodiment
  • 19 is a structural block diagram of a three-dimensional model reconstruction apparatus in one embodiment
  • 20 is a structural block diagram of a training apparatus for a three-dimensional reconstruction model in another embodiment
  • Figure 21 is an internal structure diagram of a computer device in one embodiment
  • FIG. 22 is an internal structure diagram of a computer apparatus in another embodiment.
  • the three-dimensional model reconstruction method, device, computer equipment and storage medium provided by the embodiments of the present application, as well as the training method, device, computer equipment and storage medium for the three-dimensional reconstruction model, can be implemented based on artificial intelligence (Artificial Intelligence, AI) technology.
  • artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology.
  • the basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • various embodiments of the present application may be implemented based on a computer vision technology (Computer Vision, CV).
  • Computer Vision is a science that studies how to make machines "see”. More specifically, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets, and further graphics processing to make Computer processing becomes an image more suitable for human eye observation or transmission to instruments for detection.
  • computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition, Optical Character Recognition), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual Reality, augmented reality, simultaneous positioning and map construction and other technologies, as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • OCR Optical Character Recognition, Optical Character Recognition
  • video processing video semantic understanding, video content/behavior recognition
  • 3D object reconstruction 3D technology
  • virtual Reality augmented reality
  • simultaneous positioning and map construction and other technologies as well as common biometric identification technologies such as face recognition and fingerprint recognition.
  • AIaaS AI as a Service, Chinese for "AI as a service”
  • AIaaS Artificial intelligence cloud service
  • This service model is similar to opening an AI-themed mall: all developers can access one or more artificial intelligence services provided by the platform through the API (Application Programming Interface) interface. For example, to reconstruct the input image in 3D and output it in 2D, some experienced developers can also use the AI framework and AI infrastructure provided by the platform to deploy and maintain their own cloud AI services.
  • the terminal 102 communicates with the server 104 through the network.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • the terminal 102 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers.
  • cloud services can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data Cloud servers for basic cloud computing services such as artificial intelligence platforms.
  • cloud databases cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data Cloud servers for basic cloud computing services such as artificial intelligence platforms.
  • Both the terminal 102 and the server 104 can be independently used to execute the three-dimensional model reconstruction method and the three-dimensional reconstruction model training method provided in the embodiments of the present application.
  • the server obtains the image feature coefficients of the input image, and according to the image feature coefficients, respectively obtains the texture- and shape-based global feature map and the initial local feature map of the input image.
  • the server performs edge smoothing processing on the initial local feature map to obtain a target local feature map, and splices the global feature map and the target local feature map based on texture and shape, respectively, to obtain a target texture image and a target shape image.
  • the server performs three-dimensional model reconstruction processing according to the target texture image and the target shape image, and obtains the target three-dimensional model.
  • the server can send the obtained three-dimensional model of the target to the terminal for display on the terminal.
  • the terminal 102 and the server 104 may also be used in cooperation to execute the three-dimensional model reconstruction method and the three-dimensional reconstruction model training method provided in the embodiments of the present application.
  • the server obtains training images from the terminal.
  • the server obtains the image feature coefficients and rendering coefficients of the training image, and the server inputs the image feature coefficients into the 3D reconstruction model based on deep learning, so that the 3D reconstruction model:
  • the image feature coefficients the global features of the training image based on texture and shape are obtained respectively.
  • image and the initial local feature map perform edge smoothing on the initial local feature map to obtain the target local feature map; based on the texture and shape, respectively, splicing the global feature map and the target local feature map to obtain the target texture image and target shape image;
  • the 3D model is reconstructed according to the target texture image and the target shape image, and the predicted 3D model is obtained.
  • the server performs image rendering processing on the predicted three-dimensional model according to the rendering coefficient, and obtains the predicted two-dimensional image.
  • the server trains the 3D reconstruction model according to the error of the training image and the predicted 2D image until the convergence condition is satisfied, and the trained 3D reconstruction model is obtained.
  • the server may reconstruct the three-dimensional model based on the trained three-dimensional reconstruction model.
  • the server can also send the trained three-dimensional reconstruction model to the terminal, and the terminal can reconstruct the three-dimensional model through the trained three-dimensional reconstruction model.
  • Local feature map perform edge smoothing on the initial local feature map to obtain the target local feature map, and the edge area of the target local feature map is in a smooth state; based on texture and shape, respectively, the global feature map and the target local feature map are spliced, The target texture image and target shape image with smooth edges are obtained; the target 3D model reconstructed from the target texture image and target shape image is not prone to distortion.
  • a three-dimensional model reconstruction method is provided, and it is exemplified that the method is applied to a computer device, and the computer device may be the terminal 102 or the server 104 in the above-mentioned FIG. 1 .
  • the method includes the following steps:
  • the input image may be an image containing various types of objects, such as a face image, an animal image, a building image, and the like.
  • the input image may be composed of global features and local features. Taking a face image as an example, the global features may be rough features of the entire face, and the local features may be facial organs (for example: eyes, nose, mouth, etc.) , ears, etc.)
  • the input image can be one image or multiple images containing different information. When there are multiple input images, these input images can be processed synchronously or asynchronously, and the corresponding multiple images can be reconstructed respectively. 3D model of the target.
  • the features corresponding to the image may be color features and texture features of the image (texture features are visual features that reflect the homogeneous phenomenon in the image, which reflect the slowness and slowness of the surface of the object. changing or periodically changing surface structure organization and arrangement properties), shape features (can include contour features and regional features, where contour features are mainly for the outer boundary of the object, and regional features are related to the entire shape area) and spatial relationship features (can Refers to the arrangement relationship between objects in the image) and so on.
  • the image feature coefficients may be coefficients that characterize the image features, and may be coefficients that describe the global, local, texture, shape, and other features of the image.
  • the image feature coefficients may be texture feature coefficients, shape feature coefficients, or the like.
  • the texture feature coefficient may refer to a coefficient describing the texture feature, specifically, it may be a coefficient describing the surface structure, organization, and arrangement attributes of an image;
  • the shape feature coefficient may refer to a coefficient describing the shape feature, specifically, it may be a Coefficients describing image contours, image regions, etc.
  • acquiring the image feature coefficients of the input image may be implemented by a deep learning-based network model.
  • the network model may be an autoencoder or the like.
  • deep learning Deep Learning
  • ML Machine Learning
  • ML Machine Learning
  • deep learning is a new research direction in the field of machine learning (ML, Machine Learning), which is introduced into machine learning to make it closer to the original goal - artificial intelligence.
  • ML Machine Learning
  • deep learning is to learn the inherent laws and representation levels of sample data, and the information obtained in these learning processes is of great help to the interpretation of data such as images. Its ultimate goal is to enable machines to have the ability to analyze and learn like humans, and to recognize data such as images.
  • the global feature map can be a feature map used to describe the global information of the input image.
  • the global feature map can represent the overall image information, and its size can be the same as the input image or smaller than the input image. Strong robustness.
  • the local feature map can be a feature map used to describe the local information of the input image, and can be a feature map corresponding to at least one local area. The size of the local feature map can be smaller than or equal to the global feature map, and the area concerned by the local feature map is smaller. Aims to generate more detail.
  • the global feature map may be a global feature map representing the overall face situation
  • the global feature map of the face contains fuzzy overall face information
  • the local feature map may be A feature map of key parts representing local areas such as eyes, mouth, nose, ears, hair, eyebrows, etc.
  • This key part feature map contains clear local area details, such as: eye position, outline, eyeball size, pupil color, etc. .
  • the texture-based global feature map can be a global texture feature map, and the shape-based global feature map can be a global shape feature map;
  • the texture-based initial local feature map can be an initial local texture feature map, and the shape-based global feature map can be a global shape feature map.
  • the initial local feature map may be an initial local shape feature map.
  • the global texture feature map and the initial local texture feature map of the input image may be obtained according to the texture feature coefficients, and the global shape feature map and the initial local shape feature map of the input image may be obtained according to the shape feature coefficients.
  • edge smoothing is performed on the initial local feature map, and the image obtained after the edge smoothing process is used as the target local feature map.
  • edge smoothing processing may be performed on the initial local texture feature map and the initial local shape feature map respectively to obtain the target local texture feature map and the target local shape feature map, which are used as the target local feature map.
  • the edge smoothing process may refer to smoothing the edge region of the image.
  • the smoothing process may be a gradation process of the feature value, for example, the color channel value of the image is gradually reduced in a certain direction.
  • the embodiments of the present invention can implement the processing of global and local features through a dual-branch local detail enhancement (Global-local) model.
  • Global-local local detail enhancement
  • the local detail enhancement model receives the input image, one branch returns the global information (global feature map), and one branch returns the local information (local feature map).
  • global feature map global feature map
  • local feature map local feature map
  • the target texture image is a texture image obtained by integrating global texture features and local texture features
  • the target shape image is a shape image obtained by integrating global shape features and local shape features. Both the target texture image and the target shape image may be two-dimensional images.
  • the global feature map and the target local feature map are spliced to obtain a target image including the global feature and the local feature, and the target image includes the target texture image and the target shape image.
  • this step splices the global feature map and the target local feature map based on texture to obtain the target texture image, and splices the global feature map and the target local feature map based on the shape to obtain the target shape image.
  • edge smoothing may also be performed on the global feature map, and the global feature map after edge smoothing and the target local feature map may be spliced to obtain the corresponding target texture image and target shape image.
  • S205 Perform a three-dimensional model reconstruction process according to the target texture image and the target shape image to obtain a target three-dimensional model.
  • the three-dimensional model refers to a digital model file containing the three-dimensional space coordinates of each sampling point on the surface of the object.
  • the implementation process of S205 may be: reconstructing the spatial contour of the three-dimensional model according to the target shape image, and adding the target texture image to the surface of the spatial contour, that is, obtaining the target three-dimensional model carrying the texture information.
  • the size of the local feature map is often smaller than the size of the global feature map.
  • To splicing the global feature map and the local feature map it is necessary to make the two sizes the same, and then fuse the pixels at the same position to obtain Fuse images with global and local features. Based on this, it is necessary to perform feature value padding on the local feature map, so that the size of the local feature map and the global feature map are consistent. For example, by filling the local feature map with 0 (that is, setting the feature value of the pixel points in the outer region of the local feature map to 0) to achieve the same size as the global feature map, and then stitching at the same size. This leads to a problem that the edge of the local feature map is abrupt after splicing.
  • the convolution kernel moves to the position corresponding to the local feature map, due to the reason of filling 0, only the information of the global feature map is received by the convolution.
  • the convolution kernel moves to the position corresponding to the local feature map, it can simultaneously receive the information of the global feature map and the local feature map.
  • the information received by the convolution kernel changes from only global information ⁇ global information + local information, and the obtained information suddenly increases, which will cause the output of the convolution at this position to be relatively small. Mutations were previously generated that would cause discontinuities in the final generated results. Fig.
  • FIG. 3 is a texture image obtained by splicing the global texture feature map and the local texture feature map in the traditional technology, wherein the image on the right is the image after enlarging the eye edge region 301 in the texture feature map on the left. It can be seen that significant discontinuities are generated at the edge positions of the texture image. Zooming in to the whole face, this discontinuity will also cause the overall distortion of the texture image.
  • Figure 4 is a schematic diagram of discontinuous edges of the texture image in one embodiment, wherein Figure 4(a) is a normal texture, and Figure 4(b) To produce the distorted texture, it is evident that the edges of the mouth region of Figure 4(b) are distorted.
  • FIG. 5 is a schematic diagram of the discontinuity of the edge of the shape image in one embodiment, wherein Figure 5(a) It is a normal shape, and Fig. 5(b) is a distorted shape, wherein 501 represents the mouth area, and it can be clearly found that the edge of the mouth area of Fig. 5(b) is collapsed.
  • this discontinuity will also produce obvious block phenomenon in the model reconstruction process, so that the final reconstructed target 3D model has discontinuous mutations, for example, there are obvious color jumps in the corner of the eye.
  • the above three-dimensional model reconstruction method can be implemented based on a three-dimensional reconstruction model (which may be a neural network based on deep learning, and the three-dimensional reconstruction model can reconstruct a three-dimensional model based on an input image).
  • the computer equipment inputs the image of the three-dimensional model to be reconstructed into the trained three-dimensional reconstruction model, and the three-dimensional reconstruction model extracts image feature coefficients of the input image.
  • the computer device inputs the image feature coefficients corresponding to the images of the three-dimensional model to be reconstructed into the trained three-dimensional reconstruction model.
  • the 3D reconstruction model After obtaining the image feature coefficients, the 3D reconstruction model generates a texture- and shape-based global feature map and an initial local feature map of the input image according to the image feature coefficients, and performs edge smoothing on the initial local feature map to obtain the target local feature map. Texture and shape, the global feature map and the target local feature map are spliced to obtain the target texture image and the target shape image, and the 3D model reconstruction processing is performed according to the target texture image and the target shape image, and the target 3D model corresponding to the input image is obtained. The 3D reconstruction model outputs the 3D model of the target.
  • the global feature map and the initial local feature map based on the texture and shape of the input image are respectively obtained according to the image feature coefficients; the initial local feature map is edge-smoothed to obtain the target local feature map, and the target local feature map is obtained.
  • the edge area of the local feature map is in a smooth state; based on the texture and shape, the global feature map and the target local feature map are spliced to obtain the target texture image and target shape image with smooth edges; reconstructed from the target texture image and target shape image
  • the target 3D model effectively reduces the discontinuity problem of texture images and shape images, and suppresses the distortion problem of the reconstructed target 3D model.
  • performing edge smoothing processing on the initial local feature map to obtain a target local feature map includes: acquiring the boundary of the initial local feature map; acquiring the relationship between each pixel in the initial local feature map and the distance between the boundaries; edge smoothing is performed on the initial local feature map according to the distance to obtain the target local feature map.
  • the boundary of the initial local feature map may refer to the outermost boundary line of the initial local feature map. Since the initial local feature map can have boundaries in different directions, the number of boundaries may be more than one (for example, for box boundaries, there are four boundaries). Based on this, the distance between each pixel point and the boundary in the initial local feature map may refer to the distance between each pixel point and the nearest boundary.
  • edge smoothing is performed on the initial local feature map according to the distance between each pixel point and the boundary.
  • the smoothing process can be to adjust the eigenvalues of each pixel (the eigenvalue can be the color value such as the RGB value of the pixel, or the brightness value, etc.) according to a certain gradient, for example: the center of the image to the pixel point of the image boundary The color is set to darker and darker.
  • different distances can be smoothed with different gradients. For example, in a certain direction, a higher degree of eigenvalue adjustment is performed on distant pixels (pixels farther from the boundary), A lesser degree of eigenvalue adjustment is performed on distance pixels.
  • a pre-built edge smoothing processing model can be used to perform feature learning on the initial local feature map, determine the boundary of the initial local feature map, determine the distance between each pixel in the initial local feature map and the boundary, and then According to the distance, the initial local feature map is edge-smoothed to obtain the target local feature map, so as to solve the discontinuity caused by the convolution stage while ensuring the accuracy.
  • the distance between each pixel in the initial local feature map and the boundary is determined, and edge smoothing is performed according to the distance, so that pixels with different distances have different eigenvalues, and the effect of smooth transition of the edge is realized, so that the The accumulation kernel has a smooth transition when moving from the position corresponding to the global feature map to the position corresponding to the local feature map, which prevents the local feature map from being discontinuous after splicing.
  • performing edge smoothing processing on the initial local feature map according to the distance to obtain the target local feature map includes: acquiring an edge region of the initial local feature map according to the distance; The distance determines the feature weight value corresponding to each pixel point of the edge area, so that the feature weight value corresponding to the distant pixel point is greater than the feature weight value corresponding to the close distance pixel point; according to each pixel of the initial local feature map
  • the feature weight value corresponding to the point generates a gradient smoothing feature map; the feature weight value corresponding to the pixel point outside the edge area of the initial local feature map is a preset weight value, and the feature value of each pixel point in the gradient smoothing feature map is Obtained according to the corresponding feature weight value; multiply the feature value of each pixel in the gradient smoothing feature map and the feature value of the corresponding pixel in the initial local feature map, and obtain the target local feature according to the multiplication result picture.
  • the corresponding gradient smoothing feature map can be specifically determined. Taking the initial local feature map including the left eye (left eye) local feature map, the right eye (right eye) local feature map, the nose (nose) local feature map and the mouth (mouth) local feature map as an example, it can be determined in a targeted manner.
  • the gradient smooth feature map for the left eye, the gradient smooth feature map for the right eye, the gradient smooth feature map for the nose, and the gradient smooth feature map for the mouth can be determined in a targeted manner.
  • the gradient smoothing feature map f i can be expressed as the following formula:
  • the size of f i is the same as the size of the corresponding initial local feature map Li (including: L nose , L mouth , L left eye , L right eye ); h represents the vertical axis distance between a pixel in the gradient smoothing feature map relative to the reference point (which can be the point at the lower left corner of the feature map); w represents the distance between a pixel in the gradient smoothing feature map relative to the reference point (h, w) represents a certain pixel in the gradient smoothing feature map; k is the distance between the (h, w) point and the nearest boundary; ⁇ i represents the evaluation coefficient of a gradient smoothing feature map, using In order to characterize the width of the edge region, its size can be determined according to the size of the corresponding initial local feature map, for example, hi represents the height of an initial local feature map; wi represents the width of an initial local feature map; ⁇ i ⁇ h ⁇ h i - ⁇ i ,
  • the feature weight value is determined in a stepwise manner, and the gradient smooth feature map obtained according to the feature weight value is shown in Figure 6, wherein the outer solid line box represents the boundary 601, and the area between the outer solid line box and the inner solid line box Indicates the edge region, which includes three gradient regions 602, 603, and 604, and the gray value of the pixels in regions 602, 603, and 604 gradually changes from low to high, forming a gradient smooth feature map with a gradual transition of the edge.
  • the weight value can be set in the range of [0, 1].
  • the feature weight value of the farthest pixel point farthest from the boundary in the edge region is set as 0.9
  • the feature weight value of the closest distance pixel to the boundary is set to 0.1.
  • the region outside the edge region in the initial local feature map can be called a non-edge region
  • the feature weight value of the non-edge region can be 1, that is, the feature value adjustment is not performed on the non-edge region.
  • generating the gradient smooth feature map according to the feature weight value may be: constructing a blank feature map with the same size as the initial local feature map, the initial feature value of each pixel in the blank feature map is 0, and the calculated The feature weight value of is used as the new feature value of the corresponding pixel point. After updating the feature value of each pixel point, the gradient smooth feature map is obtained.
  • the gradient smoothing feature map is multiplied by the feature value of the pixel at the corresponding position in the initial local feature map.
  • the eigenvalues of each pixel in the gradient smoothing feature map of the left eye can be multiplied with the eigenvalues of the corresponding pixels in the local feature map of the left eye. According to the multiplication result Obtain the local feature map of the left eye target.
  • the feature value as the gray value corresponds to the pixel point a1 in the gradient smoothing feature map, and corresponds to the pixel point a2 in the initial local feature map, assuming that the feature value of a1 is 0.5, the feature of a2 If the value is 200, multiply 0.5 by 200 to obtain a new gray value of 100, which is the feature value of the corresponding pixel in the target local feature map.
  • Fig. 7 shows a schematic diagram of the effect of edge smoothing processing, wherein Fig. 7(a) represents the initial local feature map, and Fig. 7(b) represents the gradient smoothing feature map.
  • Fig. 7(a) represents the initial local feature map
  • Fig. 7(b) represents the gradient smoothing feature map.
  • the edges of the initial local feature map are more obvious and have strong jumps. If it is directly spliced with the global feature map, it is prone to the problem of discontinuous edges.
  • the edge area in Fig. 7(b) has been processed with edge smoothing, and the edge transitions smoothly, showing the effect of strong center and weak edge, and the effect intensity gradually decreases from the center to the edge.
  • the target local feature map shown in Figure 7(c) is obtained. It can be seen that the edge transition of the target local feature map in Figure 7(c) is smooth, and after splicing it with the global feature map, a target image with continuous edges can be obtained.
  • the corresponding feature weight value is set for each pixel in the edge area according to the distance, and the gradient smoothing feature map is generated accordingly, and then the feature value of the corresponding pixel in the initial local feature map is adjusted through the gradient smoothing feature map. After that, the target local feature map is obtained.
  • the edge smoothing of the local feature map is realized, and the target local feature map with smooth edges can be quickly obtained, and then the discontinuous edges generated by the splicing and convolution of the local feature map and the global feature map can be solved. Distortion of the final target image.
  • the input image is a face image;
  • the target local feature map includes key part feature maps corresponding to key parts of the face;
  • the global feature map and the The target local feature maps are spliced to obtain the target texture image and the target shape image, including: respectively filling the outer regions of each key part feature map with feature values, so as to obtain a filled key part feature map with the same size as the global feature map;
  • the feature maps of each of the filled key parts are merged to obtain a feature map of facial parts; based on texture and shape, the global feature map and the feature map of facial parts are spliced to obtain the target texture image and all feature maps. describe the target shape image.
  • the face may refer to a human face, an animal face, or the like.
  • the key parts of the face may be eyes (including left and right eyes), nose, mouth, ears, and the like.
  • 8 is a schematic diagram of the effect of filling and merging feature values of feature maps of key parts in one embodiment; after the feature values of key parts corresponding to the nose, mouth, left eye, and right eye are respectively filled with feature values (for example, filled with 0)
  • the resulting filled key part feature maps (nose, mouth, left eye and right eye) can be shown as (a), (b), (c) and (d) in Figure 8, respectively, where 801/802/ 803/804 represent the areas where the nose, mouth, left eye, and right eye are located, respectively.
  • the feature maps of these filled key parts are merged, and the areas 801, 802, 803, and 804 where the nose, mouth, left eye, and right eye are located are integrated, and the facial organ features shown in Figure 8(e) can be obtained. picture.
  • the filled key part feature map with the same size as the global feature map is obtained.
  • the global feature map and the filled key part feature map of the same size can be easily superimposed together, so that the The kernels are convolutionally processed to obtain the feature map of facial organs.
  • the global feature map can be effectively spliced to generate target texture images and target shape images with continuous edges.
  • the input image is a face image
  • the target local feature map includes a left-eye feature map, a right-eye feature map, a nose feature map, and a mouth feature map
  • the The global feature map and the target local feature map are spliced to obtain a target texture image and a target shape image, including: the left eye feature map, the right eye feature map, the nose feature map and the mouth feature map.
  • the outer area of the feature map is filled with feature values to obtain the target left eye feature map, the target right eye feature map, the target nose feature map and the target mouth feature map with the same size as the global feature map;
  • the eye feature map, the target right eye feature map, the target nose feature map, and the target mouth feature map are combined to obtain a facial organ feature map; based on texture and shape, the global feature map and all
  • the facial organ feature maps are spliced to obtain the target texture image and the target shape image.
  • the feature maps corresponding to the left eye, right eye, nose and mouth in the face image are obtained, these feature maps are merged to obtain the facial organ feature map, and the facial organ feature map and the global feature map are combined. Stitching is performed to obtain accurate and reliable target texture images and target shape images.
  • edge smoothing processing may be performed on the facial organ feature map to obtain the target facial organ feature map, and the global feature map and the target facial organ feature map are spliced based on texture and shape to obtain the target texture image and the target facial organ feature map. shape image.
  • FIG. 9 is a schematic diagram of the process of obtaining the target facial organ feature map in one embodiment, and the implementation process of performing edge smoothing processing on the facial organ feature map to obtain the target facial organ feature map may be as follows: The organ gradient smoothing feature map corresponding to the facial organ feature map [as shown in Figure 9(a)], obtain the facial organ feature map [as shown in Figure 9(b)], and smooth the organ gradient smoothing feature map and facial organ features. The feature values of the corresponding pixels in the map are multiplied to obtain the feature map of the target face as shown in Figure 9(c).
  • the target facial feature map with gradual edge can be obtained, which effectively weakens the sense of separation between the edge of the target texture image and the target shape image, so that the edge of the final target three-dimensional model can be obtained. Continuous, effectively reducing the occurrence probability of 3D model distortion.
  • the global feature map includes a global texture feature map and a global shape feature map
  • the target local feature map includes a local texture feature map and a local shape feature map
  • the The global feature map and the target local feature map are spliced to obtain a target texture image and a target shape image, including: splicing the global feature map and the target local feature map based on texture to obtain a target texture image;
  • the target shape image is obtained by splicing the global feature map and the target local feature map.
  • splicing the global feature map and the target local feature map based on texture to obtain a target texture image includes: splicing the global texture feature map and the local texture feature map, and splicing the obtained feature The image is convolved to integrate global texture features and local texture features to obtain the target texture image.
  • the target shape image is obtained by splicing the global feature map and the target local feature map based on the shape, including: splicing the global shape feature map and the local shape feature map, and convolving the spliced feature map to The global shape feature and the local shape feature are integrated to obtain the target shape image.
  • the global feature map and the local feature map are spliced, and the spliced feature map is convolved to obtain a target image fused with global features and local features, and reconstructed according to the target image.
  • a 3D model of the target is obtained.
  • the 3D model of the target integrates various aspects of information, which can more comprehensively represent the image information, correspond to the input image as much as possible, and achieve reliable reconstruction of the 3D model.
  • the global feature map and the local feature map may be composed of more than one layer, but multiple layers of feature maps.
  • the global feature map and the local feature map need to be spliced and then a convolution module is used to integrate the global and local information together.
  • both the global feature map and the target local feature map correspond to at least one feature channel; the global feature map and the target local feature map are spliced based on texture and shape, respectively, Obtaining the target texture image and the target shape image includes: splicing the global feature map and the target local feature map in each of the feature channels based on the texture and shape respectively, and splicing the features obtained by splicing each of the feature channels
  • the graph is convolved to integrate global features and local features to obtain target texture images and target shape images.
  • the feature map processing flow of multiple feature channels (both texture-based and shape-based processing flow can be implemented through the processing flow of this figure, that is, the processing flow of texture and shape can be the same) as shown in Figure 10.
  • the global feature maps and initial local feature maps (initial local feature maps corresponding to the left eye, right eye, nose, and mouth) on the three feature channels are obtained respectively, and the initial local feature maps on these feature channels are edged respectively.
  • the target local feature maps on the three feature channels are obtained, and the global feature maps and target local feature maps on each feature channel are spliced to obtain the spliced feature maps 1001 on the three feature channels.
  • the images 1001 are stacked together in order, they are input into the convolution module for convolution to obtain the target image.
  • the global feature map and the local feature map are spliced based on the feature channel and subjected to convolution processing, which can fully fuse the global features and local features on multiple feature channels, so that the obtained target image can reflect more comprehensively.
  • the features of the input image are finally obtained to obtain a more accurate 3D model of the target.
  • the global feature map and the local feature map may be concatenated and convolved based on texture and shape, respectively, to obtain corresponding target texture images and target shape images.
  • the global texture feature map and the local texture feature map are spliced, and the feature map obtained by splicing is convolved to integrate the global texture feature and the local texture feature to obtain the
  • the target texture image includes: splicing the global texture feature map and the local texture feature map in each of the feature channels, and convolving the feature maps obtained by splicing each of the feature channels to combine the global texture feature and the local texture feature map.
  • the local texture features are integrated to obtain the target texture image.
  • the global texture feature map and the local texture feature map are spliced according to the channel dimension, and the spliced feature map is convolved, which fully integrates the global texture features and local texture features on multiple feature channels, and can obtain a comprehensive Accurate target texture image.
  • both the global feature map and the target local feature map correspond to at least one feature channel
  • the global shape feature map and the local shape feature map are spliced
  • the spliced result Convolving the feature map to integrate the global shape feature and the local shape feature to obtain the target shape image, including: splicing the global shape feature map and the local shape feature map in each of the feature channels, Convolving the feature maps obtained by splicing each of the feature channels to integrate the global shape feature and the local shape feature to obtain the target shape image.
  • the global shape feature map and the local shape feature map are spliced according to the channel dimension, and the spliced feature map is convolved, which fully integrates the global shape features and local shape features on multiple feature channels, and can obtain a comprehensive Accurate target shape image.
  • the image feature coefficients may be obtained by an auto-encoder.
  • an autoencoder is a neural network designed to copy the input to the output.
  • the autoencoder may be a convolutional autoencoder.
  • Convolutional autoencoders use convolutional layers instead of fully connected layers, downsample the input features to provide a potential representation of smaller dimensions, and force the autoencoder to learn a compressed version of the input features, which can be used to characterize image features. coefficient of .
  • acquiring the image feature coefficients of the input image includes: performing layer-by-layer convolution processing on the input image by using a convolutional autoencoder; Texture feature coefficients and shape feature coefficients are used as the image feature coefficients.
  • the structure of the convolutional self-encoder can be shown in Figure 11.
  • the convolutional self-encoder is composed of multiple convolutional layers, and these convolutional layers perform layer-by-layer convolution processing on the input image to analyze the features of the input image. Perform down-sampling and layer-by-layer analysis, and then obtain texture feature coefficients f a and shape feature coefficients f s according to the results of layer-by-layer convolution processing as image feature coefficients.
  • the convolutional self-encoder includes a decoder; the input image is subjected to layer-by-layer convolution processing by using the convolutional self-encoder, and the result of the layer-by-layer convolution processing is obtained.
  • the image feature coefficients include: performing layer-by-layer convolution processing on the input image through the convolutional self-encoder; obtaining, by the decoder, the texture feature coefficients and shape feature coefficients as the image feature coefficients.
  • the convolutional autoencoder outputs each convolutional layer through the decoder.
  • the output size shown on the right side of Figure 11 is the size of the output result of the decoder.
  • the 53rd layer convolution can get a two-dimensional output, that is, the 7 ⁇ 7 ⁇ (f s + f a +64) output on the right. Then perform the average pooling (AvgPool) processing, and then the values of f s and f a can be obtained, that is, the texture feature coefficients and the shape feature coefficients can be obtained.
  • AvgPool average pooling
  • the above embodiment obtains the image feature coefficients by means of the self-encoder, and can fully mine the image features of the input image through layer-by-layer convolution, thereby obtaining accurate image feature coefficients, so as to obtain accurate global feature maps and local feature maps.
  • feature analysis and decoding of the input image may be performed by a local detail enhancement module, which may include a global decoder and a local decoder, through which a global feature map and an initial Local feature map.
  • acquiring the texture- and shape-based global feature map and the initial local feature map of the input image respectively according to the image feature coefficients includes: using the deconvolution layer in the global decoder according to the The image feature coefficients perform feature decoding on the input image to obtain the global feature map; the deconvolution layer in the local decoder performs feature decoding on the input image according to the image feature coefficients to obtain the initial local features picture.
  • both the global decoder and the local decoder may be composed of at least one deconvolution layer, and the size of the convolution kernel of each deconvolution layer may be the same or different.
  • the size of the global feature map of each feature channel can be the same, and the initial local feature maps of different parts can be the same or different.
  • the global decoder decodes the input image to obtain the global feature map
  • the local decoder decodes the input image to obtain the initial local feature map, that is, the global decoder and the local decoder respectively obtain the entire input image through two branches.
  • Features and local features and then integrate the global features and local features to obtain the target image, so that the target image can restore the information of the input image as much as possible, so as to achieve a reliable image reconstruction effect based on the target image.
  • the local decoder includes a facial key part decoder; the deconvolution layer in the local decoder performs feature decoding on the input image according to the image feature coefficients to obtain the initial
  • the local feature map includes: performing feature decoding on the input image by the deconvolution layer in the face key part decoder according to the image feature coefficients, and determining the key part feature map obtained by decoding as the initial feature map Local feature map.
  • each key part of the face may correspond to at least one decoder, for example, including a left-eye decoder, a right-eye decoder, a nose decoder, and a mouth decoder.
  • These facial key part decoders respectively perform feature decoding on the input image and obtain the corresponding initial local feature maps.
  • these facial key part decoders may be pre-trained by corresponding pictures of eyes, nose, and mouth.
  • feature decoding is carried out in a targeted manner through the decoder of the key parts of the face, so as to obtain an initial local feature map with clear local features.
  • the face key part decoder includes a left eye decoder, a right eye decoder, a nose decoder and a mouth decoder; the deconvolution in the face key part decoder layer, perform feature decoding on the input image according to the image feature coefficients, and determine the key part feature map obtained by decoding as the initial local feature map, including: the deconvolution layer in the left-eye decoder, Perform feature decoding on the input image according to the image feature coefficients to obtain a left-eye feature map; the deconvolution layer in the right-eye decoder performs feature decoding on the input image according to the image feature coefficients, Obtain the right eye feature map; perform feature decoding on the input image according to the image feature coefficients by the deconvolution layer in the nose decoder to obtain the nose feature map; deconvolution in the mouth decoder layer, perform feature decoding on the input image according to the image feature coefficients to obtain a mouth feature map; combine the left eye feature map, the right eye feature map, the nose feature map and
  • obtaining the texture- and shape-based global feature map and the initial local feature map of the input image respectively according to the image feature coefficients includes: using the deconvolution layer in the global decoder to obtain the texture feature based on the texture feature. Coefficients and shape feature coefficients perform feature decoding on the input image to obtain the global texture feature map and global shape feature map; the deconvolution layer in the local decoder performs feature decoding on the input image according to the texture feature coefficients and shape feature coefficients. Perform feature decoding to obtain the initial texture local feature map and the initial shape local feature map.
  • edge smoothing may be performed on the initial texture local feature map and the initial shape local feature map to obtain a local texture feature map and a local shape feature map, and based on the obtained local texture feature map and local shape feature map and The corresponding global feature maps are spliced to obtain the corresponding target texture image and target shape image.
  • the global decoder D g consists of 13 deconvolution layers, the size of the convolution kernel of each deconvolution layer is 3*3, and the output of the global decoder can be c'*h'*w' Feature map G, where c' is the number of feature channels, h' and w' are the height and width of the global feature map respectively, h'*w' can be 192*224, and the texture coefficient fa and shape coefficient f s are used as the global feature map.
  • the input of the decoder, the global texture feature map TG and the global shape feature map SG are obtained.
  • the local decoder D l consists of 4 local decoding modules: composition to decode nose, mouth, left and right eye regions. Each local decoding module contains 10 deconvolution layers. The corresponding outputs are L nose , L mouth , L lefteye , L righteye , and their corresponding output sizes are c′*h nose *w nose , c′*h mouth *w mouth , c′*h lefteye *w lefteye , c′*h righteyere *w righteye .
  • h nose , h mouth , h lefteye and h righteye represent the heights of the nose, mouth, left eye and right eye feature maps, respectively
  • w nose , w mouth , w lefteye and w righteye represent the nose, mouth, left eye and right eye, respectively The width of the eye feature map.
  • the outputs of the global decoder and the local decoder are as follows:
  • the global feature map is obtained by decoding the global decoder, the feature map of the key parts of the face is obtained by decoding the feature map of key parts of the face, and then the splicing and convolution of the feature maps are performed to obtain the target image.
  • the global feature map of the overall facial features of the input image can also be targeted to obtain the feature maps of key parts of the face that reflect local information, so that the obtained target image is sufficiently comprehensive and accurate.
  • the target three-dimensional model generated by the embodiment of the present invention can be used as paired training data for face fusion (face-changing application) research.
  • face fusion face-changing application
  • a result image is obtained, and the 2D feature points of the face in the result image are the so-called paired training data.
  • Each frame in the template image and the user's face can be modeled separately, and then the posture and expression of each frame in the template image can be transferred to the user's face, and the 3D model corresponding to each frame after changing the face can be obtained.
  • the paired training data is obtained from the 2D image corresponding to the 3D model, so that the face-swapping model can be trained subsequently.
  • the face-changing model may be a deep learning-based network neural model, which can replace some of the features of a certain face with a model of other facial features. For example, two face images A and B are input into the face-changing model, the face-changing model obtains global features from A and local features from B, and then reconstructs the 3D model according to these features to obtain a global feature containing A. A face model C of features and local features of B. Because the local features contain more detailed information of facial organs and can more clearly represent the facial features, it can be understood that face B is replaced on face A at this time.
  • the input image is a face image
  • the method further includes: obtaining the target three-dimensional model.
  • the face-changing three-dimensional model includes the global features in the target three-dimensional model and includes Local features in the three-dimensional model of the template.
  • the three-dimensional template model can be obtained according to the template face image. Specifically, the template face image is input into the three-dimensional reconstruction model, and the three-dimensional model output by the three-dimensional reconstruction model is the template three-dimensional model. Since the template 3D model is generated by the 3D reconstruction model, in this way, the 3D reconstruction model can directly and quickly output the local feature points of the template 3D model.
  • a preconfigured three-dimensional template model may also be obtained, and feature points of the three-dimensional template model may be extracted to obtain local feature points of the three-dimensional template model.
  • local feature points are obtained from the template three-dimensional model
  • global feature points are obtained from the target three-dimensional model
  • these two feature points are input into the face-changing model in pairs
  • the face-changing three-dimensional model is obtained from the face-changing model. , and accurately realize the face change.
  • image rendering processing may be performed on the face-swapped three-dimensional model to obtain a face-swapped image.
  • Fig. 12 is a schematic diagram of the process of obtaining a face-changed image in one embodiment. As shown in Fig. 12, after the integration and rendering of the input image and the template face image, the face-changed image on the right is obtained. It has the posture and outline of the input image, and also has the facial features and expressions of the template face image, realizing the effect of "changing face”.
  • the acquiring a template three-dimensional model includes: acquiring a preset template face image; acquiring a template global feature map and an initial template local feature map of the template facial image based on texture and shape;
  • the initial template local feature map is edge-smoothed to obtain a target template local feature map; based on texture and shape, the template global feature map and the target template local feature map are spliced to obtain a template face texture image and a template face.
  • the template face image is processed by the three-dimensional reconstruction model, and the edge smoothing process is performed on the local feature map of the initial template, so that the template face texture image and the template face shape image with smooth and continuous edges can be obtained.
  • Reduce the distortion of the reconstructed template 3D model ensure the normal operation of the face-changing application, and obtain a reliable face-changing 3D model.
  • the three-dimensional model reconstruction method provided in this application can be applied to various three-dimensional reconstruction processing scenarios, for example, it can be applied to image processing software, model reconstruction software, PS (photoshop) software, and three-dimensional animation processing software (for example: animation face pinching software) etc.
  • FIG. 13 is a schematic flowchart of a three-dimensional model reconstruction method in an embodiment.
  • the application of the three-dimensional model reconstruction method in this application scenario is as follows:
  • Input a 224*224 face image to be processed into the 3D reconstruction software of the terminal (the 3D reconstruction software is equipped with a 3D reconstruction model based on deep learning) to trigger the 3D reconstruction software to process the input face image as follows: To reconstruct the target 3D face model:
  • a decoder D is constructed, which consists of a global decoder D g and a local decoder D l .
  • the global decoder D g consists of 13 deconvolutional layers and outputs a global shape feature map S G .
  • the local decoder D l has 4 local decoding modules: composition to decode nose, mouth, left eye and right eye regions, each deconvolution module contains 10 deconvolution layers.
  • the corresponding initial local feature map Li Li .
  • the gradient smoothing feature map After the gradient smoothing feature map is generated, it is multiplied with the pixel point feature value of the corresponding initial local feature map, and the target local feature map of the edge gradient corresponding to the four parts is obtained. These target local feature maps are merged to obtain a facial organ shape feature map S L .
  • the global texture feature map TG and the facial organ texture feature map TL are obtained in the same way by the second local detail enhancement module.
  • the first local detail enhancement module outputs a 2D texture map T 2D through a convolution layer.
  • the second local detail enhancement module outputs a 2D shape map S 2D through a convolutional layer.
  • the 3D reconstruction software on the terminal reconstructs the 3D face model, and finally obtains the target 3D face model. From the analysis of the target three-dimensional face model, it can be found that the three-dimensional face model output by the above embodiment is complete and continuous, and there is no problem of image distortion.
  • FIG. 14 shows a comparison diagram of the 2D texture maps obtained by the traditional technology and the embodiment of the present invention, wherein FIG. 14(a) shows three 2D texture maps obtained by the traditional technology.
  • FIG. 14(b) shows three 2D texture maps obtained in the embodiment of the present invention (wherein, in the two rows of images a/b, two textures corresponding to the upper and lower The image is obtained from the same input image), the face texture in Figure 14(b) is clear, and there is no distortion problem.
  • FIG. 14 shows a comparison diagram of the 2D texture maps obtained by the traditional technology and the embodiment of the present invention, wherein FIG. 14(a) shows three 2D texture maps obtained by the traditional technology.
  • Each face texture in 14(a) has a certain degree of distortion
  • FIG. 14(b) shows three 2D texture maps obtained in the embodiment of the present invention (wherein, in the two rows of images a/b, two textures corresponding to the upper and lower The image is obtained from the same input image), the face texture in Figure 14(b) is
  • FIG. 15 shows a comparison diagram of the edge effect of the 2D texture map obtained by the traditional technology and the embodiment of the present invention, wherein FIG. 15(a) shows the 2D texture map obtained by the traditional technology.
  • FIG. 15(a) shows the 2D texture map obtained by the traditional technology.
  • Fig. 15(b) shows the 2D texture map obtained by the embodiment of the present invention
  • the edge transition of the eye edge region 1502 in Fig. 15(b) is smooth.
  • the position of the box (including the eye edge regions 1501 and 1502) in 15 is the contrast between the sharp edge and the smooth edge.
  • FIG. 16 shows a comparison diagram of 2D shape diagrams obtained by the conventional technology and an embodiment of the present invention, wherein Fig. 16(a) shows three 2D shape diagrams obtained by the conventional technology, and the three mouth edge regions in Fig. 16(a) are all There is a certain degree of missing, that is, the edges are discontinuous.
  • Figure 16(b) shows three 2D shape maps obtained by the embodiment of the present invention (wherein, in the two rows of images a/b, the two shape images corresponding to the top and bottom are obtained through the same input. image), in Figure 16(b), each area of the face is continuous and the transition is relatively smooth.
  • the 3D reconstruction software on the terminal can well solve the problems of 2D texture and 2D shape distortion through the 3D model reconstruction method provided by the embodiment of the present invention, and can output good 3D model Rebuild the face model.
  • a method for training a three-dimensional reconstruction model is also provided. This embodiment is illustrated by applying the method to a computer device.
  • the computer device may be the terminal 102 or the server 104 in FIG. 1 above.
  • FIG. 17 is a schematic flowchart of a training method for a 3D reconstruction model in one embodiment. As shown in FIG. 17 , the method includes the following steps:
  • the training image may be an image containing various types of objects, and the specific implementation can refer to the input image in the foregoing embodiment.
  • the rendering coefficient may refer to a coefficient that can affect the image rendering process, and may be a lighting coefficient, a twist coefficient, or the like.
  • the illumination coefficient may be a coefficient corresponding to illumination intensity, illumination angle, etc.
  • the twist coefficient may be a coefficient corresponding to the pitch angle of the head, the side face angle, and the like.
  • S1702 Input the image feature coefficients into a deep learning-based three-dimensional reconstruction model, so that the three-dimensional reconstruction model: according to the image feature coefficients, obtain the texture- and shape-based global feature map and initial Local feature map, performing edge smoothing on the initial local feature map to obtain a target local feature map; based on texture and shape, splicing the global feature map and the target local feature map to obtain the target texture image and target Shape image; perform three-dimensional model reconstruction processing according to the target texture image and the target shape image to obtain a predicted three-dimensional model.
  • the three-dimensional reconstruction model may be a deep learning-based neural network model.
  • S1703 Perform image rendering processing on the predicted three-dimensional model according to the rendering coefficient to obtain a predicted two-dimensional image.
  • image rendering is a process of adjusting parameters such as light, color, and angle of an image.
  • rendering can also refer to the process of adjusting parameters such as light, color, and angle of the image and subsequent two-dimensional conversion, and directly obtaining a two-dimensional image after rendering.
  • the implementation process of S1703 may be: adjusting the illumination direction, pitch angle, etc. of the predicted three-dimensional image, performing two-dimensional conversion on the predicted three-dimensional image after the above adjustment, and using the obtained two-dimensional image as the predicted two-dimensional image .
  • the rendering can be realized by a nonlinear rendering method or the like.
  • the predicted 2D image is obtained by performing image rendering processing on the predicted 3D model, and has rendering information such as illumination, color, and angle of the objects in the training image (people, animals, buildings, etc. in the training image). Therefore, the error between the predicted 2D image and the training image carries the rendering information of the object. Based on this, the 3D reconstruction model obtained by training can reliably reconstruct the input image to obtain the target 3D model carrying the rendering information.
  • a predicted three-dimensional model without distortion problems can be obtained as much as possible through the three-dimensional reconstruction model, and a predicted two-dimensional image is obtained by image rendering of the predicted three-dimensional model.
  • the predicted two-dimensional image is reconstructed according to the training image and can be basically restored. Graphical features of training images.
  • the 3D reconstruction model is trained according to the error between the predicted 2D image and the training image, and an accurate and reliable 3D reconstruction model can be obtained by training.
  • the training of the 3D reconstruction model according to the error of the training image and the predicted 2D image includes: constructing a loss function of the 3D reconstruction model according to the error; Perform gradient descent processing; adjust model parameters of the three-dimensional reconstruction model according to the results of the gradient descent processing.
  • a loss function of the three-dimensional reconstruction model can be constructed according to the errors of these training images and predicted two-dimensional images, and then the loss function can be minimized by the gradient descent method to determine
  • the model parameter corresponding to the minimum value of the loss function, and the 3D reconstruction model corresponding to the model parameter is the 3D reconstruction model after adjustment.
  • the gradient descent method is used to process the loss function of the 3D reconstruction model, which can quickly and accurately obtain the minimum value of the loss function, and then adjust the model parameters of the 3D reconstruction model to train the 3D reconstruction model.
  • the minimum value of the loss function is sufficiently small, it can be considered that the 3D reconstruction model is good enough, at this time, it can be considered that the convergence condition is satisfied, and the corresponding 3D reconstruction model is the trained 3D reconstruction model.
  • the acquiring image feature coefficients and rendering coefficients of the training image includes: performing layer-by-layer convolution processing on the training image by using a convolutional autoencoder;
  • the convolutional autoencoder includes a decoder and a encoder;
  • the decoder obtains texture feature coefficients and shape feature coefficients of the training image according to the result of layer-by-layer convolution processing as the image feature coefficients;
  • the encoder obtains the result of layer-by-layer convolution processing by the encoder
  • the twist coefficient and illumination coefficient of the training image are obtained as the rendering coefficient.
  • the rendering coefficient includes a twist coefficient m and an illumination coefficient S.
  • the convolutional self-encoder includes an encoder; and the obtaining the image feature coefficients and the rendering coefficients according to the result of the layer-by-layer convolution processing includes: by the encoder according to the layer-by-layer convolution The result of the product processing obtains the twist coefficient and the illumination coefficient of the training image as the rendering coefficient.
  • the structure of the convolutional autoencoder can be shown in Figure 11.
  • the convolutional autoencoder can obtain the output information of the previous convolutional layer through the encoder and perform convolution processing.
  • the filter/step number in the middle of Figure 11 shows the structure of the encoder. After the average pooling process is performed on the result of the convolution process, the values of m and S can be output by the encoder, that is, the twist coefficient and the illumination coefficient can be obtained.
  • the convolutional autoencoder may also be an internal component of the 3D reconstruction model. That is, image feature coefficients and rendering coefficients are obtained from the convolutional autoencoder in the 3D reconstruction model.
  • the above embodiment obtains image feature coefficients and rendering coefficients by means of a convolutional self-encoder, and can fully mine the image features of the training image by means of layer-by-layer convolution, and obtain accurate image feature coefficients and rendering coefficients based on the deep learning method. To ensure the reliable operation of subsequent programs, and then obtain an accurate 3D model of the target.
  • FIG. 18 is a schematic flowchart of a training method for a three-dimensional reconstruction model in one embodiment.
  • the specific implementation process is as follows:
  • the terminal acquires the training face image, and generates the texture feature coefficient f s , the shape feature coefficient f a , the twist coefficient m and the illumination coefficient S of the training face image through a convolutional autoencoder.
  • a and f input f a and f s into a 3D reconstruction model, which includes a first local detail enhancement module and a second local detail enhancement module with the same structure.
  • a 2D shape map and a 2D texture map are obtained through the first local detail enhancement module and the second local detail enhancement module, respectively.
  • the 3D reconstruction model generates a predicted 3D face model based on the 2D shape map and the 2D texture map.
  • the terminal triggers the rendering module to render the predicted three-dimensional face model according to the torsion coefficient m and the illumination coefficient S to obtain the predicted two-dimensional face image.
  • the terminal adjusts the 3D reconstruction model based on the error between the trained face image and the predicted 2D face image.
  • a trained 3D reconstruction model is obtained, which can reconstruct a 3D model based on the input 2D image.
  • a predicted three-dimensional model is obtained through a three-dimensional reconstruction model, and a predicted two-dimensional image is obtained by image rendering of the predicted three-dimensional model.
  • the predicted two-dimensional image is reconstructed from the training image and can basically restore the graphic features of the training image.
  • the predicted 2D image corresponding to the predicted 3D model is compared with the input training image, the comparison result can play a feedback role on the reconstruction effect of the 3D reconstruction model, and the 3D reconstruction model is trained according to the comparison result.
  • Accurate and reliable 3D reconstruction model can be obtained by training.
  • the present application also provides an application scenario, where the above-mentioned training method for a three-dimensional reconstruction model and a three-dimensional model reconstruction method are applied.
  • the application of these methods in this application scenario is as follows:
  • the terminal receives multiple training face images, and inputs these training face images into the model training software.
  • the model training software obtains the texture feature coefficient f sX , the shape feature coefficient f aX , the twist coefficient m X and the illumination coefficient S X of the training face image.
  • the model training software inputs the texture feature coefficient f sX and the shape feature coefficient f sX into the three-dimensional reconstruction model.
  • the three-dimensional reconstruction model obtains the global texture feature map T GX , the global shape feature map S GX , the local texture feature map T LXO and the local shape feature map of the training face image according to the texture feature coefficient f sX and the shape feature coefficient f sX , respectively.
  • S LXO perform edge smoothing on the local texture feature map T LXL and the local shape feature map S LXO to obtain the target local texture feature map T LX and the target local shape feature map S LX ;
  • the feature maps T LX are spliced to obtain the target texture image TX
  • the global shape feature map S GX and the target local shape feature map S LX are spliced to obtain the target shape image SX ;
  • According to the target texture image TX and the target shape image S X performs three-dimensional model reconstruction processing to obtain a predicted three-dimensional face model.
  • the model training software performs image rendering processing on the predicted three-dimensional face model according to the torsion coefficient m X and the illumination coefficient S X , and obtains the predicted two-dimensional face image; constructs a loss function according to the training face image and the error of the predicted two-dimensional face image,
  • the gradient descent algorithm is run on the loss function, and the trained 3D reconstruction model is obtained when the result of the gradient descent algorithm satisfies the convergence condition.
  • the model training software can output the trained 3D reconstruction model to the 3D reconstruction software.
  • the three-dimensional reconstruction software realize the following steps through the trained three-dimensional reconstruction model when receiving the input face image: respectively obtain the global image of the input face image according to the texture feature coefficient f sY and the shape feature coefficient f aY of the input face image.
  • Texture feature map T GY , global shape feature map S GY , local texture feature map T LYO and local shape feature map S LYO perform edge smoothing on the local texture feature map T LYO and local shape feature map S LYO to obtain the target local texture
  • the images S LY are spliced to obtain the target shape image S Y ; the three-dimensional model reconstruction processing is performed according to the target texture image T Y and the target shape image S Y to obtain the target three-dimensional face model.
  • the 3D reconstruction software converts the target 3D face model into the form of an image and displays it on the display screen.
  • the above method provided by the embodiment of the present invention can realize the training of the 3D model and the reconstruction of the 3D model on the terminal, and the 3D face reconstructed by the 3D reconstruction model can effectively suppress the appearance of image distortion and achieve a reliable face model reconstruction effect.
  • steps in the above flow charts are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in the above flow chart may include multiple steps or multiple stages. These steps or stages are not necessarily executed at the same time, but may be executed at different times. The execution sequence of these steps or stages It is also not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of a step or phase within the other steps.
  • the present invention also provides a three-dimensional model reconstruction device and a three-dimensional reconstruction model training device, and these devices can be respectively used for executing the above-mentioned three-dimensional model reconstruction method. and training methods for 3D reconstruction models.
  • a three-dimensional model reconstruction device and a three-dimensional reconstruction model training device can be respectively used for executing the above-mentioned three-dimensional model reconstruction method. and training methods for 3D reconstruction models.
  • training methods for 3D reconstruction models for the convenience of description, in the schematic structural diagrams of the embodiments of the three-dimensional model reconstruction apparatus and the three-dimensional reconstruction model training apparatus, only the parts related to the embodiments of the present invention are shown, and those skilled in the art can understand that the illustrated structures do not constitute a pair of apparatuses. may include more or fewer components than shown, or combine certain components, or arrange different components.
  • a three-dimensional model reconstruction apparatus 1900 is provided, and the apparatus may adopt software modules or hardware modules, or a combination of the two to become a part of computer equipment, and the apparatus specifically includes: A coefficient acquisition module 1901, a feature map acquisition module 1902, a smoothing processing module 1903, a feature map stitching module 1904 and a first model reconstruction module 1905, wherein:
  • the first coefficient obtaining module 1901 is configured to obtain image characteristic coefficients of the input image.
  • the feature map obtaining module 1902 is configured to obtain, according to the image feature coefficients, a global feature map and an initial local feature map of the input image based on texture and shape, respectively.
  • the smoothing processing module 1903 is configured to perform edge smoothing processing on the initial local feature map to obtain a target local feature map.
  • the feature map splicing module 1904 is used for splicing the global feature map and the target local feature map based on texture and shape, respectively, to obtain a target texture image and a target shape image.
  • the first model reconstruction module 1905 is configured to perform three-dimensional model reconstruction processing according to the target texture image and the target shape image to obtain a target three-dimensional model.
  • the edge smoothing process is performed on the local feature map, so that image distortion is not easily generated, and a target three-dimensional model with smooth edges can be obtained.
  • the smoothing processing module includes: a boundary acquisition sub-module for acquiring the boundary of the initial local feature map; a distance acquisition sub-module for acquiring the relationship between each pixel in the initial local feature map and the The distance between the boundaries; the first edge smoothing sub-module is used to perform edge smoothing processing on the initial local feature map according to the distance to obtain the target local feature map.
  • the first edge smoothing sub-module includes: an edge region obtaining unit, configured to obtain the edge region of the initial local feature map according to the distance; a weight value determining unit, configured to determine the edge region according to the distance The feature weight value corresponding to each pixel point of the edge area, so that the feature weight value corresponding to the distant pixel point is greater than the feature weight value corresponding to the short distance pixel point; the feature map construction unit is used for according to the initial local feature map The feature weight value corresponding to each pixel point generates a gradient smoothing feature map; the feature weight value corresponding to the pixel point outside the edge area of the initial local feature map is a preset weight value, and the gradient smoothing feature map of each pixel point is a preset weight value.
  • the feature value is obtained according to the corresponding feature weight value; the feature value multiplication unit is used to multiply the feature value of each pixel point in the gradient smoothing feature map and the feature value of the corresponding pixel point in the initial local feature map, The target local feature map is obtained according to the multiplication result.
  • the input image is a face image
  • the target local feature map includes a key part feature map corresponding to a key part of the face
  • the feature map splicing module includes: a filling sub-module, which is used for each key part respectively.
  • the outer area of the part feature map is filled with feature values to obtain the filled key part feature map with the same size as the global feature map
  • the merging sub-module is used to merge each of the filled key part feature maps to obtain facial parts.
  • the key parts of the face include at least one of left eye, right eye, nose, and mouth.
  • the global feature map includes a global texture feature map and a global shape feature map
  • the target local feature map includes a local texture feature map and a local shape feature map
  • the feature map splicing module includes: a texture convolution sub The module is used for splicing the global texture feature map and the local texture feature map, and convolving the feature map obtained by splicing to integrate the global texture feature and the local texture feature to obtain the target texture image; shape convolution The sub-module is used for splicing the global shape feature map and the local shape feature map, and performing convolution on the feature map obtained by splicing to integrate the global shape feature and the local shape feature to obtain the target shape image.
  • both the global feature map and the target local feature map correspond to at least one feature channel; the texture convolution sub-module is further configured to compare the global texture feature map and the target local feature map in each of the feature channels.
  • the local texture feature maps are spliced, and the feature maps obtained by splicing each of the feature channels are convoluted to integrate the global texture feature and the local texture feature to obtain the target texture image;
  • the shape convolution sub-module also uses In each of the feature channels, the global shape feature map and the local shape feature map are spliced, and the feature maps obtained by splicing each of the feature channels are convolved to integrate the global shape feature and the local shape feature. to obtain the target shape image.
  • the first coefficient acquisition module includes: a convolution sub-module for performing layer-by-layer convolution processing on the input image through a convolutional autoencoder; a feature coefficient acquisition sub-module for layer-by-layer convolution processing As a result of the convolution processing, texture feature coefficients and shape feature coefficients of the input image are obtained as the image feature coefficients.
  • the feature map acquisition module includes: a global decoding sub-module, configured to perform feature decoding on the input image according to the image feature coefficients by the deconvolution layer in the global decoder to obtain the global feature Figure; a local decoding sub-module for performing feature decoding on the input image according to the image feature coefficients by the deconvolution layer in the local decoder to obtain the initial local feature map.
  • the local decoder includes a facial key part decoder; the local decoding sub-module is further configured to use the deconvolution layer in the facial key part decoder to pair the image feature coefficients according to the image feature coefficients.
  • Feature decoding is performed on the input image, and the key part feature map obtained by decoding is determined as the initial local feature map.
  • the input image is a face image; the device further includes: a target feature point acquisition module, configured to acquire the global feature points of the two-dimensional image corresponding to the target three-dimensional model, and obtain the target feature points
  • the template model acquisition module is used to obtain the template three-dimensional model; the template feature point acquisition module is used to obtain the local feature points of the two-dimensional image corresponding to the template three-dimensional model to obtain the template feature points; the face-changing module is used to The target feature points and the template feature points are input into the face-changing model as paired data, so that the face-changing model outputs the face-changing three-dimensional model; the face-changing three-dimensional model includes the global features and includes local features in the three-dimensional model of the template.
  • the template model acquisition module includes: a template image acquisition sub-module for acquiring a preset template face image; a template feature map acquisition sub-module for acquiring the template face image based on texture and shape The template global feature map and the initial template local feature map; the second edge smoothing sub-module is used to perform edge smoothing on the initial template local feature map to obtain the target template local feature map; The feature map splicing sub-module is used to separate the Based on the texture and shape, the template global feature map and the target template local feature map are spliced to obtain a template face texture image and a template face shape image; a template three-dimensional model reconstruction sub-module is used for A three-dimensional model reconstruction process is performed on the partial texture image and the template face shape image to obtain the template three-dimensional model.
  • Each module in the above-mentioned three-dimensional model reconstruction device may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a training apparatus 2000 for a three-dimensional reconstruction model is provided.
  • the apparatus may adopt software modules or hardware modules, or a combination of the two to become a part of computer equipment.
  • the apparatus specifically includes : the second coefficient acquisition module 2001, the second model reconstruction module 2002, the image rendering module 2003 and the reconstruction model training module 2004, wherein:
  • the second coefficient obtaining module 2001 is configured to obtain image feature coefficients and rendering coefficients of the training image.
  • the second model reconstruction module 2002 is configured to input the image feature coefficients into a three-dimensional reconstruction model based on deep learning, so that the three-dimensional reconstruction model: according to the image feature coefficients, respectively obtain the training image based on texture and Shape global feature map and initial local feature map, perform edge smoothing on the initial local feature map to obtain a target local feature map; based on texture and shape, respectively, splicing the global feature map and the target local feature map , obtain a target texture image and a target shape image; perform three-dimensional model reconstruction processing according to the target texture image and the target shape image to obtain a predicted three-dimensional model.
  • the image rendering module 2003 is configured to perform image rendering processing on the predicted three-dimensional model according to the rendering coefficient to obtain a predicted two-dimensional image.
  • the reconstruction model training module 2004 is configured to train the 3D reconstruction model according to the error between the training image and the predicted 2D image, until the convergence condition is satisfied, and the trained 3D reconstruction model is obtained.
  • a predicted three-dimensional model without distortion problems can be obtained as much as possible through the three-dimensional reconstruction model, a predicted two-dimensional image corresponding to the predicted three-dimensional model can be determined, and the three-dimensional reconstruction model can be trained according to the error between the predicted two-dimensional image and the training image. Accurate and reliable 3D reconstruction model is obtained by training.
  • the second coefficient obtaining module includes: a layer-by-layer convolution sub-module, configured to perform layer-by-layer convolution processing on the training image through a convolutional autoencoder;
  • the convolutional autoencoder includes a decoding an encoder and an encoder;
  • an image feature coefficient acquisition sub-module for obtaining the texture feature coefficient and shape feature coefficient of the training image by the decoder according to the result of layer-by-layer convolution processing, as the image feature coefficient;
  • rendering coefficient An obtaining submodule is used for obtaining the twist coefficient and the illumination coefficient of the training image by the encoder according to the result of the layer-by-layer convolution processing, as the rendering coefficient.
  • Each module in the above-mentioned training device for the three-dimensional reconstruction model can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device in one embodiment, the computer device may be a server, and its internal structure diagram may be as shown in FIG. 21 .
  • the computer device includes a processor, memory, and a network interface connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as three-dimensional reconstruction models, convolutional autoencoders, training images, and the like.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions when executed by the processor, implement a three-dimensional model reconstruction method and a three-dimensional reconstruction model training method.
  • a computer device in one embodiment, the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 22 .
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the non-volatile storage medium stores an operating system and computer-readable instructions.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the computer-readable instructions when executed by the processor, implement a three-dimensional model reconstruction method and a three-dimensional reconstruction model training method.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIGS. 21 and 22 are only block diagrams of partial structures related to the solution of the present application, and do not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • a device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and one or more processors, where computer-readable instructions are stored in the memory, and the one or more processors implement the above-mentioned reconstruction of each three-dimensional model when executing the computer program Steps in Method Examples.
  • a computer device including a memory and one or more processors, where computer-readable instructions are stored in the memory, and the one or more processors implement the above three-dimensional reconstruction models when executing the computer program The steps in the training method embodiment.
  • one or more non-transitory computer-readable storage media are provided having computer-readable instructions stored thereon, the computer-readable instructions when executed by one or more processors are provided. The steps in each of the foregoing three-dimensional model reconstruction method embodiments are implemented.
  • one or more non-transitory computer-readable storage media are provided having computer-readable instructions stored thereon, the computer-readable instructions when executed by one or more processors are provided. The steps in the above-mentioned embodiments of the training methods for the three-dimensional reconstruction models are implemented.
  • a computer program product or computer program comprising computer readable instructions stored in a computer readable storage medium.
  • the processor of the computer device reads the computer-readable instructions from the computer-readable storage medium, and the processor executes the computer-readable instructions, so that the computer device performs the steps in the foregoing method embodiments.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Generation (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请涉及一种三维模型重建方法、三维重建模型的训练方法、装置、计算机设备和存储介质。所述三维模型重建方法包括:获取输入图像的图像特征系数;根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图;对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。

Description

三维模型重建方法、三维重建模型的训练方法和装置
本申请要求于2020年09月15日提交中国专利局,申请号为2020109696150,申请名称为“三维模型重建方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种三维模型重建方法、装置、计算机设备和存储介质,以及,三维重建模型的训练方法、装置、计算机设备和存储介质。
背景技术
随着图像处理技术的发展,出现了2D(2-dimension,二维)图像处理、3D(3-dimension,三维)模型重建等人工智能技术,例如,根据输入的人脸图像重建新的人脸3D模型等。
传统技术获取输入图像的形状图、纹理图等,并根据所获取的形状图、纹理图等重建得到三维模型。但是,传统技术所获取的形状图和纹理图容易出现畸变,导致重建的三维模型不准确。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本发明的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本申请实施例提供一种三维模型重建方法、装置、计算机设备和存储介质,以及,三维重建模型的训练方法、装置、计算机设备和存储介质。
一种三维模型重建方法,所述方法包括:
获取输入图像的图像特征系数;
根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图;
对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;
分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;
根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。
一种三维重建模型的训练方法,所述方法包括:
获取训练图像的图像特征系数和渲染系数;
将所述图像特征系数输入至基于深度学习的三维重建模型中,以使所述三维重建模型:根据所述图像特征系数,分别获取所述训练图像基于纹理和形状的全局特征图和初始局部特征图,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到预测三 维模型;
根据所述渲染系数对所述预测三维模型进行图像渲染处理,得到预测二维图像;
根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。
一种三维模型重建装置,所述装置包括:
第一系数获取模块,用于获取输入图像的图像特征系数;
特征图获取模块,用于根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图;
平滑处理模块,用于对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;
特征图拼接模块,用于分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;
第一模型重建模块,用于根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。
一种三维重建模型的训练装置,所述装置包括:
第二系数获取模块,用于获取训练图像的图像特征系数和渲染系数;
第二模型重建模块,用于将所述图像特征系数输入至基于深度学习的三维重建模型中,以使所述三维重建模型:根据所述图像特征系数,分别获取所述训练图像基于纹理和形状的全局特征图和初始局部特征图,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到预测三维模型;
图像渲染模块,用于根据所述渲染系数对所述预测三维模型进行图像渲染处理,得到预测二维图像;
重建模型训练模块,用于根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。
一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行上述三维模型重建方法和三维重建模型的训练方法的步骤。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行上述三维模型重建方法和三维重建模型的训练方法的步骤。
一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机可读指令,所述计算机可读指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机可读指令,所述处理器执行所述计算机可读指令,使得所述计算机设备执行上述三维模型重建方法和三维重建模型的训练方法的步骤。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本 领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中三维模型重建方法和三维重建模型的训练方法的应用环境图;
图2为一个实施例中三维模型重建方法的流程示意图;
图3为一个实施例中纹理图像边缘不连续的示意图;
图4为另一个实施例中纹理图像边缘不连续的示意图;
图5为一个实施例中形状图像边缘不连续的示意图;
图6为一个实施例中渐变平滑特征图的结构示意图;
图7为一个实施例中渐变平滑处理的效果示意图;
图8为一个实施例中关键部位特征图的特征值填充和合并的效果示意图;
图9为一个实施例中得到目标脸部器官特征图的过程示意图;
图10为一个实施例中多特征通道的特征图处理流程的示意图;
图11为一个实施例中卷积自编码器的结构示意图;
图12为一个实施例中得到已换脸图像的过程示意图;
图13为另一个实施例中三维模型重建方法的流程示意图;
图14为一个实施例中2D纹理图的对比图;
图15为一个实施例中2D纹理图的边缘效果对比图;
图16为一个实施例中2D形状图的对比图;
图17为一个实施例中三维重建模型的训练方法的流程示意图;
图18为另一个实施例中三维重建模型的训练方法的流程示意图;
图19为一个实施例中三维模型重建装置的结构框图;
图20为另一个实施例中三维重建模型的训练装置的结构框图;
图21为一个实施例中计算机设备的内部结构图;
图22为另一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供的三维模型重建方法、装置、计算机设备和存储介质,以及,三维重建模型的训练方法、装置、计算机设备和存储介质,可以基于人工智能(Artificial Intelligence,AI)技术实现。其中,人工智能是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
具体的,本申请的各个实施例可以基于计算机视觉技术(Computer Vision,CV)实现。其中,计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR(Optical Character Recognition,光学字符识别)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
另外,本申请的各个实施例可以应用于人工智能云服务。所谓人工智能云服务,一般也被称作是AIaaS(AI as a Service,中文为“AI即服务”)。这是目前主流的一种人工智能平台的服务方式,具体来说AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务。这种服务模式类似于开了一个AI主题商城:所有的开发者都可以通过API(Application Programming Interface,应用程序接口)接口的方式来接入使用平台提供的一种或者是多种人工智能服务,例如,对输入图像进行三维重建并以二维形式输出,部分资深的开发者还可以使用平台提供的AI框架和AI基础设施来部署和运维自己专属的云人工智能服务。
本申请提供的三维模型重建方法和三维重建模型的训练方法,均可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。终端以及服务器可以通过有线或无线通信方式进行直接或间接的连接,本申请在此不做限制。终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
终端102和服务器104均可单独用于执行本申请实施例中提供的三维模型重建方法和三维重建模型的训练方法。
例如,服务器获取输入图像的图像特征系数,根据图像特征系数,分别获取输入图像基于纹理和形状的全局特征图和初始局部特征图。服务器对初始局部特征图进行边缘平滑处理,得到目标局部特征图,分别基于纹理和形状,对全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像。服务器根据目标纹理图像和目标形状图像进行三维模型重建处理,得到目标三维模型。服务器可以将得到的目标三维模型发送至终端,在终端进行展示。
终端102和服务器104也可协同用于执行本申请实施例中提供的三维模型重建方法和三维重建模型的训练方法。
例如,服务器从终端获取训练图像。服务器获取训练图像的图像特征系数和渲染系数,服务器将图像特征系数输入至基于深度学习的三维重建模型中,以使三维重建模型:根据图像特征系数,分别获取训练图像基于纹理和形状的全局特征图和初始局部特征图,对初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对全局特 征图和目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据目标纹理图像和目标形状图像进行三维模型重建处理,得到预测三维模型。服务器根据渲染系数对预测三维模型进行图像渲染处理,得到预测二维图像。服务器根据训练图像和预测二维图像的误差对三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。服务器可以基于已训练的三维重建模型进行三维模型的重建。服务器也可以将已训练的三维重建模型发送至终端,终端通过已训练的三维重建模型可以进行三维模型的重建。
上述三维模型重建方法、装置、计算机设备和存储介质,以及,三维重建模型的训练方法、装置、计算机设备和存储介质,根据图像特征系数,分别获取输入图像基于纹理和形状的全局特征图和初始局部特征图;对初始局部特征图进行边缘平滑处理,得到目标局部特征图,该目标局部特征图的边缘区域处于平滑状态;分别基于纹理和形状,对全局特征图和目标局部特征图进行拼接,得到边缘平滑的目标纹理图像和目标形状图像;根据目标纹理图像和目标形状图像重建得到的目标三维模型不容易产生畸变。
在一个实施例中,提供了一种三维模型重建方法,以该方法应用于计算机设备进行举例说明,该计算机设备可以是上述图1中的终端102或服务器104。
具体地,如图2所示,该方法包括以下步骤:
S201,获取输入图像的图像特征系数。
其中,输入图像可以是包含各种类型的对象的图像,例如是人脸图像、动物图像、建筑物图像等。在一个实施例中,输入图像可以由全局特征和局部特征构成,以人脸图像为例,全局特征可以是整张脸的粗略特征,局部特征可以是脸部器官(例如:眼睛、鼻子、嘴巴、耳朵等)的细节特征。另外,输入图像可以是一张图像也可以是包含不同信息的多张图像,当输入图像为多张时,可以通过同步或异步的方式对这些输入图像进行处理,并分别重建得到对应的多个目标三维模型。
在一个实施例中,图像对应的特征(可以称为图像特征)可以是图像的颜色特征、纹理特征(纹理特征是一种反映图像中同质现象的视觉特征,它体现了物体表面的具有缓慢变化或者周期性变化的表面结构组织排列属性)、形状特征(可以包括轮廓特征和区域特征,其中,轮廓特征主要针对物体的外边界,区域特征则关系到整个形状区域)和空间关系特征(可以指图像中物体之间的排布关系)等。图像特征系数可以是表征图像特征的系数,可以是描述图像全局、局部、纹理、形状等特征的系数。在一个实施例中,图像特征系数可以是纹理特征系数、形状特征系数等。其中,纹理特征系数可以指对纹理特征进行描述的系数,具体的,可以是描述图像的表面结构组织排列属性的系数;形状特征系数可以指对形状特征进行描述的系数,具体的,可以是对图像轮廓、图像区域等进行描述的系数。
在一个实施例中,获取输入图像的图像特征系数可以通过基于深度学习的网络模型实现。具体的,该网络模型可以是自编码器等。其中,深度学习(DL,Deep Learning)是机器学习(ML,Machine Learning)领域中一个新的研究方向,它被引入机器学习使其更接近于最初的目标——人工智能。同时,深度学习是学习样本数据的内在规律和表示层次,这些学习过程中获得的信息对诸如图像等数据的解释有很大的帮助。它的最终目标是让机器能够像人一样具有分析学习能力,能够识别图像等数据。
S202,根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图 和初始局部特征图。
其中,全局特征图可以是用于描述输入图像全局信息的特征图,全局特征图能表征整体的图像信息,其尺寸可以和输入图像一样也可以小于输入图像,其关注图像的整体性,对噪声鲁棒性较强。局部特征图可以是用于描述输入图像局部信息的特征图,可以是至少一个局部区域对应的特征图,局部特征图的尺寸可以小于或等于全局特征图,局部特征图所关注的区域更小,旨在生成更多的细节。在一个实施例中,以脸部图像为例,全局特征图可以是表征整体脸部情况的脸部全局特征图,该脸部全局特征图包含有模糊的整体脸部信息,局部特征图可以是表征眼睛、嘴巴、鼻子、耳朵、头发、眉毛等局部区域情况的关键部位特征图,这个关键部位特征图包含有清晰的局部区域细节信息,例如:眼睛的位置、轮廓、眼球大小、瞳孔颜色等。
在一个实施例中,基于纹理的全局特征图可以为全局纹理特征图,基于形状的全局特征图可以为全局形状特征图;基于纹理的初始局部特征图可以为初始局部纹理特征图,基于形状的初始局部特征图可以为初始局部形状特征图。在一个实施例中,可以根据纹理特征系数获取输入图像的全局纹理特征图和初始局部纹理特征图,根据形状特征系数获取输入图像的全局形状特征图和初始局部形状特征图。
S203,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图。
本步骤对初始局部特征图进行边缘平滑处理,边缘平滑处理之后得到的图像作为目标局部特征图。具体的,可以是分别对初始局部纹理特征图和初始局部形状特征图进行边缘平滑处理,得到目标局部纹理特征图和目标局部形状特征图,作为该目标局部特征图。
其中,边缘平滑处理可以指对图像的边缘区域进行平滑处理。该平滑处理可以是进行特征值的渐变处理,例如,按照一定方向使得图像的颜色通道值逐步减小。
另外,本发明实施例可以通过一个双分支的局部细节增强(Global-local)模型实现对全局和局部特征的处理。局部细节增强模型接收到输入图像后,一条支路回归全局信息(全局特征图),一条支路回归局部信息(局部特征图)。需要注意的是,一般需要将全局特征图和局部特征图拼接(特征图拼接的过程见S204)后才能得到包含完整信息的目标图像。
S204,分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像。
其中,目标纹理图像为整合了全局纹理特征和局部纹理特征得到的纹理图像,目标形状图像为整合了全局形状特征和局部形状特征得到的形状图像。目标纹理图像和目标形状图像均可以为二维图像。
本步骤对全局特征图和目标局部特征图进行拼接,可以得到包含全局特征和局部特征的目标图像,该目标图形包括目标纹理图像和目标形状图像。在一个实施例中,本步骤基于纹理对全局特征图和目标局部特征图进行拼接得到目标纹理图像,基于形状对全局特征图和目标局部特征图进行拼接得到目标形状图像。
在某些实施例中,可以对全局特征图也进行边缘平滑处理,并对边缘平滑处理之后的全局特征图与目标局部特征图进行拼接,得到对应的目标纹理图像和目标形状图像。
S205,根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。
其中,三维模型指的是包含物体表面每个采样点的三维空间坐标的数字模型文件。
S205的实现过程可以是:根据目标形状图像重建三维模型的空间轮廓,在该空间轮廓的表面附加上目标纹理图像,即得到携带有纹理信息的目标三维模型。
在一个实施例中,局部特征图的尺寸往往小于全局特征图,而要将全局特征图和局部特征图进行拼接,就需要使得两者尺寸相同,进而对同一位置上的像素点进行融合,得到融合有全局特征和局部特征的图像。基于此,需要对局部特征图进行特征值的填充(padding),以使局部特征图与全局特征图尺寸一致。例如,通过对局部特征图补0(即,将局部特征图的外部区域像素点的特征值设置为0)来达到与全局特征图相同的尺寸,之后在同一尺寸下进行拼接。这就会导致一个问题,即局部特征图在拼接后的边缘突变。局部特征图与全局特征图拼接之后,在卷积过程中,当卷积核未移动到局部特征图对应的位置时,由于补0的原因,其卷积接收的只有全局特征图的信息。当卷积核移动到局部特征图对应的位置时,其能同时接收全局特征图和局部特征图的信息。然而,当卷积核移动到局部特征图的边缘时,卷积核所接收的信息从只有全局信息→全局信息+局部信息,所获信息突然增加,这将导致卷积在此位置的输出较之前产生突变,这种突变会导致最终生成结果的不连续性。图3为传统技术中将全局纹理特征图和局部纹理特征图进行拼接之后得到的纹理图像,其中,右侧的图像为对左侧纹理特征图中的眼部边缘区域301放大后的图像,可以看出在纹理图像的边缘位置产生了明显的不连续性。放大到全脸,这种不连续也将会导致纹理图像产生整体畸变,图4为一个实施例中纹理图像边缘不连续的示意图,其中,图4(a)为正常纹理,图4(b)为产生畸变的纹理,很明显,图4(b)嘴部区域的边缘出现畸变。另一方面,这种不连续也会作用在形状图像上,在局部特征图的边缘位置产生不可控畸变,图5为一个实施例中形状图像边缘不连续的示意图,其中,图5(a)为正常形状,图5(b)为产生畸变的形状,其中,501表示嘴部区域,可以明显发现图5(b)嘴部区域的边缘崩坏。另外,这种不连续也会在模型重建过程中产生明显的分块现象,使得最终重建的目标三维模型存在不连续性突变,例如,眼角区域存在明显的颜色跳跃。
在一个实施例中,上述三维模型重建方法能够基于三维重建模型(可以是基于深度学习的神经网络,该三维重建模型能够基于输入图像重建得到三维模型)实现。具体地,计算机设备将待重建三维模型的图像输入已训练的三维重建模型,三维重建模型提取输入图像的图像特征系数。或者,计算机设备将待重建三维模型的图像对应的图像特征系数输入已训练的三维重建模型。在得到图像特征系数后,三维重建模型根据图像特征系数分别生成输入图像基于纹理和形状的全局特征图和初始局部特征图,对初始局部特征图进行边缘平滑处理,得到目标局部特征图,分别基于纹理和形状,对全局特征图和目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,根据目标纹理图像和目标形状图像进行三维模型重建处理,得到输入图像对应的目标三维模型。三维重建模型输出目标三维模型。
本发明实施例的三维模型重建方法中,根据图像特征系数分别获取输入图像基于纹理和形状的全局特征图和初始局部特征图;对初始局部特征图进行边缘平滑处理得到目标局部特征图,该目标局部特征图的边缘区域处于平滑状态;分别基于纹理和形状,对全局特征图和目标局部特征图进行拼接,得到边缘平滑的目标纹理图像和目标形状图像;根据目标纹理图像和目标形状图像重建得到目标三维模型,有效减少了纹理图像和形状图像的不连续问题,抑制了重建得到的目标三维模型的畸变问题。
在一个实施例中,所述对所述初始局部特征图进行边缘平滑处理,得到目标局部特征 图,包括:获取所述初始局部特征图的边界;获取所述初始局部特征图中各个像素点与所述边界之间的距离;根据所述距离对所述初始局部特征图进行边缘平滑处理,得到所述目标局部特征图。
其中,初始局部特征图的边界可以指初始局部特征图最外侧的边界线。由于初始局部特征图在不同方向都可以对应有边界,因此边界的数量可能不止一个(例如,对于方框边界,则存在四条边界)。基于此,初始局部特征图中各个像素点与边界之间的距离可以指各个像素点离最近的边界之间的距离。
本实施例,根据各像素点与边界之间的距离对初始局部特征图进行边缘平滑处理。平滑处理可以是将各个像素点的特征值(该特征值可以是像素点的RGB值等颜色值,也可以是亮度值等)按照一定梯度进行调整,例如:将图像中心到图像边界的像素点的颜色设置为越来越深。
在一个实施例中,可以是不同距离大小进行不同梯度的平滑处理,例如,在某一方向上,对远距离像素点(距离边界更远的像素点)进行程度更高的特征值调整,对近距离像素点进行程度更低的特征值调整。
在一个实施例中,可以通过预先构建的边缘平滑处理模型来对初始局部特征图进行特征学习,确定初始局部特征图的边界,确定初始局部特征图中各个像素点与边界之间的距离,进而根据该距离对初始局部特征图进行边缘平滑处理,得到目标局部特征图,以在保证精度的情况下解决卷积阶段造成的不连续性。
上述实施例,确定初始局部特征图中各像素点与边界之间的距离,根据该距离进行边缘平滑处理,能使得不同距离的像素点具有不同的特征值,实现边缘平滑过渡的效果,使得卷积核在从全局特征图对应的位置移动到局部特征图对应的位置时有一个平缓的过渡,防止出现局部特征图在拼接以后不连续的情况。
在一个实施例中,所述根据所述距离对所述初始局部特征图进行边缘平滑处理,得到所述目标局部特征图,包括:根据所述距离获取所述初始局部特征图的边缘区域;根据所述距离确定所述边缘区域的各个像素点对应的特征权重值,以使远距离像素点对应的特征权重值大于近距离像素点对应的特征权重值;根据所述初始局部特征图的各个像素点对应的特征权重值生成渐变平滑特征图;所述初始局部特征图的边缘区域之外的像素点对应的特征权重值为预设权重值,所述渐变平滑特征图中各个像素点的特征值根据对应的特征权重值得到;将所述渐变平滑特征图中各个像素点的特征值和所述初始局部特征图中对应像素点的特征值进行相乘,根据相乘结果得到所述目标局部特征图。
在初始局部特征图不止一个时,可以针对性地确定对应的渐变平滑特征图。以初始局部特征图包括左眼(left eye)局部特征图、右眼(right eye)局部特征图、鼻子(nose)局部特征图和嘴部(mouth)局部特征图为例,可以针对性地确定左眼渐变平滑特征图、右眼渐变平滑特征图、鼻子渐变平滑特征图和嘴部渐变平滑特征图。
以初始局部特征图为左眼局部特征图、右眼局部特征图、鼻子局部特征图和嘴部局部特征图为例,渐变平滑特征图f i可以表示为以下公式:
Figure PCTCN2021112089-appb-000001
其中,i∈(nose,mouth,left eye,right eye),f i的尺寸与对应的初始局部特征图L i(包括:L nose,L mouth,L left eye,L right eye)的尺寸相同;h表示渐变平滑特征图中某个像素点相对于参考点(可以是特征图左下角所在的点)之间的纵轴距离;w表示渐变平滑特征图中某个像素点相对于参考点之间的横轴距离;(h,w)表示渐变平滑特征图中的某个像素点;k为(h,w)点离最近边界的距离;λ i表示某个渐变平滑特征图的评价系数,用于表征边缘区域宽度,其大小可以根据对应初始局部特征图的尺寸来确定,例如,
Figure PCTCN2021112089-appb-000002
h i表示某个初始局部特征图的高;w i表示某个初始局部特征图的宽;λ i<h<h iii<w<w ii表示对应初始局部特征图中的非边缘区域(也可以称为中心区域),非边缘区域的特征权重值设置为1;other(h,w)表示对应初始局部特征图中的边缘区域,根据各像素点与边界的距离按照线性或者阶梯性的方式确定边缘区域的特征权重值,距离越大特征权重值越大。
通过阶梯性的方式确定特征权重值,根据该特征权重值得到的渐变平滑特征图如图6所示,其中,外侧实线框表示边界601,外侧实线框与内侧实线框之间的区域表示边缘区域,该边缘区域包括3个梯度的区域602、603和604,区域602、603和604中像素点的灰度值逐渐由低变高,形成一个边缘逐渐过渡变化的渐变平滑特征图。
在一个实施例中,对于特征权重值的设置,可以在[0,1]的范围内进行权重值的设置,例如,边缘区域中距离边界最远的最远距离像素点的特征权重值设置为0.9,距离边界最近的最近距离像素点的特征权重值设置为0.1。另外,初始局部特征图中边缘区域之外的区域可以称之为非边缘区域,非边缘区域的特征权重值可以为1,即,不对非边缘区域进行特征值的调整。在一个实施例中,根据特征权重值生成渐变平滑特征图可以是:构建一个与初始局部特征图尺寸相同的空白特征图,该空白特征图中各个像素点的初始特征值为0,将所计算的特征权重值作为对应像素点新的特征值,在更新完各个像素点的特征值之后,即得到渐变平滑特征图。
在得到渐变平滑特征图之后,将渐变平滑特征图与初始局部特征图中对应位置的像素点的特征值进行相乘。以左眼为例,在进行特征值相乘时,可以将左眼渐变平滑特征图中各个像素点的特征值和左眼局部特征图中对应像素点的特征值进行相乘,根据相乘结果得到左眼目标局部特征图。另外,以特征值为灰度值为例,对于某个位置,其在渐变平滑特征图对应像素点a1,在初始局部特征图中对应像素点a2,假设a1的特征值为0.5,a2的特征值为200,则将0.5乘以200得到新的灰度值100,这个灰度值100就是目标局部特征图中对应像素点的特征值。
在一个实施例中,图7示出了边缘平滑处理的效果示意图,其中,图7(a)表示初始 局部特征图,图7(b)表示渐变平滑特征图。从图7(a)中可以看出,初始局部特征图的边缘较为明显,跳跃性较强,如果直接将其与全局特征图进行拼接则容易出现边缘不连续的问题。图7(b)中的边缘区域进行了边缘平滑处理,边缘平缓过渡,呈现中心强边缘弱的效果,作用强度由中心向边缘逐步递减。将边缘平滑特征图与初始局部特征图中对应像素点进行相乘运算后,得到如图7(c)所示的目标局部特征图。可以看出,图7(c)中的目标局部特征图边缘过渡平缓,将其与全局特征图进行拼接后,能得到边缘连续的目标图像。
上述实施例,根据距离对边缘区域内各个像素点设置对应的特征权重值,据此生成渐变平滑特征图,进而通过渐变平滑特征图对初始局部特征图中对应像素点的特征值进行调整,调整之后得到目标局部特征图。通过简单直观的方法就实现了对局部特征图的边缘平滑处理,能快速得到边缘平滑的目标局部特征图,进而解决局部特征图和全局特征图进行拼接和卷积之后产生的不连续边缘,减少最终目标图像的畸变问题。
在一个实施例中,所述输入图像为脸部图像;所述目标局部特征图包括脸部关键部位对应的关键部位特征图;所述分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,包括:分别对各个关键部位特征图的外部区域进行特征值填充,以得到与所述全局特征图尺寸相同的填充关键部位特征图;将各个所述填充关键部位特征图进行合并,得到脸部器官特征图;分别基于纹理和形状,对所述全局特征图和所述脸部器官特征图进行拼接,得到所述目标纹理图像和所述目标形状图像。
其中,脸部可以指人脸、动物的脸等。在一个实施例中,脸部关键部位可以是眼睛(包括左眼和右眼)、鼻子、嘴部、耳朵等。图8为一个实施例中关键部位特征图的特征值填充和合并的效果示意图;分别对鼻子、嘴部、左眼和右眼对应的关键部位特征图进行特征值填充(例如填充为0)之后得到的填充关键部位特征图(鼻子、嘴部、左眼和右眼)可以分别如图8中的(a)、(b)、(c)和(d)所示,其中,801/802/803/804分别表示鼻子、嘴部、左眼和右眼所在的区域。对这些填充关键部位特征图进行合并,将鼻子、嘴部、左眼和右眼所在的区域801、802、803和804整合在一起,可以得到如图8(e)所示的脸部器官特征图。
上述实施例,通过对关键部位特征图进行特征值填充,得到与全局特征图尺寸一致的填充关键部位特征图,同一尺寸的全局特征图和填充关键部位特征图能够方便地叠加在一起,以便卷积核进行卷积处理,进而得到脸部器官特征图,基于该脸部器官特征图能与全局特征图进行有效的拼接,生成边缘连续的目标纹理图像和目标形状图像。
在一个实施例中,所述输入图像为脸部图像;所述目标局部特征图包括左眼特征图、右眼特征图、鼻子特征图和嘴部特征图;所述分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,包括:对所述左眼特征图、所述右眼特征图、所述鼻子特征图和所述嘴部特征图的外部区域进行特征值填充,以得到与所述全局特征图尺寸相同的目标左眼特征图、目标右眼特征图、目标鼻子特征图和目标嘴部特征图;对所述目标左眼特征图、所述目标右眼特征图、所述目标鼻子特征图和所述目标嘴部特征图进行合并,得到脸部器官特征图;分别基于纹理和形状,对所述全局特征图和所述脸部器官特征图进行拼接,得到所述目标纹理图像和所述目标形状图像。
上述实施例,得到脸部图像中的左眼、右眼、鼻子和嘴部对应的特征图,对这些特征图进行合并以得到脸部器官特征图,并将脸部器官特征图与全局特征图进行拼接,得到准确可靠的目标纹理图像和目标形状图像。
在一个实施例中,可以对脸部器官特征图进行边缘平滑处理,得到目标脸部器官特征图,基于纹理和形状对全局特征图和目标脸部器官特征图进行拼接,得到目标纹理图像和目标形状图像。其中,参照图9,图9为一个实施例中得到目标脸部器官特征图的过程示意图,对脸部器官特征图进行边缘平滑处理得到目标脸部器官特征图的实现过程可以如下:构建与脸部器官特征图对应的器官渐变平滑特征图【如图9(a)所示】,获取脸部器官特征图【如图9(b)所示】,将器官渐变平滑特征图和脸部器官特征图对应像素点的特征值进行相乘处理,得到如图9(c)所示目标脸部器官特征图。
上述实施例,通过对脸部器官特征图进行边缘平滑处理,能得到边缘渐变的目标脸部器官特征图,有效弱化目标纹理图像和目标形状图像边缘的割裂感,使得最终得到的目标三维模型边缘连续,有效降低三维模型畸变的出现概率。
在一个实施例中,所述全局特征图包括全局纹理特征图和全局形状特征图,所述目标局部特征图包括局部纹理特征图和局部形状特征图;所述分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,包括:基于纹理对所述全局特征图和所述目标局部特征图进行拼接得到目标纹理图像;基于形状对所述全局特征图和所述目标局部特征图进行拼接得到目标形状图像。
在一个实施例中,基于纹理对所述全局特征图和所述目标局部特征图进行拼接得到目标纹理图像,包括:对所述全局纹理特征图和局部纹理特征图进行拼接,对拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像。
基于形状对所述全局特征图和所述目标局部特征图进行拼接得到目标形状图像,包括:对所述全局形状特征图和局部形状特征图进行拼接,对拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像。
上述实施例,分别基于纹理和形状,对全局特征图和局部特征图进行拼接,并对拼接之后的特征图进行卷积,能够得到融合全局特征和局部特征的目标图像,据此根据目标图像重建得到目标三维模型,该目标三维模型融合了多方面的信息,能够更为全面地表征图像信息,尽可能与输入图像相对应,实现三维模型的可靠重建。
在一个实施例中,全局特征图和局部特征图可以不止一层,而是由多层特征图组成。在这种情况下,全局特征图和局部特征图需要拼接后再利用一个卷积模块才能将全局信息和局部信息整合在一起。在一个实施例中,所述全局特征图和所述目标局部特征图均对应有至少一个特征通道;所述分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,包括:分别基于纹理和形状,在各个所述特征通道内对所述全局特征图和所述目标局部特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局特征和局部特征进行整合,得到目标纹理图像和目标形状图像。
多个特征通道的特征图处理流程(基于纹理和基于形状的处理流程都可以通过该图的处理流程实现,即纹理和形状的处理流程可以一样)如图10所示。具体地,分别获取三个特征通道上的全局特征图和初始局部特征图(左眼、右眼、鼻子以及嘴巴对应的初始局 部特征图),对这些特征通道上的初始局部特征图分别进行边缘平滑处理,得到三个特征通道上的目标局部特征图,对各个特征通道上的全局特征图和目标局部特征图进行拼接得到这三个特征通道上的已拼接特征图1001,将这些已拼接特征图1001按序叠放在一起之后输入卷积模块中进行卷积,得到目标图像。
上述实施例,基于特征通道对全局特征图和局部特征图进行拼接并进行卷积处理,能充分融合多个特征通道上的全局特征和局部特征,使得所得到的目标图像更为全面地反映出输入图像的特征,最终得到更为准确的目标三维模型。
在一个实施例中,可以基于纹理和形状分别对全局特征图和局部特征图进行拼接和卷积处理,以得到对应的目标纹理图像和目标形状图像。
在一个实施例中,对于纹理,所述对所述全局纹理特征图和局部纹理特征图进行拼接,对拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像,包括:在各个所述特征通道内对所述全局纹理特征图和所述局部纹理特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像。上述实施例,按照通道维度对全局纹理特征图和局部纹理特征图进行拼接,对拼接后的特征图进行卷积,充分融合了多个特征通道上的全局纹理特征和局部纹理特征,能得到全面准确的目标纹理图像。
在一个实施例中,对于形状,所述全局特征图和所述目标局部特征图均对应有至少一个特征通道,所述对所述全局形状特征图和局部形状特征图进行拼接,对拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像,包括:在各个所述特征通道内对所述全局形状特征图和所述局部形状特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像。上述实施例,按照通道维度对全局形状特征图和局部形状特征图进行拼接,对拼接后的特征图进行卷积,充分融合了多个特征通道上的全局形状特征和局部形状特征,能得到全面准确的目标形状图像。
在一个实施例中,可以通过自编码器得到图像特征系数。其中,自编码器(Autoencoder)是一种旨在将输入复制到输出的神经网络。具体的,该自编码器可以是卷积自编码器。卷积自编码器是采用卷积层代替全连接层,对输入的特征进行降采样以提供较小维度的潜在表示,并强制自编码器学习输入特征的压缩版本,可以得到对图像特征进行表征的系数。
在一个实施例中,所述获取输入图像的图像特征系数,包括:通过卷积自编码器对所述输入图像进行逐层卷积处理;根据逐层卷积处理的结果得到所述输入图像的纹理特征系数和形状特征系数,作为所述图像特征系数。
其中,卷积自编码器的结构可以如图11所示,该卷积自编码器由多个卷积层构成,这些卷积层对输入图像进行逐层卷积处理,以对输入图像的特征进行降采样并逐层分析,进而根据逐层卷积处理的结果得到纹理特征系数f a和形状特征系数f s,作为图像特征系数。
具体的,在一个实施例中,所述卷积自编码器包括解码器;所述通过卷积自编码器对所述输入图像进行逐层卷积处理,根据逐层卷积处理的结果得到所述图像特征系数,包括:通过所述卷积自编码器对所述输入图像进行逐层卷积处理;由所述解码器根据逐层卷积处 理的结果得到所述输入图像的纹理特征系数和形状特征系数,作为所述图像特征系数。
卷积自编码器通过解码器对每一个卷积层进行输出,如图11右侧输出大小所示即为解码器所输出结果的大小。第53层卷积可以得到二维的输出结果,即右侧的7×7×(f s+f a+64)输出。再进行平均池化(AvgPool)处理,就可以得到f s和f a的值,即得到纹理特征系数和形状特征系数。
上述实施例借助自编码器来得到图像特征系数,能通过逐层卷积的方式充分挖掘输入图像的图像特征,进而得到准确的图像特征系数,以便得到准确的全局特征图和局部特征图。
在一个实施例中,可以通过局部细节增强模块来对输入图像进行特征分析和解码,该局部细节增强模块可以包括全局解码器和局部解码器,通过这两种解码器可以得到全局特征图和初始局部特征图。在一个实施例中,所述根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图,包括:由全局解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述全局特征图;由局部解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述初始局部特征图。
其中,全局解码器和局部解码器均可以由至少一个反卷积层构成,各个反卷积层的卷积核尺寸可以相同也可以不同。另外,各个特征通道的全局特征图的尺寸可以相同,而不同部位的初始局部特征图可以相同也可以不同。
上述实施例,由全局解码器对输入图像解码得到全局特征图,由局部解码器对输入图像解码得到初始局部特征图,即通过全局解码器和局部解码器这两个分支分别获取输入图像的整体特征和局部特征,之后基于整体特征和局部特征整合得到目标图像,能够使得目标图像尽可能还原输入图像的信息,以便后续基于目标图像实现可靠的图像重建效果。
在一个实施例中,所述局部解码器包括脸部关键部位解码器;所述由局部解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述初始局部特征图,包括:由所述脸部关键部位解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,将解码得到的关键部位特征图确定为所述初始局部特征图。
其中,脸部关键部位解码器可以不止一个,同时每个脸部关键部位可以对应至少一个解码器,例如,包括左眼解码器、右眼解码器、鼻子解码器、嘴部解码器等。这些脸部关键部位解码器分别对输入图像进行特征解码并得到对应的初始局部特征图。
在某些实施例中,这些脸部关键部位解码器可以通过对应的眼睛、鼻子、嘴巴的图片预先训练得到。
上述实施例,通过脸部关键部位解码器来针对性地进行特征解码,以得到局部特征清晰的初始局部特征图。
在一个实施例中,所述脸部关键部位解码器包括左眼解码器、右眼解码器、鼻子解码器和嘴部解码器;所述由所述脸部关键部位解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,将解码得到的关键部位特征图确定为所述初始局部特征图,包括:由所述左眼解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,得到左眼特征图;由所述右眼解码器中的反卷积层,根据所述图像特征系数对所述 输入图像进行特征解码,得到右眼特征图;由所述鼻子解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,得到鼻子特征图;由所述嘴部解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,得到嘴部特征图;将所述左眼特征图、所述右眼特征图、所述鼻子特征图和所述嘴部特征图,确定为所述初始局部特征图。
在一个实施例中,所述根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图,包括:由全局解码器中的反卷积层根据纹理特征系数和形状特征系数对所述输入图像进行特征解码,得到所述全局纹理特征图和全局形状特征图;由局部解码器中的反卷积层根据纹理特征系数和形状特征系数对所述输入图像进行特征解码,得到初始纹理局部特征图和初始形状局部特征图。
在一个实施例中,可以对初始纹理局部特征图和初始形状局部特征图进行边缘平滑处理,以得到局部纹理特征图和局部形状特征图,基于所得到的局部纹理特征图和局部形状特征图与对应的全局特征图进行拼接,得到对应的目标纹理图像和目标形状图像。
在一个实施例中,全局解码器D g由13个反卷积层构成,每个反卷积层的卷积核尺寸为3*3,其输出可以为c′*h′*w′的全局特征图G,其中,c′为特征通道数,h′和w′分别为全局特征图的高和宽,h'*w'可以为192*224,纹理系数f a和形状系数f s作为全局解码器的输入,得到全局纹理特征图T G和全局形状特征图S G
局部解码器D l由4个局部解码模块:
Figure PCTCN2021112089-appb-000003
组成,以解码鼻子、嘴巴、左眼和右眼区域。每个局部解码模块都包含10个反卷积层。
Figure PCTCN2021112089-appb-000004
Figure PCTCN2021112089-appb-000005
对应的输出分别为L nose,L mouth,L lefteye,L righteye,它们对应的输出尺寸分别为c′*h nose*w nose,c′*h mouth*w mouth,c′*h lefteye*w lefteye,c′*h righteyere*w righteye。其中,h nose、h mouth、h lefteye和h righteye分别表示鼻子、嘴巴、左眼和右眼特征图的高,w nose、w mouth、w lefteye和w righteye分别表示鼻子、嘴巴、左眼和右眼特征图的宽。
全局解码器和局部解码器的输出分别如下表:
Figure PCTCN2021112089-appb-000006
Figure PCTCN2021112089-appb-000007
上述实施例,通过全局解码器解码得到全局特征图,通过脸部关键部位特征图解码得到脸部关键部位特征图,进而进行特征图的拼接和卷积,以得到目标图像,既能获取到反映输入图像整体脸部特征的全局特征图,也能针对性地获取到反映局部信息的脸部关键部位特征图,使得所得到的目标图像足够全面和准确。
在一个实施例中,本发明实施例生成的目标三维模型可作为人脸融合(换脸应用)研究的成对训练数据。融合图(用户人脸)和模板图换脸后得到结果图,该结果图的人脸2D特征点就是所谓的成对训练数据。可以将模板图中的每一帧和用户人脸分别建模,然后迁移模板图中每一帧的姿态和表情到用户人脸上,可以得到换脸后每一帧对应的三维模型,从而从三维模型对应的2D图中得到成对训练数据,以便后续对换脸模型进行训练。
其中,换脸模型可以是基于深度学习的网络神经模型,它能够将某一脸部中的部分特征替换成其他脸部特征的模型。例如,将A和B两个脸部图像输入到换脸模型中,该换脸模型从A中获取全局特征同时从B中获取局部特征,进而根据这些特征进行三维模型重建,得到包含A的全局特征和B的局部特征的脸部模型C。因为局部特征包含有脸部器官更为细节的信息,能够更为明显地表征脸部特征,因此此时可以理解为将脸部B替换到脸部A上。
具体的,在一个实施例中,所述输入图像为脸部图像;所述根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型之后,还包括:获取所述目标三维模型对应的二维图像的全局特征点,得到目标特征点;获取模板三维模型;获取所述模板三维模型对应的二维图像的局部特征点,得到模板特征点;将所述目标特征点和所述模板特征点作为成对数据输入到换脸模型中,以使所述换脸模型输出已换脸三维模型;所述已换脸三维模型包含所述目标三维模型中的全局特征且包含所述模板三维模型中的局部特征。
其中,模板三维模型可以根据模板脸部图像获取得到。具体的,将模板脸部图像输入三维重建模型中,该三维重建模型输出的三维模型即为模板三维模型。由于模板三维模型是由三维重建模型生成的,因此,通过这种方式,三维重建模型能够直接快速地输出模板三维模型的局部特征点。在一些实施例中,也可以获取预先配置的模板三维模型,通过对模板三维模型进行特征点提取,以得到模板三维模型的局部特征点。
上述实施例,从模板三维模型中获取局部特征点,从目标三维模型中获取全局特征点,并将这两种特征点成对输入到换脸模型中,由换脸模型得到已换脸三维模型,准确地实现换脸。
在一个实施例中,可以对已换脸三维模型进行图像渲染处理,得到已换脸图像。图12为一个实施例中得到已换脸图像的过程示意图,如图12所示,通过输入图像和模板脸部图像的整合、渲染之后,得到右侧的已换脸图像,该已换脸图像拥有输入图像的姿态和轮廓,同时拥有模板脸部图像的五官形态和表情,实现了“换脸”的效果。
在一个实施例中,所述获取模板三维模型,包括:获取预设的模板脸部图像;获取所述模板脸部图像基于纹理和形状的模板全局特征图和初始模板局部特征图;对所述初始模 板局部特征图进行边缘平滑处理,得到目标模板局部特征图;分别基于纹理和形状,对所述模板全局特征图和所述目标模板局部特征图进行拼接,得到模板脸部纹理图像和模板脸部形状图像;根据所述模板脸部纹理图像和所述模板脸部形状图像进行三维模型重建处理,得到所述模板三维模型。
上述实施例,通过三维重建模型对模板脸部图像进行处理,并对初始模板局部特征图进行边缘平滑处理,能得到边缘平滑连续的模板脸部纹理图像和模板脸部形状图像,据此能有效减少重建得到的模板三维模型的畸变,保证换脸应用的正常进行,得到可靠的已换脸三维模型。
本申请提供的三维模型重建方法可以应用到各种三维重建处理场景中,比如,可以应用到图像处理软件、模型重建软件、PS(photoshop)软件、三维动画处理软件(例如:动漫捏脸软件)等等。
本申请还提供一种应用场景,该应用场景应用上述的三维模型重建方法。具体地,图13为一个实施例中三维模型重建方法的流程示意图,参照图13,该三维模型重建方法在该应用场景的应用如下:
向终端的三维重建软件(三维重建软件中搭载有基于深度学习的三维重建模型)中输入一张224*224待处理的人脸图像,以触发三维重建软件对输入的人脸图像进行如下处理,以重建得到目标三维人脸模型:
1、通过一个卷积自编码器生成人脸图像对应的纹理系数f a和形状系数f s
2、将f a和f s输入到结构相同的第一局部细节增强模块和第二局部细节增强模块中。其中,第一局部细节增强模块用于输出2D形状图,第二局部细节增强模块用于输出2D纹理图。这两个局部细节增强模块具体的处理流程见步骤3和4。
3、在局部细节增强模块中,构建解码器D,由一个全局解码器D g和一个局部解码器D l构成。
以下以第一局部细节增强模块为例进行说明(第二局部细节增强模块的实现过程一样,在此不再赘述):
其后,全局解码器D g由13个反卷积层构成,输出全局形状特征图S G
局部解码器D l有4个局部解码模块:
Figure PCTCN2021112089-appb-000008
组成,以解码鼻子、嘴巴、左眼和右眼区域,每个反卷积模块都包含10个反卷积层。
Figure PCTCN2021112089-appb-000009
Figure PCTCN2021112089-appb-000010
对应的初始局部特征图L i
设定渐变平滑特征图f i,f i的尺寸与初始局部特征图L i相同。渐变平滑特征图中各个像素点的特征值设置如下:
Figure PCTCN2021112089-appb-000011
渐变平滑特征图生成后,将其与对应的初始局部特征图的像素点特征值进行相乘处理,得到四个部位对应的边缘渐变的目标局部特征图。将这些目标局部特征图进行合并,得到脸部器官形状特征图S L
由第二局部细节增强模块按照同样的方式得到全局纹理特征图T G和脸部器官纹理特征图T L
4、脸部器官纹理特征图T L与全局纹理特征图T G在通道维度拼接后,由第一局部细节增强模块经过一个卷积层输出2D纹理图T 2D。同理,脸部器官形状特征图S L与全局形状特征图S G在通道维拼接后,由第二局部细节增强模块经过一个卷积层输出2D形状图S 2D
5、通过上述得到的2D纹理图T 2D和2D形状图S 2D进行三维模型重建,得到目标三维人脸模型。
如图13所示,输入一张人脸图像之后,终端上的三维重建软件进行三维人脸模型的重建,最终得到目标三维人脸模型。对目标三维人脸模型的分析可以发现,通过上述实施例输出的三维人脸模型完整连续,不存在图像畸变的问题。
具体的,对于图像纹理上的处理效果,图14示出了传统技术和本发明实施例得到的2D纹理图的对比图,其中图14(a)表示传统技术得到的3张2D纹理图,图14(a)中各个人脸纹理均存在一定程度的畸变,图14(b)表示本发明实施例得到的3张2D纹理图(其中,a/b两排图像中,上下对应的两个纹理图像是通过同一输入图像得到的),图14(b)中人脸纹理清晰,不存在畸变问题。接下来再具体比对纹理图像的边缘处理效果,图15示出了传统技术和本发明实施例得到的2D纹理图的边缘效果对比图,其中图15(a)表示传统技术得到的2D纹理图,图15(a)中眼部边缘区域1501存在明显的割裂,图15(b)表示本发明实施例得到的2D纹理图,图15(b)中的眼部边缘区域1502边缘过渡平缓,图15中方框(包括眼部边缘区域1501和1502)的位置即边缘突兀与边缘平滑的对比。通过图14和图15的对比可知,对比传统技术生成的纹理图像,本发明实施例生成的2D纹理图的边缘更加平滑,卷积所带来的不连续性大幅减弱,更加接近原始的皮肤色彩。
对于图像形状上的处理效果。图16示出了传统技术和本发明实施例得到2D形状图的对比图,其中图16(a)表示传统技术得到的3张2D形状图,图16(a)中3个嘴部边缘区域均存在一定程度的缺失,即边缘不连续,图16(b)表示本发明实施例得到的3张2D形状图(其中,a/b两排图像中,上下对应的两个形状图像是通过同一输入图像得到的),图16(b)中脸部各个区域均连续且过渡较为平滑。
通过对上述图像的分析对比结果可以看出,终端上的三维重建软件通过本发明实施 例提供的三维模型重建方法,很好地解决了2D纹理和2D形状畸变的问题,进而可以输出良好的三维重建人脸模型。
在一个实施例中,还提供一种三维重建模型的训练方法,本实施例以该方法应用于计算机设备进行举例说明,计算机设备可以是上述图1中的终端102或服务器104。
具体地,图17为一个实施例中三维重建模型的训练方法的流程示意图,如图17所示,所述方法包括以下步骤:
S1701,获取训练图像的图像特征系数和渲染系数。
训练图像可以是包含各种类型的对象的图像,具体实现方式可以参见前述实施例中的输入图像。
渲染系数可以指能够影响图像渲染过程的系数,可以是光照系数、扭转系数等。其中,以脸部为例,光照系数可以是光照强度、光照角度等对应的系数,扭转系数可以是头部的俯仰角度、侧脸角度等对应的系数。
S1702,将所述图像特征系数输入至基于深度学习的三维重建模型中,以使所述三维重建模型:根据所述图像特征系数,分别获取所述训练图像基于纹理和形状的全局特征图和初始局部特征图,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到预测三维模型。
三维重建模型可以是基于深度学习的神经网络模型。
需要说明的是,三维重建模型得到预测三维模型的具体实现过程可以参见前述三维模型重建方法的实施例,在此不再赘述。
S1703,根据所述渲染系数对所述预测三维模型进行图像渲染处理,得到预测二维图像。
其中,图像渲染是对图像的光线、色彩、角度等参数进行调整的过程。在某些情况下,渲染也可以指对图像的光线、色彩、角度等参数进行调整以及后续进行二维转换的过程,渲染之后直接得到二维图像。
在一个实施例中,S1703的实现过程可以是:调整预测三维图像的光照方向、俯仰角度等,将经过上述调整后的预测三维图像进行二维转换,将得到的二维图像作为预测二维图像。具体地,渲染可以通过非线性渲染法等实现。
S1704,根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。
预测二维图像是对预测三维模型进行图像渲染处理得到的,拥有训练图像中对象(训练图像中的人物、动物、建筑物等)的光照、色彩、角度等渲染信息。因此,该预测二维图像与训练图像的误差中携带有对象的渲染信息,基于此,训练得到的三维重建模型能够对所输入的图像进行可靠重建,以得到携带有渲染信息的目标三维模型。
上述实施例,通过三维重建模型能够尽可能得到不存在畸变问题的预测三维模型,对预测三维模型进行图像渲染得到预测二维图像,该预测二维图像是根据训练图像重建得到的,能基本还原训练图像的图形特征。基于此,根据预测二维图像与训练图像的误差对三维重建模型进行训练,能训练得到准确可靠的三维重建模型。
在一个实施例中,所述根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,包括:根据所述误差构建三维重建模型的损失函数;对所述损失函数进行梯度下降处理;根据梯度下降处理的结果调整所述三维重建模型的模型参数。
具体的,训练图像和预测二维图像均可以为多个,可以根据这些训练图像和预测二维图像的误差构建三维重建模型的损失函数,进而通过梯度下降法对损失函数进行最小化处理,确定损失函数最小值对应的模型参数,该模型参数对应的三维重建模型即为调整以后的三维重建模型。
上述实施例通过梯度下降法实现对三维重建模型损失函数的处理,能够快速准确地得到损失函数的最小值,进而调整三维重建模型的模型参数,以对三维重建模型进行训练。当损失函数的最小值足够小时,可以认为三维重建模型足够好,此时可以认为满足收敛条件,对应的三维重建模型即为已训练的三维重建模型。
在一个实施例中,所述获取训练图像的图像特征系数和渲染系数,包括:通过卷积自编码器对所述训练图像进行逐层卷积处理;所述卷积自编码器包括解码器和编码器;由所述解码器根据逐层卷积处理的结果得到所述训练图像的纹理特征系数和形状特征系数,作为所述图像特征系数;由所述编码器根据逐层卷积处理的结果得到所述训练图像的扭转系数和光照系数,作为所述渲染系数。
其中,渲染系数包括扭转系数m以及光照系数S。
在一个实施例中,所述卷积自编码器包括编码器;所述根据逐层卷积处理的结果得到所述图像特征系数和所述渲染系数,包括:由所述编码器根据逐层卷积处理的结果得到所述训练图像的扭转系数和光照系数,作为所述渲染系数。
卷积自编码器的结构可以如图11所示。卷积自编码器通过编码器可以获取到上一个卷积层的输出信息,进行卷积处理,图11中间的过滤/步数所示即为编码器的结构。对卷积处理的结果进行平均池化处理后,可以由编码器输出m和S的值,即得到扭转系数和光照系数。
在某些实施例中,卷积自编码器也可以属于三维重建模型的内部组成部分。即,由三维重建模型中的卷积自编码器得到图像特征系数和渲染系数。
上述实施例借助卷积自编码器来得到图像特征系数和渲染系数,能通过逐层卷积的方式充分挖掘训练图像的图像特征,基于深度学习的方法得到准确的图像特征系数和渲染系数,能保证后续程序的可靠运行,进而得到准确的目标三维模型。
在一个实施例中,提供一种三维重建模型的训练方法,本实施例以该方法应用于图1中的终端进行举例说明。实现过程如图18所示,图18为一个实施例中三维重建模型的训练方法的流程示意图。具体实现过程如下:
第一,终端获取训练人脸图像,通过卷积自编码器生成训练人脸图像的纹理特征系数f s、形状特征系数f a、扭转系数m以及光照系数S。
第二,将f a和f s输入三维重建模型中,该三维重建模型包含结构相同的第一局部细节增强模块和第二局部细节增强模块。通过第一局部细节增强模块和第二局部细节增强模块 分别得到2D形状图和2D纹理图。三维重建模型根据2D形状图和2D纹理图生成预测三维人脸模型。
第三,终端触发渲染模块根据扭转系数m以及光照系数S对预测三维人脸模型进行渲染,得到预测二维人脸图像。
第四,终端基于训练人脸图像和预测二维人脸图像的误差来对三维重建模型进行调整。在满足收敛条件时,得到已训练的三维重建模型,该模型能够基于输入的二维图像重建得到三维模型。
上述实施例,通过三维重建模型得到预测三维模型,对预测三维模型进行图像渲染得到预测二维图像,该预测二维图像是根据训练图像重建得到的,能基本还原训练图像的图形特征。基于此,将预测三维模型对应的预测二维图像与输入的训练图像进行比对,比对结果能够起到对三维重建模型的重建效果的反馈作用,根据比对结果对三维重建模型进行训练,能训练得到准确可靠三维重建模型。
本申请还提供一种应用场景,该应用场景应用上述的三维重建模型的训练方法和三维模型重建方法。具体地,这些方法在该应用场景的应用如下:
终端接收到多张训练人脸图像,将这些训练人脸图像输入到模型训练软件中。由该模型训练软件获取训练人脸图像的纹理特征系数f sX、形状特征系数f aX、扭转系数m X以及光照系数S X。模型训练软件将纹理特征系数f sX、形状特征系数f sX输入到三维重建模型中。由该三维重建模型根据纹理特征系数f sX、形状特征系数f sX,分别获取训练人脸图像的全局纹理特征图T GX、全局形状特征图S GX、局部纹理特征图T LXO和局部形状特征图S LXO,对局部纹理特征图T LXL和局部形状特征图S LXO进行边缘平滑处理,得到目标局部纹理特征图T LX和目标局部形状特征图S LX;对全局纹理特征图T GX和目标局部纹理特征图T LX进行拼接,得到目标纹理图像T X,对全局形状特征图S GX和目标局部形状特征图S LX进行拼接,得到目标形状图像S X;根据目标纹理图像T X和目标形状图像S X进行三维模型重建处理,得到预测三维人脸模型。模型训练软件根据扭转系数m X以及光照系数S X对预测三维人脸模型进行图像渲染处理,得到预测二维人脸图像;根据训练人脸图像和预测二维人脸图像的误差构建损失函数,对该损失函数运行梯度下降算法,在梯度下降算法的结果满足收敛条件时,得到已训练的三维重建模型。
之后,模型训练软件可以将已训练的三维重建模型输出至三维重建软件中。以使三维重建软件在接收到输入人脸图像时通过该已训练的三维重建模型实现以下步骤:根据输入人脸图像的纹理特征系数f sY和形状特征系数f aY分别获取输入人脸图像的全局纹理特征图T GY、全局形状特征图S GY、局部纹理特征图T LYO和局部形状特征图S LYO,对局部纹理 特征图T LYO和局部形状特征图S LYO进行边缘平滑处理,得到目标局部纹理特征图T LY和目标局部形状特征图S LY;对全局纹理特征图T GY和目标局部纹理特征图T LY进行拼接,得到目标纹理图像T Y,对全局形状特征图S GY和目标局部形状特征图S LY进行拼接,得到目标形状图像S Y;根据目标纹理图像T Y和目标形状图像S Y进行三维模型重建处理,得到目标三维人脸模型。三维重建软件将该目标三维人脸模型转换成图像的形式并显示在显示屏中。
本发明实施例提供的上述方法能够在终端实现三维模型的训练和三维模型的重建,且三维重建模型重建得到的三维人脸能有效抑制图像畸变的出现,实现可靠的人脸模型重建效果。
应该理解的是,虽然上述流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
基于与上述实施例中的三维模型重建方法和三维重建模型的训练方法相同的思想,本发明还提供三维模型重建装置和三维重建模型的训练装置,这些装置可分别用于执行上述三维模型重建方法和三维重建模型的训练方法。为了便于说明,三维模型重建装置和三维重建模型的训练装置实施例的结构示意图中,仅仅示出了与本发明实施例相关的部分,本领域技术人员可以理解,图示结构并不构成对装置的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
在一个实施例中,如图19所示,提供了一种三维模型重建装置1900,该装置可以采用软件模块或硬件模块,或者是二者的结合成为计算机设备的一部分,该装置具体包括:第一系数获取模块1901、特征图获取模块1902、平滑处理模块1903、特征图拼接模块1904和第一模型重建模块1905,其中:
第一系数获取模块1901,用于获取输入图像的图像特征系数。
特征图获取模块1902,用于根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图。
平滑处理模块1903,用于对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图。
特征图拼接模块1904,用于分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像。
第一模型重建模块1905,用于根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。
上述三维模型重建装置中,对局部特征图进行边缘平滑处理,不容易产生图像畸变,能得到边缘平滑的目标三维模型。
在一个实施例中,平滑处理模块,包括:边界获取子模块,用于获取所述初始局部特 征图的边界;距离获取子模块,用于获取所述初始局部特征图中各个像素点与所述边界之间的距离;第一边缘平滑子模块,用于根据所述距离对所述初始局部特征图进行边缘平滑处理,得到所述目标局部特征图。
在一个实施例中,第一边缘平滑子模块,包括:边缘区域获取单元,用于根据所述距离获取所述初始局部特征图的边缘区域;权重值确定单元,用于根据所述距离确定所述边缘区域的各个像素点对应的特征权重值,以使远距离像素点对应的特征权重值大于近距离像素点对应的特征权重值;特征图构建单元,用于根据所述初始局部特征图的各个像素点对应的特征权重值生成渐变平滑特征图;所述初始局部特征图的边缘区域之外的像素点对应的特征权重值为预设权重值,所述渐变平滑特征图中各个像素点的特征值根据对应的特征权重值得到;特征值相乘单元,用于将所述渐变平滑特征图中各个像素点的特征值和所述初始局部特征图中对应像素点的特征值进行相乘,根据相乘结果得到所述目标局部特征图。
在一个实施例中,所述输入图像为脸部图像;所述目标局部特征图包括脸部关键部位对应的关键部位特征图;特征图拼接模块,包括:填充子模块,用于分别对各个关键部位特征图的外部区域进行特征值填充,以得到与所述全局特征图尺寸相同的填充关键部位特征图;合并子模块,用于将各个所述填充关键部位特征图进行合并,得到脸部器官特征图;拼接子模块,用于分别基于纹理和形状,对所述全局特征图和所述脸部器官特征图进行拼接,得到所述目标纹理图像和所述目标形状图像。
在一个实施例中,所述脸部关键部位包括左眼、右眼、鼻子、嘴部中的至少一项。
在一个实施例中,所述全局特征图包括全局纹理特征图和全局形状特征图,所述目标局部特征图包括局部纹理特征图和局部形状特征图;特征图拼接模块,包括:纹理卷积子模块,用于对所述全局纹理特征图和局部纹理特征图进行拼接,对拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像;形状卷积子模块,用于对所述全局形状特征图和局部形状特征图进行拼接,对拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像。
在一个实施例中,所述全局特征图和所述目标局部特征图均对应有至少一个特征通道;纹理卷积子模块,还用于在各个所述特征通道内对所述全局纹理特征图和所述局部纹理特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像;形状卷积子模块,还用于在各个所述特征通道内对所述全局形状特征图和所述局部形状特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像。
在一个实施例中,第一系数获取模块,包括:卷积子模块,用于通过卷积自编码器对所述输入图像进行逐层卷积处理;特征系数获取子模块,用于根据逐层卷积处理的结果得到所述输入图像的纹理特征系数和形状特征系数,作为所述图像特征系数。
在一个实施例中,特征图获取模块,包括:全局解码子模块,用于由全局解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述全局特征图;局部解码子模块,用于由局部解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述初始局部特征图。
在一个实施例中,所述局部解码器包括脸部关键部位解码器;局部解码子模块,还用 于由所述脸部关键部位解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,将解码得到的关键部位特征图确定为所述初始局部特征图。
在一个实施例中,所述输入图像为脸部图像;所述装置,还包括:目标特征点获取模块,用于获取所述目标三维模型对应的二维图像的全局特征点,得到目标特征点;模板模型获取模块,用于获取模板三维模型;模板特征点获取模块,用于获取所述模板三维模型对应的二维图像的局部特征点,得到模板特征点;换脸模块,用于将所述目标特征点和所述模板特征点作为成对数据输入到换脸模型中,以使所述换脸模型输出已换脸三维模型;所述已换脸三维模型包含所述目标三维模型中的全局特征且包含所述模板三维模型中的局部特征。
在一个实施例中,模板模型获取模块,包括:模板图像获取子模块,用于获取预设的模板脸部图像;模板特征图获取子模块,用于获取所述模板脸部图像基于纹理和形状的模板全局特征图和初始模板局部特征图;第二边缘平滑子模块,用于对所述初始模板局部特征图进行边缘平滑处理,得到目标模板局部特征图;特征图拼接子模块,用于分别基于纹理和形状,对所述模板全局特征图和所述目标模板局部特征图进行拼接,得到模板脸部纹理图像和模板脸部形状图像;模板三维模型重建子模块,用于根据所述模板脸部纹理图像和所述模板脸部形状图像进行三维模型重建处理,得到所述模板三维模型。
关于三维模型重建装置的具体限定可以参见上文中对于三维模型重建方法的限定,在此不再赘述。上述三维模型重建装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,如图20所示,提供了一种三维重建模型的训练装置2000,该装置可以采用软件模块或硬件模块,或者是二者的结合成为计算机设备的一部分,该装置具体包括:第二系数获取模块2001、第二模型重建模块2002、图像渲染模块2003和重建模型训练模块2004,其中:
第二系数获取模块2001,用于获取训练图像的图像特征系数和渲染系数。
第二模型重建模块2002,用于将所述图像特征系数输入至基于深度学习的三维重建模型中,以使所述三维重建模型:根据所述图像特征系数,分别获取所述训练图像基于纹理和形状的全局特征图和初始局部特征图,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到预测三维模型。
图像渲染模块2003,用于根据所述渲染系数对所述预测三维模型进行图像渲染处理,得到预测二维图像。
重建模型训练模块2004,用于根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。
上述实施例,通过三维重建模型能够尽可能得到不存在畸变问题的预测三维模型,确定预测三维模型对应的预测二维图像,根据预测二维图像与训练图像的误差对三维重建模 型进行训练,能训练得到准确可靠的三维重建模型。
在一个实施例中,第二系数获取模块,包括:逐层卷积子模块,用于通过卷积自编码器对所述训练图像进行逐层卷积处理;所述卷积自编码器包括解码器和编码器;图像特征系数获取子模块,用于由所述解码器根据逐层卷积处理的结果得到所述训练图像的纹理特征系数和形状特征系数,作为所述图像特征系数;渲染系数获取子模块,用于由所述编码器根据逐层卷积处理的结果得到所述训练图像的扭转系数和光照系数,作为所述渲染系数。
关于三维重建模型的训练装置的具体限定可以参见上文中对于三维重建模型的训练方法的限定,在此不再赘述。上述三维重建模型的训练装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图21所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储三维重建模型、卷积自编码器、训练图像等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种三维模型重建方法和三维重建模型的训练方法。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图22所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机可读指令被处理器执行时以实现一种三维模型重建方法和三维重建模型的训练方法。
该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图21、22中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,还提供了一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,该一个或多个处理器执行计算机程序时实现上述各三维模型重建方法实施例中的步骤。
在一个实施例中,还提供了一种计算机设备,包括存储器和一个或多个处理器,存储 器中存储有计算机可读指令,该一个或多个处理器执行计算机程序时实现上述各三维重建模型的训练方法实施例中的步骤。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时实现上述各三维模型重建方法实施例中的步骤。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时实现上述各三维重建模型的训练方法实施例中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机可读指令,处理器执行该计算机可读指令,使得该计算机设备执行上述各方法实施例中的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (18)

  1. 一种三维模型重建方法,其特征在于,由计算机设备执行,所述方法包括:
    获取输入图像的图像特征系数;
    根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图;
    对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;
    分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;及
    根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图,包括:
    获取所述初始局部特征图的边界;
    获取所述初始局部特征图中各个像素点与所述边界之间的距离;及
    根据所述距离对所述初始局部特征图进行边缘平滑处理,得到所述目标局部特征图。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述距离对所述初始局部特征图进行边缘平滑处理,得到所述目标局部特征图,包括:
    根据所述距离获取所述初始局部特征图的边缘区域;
    根据所述距离确定所述边缘区域的各个像素点对应的特征权重值,以使远距离像素点对应的特征权重值大于近距离像素点对应的特征权重值;
    根据所述初始局部特征图的各个像素点对应的特征权重值生成渐变平滑特征图;所述初始局部特征图的边缘区域之外的像素点对应的特征权重值为预设权重值,所述渐变平滑特征图中各个像素点的特征值根据对应的特征权重值得到;及
    将所述渐变平滑特征图中各个像素点的特征值和所述初始局部特征图中对应像素点的特征值进行相乘,根据相乘结果得到所述目标局部特征图。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述输入图像为脸部图像;所述目标局部特征图包括脸部关键部位对应的关键部位特征图;
    所述分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,包括:
    分别对各个关键部位特征图的外部区域进行特征值填充,以得到与所述全局特征图尺寸相同的填充关键部位特征图;
    将各个所述填充关键部位特征图进行合并,得到脸部器官特征图;及
    分别基于纹理和形状,对所述全局特征图和所述脸部器官特征图进行拼接,得到所述目标纹理图像和所述目标形状图像。
  5. 根据权利要求4所述的方法,其特征在于,所述脸部关键部位包括左眼、右眼、鼻子、嘴部中的至少一项。
  6. 根据权利要求1至3任一项所述的方法,其特征在于,所述全局特征图包括全局纹理特征图和全局形状特征图,所述目标局部特征图包括局部纹理特征图和局部形状特征图;
    所述分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像,包括:
    对所述全局纹理特征图和局部纹理特征图进行拼接,对拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像;及
    对所述全局形状特征图和局部形状特征图进行拼接,对拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像。
  7. 根据权利要求6所述的方法,其特征在于,所述全局特征图和所述目标局部特征图均对应有至少一个特征通道;
    所述对所述全局纹理特征图和局部纹理特征图进行拼接,对拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像,包括:
    在各个所述特征通道内对所述全局纹理特征图和所述局部纹理特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局纹理特征和局部纹理特征进行整合,得到所述目标纹理图像;
    所述对所述全局形状特征图和局部形状特征图进行拼接,对拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像,包括:
    在各个所述特征通道内对所述全局形状特征图和所述局部形状特征图进行拼接,对各个所述特征通道拼接得到的特征图进行卷积以对全局形状特征和局部形状特征进行整合,得到所述目标形状图像。
  8. 根据权利要求1至3任一项所述的方法,其特征在于,所述获取输入图像的图像特征系数,包括:
    通过卷积自编码器对所述输入图像进行逐层卷积处理;及
    根据逐层卷积处理的结果得到所述输入图像的纹理特征系数和形状特征系数,作为所述图像特征系数。
  9. 根据权利要求1至3任一项所述的方法,其特征在于,所述根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图,包括:
    由全局解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述全局特征图;及
    由局部解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述初始局部特征图。
  10. 根据权利要求9所述的方法,其特征在于,所述局部解码器包括脸部关键部位解码器;
    所述由局部解码器中的反卷积层根据所述图像特征系数对所述输入图像进行特征解码,得到所述初始局部特征图,包括:
    由所述脸部关键部位解码器中的反卷积层,根据所述图像特征系数对所述输入图像进行特征解码,将解码得到的关键部位特征图确定为所述初始局部特征图。
  11. 根据权利要求1至3任一项所述的方法,其特征在于,所述输入图像为脸部图像;所述根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型之后,还包括:
    获取所述目标三维模型对应的二维图像的全局特征点,得到目标特征点;
    获取模板三维模型;
    获取所述模板三维模型对应的二维图像的局部特征点,得到模板特征点;及
    将所述目标特征点和所述模板特征点作为成对数据输入到换脸模型中,以使所述换脸模型输出已换脸三维模型;所述已换脸三维模型包含所述目标三维模型中的全局特征且包含所述模板三维模型中的局部特征。
  12. 根据权利要求11所述的方法,其特征在于,所述获取模板三维模型,包括:
    获取预设的模板脸部图像;
    获取所述模板脸部图像基于纹理和形状的模板全局特征图和初始模板局部特征图;
    对所述初始模板局部特征图进行边缘平滑处理,得到目标模板局部特征图;
    分别基于纹理和形状,对所述模板全局特征图和所述目标模板局部特征图进行拼接,得到模板脸部纹理图像和模板脸部形状图像;及
    根据所述模板脸部纹理图像和所述模板脸部形状图像进行三维模型重建处理,得到所述模板三维模型。
  13. 一种三维重建模型的训练方法,其特征在于,由计算机设备执行,包括:
    获取训练图像的图像特征系数和渲染系数;
    将所述图像特征系数输入至基于深度学习的三维重建模型中,以使所述三维重建模型:根据所述图像特征系数,分别获取所述训练图像基于纹理和形状的全局特征图和初始局部特征图,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到预测三维模型;
    根据所述渲染系数对所述预测三维模型进行图像渲染处理,得到预测二维图像;及
    根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。
  14. 根据权利要求13所述的方法,其特征在于,所述获取训练图像的图像特征系数和渲染系数,包括:
    通过卷积自编码器对所述训练图像进行逐层卷积处理;所述卷积自编码器包括解码器和编码器;
    由所述解码器根据逐层卷积处理的结果得到所述训练图像的纹理特征系数和形状特征系数,作为所述图像特征系数;及
    由所述编码器根据逐层卷积处理的结果得到所述训练图像的扭转系数和光照系数,作为所述渲染系数。
  15. 一种三维模型重建装置,其特征在于,所述装置包括:
    第一系数获取模块,用于获取输入图像的图像特征系数;
    特征图获取模块,用于根据所述图像特征系数,分别获取所述输入图像基于纹理和形状的全局特征图和初始局部特征图;
    平滑处理模块,用于对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;
    特征图拼接模块,用于分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;及
    第一模型重建模块,用于根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到目标三维模型。
  16. 一种三维模型重建装置,其特征在于,所述装置包括:
    第二系数获取模块,用于获取训练图像的图像特征系数和渲染系数;
    第二模型重建模块,用于将所述图像特征系数输入至基于深度学习的三维重建模型中,以使所述三维重建模型:根据所述图像特征系数,分别获取所述训练图像基于纹理和形状的全局特征图和初始局部特征图,对所述初始局部特征图进行边缘平滑处理,得到目标局部特征图;分别基于纹理和形状,对所述全局特征图和所述目标局部特征图进行拼接,得到目标纹理图像和目标形状图像;根据所述目标纹理图像和所述目标形状图像进行三维模型重建处理,得到预测三维模型;
    图像渲染模块,用于根据所述渲染系数对所述预测三维模型进行图像渲染处理,得到预测二维图像;及
    重建模型训练模块,用于根据所述训练图像和所述预测二维图像的误差对所述三维重建模型进行训练,直到满足收敛条件,得到已训练的三维重建模型。
  17. 一种计算机设备,包括存储器和一个或多个处理器,所述存储器存储有计算机可读指令,其特征在于,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器实现权利要求1至12或13至14中任一项所述方法的步骤。
  18. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,存储有计算机可读指令,其特征在于,所述计算机可读指令被一个或多个处理器执行时实现权利要求1至12或13至14中任一项所述方法的步骤。
PCT/CN2021/112089 2020-09-15 2021-08-11 三维模型重建方法、三维重建模型的训练方法和装置 WO2022057526A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21868354.8A EP4109412A4 (en) 2020-09-15 2021-08-11 METHOD AND APPARATUS FOR RECONSTRUCTING THREE-DIMENSIONAL MODEL, AND METHOD AND APPARATUS FOR FORMING THREE-DIMENSIONAL RECONSTRUCTION MODEL
US17/976,259 US20230048906A1 (en) 2020-09-15 2022-10-28 Method for reconstructing three-dimensional model, method for training three-dimensional reconstruction model, and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010969615.0 2020-09-15
CN202010969615.0A CN112102477A (zh) 2020-09-15 2020-09-15 三维模型重建方法、装置、计算机设备和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/976,259 Continuation US20230048906A1 (en) 2020-09-15 2022-10-28 Method for reconstructing three-dimensional model, method for training three-dimensional reconstruction model, and apparatus

Publications (1)

Publication Number Publication Date
WO2022057526A1 true WO2022057526A1 (zh) 2022-03-24

Family

ID=73760189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112089 WO2022057526A1 (zh) 2020-09-15 2021-08-11 三维模型重建方法、三维重建模型的训练方法和装置

Country Status (4)

Country Link
US (1) US20230048906A1 (zh)
EP (1) EP4109412A4 (zh)
CN (1) CN112102477A (zh)
WO (1) WO2022057526A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102477A (zh) * 2020-09-15 2020-12-18 腾讯科技(深圳)有限公司 三维模型重建方法、装置、计算机设备和存储介质
CN113350801A (zh) * 2021-07-20 2021-09-07 网易(杭州)网络有限公司 模型处理方法、装置、存储介质及计算机设备
CN114926480A (zh) * 2022-05-30 2022-08-19 腾讯科技(深圳)有限公司 一种训练图像分割模型的方法、装置、设备及存储介质
CN115147526B (zh) * 2022-06-30 2023-09-26 北京百度网讯科技有限公司 服饰生成模型的训练、生成服饰图像的方法和装置
CN115063542A (zh) * 2022-08-18 2022-09-16 江西科骏实业有限公司 一种几何不变量的预测和模型构建方法与系统
CN116993929B (zh) * 2023-09-27 2024-01-16 北京大学深圳研究生院 基于人眼动态变化的三维人脸重建方法、装置及存储介质
CN117496091B (zh) * 2023-12-28 2024-03-15 西南石油大学 一种基于局部纹理的单视图三维重建方法
CN117495867B (zh) * 2024-01-03 2024-05-31 东莞市星火齿轮有限公司 小模数齿轮精度的视觉检测方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955963A (zh) * 2014-04-30 2014-07-30 崔岩 一种基于 Kinect设备的数字化人体三维重建方法及系统
CN107730519A (zh) * 2017-09-11 2018-02-23 广东技术师范学院 一种人脸二维图像到人脸三维重建的方法及系统
CN108765550A (zh) * 2018-05-09 2018-11-06 华南理工大学 一种基于单张图片的三维人脸重建方法
CN111145338A (zh) * 2019-12-17 2020-05-12 桂林理工大学 一种基于单视角rgb图像的椅子模型重建方法及系统
CN111598111A (zh) * 2020-05-18 2020-08-28 商汤集团有限公司 三维模型生成方法、装置、计算机设备及存储介质
CN112102477A (zh) * 2020-09-15 2020-12-18 腾讯科技(深圳)有限公司 三维模型重建方法、装置、计算机设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9818232B2 (en) * 2015-08-26 2017-11-14 Adobe Systems Incorporated Color-based depth smoothing of scanned 3D model to enhance geometry in 3D printing
CN109118578A (zh) * 2018-08-01 2019-01-01 浙江大学 一种层次化的多视图三维重建纹理映射方法
CN109325437B (zh) * 2018-09-17 2021-06-22 北京旷视科技有限公司 图像处理方法、装置和系统
CN111027350A (zh) * 2018-10-10 2020-04-17 成都理工大学 一种基于人脸三维重建的改进pca算法
CN109978989B (zh) * 2019-02-26 2023-08-01 腾讯科技(深圳)有限公司 三维人脸模型生成方法、装置、计算机设备及存储介质
CN110738723A (zh) * 2019-10-12 2020-01-31 创新工场(北京)企业管理股份有限公司 一种基于人脸网格模型的纹理贴图生成方法、系统及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955963A (zh) * 2014-04-30 2014-07-30 崔岩 一种基于 Kinect设备的数字化人体三维重建方法及系统
CN107730519A (zh) * 2017-09-11 2018-02-23 广东技术师范学院 一种人脸二维图像到人脸三维重建的方法及系统
CN108765550A (zh) * 2018-05-09 2018-11-06 华南理工大学 一种基于单张图片的三维人脸重建方法
CN111145338A (zh) * 2019-12-17 2020-05-12 桂林理工大学 一种基于单视角rgb图像的椅子模型重建方法及系统
CN111598111A (zh) * 2020-05-18 2020-08-28 商汤集团有限公司 三维模型生成方法、装置、计算机设备及存储介质
CN112102477A (zh) * 2020-09-15 2020-12-18 腾讯科技(深圳)有限公司 三维模型重建方法、装置、计算机设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4109412A4

Also Published As

Publication number Publication date
CN112102477A (zh) 2020-12-18
EP4109412A1 (en) 2022-12-28
EP4109412A4 (en) 2023-11-01
US20230048906A1 (en) 2023-02-16

Similar Documents

Publication Publication Date Title
WO2022057526A1 (zh) 三维模型重建方法、三维重建模型的训练方法和装置
WO2021184933A1 (zh) 一种人体三维模型重建方法
EP3992919B1 (en) Three-dimensional facial model generation method and apparatus, device, and medium
WO2022143645A1 (zh) 三维人脸重建的方法、装置、设备和存储介质
US11615587B2 (en) Object reconstruction with texture parsing
CN110458924B (zh) 一种三维脸部模型建立方法、装置和电子设备
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
CN113822965A (zh) 图像渲染处理方法、装置和设备及计算机存储介质
CN114549291A (zh) 图像处理方法、装置、设备以及存储介质
US20220375179A1 (en) Virtual object construction method, apparatus and storage medium
US20210304514A1 (en) Image processing for updating a model of an environment
US11989846B2 (en) Mixture of volumetric primitives for efficient neural rendering
CN115147261A (zh) 图像处理方法、装置、存储介质、设备及产品
CN114049290A (zh) 图像处理方法、装置、设备及存储介质
US11893681B2 (en) Method for processing two-dimensional image and device for executing method
US20230093827A1 (en) Image processing framework for performing object depth estimation
US11977979B2 (en) Adaptive bounding for three-dimensional morphable models
US20220383582A1 (en) Hybrid differentiable rendering for light transport simulation systems and applications
CN116977539A (zh) 图像处理方法、装置、计算机设备、存储介质和程序产品
RU2757563C1 (ru) Способ визуализации 3d портрета человека с измененным освещением и вычислительное устройство для него
CN113223128B (zh) 用于生成图像的方法和装置
CN115760888A (zh) 图像处理方法、装置、计算机及可读存储介质
CN113570634A (zh) 对象三维重建方法、装置、电子设备及存储介质
CN116229008B (zh) 图像处理方法和装置
US20240029354A1 (en) Facial texture synthesis for three-dimensional morphable models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868354

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021868354

Country of ref document: EP

Effective date: 20220921

NENP Non-entry into the national phase

Ref country code: DE