WO2022001222A1 - 三维模型生成方法、神经网络生成方法及装置 - Google Patents

三维模型生成方法、神经网络生成方法及装置 Download PDF

Info

Publication number
WO2022001222A1
WO2022001222A1 PCT/CN2021/082485 CN2021082485W WO2022001222A1 WO 2022001222 A1 WO2022001222 A1 WO 2022001222A1 CN 2021082485 W CN2021082485 W CN 2021082485W WO 2022001222 A1 WO2022001222 A1 WO 2022001222A1
Authority
WO
WIPO (PCT)
Prior art keywords
sphere
position information
image
rendered image
spheres
Prior art date
Application number
PCT/CN2021/082485
Other languages
English (en)
French (fr)
Inventor
汪旻
邱丰
刘文韬
钱晨
马利庄
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to JP2021573567A priority Critical patent/JP2022542758A/ja
Priority to EP21819707.7A priority patent/EP3971840A4/en
Priority to KR1020217042400A priority patent/KR20220013403A/ko
Priority to US17/645,446 priority patent/US20220114799A1/en
Publication of WO2022001222A1 publication Critical patent/WO2022001222A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2021Shape modification

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to a method for generating a three-dimensional model, a method for generating a neural network, an apparatus, a device, and a computer-readable storage medium.
  • the embodiments of the present disclosure provide at least a three-dimensional model generation method, a neural network generation method, an apparatus, a device, and a computer-readable storage medium.
  • an embodiment of the present disclosure provides a method for generating a three-dimensional model, comprising: acquiring, based on a first image including a first object, a first sphere in a camera coordinate system of each first sphere among a plurality of first spheres position information, the multiple first spheres respectively represent different parts of the first object; a first rendered image is generated based on the first sphere position information of the multiple first spheres; based on the first rendered image and a semantically segmented image of the first image to obtain gradient information of the first rendered image; adjusting the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image, and using the adjusted first sphere position information of the plurality of first spheres to generate a three-dimensional model of the first object.
  • the degree of correctness of the first sphere position information capable of representing the plurality of first spheres is determined and readjust the first sphere position information corresponding to the multiple first spheres based on the gradient information, so that the adjusted position information of the multiple first spheres has higher accuracy, that is, based on the multiple first spheres
  • the three-dimensional model recovered from the position information of the first spheres corresponding to the first spheres also has higher accuracy.
  • the generating a first rendered image based on the first sphere position information of the plurality of first spheres includes: determining, based on the first sphere position information, that each The first three-dimensional position information of the vertices of the multiple patches of each first sphere in the camera coordinate system; The first three-dimensional position information in the system is used to generate the first rendered image.
  • the first object can be divided into a plurality of parts and represented as different first spheres, and based on the first three-dimensional position information in the camera coordinate system of the vertices of the plurality of patches constituting the different spheres, the first sphere can be generated.
  • a rendered image in which the first rendered image includes three-dimensional relationship information of parts of different first objects, so that the three-dimensional model of the first object can be constrained based on the gradient information determined by the first rendered image, so that the three-dimensional The model has higher accuracy.
  • the first three-dimensional position in the camera coordinate system of each vertex of a plurality of patches constituting each first sphere is determined based on the first sphere position information.
  • information including: based on the first positional relationship between template vertices of a plurality of template patches constituting the template sphere and the center point of the template sphere, and the first sphere position information of each of the first spheres, Determine the first three-dimensional position information of each vertex in the camera coordinate system of each vertex of the plurality of patches constituting each of the first spheres.
  • first spheres are obtained by deforming multiple template patches, and the surfaces of the spheres are represented by patches, thereby reducing the complexity of rendering to generate the first rendered image.
  • the first sphere position information of each first sphere includes: second three-dimensional position information of the center point of each first sphere in the camera coordinate system, The lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
  • the first positional relationship between the template vertices of a plurality of template patches constituting the template sphere and the center point of the template sphere, and the first position of each first sphere Sphere position information determining the first three-dimensional position information of each vertex in the camera coordinate system of each vertex of a plurality of patches constituting each first sphere, including: based on three vertices of each first sphere The respective lengths of the coordinate axes and the rotation angle of each first sphere relative to the camera coordinate system are used to transform the template sphere in shape and rotation angle; based on the shape and rotation angle transformation of the template sphere.
  • the result and the first positional relationship determine the second positional relationship between each template vertex and the center point of the transformed template sphere; based on the first position of the center point of each first sphere in the camera coordinate system;
  • the two-dimensional positional information and the second positional relationship determine the first three-dimensional positional information of each vertex of the plurality of patches constituting each first sphere in the camera coordinate system.
  • the first three-dimensional position information can be obtained quickly.
  • the method further includes: acquiring the projection matrix of the camera of the first image; Generating a first rendered image based on the first three-dimensional position information in the camera coordinate system includes: determining a part index and a patch index of each pixel in the first rendered image based on the first three-dimensional position information and the projection matrix ; Based on the determined part index and the patch index of each pixel in the first rendered image, generate the first rendered image; Wherein, the part index of any pixel identifies the first corresponding to the any pixel. A part on an object; the patch index of any pixel identifies the patch corresponding to the any pixel.
  • the generating the first rendered image based on the first three-dimensional position information of the respective vertices of the multiple patches constituting the first sphere in the camera coordinate system includes: :
  • the obtaining gradient information of the first rendered image based on the first rendered image and the semantically segmented image of the first image includes:
  • the gradient information of the first rendered image corresponding to each first sphere is obtained according to the first rendered image and the semantically segmented image corresponding to each first sphere.
  • the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image; the first rendered image based on the first rendered image and the first image to obtain the gradient information of the first rendered image, including: traversing each pixel in the first rendered image, and for the first pixel value of the traversed pixel in the first rendered image , and the second pixel value of the traversed pixel point in the semantic segmentation image, to determine the gradient value of the traversed pixel point.
  • the gradient information of the first rendered image can be obtained by segmenting the first rendered image and the semantics of the first image.
  • the first pixel value of the traversed pixel in the first rendered image, and the second pixel of the traversed pixel in the semantic segmentation image value, and determining the gradient value of the traversed pixel point includes: determining the traversed pixel point according to the first pixel value of the traversed pixel point and the second pixel value of the traversed pixel point.
  • the target first sphere corresponding to the pixel points of the target sphere, and the target patch is determined from the multiple patches that constitute the target first sphere;
  • the target of at least one target vertex on the target patch in the camera coordinate system is determined Three-dimensional position information, wherein, when the at least one target vertex is located at the position identified by the target three-dimensional position information, a new first pixel value obtained by re-rendering the traversed pixel point, and the The residual between the second pixel values corresponding to the traversed pixel points is determined as the first value; based on the first three-dimensional position information
  • the acquiring, based on the first image including the first object, the first sphere position information in the camera coordinate system of each first sphere among the plurality of first spheres includes: using pre-training The location information prediction network is used to perform location information prediction processing on the first image to obtain first sphere location information of each of the first spheres in the camera coordinate system.
  • an embodiment of the present disclosure further provides a method for generating a neural network, including: using a neural network to be trained, performing three-dimensional position information prediction processing on a second object in a second image, and obtaining a representation that the second object is different second sphere position information in the camera coordinate system of each of the plurality of second spheres in the part; generating a second rendered image based on the second sphere position information corresponding to the plurality of second spheres; The second rendered image and the semantically labeled image of the second image are obtained to obtain the gradient information of the second rendered image; based on the gradient information of the second rendered image, the neural network to be trained is updated to obtain the updated neural network.
  • the internet The internet.
  • the second sphere position information of the plurality of second spheres representing the three-dimensional model of the second object in the second image is obtained after the second sphere position information is obtained.
  • image rendering is performed based on the second sphere position information, and based on the result of the image rendering, the gradient information of the correctness degree of the second sphere position information of the plurality of second spheres is determined, and the neural network to be optimized is updated based on the gradient information, The optimized neural network is obtained, so that the optimized neural network has higher prediction accuracy of three-dimensional position information.
  • an embodiment of the present disclosure further provides a three-dimensional model generation apparatus, including: a first acquisition part configured to acquire, based on a first image including a first object, each of the plurality of first spheres in the position information of the first sphere in the camera coordinate system, the plurality of first spheres respectively represent different parts of the first object; the first generation part is configured to be based on the first sphere of the plurality of first spheres position information, to generate a first rendered image; a first gradient determination part, configured to obtain gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image; the adjustment part , configured to adjust the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image; the model generation part is configured to use the adjusted first spheres The position information of the first sphere is generated, and a three-dimensional model of the first object is generated.
  • the first generating part is configured to: a sphere position information, to determine the first three-dimensional position information of each vertex of the plurality of patches constituting each first sphere respectively in the camera coordinate system; based on the plurality of faces constituting each first sphere The first three-dimensional position information of each vertex of the slice in the camera coordinate system is used to generate the first rendered image.
  • the first generation part determines, based on the first sphere position information, that each vertex of the plurality of patches constituting the first sphere is in the camera coordinate system, respectively.
  • the first three-dimensional position information of the The first sphere position information of the sphere determines the first three-dimensional position information in the camera coordinate system of the vertices of the plurality of patches constituting each first sphere.
  • the first sphere position information of each first sphere includes: second three-dimensional position information of the center point of each first sphere in the camera coordinate system, all The lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
  • the first generation part is based on a first positional relationship between template vertices of a plurality of template patches constituting a template sphere and a center point of the template sphere, and each of the The first sphere position information of the first sphere, in the case of determining the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system, is configured as: based on The corresponding lengths of the three coordinate axes of each first sphere and the rotation angle of each first sphere relative to the camera coordinate system, the template sphere is subjected to shape and rotation angle transformation; The result of the shape and rotation angle transformation of the template sphere and the first positional relationship, determine the second positional relationship between each template vertex and the center point of the transformed template sphere; based on the center of each first sphere The second three-dimensional position information of the point in the camera coordinate system and the second positional relationship determine the first vertices in the camera coordinate system of the vertices of the
  • the first acquisition part is further configured to: acquire the projection matrix of the camera of the first image; the first generation part is based on the The first three-dimensional position information of each vertex of the plurality of patches in the camera coordinate system is configured to: determine the first three-dimensional position information based on the first three-dimensional position information and the projection matrix when the first rendered image is generated.
  • the part index and patch index of each pixel in the first rendered image; the first rendered image is generated based on the determined part index and patch index of each pixel in the first rendered image; wherein any pixel
  • the part index of the point identifies the part on the first object corresponding to the any pixel point; the patch index of any pixel point identifies the patch corresponding to the any pixel point.
  • the first generating part generates the first three-dimensional position information in the camera coordinate system based on the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere.
  • the first rendered image it is configured to: for each of the first spheres, according to the respective vertices of the plurality of patches constituting the each of the first spheres in the first three-dimensional camera coordinate system position information, generating a first rendered image corresponding to each of the first spheres;
  • the first gradient determination part in the case of obtaining gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image, is configured to: for each of the For the first sphere, the gradient information of the first rendered image corresponding to each first sphere is obtained according to the first rendered image and the semantically segmented image corresponding to each first sphere.
  • the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image; the first gradient determination part is based on the first rendered image. and the semantically segmented image of the first image, when the gradient information of the first rendered image is obtained, it is configured to: traverse each pixel in the first rendered image, and for the traversed pixel in the The first pixel value in the first rendered image and the second pixel value of the traversed pixel point in the semantic segmentation image are used to determine the gradient value of the traversed pixel point.
  • the first gradient determination part is based on the first pixel value of the traversed pixel in the first rendered image, and the traversed pixel in the semantic segmentation.
  • the second pixel value in the image when the gradient value of the traversed pixel point is determined, is configured to: according to the first pixel value of the traversed pixel point, and the traversed pixel the second pixel value of the point to determine the residual of the traversed pixel point; in the case where the residual of the traversed pixel point is the first value, the gradient of the traversed pixel point The value is determined as the first value; in the case that the residual of the traversed pixel point is not the first value, based on the second pixel value of the traversed pixel point, from the Determining a target first sphere corresponding to the traversed pixel point from among the plurality of first spheres, and determining a target patch from a plurality of patches constituting the target first sphere; determining at least one of the
  • the first acquisition part acquires the position information of the first sphere in the camera coordinate system of each of the plurality of first spheres based on the first image including the first object.
  • it is configured to: use a pre-trained position information prediction network to perform position information prediction processing on the first image to obtain the position information of each of the plurality of first spheres in the camera coordinate system First sphere position information.
  • the embodiments of the present disclosure also provide a device for generating a neural network, including: a second acquisition part configured to use the neural network to be trained to predict three-dimensional position information of a second object in a second image processing to obtain second sphere position information in the camera coordinate system of each of the plurality of second spheres representing different parts of the second object; the second generation part is configured to be based on the plurality of second spheres, respectively generating a second rendered image corresponding to the position information of the second sphere; a second gradient determination part is configured to obtain the gradient of the second rendered image based on the second rendered image and the semantic annotation image of the second image information; the updating part is configured to update the neural network to be trained based on the gradient information of the second rendered image to obtain the updated neural network.
  • an optional implementation manner of the present disclosure further provides an electronic device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the memory machine-readable instructions stored in the processor, when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the above-mentioned first aspect, or any one of the possible first aspects. or perform the above-mentioned second aspect, or the steps in any possible implementation manner of the second aspect.
  • an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect or the first aspect when the computer program is run. Steps in any possible implementation manner of the second aspect; or perform the above-mentioned second aspect, or the steps in any possible implementation manner of the second aspect.
  • an optional implementation manner of the present disclosure further provides a computer program, comprising computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the code to achieve The above-mentioned first aspect, or steps in any possible implementation manner of the first aspect; or implementing the above-mentioned second aspect, or steps in any possible implementation manner of the second aspect.
  • FIG. 1 shows a flowchart of a method for generating a three-dimensional model provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of an example of characterizing a human body by a plurality of first spheres provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram showing an example of the structure of a location information prediction network provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of an example of transforming a template sphere into a first sphere provided by an embodiment of the present disclosure
  • FIG. 5 shows a flowchart of a method for determining a gradient value of a traversed pixel point provided by an embodiment of the present disclosure
  • FIG. 6 shows various examples of determining the three-dimensional position information of the target when the residual error of the traversed pixel point provided by the embodiment of the present disclosure is not the first value
  • FIG. 7 shows a flowchart of a method for generating a neural network provided by an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of a three-dimensional model generating apparatus provided by an embodiment of the present disclosure
  • FIG. 9 shows a flowchart of a neural network generating apparatus provided by an embodiment of the present disclosure.
  • FIG. 10 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • a neural network is generally used to predict the 3D model parameters of the generated object in the 2D image, and the 3D model is generated based on the 3D model parameters.
  • the current 3D model generation method cannot deal with the ambiguity caused by the occlusion of some parts of the 3D model reconstructed object, resulting in the inability to accurately restore the depth pose of the 3D model reconstructed object, which in turn leads to the accuracy of the generated 3D model. lower.
  • an embodiment of the present disclosure provides a method for generating a three-dimensional model, by performing image rendering on the first sphere position information of a plurality of first spheres representing the three-dimensional model, and based on the result of the first image rendering, it is determined that the gradient information of the correctness degree of the first sphere position information of the plurality of first spheres, and readjust the first sphere position information corresponding to the plurality of first spheres based on the gradient information, so that the adjusted plurality of first spheres
  • the position information of a sphere has higher accuracy, that is, the three-dimensional model restored based on the first sphere position information corresponding to the plurality of first spheres also has higher accuracy.
  • the first spheres corresponding to the plurality of first spheres are respectively The position information is readjusted, so that the depth information of the first object can be restored with higher accuracy, and has higher accuracy.
  • An embodiment of the present disclosure also provides a method for generating a neural network, which uses the neural network to be optimized to perform three-dimensional position information prediction processing on a second object in a second image, and obtains a three-dimensional model representing a three-dimensional model of the second object in the second image.
  • image rendering is performed based on the second sphere position information, and based on the result of the image rendering, the gradient information of the correctness degree of the second sphere position information of the plurality of second spheres is determined. , and update the neural network to be optimized based on the gradient information to obtain an optimized neural network, so that the optimized neural network has higher prediction accuracy of three-dimensional position information.
  • the execution body of the method for generating a 3D model provided by the embodiment of the present disclosure is generally a computer device with a certain computing capability.
  • the computer equipment includes, for example, a terminal device or a server or other processing device, and the terminal device can be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant) , PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc.
  • the three-dimensional model generation method may be implemented by a processor invoking computer-readable instructions stored in a memory.
  • FIG. 1 is a flowchart of a method for generating a 3D model provided by an embodiment of the present disclosure
  • the method includes steps S101 to S104 , wherein:
  • S101 Based on a first image including a first object, obtain first sphere position information in the camera coordinate system of each first sphere in a plurality of first spheres, where the plurality of first spheres respectively represent the first object different parts;
  • S102 Generate a first rendered image based on the first sphere position information of the plurality of first spheres;
  • S103 Obtain gradient information of the first rendered image based on the first rendered image and the semantically segmented image of the first image;
  • S104 Adjust the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image, and use the adjusted first sphere position information of the plurality of first spheres , generating a three-dimensional model of the first object.
  • the first sphere position information in the camera coordinate system of each of the multiple first spheres representing different parts of the first object, according to the first sphere position information, An object is re-rendered to obtain a first rendered image; then the image is segmented based on the first rendered image and the semantics of the first image to obtain gradient information of the first rendered image, where the gradient information represents the first object based on the first sphere position information
  • the correctness of the first rendered image obtained by re-rendering, so that in the process of adjusting the first sphere position information of each first sphere based on the gradient information, the part where the first sphere position information is predicted incorrectly is adjusted, so that the adjustment
  • the obtained first sphere position information can more accurately represent the positions of different parts of the first object in the camera coordinate system, and then a three-dimensional model of the first object is generated based on the adjusted first sphere position information of each first sphere, with higher precision.
  • the gradient information representing the degree of correctness of the first sphere position information of the plurality of first spheres is used to readjust the first sphere position information respectively corresponding to the plurality of first spheres, it is possible to The depth information of the first object is restored with higher accuracy, so that the obtained three-dimensional model has higher accuracy.
  • the first object is divided into multiple parts, and different parts of the first object are divided into different parts. Prediction of three-dimensional position information is performed.
  • the three-dimensional position information corresponding to different parts of the first object is represented by the first sphere position information of the first sphere in the camera coordinate system;
  • the first sphere position information of the first sphere in the camera coordinate system includes: The three-dimensional position information (that is, the second three-dimensional position information) of the center point of the first sphere in the camera coordinate system, the lengths corresponding to the three coordinate axes of the first sphere, and the relative length of each first sphere relative to the camera The rotation angle of the coordinate system.
  • the body can be divided into multiple parts according to the limbs and torso of the human body, and each part is represented by a first sphere; each first sphere includes three coordinate axes, which respectively represent the length of the bones , and the thickness of the part in different directions.
  • an embodiment of the present disclosure provides an example in which a human body is represented by a plurality of first spheres.
  • the human body is divided into 20 parts, and the 20 parts pass through the 20 first spheres.
  • i 1,...,20 ⁇ ;
  • ⁇ i E(R i ,C i ,X i );
  • ⁇ i represents the first sphere position information of the i-th first sphere in the camera coordinate system, that is, the pose data of the corresponding part of the first sphere in the camera coordinate system;
  • X i represents the i-th first sphere Dimensional data of a sphere whose parameters include: bone length l i , and part thickness in different directions and
  • C i represents the three-dimensional coordinate value of the center point of the i-th first sphere in the camera coordinate system;
  • R i represents the rotation information of the i-th first sphere in the camera coordinate system.
  • the pose data S i of the i-th first sphere satisfies the following formula (1):
  • O i is the offset vector, which represents the offset direction from the parent part corresponding to the ith first sphere to the current part; l i O i represents the position of the ith part of the human body in the key point layout local location.
  • S parent(i) represents the pose data of the parent part.
  • R parent(i) represents the rotation information of the parent part corresponding to the i-th first sphere in the camera coordinate system.
  • a pre-trained position information prediction network can be used to perform position information prediction processing on the first image to obtain First sphere position information of each of the plurality of first spheres in the camera coordinate system.
  • the embodiment of the present disclosure further provides an example of the structure of a position information prediction network, including: a feature extraction sub-network, a key point prediction sub-network, and a three-dimensional position information prediction sub-network.
  • the feature extraction sub-network is used to perform feature extraction processing on the first image to obtain a feature map of the first image.
  • the feature extraction sub-network includes, for example, convolutional neural networks (CNN), and the CNN can perform at least one-level feature extraction processing on the first image to obtain a feature map of the first image.
  • CNN convolutional neural networks
  • the process of performing at least one-level feature extraction processing on the first image by the CNN can also be regarded as the process of encoding the first image by using the CNN encoder.
  • the key point prediction sub-network is configured to determine, based on the feature map of the first image, two-dimensional coordinate values of multiple key points of the first object in the first image.
  • the key point prediction sub-network can perform at least one level of deconvolution processing based on the feature map of the first image to obtain a heat map of the first image, wherein the size of the heat map is, for example, the same as the size of the first image;
  • the pixel value of any first pixel point in the heat map represents the probability that the second pixel point corresponding to the position of any first pixel point in the first image is a key point of the first object.
  • two-dimensional coordinate values of multiple key points of the first object in the first image can be obtained.
  • the three-dimensional position information prediction sub-network is used to obtain a plurality of first spheres constituting the first object respectively according to the two-dimensional coordinate values in the first image and the feature map of the first image based on a plurality of key points of the first object.
  • the position information of the first sphere in the camera coordinate system is used to obtain a plurality of first spheres constituting the first object respectively according to the two-dimensional coordinate values in the first image and the feature map of the first image based on a plurality of key points of the first object.
  • the first rendered image may be generated in the following manner, for example:
  • the first sphere Based on the position information of the first sphere, determine the first three-dimensional position information of each vertex of the plurality of patches constituting the first sphere in the camera coordinate system; based on the first sphere constituting the first sphere The first three-dimensional position information of the respective vertices of the plurality of patches in the camera coordinate system is generated, and the first rendered image is generated.
  • a patch is a collection of vertices and polygons representing the shape of a polyhedron in 3D computer graphics, also known as an unstructured mesh.
  • the first sphere position information in the camera coordinate system of the plurality of patches constituting the first sphere can be determined based on the first sphere position information.
  • it may be determined based on the first positional relationship between template vertices of a plurality of template patches constituting the template sphere and the center point of the template sphere, and the first sphere position information of each of the first spheres The first three-dimensional position information in the camera coordinate system of each vertex of the plurality of patches constituting each first sphere.
  • the template sphere is shown as 41 in FIG. 4, for example, the template sphere includes a plurality of template patches, and the template vertex of each template patch has a certain positional relationship with the center point of the template sphere.
  • the first sphere can be obtained based on the deformation of the template sphere. In the case of deforming the first template sphere, for example, the corresponding length of the three coordinate axes of each first sphere and the relative length of each first sphere can be obtained.
  • shape and rotation angle transformation is performed on the template sphere; based on the result of the shape and rotation angle transformation on the template sphere, and the first positional relationship, it is determined that each template vertex and a second positional relationship between the center points of the transformed template spheres; based on the second three-dimensional positional information of the center point of each first sphere in the camera coordinate system and the second positional relationship, determine the composition The first three-dimensional position information of the respective vertices of the plurality of patches of each first sphere in the camera coordinate system.
  • the template sphere in the case of transforming the shape and rotation angle of the template sphere, can be transformed in shape first, so that the three coordinate axes of the template sphere are respectively equal to the lengths of the three coordinate axes of the first sphere, and then The rotation angle transformation is performed based on the result of the shape transformation of the template sphere, so that the directions of the three coordinate axes of the template sphere in the camera coordinate system correspond one-to-one with the directions of the three coordinate axes of the first sphere, completing the transformation of the template sphere.
  • Shape and rotation angle transformation is performed based on the result of the shape transformation of the template sphere, so that the directions of the three coordinate axes of the template sphere in the camera coordinate system correspond one-to-one with the directions of the three coordinate axes of the first sphere, completing the transformation of the template sphere.
  • the lengths of the three coordinate axes in the template sphere and the rotation angle in the camera coordinate system are also determined.
  • the length of the coordinate axis and the rotation angle in the camera coordinate system, and the first positional relationship between the template vertices of the template patches constituting the template sphere and the center point of the template sphere can be determined.
  • the second positional relationship between the template vertices of each template patch and the center point of the transformed template sphere Based on the second position relationship and the second three-dimensional position information of the center point of the first sphere in the camera coordinate system, the three-dimensional position information of the template vertices constituting the plurality of template patches in the camera coordinate system is determined.
  • the three-dimensional position information of the template vertices of the multiple template patches in the camera coordinate system that is, the first three-dimensional position information of the multiple vertexes of the multiple patches constituting the first sphere, respectively, in the camera coordinate system.
  • an embodiment of the present disclosure further provides an example of transforming a template sphere into a first sphere.
  • the template sphere is shown as 41 in FIG. 4 ;
  • the result of the rotation angle transformation is shown in 42; 43 and 44 represent the human body formed by the first sphere; wherein, 43 is the perspective view of the human body formed by the first sphere.
  • the three-dimensional position information is used to perform image rendering processing on a plurality of spheres constituting the first object to generate a first rendered image.
  • image rendering processing may be performed on the plurality of first spheres constituting the first object in the following manner:
  • the part index of any pixel point identifies the part on the first object corresponding to the any pixel point; the patch index of any pixel point identifies the patch corresponding to the any pixel point.
  • the camera is the camera that acquires the first image;
  • the projection matrix of the camera may be based on the position of the camera in the camera coordinate system, and the first three-dimensional values of the vertices of the multiple patches constituting the first sphere in the camera coordinate system, respectively.
  • a plurality of first spheres can be mapped into the camera coordinate system based on the projection matrix to obtain a first rendered image.
  • the multiple first spheres are collectively rendered based on the first sphere position information corresponding to the multiple spheres, respectively, A first rendered image including all first spheres is obtained.
  • the gradient information of the first rendered images corresponding to all the first spheres is obtained, and the first sphere position information of the plurality of first spheres is adjusted based on the gradient information.
  • rendering is performed for each of the multiple first spheres, respectively, to obtain the same result as the multiple first spheres.
  • the first rendered images corresponding to the first spheres respectively.
  • the gradient information of the first rendered images corresponding to the multiple first spheres is obtained, and based on the gradient information of the first rendered images corresponding to the multiple first spheres, the A sphere position.
  • a pre-trained semantic segmentation network can be used to perform semantic segmentation processing on the first image to obtain a semantically segmented image of the first image.
  • the pixel values of the corresponding pixels of different first spheres are different when they are rendered to the first rendered image;
  • the pixel value corresponding to any pixel in the semantically segmented image represents the classification value of the part to which the pixel at the corresponding position in the first image belongs.
  • the classification values corresponding to different parts of the first object in the semantically segmented image are also different.
  • the pixel value of the corresponding pixel is the same as the classification value corresponding to the part in the semantic segmentation image.
  • the gradient information of the first rendered image is obtained based on the first rendered image and the semantically segmented image of the first image, for example, the following methods may be used:
  • each first sphere For each first sphere, obtain gradient information of the first rendered image corresponding to each first sphere according to the first rendered image and the semantically segmented image corresponding to each first sphere;
  • the generated first rendered image and the first semantic segmentation image have the same pixel value of the corresponding position pixel. If the predicted first sphere position information of any first sphere is incorrect, the pixel values of the pixels corresponding to at least some positions in the first rendered image and the first semantic segmentation image may be different.
  • the first rendered image and the semantically segmented image of the first image can be used to determine the gradient information of the first rendered image, where the gradient information represents that each of the first spheres is in the camera coordinate system The degree of correctness of the position information of the first sphere.
  • the 3D model of the first object has higher accuracy.
  • the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image.
  • each pixel in the first rendered image may be traversed, the first pixel value of the traversed pixel in the first rendered image, and the The second pixel value of the traversed pixel point in the semantic segmentation image is determined, and the gradient value of the traversed pixel point is determined.
  • an embodiment of the present disclosure further provides a method for determining the gradient value of a traversed pixel point, including:
  • S501 Determine the residual of the traversed pixel point according to the first pixel value of the traversed pixel point and the second pixel value of the traversed pixel point.
  • the traversed pixel point in the case where the first pixel value and the second pixel value of the traversed pixel point are equal, it is considered that the traversed pixel point is the projection point.
  • the first sphere position information for a sphere is predicted correctly.
  • the position point is a position point on any patch on the first sphere representing any part of the first object.
  • the first sphere position information of the first sphere to which the position point of the traversed pixel point is the projection point belongs to the prediction Mistake.
  • the first value is 0, for example.
  • S504 Determine target three-dimensional position information in the camera coordinate system of at least one target vertex on the target patch, wherein, in the case that the at least one target vertex is located at a position identified by the target three-dimensional position information, Determining the residual difference between the new first pixel value obtained by re-rendering the traversed pixel point and the second pixel value corresponding to the traversed pixel point as the first value;
  • S505 Obtain the gradient value of the traversed pixel point based on the first three-dimensional position information of the target vertex in the camera coordinate system and the three-dimensional position information of the target.
  • the patch is a triangular patch, that is, any patch constituting the first sphere includes three edges and three vertices.
  • the pixel point P is the traversed pixel point
  • I P (x) ⁇ 0,1 point P represents a pixel rendering function.
  • 61 represents a target patch; the target patch is the jth patch in the first sphere representing the ith part in the first object. represents the kth vertex in the target patch, that is, the target vertex in the embodiment of the present disclosure.
  • 62 denotes an occlusion patch covering the target patch in the direction of the camera, and the patch occluded by the target patch and the target patch belong to different first spheres.
  • the first pixel value of the pixel point P is to be rendered as the first pixel value corresponding to the target patch; in this example, the pixel point P is occluded by the occlusion patch 62, and
  • the target patch 61 When the target patch 61 is projected in the image coordinate system, it will not cover the pixel point P; therefore, the target vertex is adjusted in either the x-axis direction and the y-axis direction in the camera coordinate system.
  • the position of will not make the new first pixel value obtained after the re-rendering of the pixel P is the same as the first pixel value corresponding to the target patch.
  • ⁇ I P represents a pixel point P residuals.
  • x 0 means that the target vertex Before moving along the x-axis, the target vertex The coordinate value on the x-axis; x 1 means to place the target vertex After moving along the x-axis, the target vertex Coordinate value on the x-axis.
  • denotes a hyperparameter.
  • ⁇ ( ⁇ , ⁇ ) represents the distance between two points.
  • the first pixel value of the pixel point P is to be rendered as the first pixel value corresponding to the target patch; in this example, the pixel point P is not blocked by the blocking patch 62, so , just move the target vertex along the x-axis of the camera coordinate system , the new first pixel value obtained after the re-rendering of the pixel point P will be the same as the first pixel value corresponding to the target patch. Therefore, in this case, as shown in b in Figure 6, You can move the target vertex in the x-axis direction of the camera coordinate system When the target patch is projected in the image coordinate system, it can cover the pixel point P to obtain the target vertex The three-dimensional position information of the target in the camera coordinate system.
  • the gradient value of the pixel point P satisfies the above formula (2), and the gradient values of the pixel point P in the z-axis direction and the y-axis direction are both 0.
  • the first pixel value of the pixel point P is to be rendered as the first pixel value corresponding to the target patch; in this example, the pixel point P is occluded by the occlusion patch 62, and
  • the target patch 61 When the target patch 61 is projected in the image coordinate system, it will cover the pixel point P, so there is no need to adjust the target vertex in the x-axis and y-axis directions of the camera coordinate system , just adjust the target vertex in the z-axis direction according to e in Figure 6
  • the position of so that the position point Q projected to the pixel point P in the target patch can be located in front of the occlusion patch (relative to the position of the camera), and then the target vertex can be obtained.
  • the three-dimensional position information of the target in the camera coordinate system The three-dimensional position information of the target in the camera coordinate system.
  • the gradient value of the pixel point P satisfies the above formula (3), and the gradient values of the pixel point P in the x-axis direction and the y-axis direction are both 0.
  • the first pixel value of the pixel point P is to be rendered as a first pixel value different from the target patch; in this example, the pixel point P is not blocked by the blocking patch 62,
  • the target patch 61 When the target patch 61 is projected in the image coordinate system, it will cover the pixel point P; at this time, the target vertex needs to be moved along the x-axis direction of the camera coordinate system , the new first pixel value obtained by re-rendering the pixel point P will be different from the first pixel value corresponding to the target patch.
  • the gradient value of the pixel point P satisfies the above formula (2), and the gradient values of the pixel point P in the y-axis direction and the z-axis direction are both 0.
  • the gradient value of each pixel in the first rendered image can be obtained; the gradient values of all pixels in the first rendered image constitute gradient information of the first rendered image.
  • the first sphere position information of the first sphere based on the gradient information of the first rendered image
  • at least one item of the first sphere position information of the first sphere may be adjusted, or
  • the second three-dimensional position information of the center point of each first sphere in the camera coordinate system, the lengths corresponding to the three coordinate axes of each first sphere, and the length of each first sphere Adjust with respect to at least one of the rotation angles of the camera coordinate system, so that in the new first rendered image generated based on the adjusted first sphere position information, the gradient value of each pixel tends to the first
  • the direction of the numerical value changes, so that the position information of the first sphere can be gradually approximated to the real value through multiple iterations, the accuracy of the position information of the first sphere is improved, and the accuracy of the three-dimensional model of the first object is finally improved.
  • an embodiment of the present disclosure further provides a method for generating a neural network, including:
  • S701 Using the neural network to be trained, perform three-dimensional position information prediction processing on the second object in the second image, and obtain the camera coordinate system of each second sphere in a plurality of second spheres representing different parts of the second object.
  • S702 Generate a second rendered image based on the second sphere position information corresponding to the plurality of second spheres respectively;
  • S703 Obtain gradient information of the second rendered image based on the second rendered image and the semantically annotated image of the second image;
  • S704 Based on the gradient information of the second rendered image, update the neural network to be trained to obtain an updated neural network.
  • FIG. 3 the structure of the neural network provided by the embodiment of the present disclosure is shown in FIG. 3 , which will not be repeated here.
  • the neural network to be optimized is used to perform three-dimensional position information prediction processing on the second object in the second image, and the second sphere positions of the plurality of second spheres representing the three-dimensional model of the second object in the second image are obtained.
  • image rendering is performed based on the position information of the second sphere, and based on the result of the image rendering, gradient information representing the degree of correctness of the second sphere position information of the plurality of second spheres is determined, and the to-be-optimized update is based on the gradient information
  • the optimized neural network is obtained, so that the optimized neural network has higher prediction accuracy of three-dimensional position information.
  • the implementation process of the above S702 is similar to the implementation process of the above S102; the implementation process of the above S703 is similar to the implementation process of the above S103, and will not be repeated here.
  • the embodiment of the present disclosure can transmit the gradient on a certain pixel to the Euclidean coordinates of the node on the 3D grid, that is, the shape of the 3D object model can be corrected by using image information such as object outline and semantic segmentation of parts.
  • image information such as object outline and semantic segmentation of parts.
  • the previous item is propagated: for the mesh from the 3D model to the image pixels;
  • each triangular patch (the above-mentioned patch) on the image plane; for each pixel on the image plane, calculate the area where the pixel is located, the distance The index of the triangular patch closest to the camera (that is, which triangular patch is rendered by this pixel during complete rendering); an image that stores the triangular patch index for each pixel is the triangular face index (Face Index). ) (patch index above).
  • the Part Index (the above-mentioned part index)
  • a rendered image is generated, and then for each part (the above-mentioned part), a separate Extract a portion of pixel values from the complete rendered image, wherein the pixel coordinates of the extracted portion belong to the current part in the part index.
  • the value of a pixel can be an RGB value, a grayscale value, or a brightness value and a binary value.
  • the binary value is taken as an example, that is, the visible value is 1, and the invisible value is 0.
  • the gradient on a pixel is either positive (0 to 1) or negative (1 to 0).
  • the supervision information used is no longer limited to the complete rendered image, and the semantic segmentation of objects can be used as supervision information; in the case of multiple objects being rendered together, different objects can also be regarded as components and rendered independently. Thus, the positional relationship between different objects can be known.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process, and the execution order of each step should be determined by its function and possible internal logic.
  • the embodiment of the present disclosure also provides a three-dimensional model generation device corresponding to the three-dimensional model generation method. Since the device in the embodiment of the present disclosure is similar to the above-mentioned three-dimensional model generation method in the embodiment of the present disclosure, the implementation of the device can be Refer to the implementation of the method, and the repeated places will not be repeated.
  • the device includes: a first acquisition part 81 , a first generation part 82 , a first gradient determination part 83 , an adjustment part 84 , and the model generation section 85; wherein,
  • the first acquisition part 81 is configured to acquire first sphere position information in the camera coordinate system of each of the plurality of first spheres in the camera coordinate system based on the first image including the first object, the plurality of first spheres respectively represent different parts of the first object;
  • a first generating part 82 configured to generate a first rendered image based on the first sphere position information of the plurality of first spheres
  • a first gradient determination part 83 configured to obtain gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image
  • an adjustment part 84 configured to adjust the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image
  • the model generation part 85 is configured to generate a three-dimensional model of the first object using the adjusted first sphere position information of the plurality of first spheres.
  • the first generation part 82 in the case of generating a first rendered image based on the first sphere position information of the plurality of first spheres, is configured to:
  • the first rendered image is generated based on the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system.
  • the first generation part 82 determines, based on the first sphere position information, that each vertex of the plurality of patches constituting the first sphere is at the camera coordinate, respectively.
  • the first three-dimensional position information in the system it is configured as:
  • the template sphere constitutes the template sphere.
  • the first sphere position information of each first sphere includes: second three-dimensional position information of the center point of each first sphere in the camera coordinate system, The lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
  • the first generation part 82 is based on a first positional relationship between template vertices of a plurality of template patches constituting a template sphere and a center point of the template sphere, and the The first sphere position information of each first sphere, in the case of determining the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system, is configured as :
  • each vertex of a plurality of patches constituting each first sphere is respectively First three-dimensional position information in the camera coordinate system.
  • the first acquisition part 81 is further configured to: acquire the projection matrix of the camera of the first image;
  • the first generation part 82 in the case of generating the first rendered image based on the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system, respectively, is configured as:
  • the part index of any pixel point identifies the part on the first object corresponding to the any pixel point; the patch index of any pixel point identifies the patch corresponding to the any pixel point.
  • the first generation part 82 is based on the first three-dimensional position information in the camera coordinate system of the respective vertices of the plurality of patches constituting the first sphere, respectively,
  • the case where the first rendered image is generated is configured as:
  • the first gradient determination part 83 in the case of obtaining the gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image, is configured to:
  • the gradient information of the first rendered image corresponding to each first sphere is obtained according to the first rendered image and the semantically segmented image corresponding to each first sphere.
  • the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image
  • the first gradient determination part 83 in the case of obtaining the gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image, is configured to:
  • the first gradient determination part 83 determines the first pixel value in the first rendered image for the traversed pixel point in the first rendered image and the traversed pixel point in the semantic
  • the second pixel value in the segmented image in the case of determining the gradient value of the traversed pixel point, is configured as:
  • determining the first sphere from the plurality of first spheres based on the second pixel value of the traversed pixel point The target first sphere corresponding to the traversed pixel points is determined, and the target patch is determined from the multiple patches constituting the target first sphere;
  • the new first pixel value obtained by re-rendering the traversed pixel point and the residual difference between the second pixel value corresponding to the traversed pixel point is determined as the first numerical value
  • the gradient value of the traversed pixel point is obtained.
  • the first acquisition part 81 acquires, based on the first image including the first object, the first sphere position of each of the plurality of first spheres in the camera coordinate system Information case is configured as:
  • position information prediction processing is performed on the first image to obtain first sphere position information of each of the plurality of first spheres in the camera coordinate system.
  • an embodiment of the present disclosure further provides a neural network generating apparatus, including:
  • the second obtaining part 91 is configured to perform three-dimensional position information prediction processing on the second object in the second image by using the neural network to be trained, and obtain each of the plurality of second spheres representing different parts of the second object The position information of the second sphere in the camera coordinate system of the second sphere;
  • the second generating part 92 is configured to generate a second rendered image based on the second sphere position information corresponding to the plurality of second spheres respectively;
  • the second gradient determination part 93 is configured to obtain the gradient information of the second rendered image based on the second rendered image and the semantically annotated image of the second image;
  • the updating part 94 is configured to update the neural network to be trained based on the gradient information of the second rendered image to obtain an updated neural network.
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, a unit, a module or a non-modularity.
  • An embodiment of the present disclosure further provides a computer device.
  • the schematic structural diagram of the computer device provided by the embodiment of the present disclosure includes:
  • machine-readable instructions are executed by the processor to implement the following steps:
  • the neural network to be trained is updated to obtain an updated neural network.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the three-dimensional model generation method described in the foregoing method embodiments or Steps of a neural network generation method.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the computer program product of the three-dimensional model generation method or the neural network generation method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be configured to execute the above method embodiments.
  • the steps of the three-dimensional model generation method or the neural network generation method reference may be made to the foregoing method embodiments, which will not be repeated here.
  • Embodiments of the present disclosure also provide a computer program, which implements any one of the methods in the foregoing embodiments when the computer program is executed by a processor.
  • the computer program product can be implemented in hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • Embodiments of the present disclosure also provide a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device implements the above-mentioned three-dimensional model when executed. generation method, or the above-mentioned neural network generation method.
  • the accuracy of the reconstructed model can be optimized, and the ambiguity caused by the self-occlusion of the high-degree-of-freedom model can be reduced; and, in deep learning, through the embodiments of the present disclosure, the Image and 3D space are linked; thereby improving the accuracy of tasks such as semantic segmentation, 3D reconstruction, etc.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
  • Embodiments of the present disclosure provide a method for generating a three-dimensional model, a method for generating a neural network, and an apparatus, wherein the method for generating a three-dimensional model includes: acquiring, based on a first image including a first object, each of the plurality of first spheres.
  • First sphere position information of a sphere in the camera coordinate system the plurality of first spheres are respectively configured to represent different parts of the first object; based on the first sphere position information of the plurality of first spheres , generate a first rendered image; based on the first rendered image and the semantic segmentation of the first image, obtain gradient information of the first rendered image; based on the gradient information of the first rendered image, adjust the The first sphere position information of the plurality of first spheres is used, and the three-dimensional model of the first object is generated by using the adjusted first sphere position information of the plurality of first spheres.
  • the three-dimensional model generated by the embodiment of the present disclosure has higher precision.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Processing Or Creating Images (AREA)
  • Image Generation (AREA)
  • Image Analysis (AREA)

Abstract

一种三维模型生成方法、神经网络生成方法及装置,三维模型生成方法包括:基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,多个第一球体分别表示第一对象不同部位(S101);基于多个第一球体的第一球体位置信息,生成第一渲染图像(S102);基于第一渲染图像与第一图像的语义分割图像,得到第一渲染图像的梯度信息(S103);基于第一渲染图像的梯度信息,调整多个第一球体的第一球体位置信息,并利用调整后的多个第一球体的第一球体位置信息,生成第一对象的三维模型(S104)。

Description

三维模型生成方法、神经网络生成方法及装置
相关申请的交叉引用
本申请基于申请号为202010607430.5、申请日为2020年06月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及图像处理技术领域,具体而言,涉及一种三维模型生成方法、神经网络生成方法、装置、设备及计算机可读存储介质。
背景技术
基于二维图像的三维模型重建过程中,需要通过深度神经网络获取图像的特征,然后利用图像特征回归得到三维模型的参数,并基于获得的三维模型参数来实现三维模型重建。
当前的三维模型生成方法存在精度低的问题。
发明内容
本公开实施例至少提供一种三维模型生成方法、神经网络生成方法、装置、设备及计算机可读存储介质。
第一方面,本公开实施例提供了一种三维模型生成方法,包括:基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息,并利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
这样,通过对表征三维模型的多个第一球体的第一球体位置信息进行图像渲染,并基于第一图像渲染的结果,确定能够表征多个第一球体的第一球体位置信息的正确性程度的梯度信息,并基于该梯度信息对多个第一球体分别对应的第一球体位置信息进行重新调整,从而使得调整后的多个第一球体位置信息具有更高的精度,也即,基于多个第一球体分别对应的第一球体位置信息恢复的三维模型也具有更高的精度。
一种可选的实施方式中,所述基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像,包括:基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息;基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成所述第一渲染图像。
这样,能够通过将第一对象分为多个部位分别表示为不同的第一球体,并基于构成不同球体的多个面片的各个顶点分别在相机坐标系中的第一三维位置信息,生成第一渲染图像,在第一渲染图像中,包含了不同第一对象的部位的三维关系信息,进而能够基于第一渲染图像确定的梯度信息来约束第一对象的三维模型,使得第一对象的三维模型具有更高的精度。
一种可选的实施方式中,所述基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,包括:基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
这样,通过多个模板面片变形得到多个第一球体,通过面片来表征球体的表面,降低渲染生成第一渲染图像时的复杂度。
一种可选的实施方式中,所述每个第一球体的所述第一球体位置信息包括:所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息、所述每个第一球体的三个坐标轴分别对应的长度、以及所述每个第一球体相对于所述相机坐标系的旋转角度。
这样,通过上述三个参数,能够清晰的将各个第一球体在相机坐标系中的位姿表示出来。
一种可选的实施方式中,所述基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的第一球体位置信息,确定构成所述每个第一球体的多 个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,包括:基于所述每个第一球体的三个坐标轴分别对应的长度以及所述每个第一球体相对于所述相机坐标系的旋转角度,对所述模板球体进行形状及旋转角度变换;基于对所述模板球体进行形状及旋转角度变换的结果以及所述第一位置关系,确定各个模板顶点与变换后的模板球体的中心点之间的第二位置关系;基于所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息以及所述第二位置关系,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
这样,可以快速获得第一三维位置信息。
一种可选的实施方式中,所述方法还包括:获取所述第一图像的相机的投影矩阵;所述基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像,包括:基于所述第一三维位置信息以及所述投影矩阵,确定第一渲染图像中每个像素点的部位索引以及面片索引;基于确定的第一渲染图像中每个像素点的部位索引以及面片索引,生成所述第一渲染图像;其中,任一像素点的部位索引标识所述任一像素点对应的所述第一对象上的部位;任一像素点的面片索引标识所述任一像素点对应的面片。
一种可选的实施方式中,所述基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像,包括:
针对所述每个第一球体,根据构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成与所述每个第一球体对应的第一渲染图像;
所述基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,包括:
针对所述每个第一球体,根据所述每个第一球体对应的第一渲染图像和语义分割图像,得到与所述每个第一球体对应的第一渲染图像的梯度信息。
这样,有利于简化不同部位对应的分类值的表达,简化在梯度计算过程中的运算复杂度。
一种可选的实施方式中,所述第一渲染图像的梯度信息包括:所述第一渲染图像中每个像素点的梯度值;所述基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,包括:遍历所述第一渲染图像中的各个像素点,针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值。
这样,可以通过对第一渲染图像和第一图像的语义分割图像,得到第一渲染图像的梯度信息。
一种可选的实施方式中,所述针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值,包括:根据所述遍历到的像素点的所述第一像素值,以及所述遍历到的像素点的所述第二像素值,确定所述遍历到的像素点的残差;在所述遍历到的像素点的残差为第一数值的情况下,将所述遍历到的像素点的梯度值确定为所述第一数值;在所述遍历到的像素点的残差不为所述第一数值的情况下,基于所述遍历到的像素点的所述第二像素值,从所述多个第一球体中确定所述遍历到的像素点对应的目标第一球体,并从构成所述目标第一球体的多个面片中确定目标面片;确定所述目标面片上的至少一个目标顶点在所述相机坐标系中的目标三维位置信息,其中,在所述至少一个目标顶点位于所述目标三维位置信息所标识的位置的情况下,将所述遍历到的像素点进行重新渲染得到的新的第一像素值,和所述遍历到的像素点对应的第二像素值之间的残差确定为所述第一数值;基于所述目标顶点在所述相机坐标系中的第一三维位置信息和所述目标三维位置信息,得到所述遍历到的像素点的梯度值。
这样,可以得到第一渲染图像中每个像素点的梯度值。
一种可选的实施方式中,所述基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,包括:利用预先训练的位置信息预测网络,对所述第一图像进行位置信息预测处理,得到所述多个第一球体中每个第一球体在所述相机坐标系中的第一球体位置信息。
第二方面,本公开实施例还提供一种神经网络生成方法,包括:利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
这样,在利用待优化的神经网络对第二图像中的第二对象进行三维位置信息预测处理,得到表征第二图像中第二对象的三维模型的多个第二球体的第二球体位置信息后,基于第二球体位置信息进行图像渲染,并基于图像渲染的结果,确定多个第二球体的第二球体位置信息正确性程度的梯度信息,并基于该梯度信息更新待优化的神经网络的,得到优化后的神经网络,使得优化后的神经网络具有更高的三维 位置信息预测精度。
第三方面,本公开实施例还提供一种三维模型生成装置,包括:第一获取部分,被配置为基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;第一生成部分,被配置为基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;第一梯度确定部分,被配置为基于所述第一渲染图像、以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;调整部分,被配置为基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息;模型生成部分,被配置为利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
一种可能的实施方式中,所述第一生成部分,在基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像的情况下,被配置为:基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息;基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成所述第一渲染图像。
一种可能的实施方式中,所述第一生成部分,在基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息的情况下,被配置为:基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
一种可能的实施方式中,所述每个第一球体的所述第一球体位置信息包括:所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息、所述每个第一球体的三个坐标轴分别对应的长度、以及所述每个第一球体相对于所述相机坐标系的旋转角度。
一种可能的实施方式中,所述第一生成部分,在基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息的情况下,被配置为:基于所述每个第一球体的三个坐标轴分别对应的长度以及所述每个第一球体相对于所述相机坐标系的旋转角度,对所述模板球体进行形状及旋转角度变换;基于对所述模板球体进行形状及旋转角度变换的结果以及所述第一位置关系,确定各个模板顶点与变换后的模板球体的中心点之间的第二位置关系;基于所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息以及所述第二位置关系,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
一种可能的实施方式中,所述第一获取部分,还被配置为:获取所述第一图像的相机的投影矩阵;所述第一生成部分,在基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像的情况下,被配置为:基于所述第一三维位置信息以及所述投影矩阵,确定第一渲染图像中每个像素点的部位索引以及面片索引;基于确定的第一渲染图像中每个像素点的部位索引以及面片索引,生成所述第一渲染图像;其中,任一像素点的部位索引标识所述任一像素点对应的所述第一对象上的部位;任一像素点的面片索引标识所述任一像素点对应的面片。
一种可选的实施方式中,所述第一生成部分,在基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像的情况下,被配置为:针对所述每个第一球体,根据构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成与所述每个第一球体对应的第一渲染图像;
所述第一梯度确定部分,在基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息的情况下,被配置为:针对所述每个第一球体,根据所述每个第一球体对应的第一渲染图像和语义分割图像,得到与所述每个第一球体对应的第一渲染图像的梯度信息。
一种可能的实施方式中,所述第一渲染图像的梯度信息包括:所述第一渲染图像中每个像素点的梯度值;所述第一梯度确定部分,在基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息的情况下,被配置为:遍历所述第一渲染图像中的各个像素点,针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值。
一种可能的实施方式中,所述第一梯度确定部分,在针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值的情况下,被配置为:根据所述遍历到的像素点的所述第一像素值,以及所述遍历到的像素点的所述第二像素值,确定所述遍历到的像素点的残差;在所述遍历到的像素点的残差为第一数值的情况下,将所述遍历到的像素点的梯度值确定为所述第一数值;在所述遍历到的像素点的残差不为所述第一数值的情况下,基于所述遍历到的像素点的所述第二像素值,从所述多个第一球体中确定所述遍历到的 像素点对应的目标第一球体,并从构成所述目标第一球体的多个面片中确定目标面片;确定所述目标面片上的至少一个目标顶点在所述相机坐标系中的目标三维位置信息,其中,在所述至少一个目标顶点位于所述目标三维位置信息所标识的位置的情况下,将所述遍历到的像素点进行重新渲染得到的新的第一像素值,和所述遍历到的像素点对应的第二像素值之间的残差确定为所述第一数值;基于所述目标顶点在所述相机坐标系中的第一三维位置信息和所述目标三维位置信息,得到所述遍历到的像素点的梯度值。
一种可能的实施方式中,所述第一获取部分,在基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息的情况下,被配置为:利用预先训练的位置信息预测网络,对所述第一图像进行位置信息预测处理,得到所述多个第一球体中每个第一球体在所述相机坐标系中的第一球体位置信息。
第四方面,本公开实施例还提共一种神经网络的生成装置,包括:第二获取部分,被配置为利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;第二生成部分,被配置为基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;第二梯度确定部分,被配置为基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;更新部分,被配置为基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
第五方面,本公开可选实现方式还提供一种电子设备,处理器、存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器被配置为执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤;或执行上述第二方面,或第二方面中任一种可能的实施方式中的步骤。
第六方面,本公开可选实现方式还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤;或执行上述第二方面,或第二方面中任一种可能的实施方式中的步骤。
第六方面,本公开可选实现方式还提供一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行时实现上述第一方面,或第一方面中任一种可能的实施方式中的步骤;或实现上述第二方面,或第二方面中任一种可能的实施方式中的步骤。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种三维模型生成方法的流程图;
图2示出了本公开实施例所提供的通过多个第一球体表征人体的示例的示意图;
图3示出了本公开实施例所提供的一种位置信息预测网络的结构的示例的示意图;
图4示出了本公开实施例所提供的一种将模板球体变换为第一球体的示例的示意图;
图5示出了本公开实施例所提供的确定遍历到的像素点的梯度值的方法的流程图;
图6示出了本公开实施例所提供的在遍历到的像素点的残差并非第一数值的情况下,确定目标三维位置信息的多种示例;
图7示出了本公开实施例所提供的一种神经网络生成方法的流程图;
图8示出了本公开实施例所提供的一种三维模型生成装置的示意图;
图9示出了本公开实施例所提供的一种神经网络生成装置的流程图;
图10示出了本公开实施例所提供的一种计算机设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不 是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开实施例的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开实施例保护的范围。
在基于二维图像的三维模型生成过程中,一般采用神经网络预测二维图像中生成对象的三维模型参数,并基于三维模型参数来进行三维模型生成。在神经网络训练过程中,需要利用样本图像的监督数据监督训练过程;也即预先将训练过程中用到的各样本图像中的对象的三维模型参数标注出来,并用作对神经网络训练的监督。由于监督数据获取困难,因此很多情况下采用仿真系统获得二维图像、以及二维图像的监督数据;但由于仿真系统得到的二维图像和真实二维图像之间具有一定的差异,这导致了神经网络在基于真实的二维图像进行三维模型生成时精度下降的问题。
另外,当前的三维模型生成方法,无法处理由于三维模型重建对象的部分部位被遮挡所造成的歧义性问题,造成无法准确还原三维模型重建对象在深度上的姿态,进而导致生成的三维模型的精度较低。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应是发明人对本公开实施例做出的贡献。
基于上述研究,本公开实施例提供了一种三维模型生成方法,通过对表征三维模型的多个第一球体的第一球体位置信息进行图像渲染,并基于第一图像渲染的结果,确定能够表征多个第一球体的第一球体位置信息的正确性程度的梯度信息,并基于该梯度信息对多个第一球体分别对应的第一球体位置信息进行重新调整,从而使得调整后的多个第一球体位置信息具有更高的精度,也即,基于多个第一球体分别对应的第一球体位置信息恢复的三维模型也具有更高的精度。
另外,本公开实施例提供的三维模型生成方法中,由于是采用表征多个第一球体的第一球体位置信息的正确性程度的梯度信息,来对多个第一球体分别对应的第一球体位置信息进行重新调整,从而能够以更高的精度还原第一对象在深度上的信息,具有更高的精度。
本公开实施例还提供一种神经网络的生成方法,在利用待优化的神经网络对第二图像中的第二对象进行三维位置信息预测处理,得到表征第二图像中第二对象的三维模型的多个第二球体的第二球体位置信息的基础上,基于第二球体位置信息进行图像渲染,并基于图像渲染的结果,确定多个第二球体的第二球体位置信息正确性程度的梯度信息,并基于该梯度信息更新待优化的神经网络的,得到优化后的神经网络,使得优化后的神经网络具有更高的三维位置信息预测精度。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种三维模型生成方法进行详细介绍,本公开实施例所提供的三维模型生成方法的执行主体一般为具有一定计算能力的计算机设备,所述计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,所述三维模型生成方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
下面首先对本公开实施例提供的三维模型生成方法加以说明。
参见图1所示,为本公开实施例提供的三维模型生成方法的流程图,所述方法包括步骤S101~S104,其中:
S101:基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;
S102:基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;
S103:基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;
S104:基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息,并利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
本公开实施例通过在得到表征第一对象不同部位的多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息的基础上,根据该第一球体位置信息,对第一对象重新渲染,得到第一渲染图像;然后基于第一渲染图像和第一图像的语义分割图像,得到第一渲染图像的梯度信息,该梯度信息表征了基于第一球体位置信息对第一对象进行重新渲染得到的第一渲染图像的正确程度,从而在基于该梯度信息调整每个第一球体的第一球体位置信息的过程中,对第一球体位置信息预测错误的部分进行调整,使得调整后的第一球体位置信息能够更准确的表征第一对象的不同部位在相机坐标系中的位置,进而基于调整后的各个第一球体的第一球体位置信息得到生成第一对象的三维模型,具有更高的精度。
另外,本公开实施例由于是采用表征多个第一球体的第一球体位置信息的正确性程度的梯度信息,来对多个第一球体分别对应的第一球体位置信息进行重新调整,从而能够以更高的精度还原第一对象在深度上的信息,因而得到的三维模型具有更高的精度。
下面分别对上述S101~S104加以详细描述。
在上述S101中,本公开实施例在基于第一对象的二维图像,生成第一对象的三维模型的情况下,是将第一对象分为多个部位,并对第一对象的不同部位分别进行三维位置信息的预测。
示例性的,第一对象的不同部位分别对应的三维位置信息通过第一球体在相机坐标系中的第一球体位置信息来表征;第一球体在相机坐标系中的第一球体位置信息,包括该第一球体的中心点在相机坐标系中的三维位置信息(也即第二三维位置信息)、该第一球体的三个坐标轴分别对应的长度、以及每个该第一球体相对于相机坐标系的旋转角度。
以将人体作为第一对象为例,可以按照人体的肢体和躯干将身体分为多个部位,每个部位采用一个第一球体表示;每个第一球体包括三个坐标轴,分别表示骨骼长度、以及该部位在不同方向的厚度。
示例性的,参见图2所示,本公开实施例提供一种通过多个第一球体表征人体的示例,在该示例中,将人体划分为20个部位,20个部位通过20个第一球体表示,人体M表示为:M={ε i|i=1,...,20};
其中,ε i=E(R i,C i,X i);
其中,ε i表示第i个第一球体在相机坐标系下的第一球体位置信息,也即第一球体对应的部位在相机坐标系下的位姿数据;其中,X i表示第i个第一球体的尺寸数据,其参数包括:骨骼长度l i,以及在不同方向的部位厚度
Figure PCTCN2021082485-appb-000001
Figure PCTCN2021082485-appb-000002
C i表示第i个第一球体的中心点在相机坐标系下的三维坐标值;R i表示第i个第一球体在相机坐标系中的旋转信息。
第i个第一球体的位姿数据S i满足下述公式(1):
S i=R parent(i)·(l iO i)+S parent(i)       (1)
其中,O i为偏移向量,该偏移向量表征从第i个第一球体对应的父部位到当前部位的偏移方向;l iO i表示人体的第i个部位在关键点布局中的局部位置。S parent(i)表示父部位的位姿数据。R parent(i)表示第i个第一球体对应的父部位在相机坐标系中的旋转信息。上述公式(1)约束了不同第一球体之间的相互连接关系。
在获取多个第一球体中每个球体在相机坐标系中的第一球体位置信息的情况下,例如可以利用预先训练的位置信息预测网络,对所述第一图像进行位置信息预测处理,得到所述多个第一球体中每个第一球体在所述相机坐标系中的第一球体位置信息。
示例性的,参见图3所示,本公开实施例还提供一种位置信息预测网络的结构的示例,包括:特征提取子网络、关键点预测子网络、以及三维位置信息预测子网络。
这里,特征提取子网络,用于对第一图像进行特征提取处理,得到第一图像的特征图。
此处,特征提取子网络例如包括:卷积神经网络(convolutional neural networks,CNN),CNN能够对第一图像进行至少一级特征提取处理,得到第一图像的特征图。CNN对第一图像进行至少一级特征提取处理的过程,又可以看作利用CNN编码器对第一图像进行编码的过程。
关键点预测子网络,用于基于第一图像的特征图,确定第一对象的多个关键点在第一图像中的二维坐标值。
此处,关键点预测子网络,例如可以基于第一图像的特征图进行至少一级反卷积处理,得到第一图像的热图,其中,热图的尺寸例如与第一图像的尺寸相同;热图中任一第一像素点的像素值,表征第一图像中与该任一第一像素点位置对应的第二像素点为第一对象关键点的概率。进而用该热图,能够得到第一对象的多个关键点分别在第一图像中的二维坐标值。
三维位置信息预测子网络,用于基于第一对象的多个关键点分别根据第一图像中的二维坐标值、以及第一图像的特征图,得到构成第一对象的多个第一球体分别在相机坐标系下的第一球体位置信息。
在上述S102中,在得到多个第一球体分别对应的第一球体位置信息后,例如可以采用下述方式生成第一渲染图像:
基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息;基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成所述第一渲染图像。
这里,面片是三维计算机图形学中表示多面体形状的顶点与多边形的集合,又称为非结构网格。在 确定构成第一对象的多个第一球体分别对应的第一球体位置信息的基础上,能够基于第一球体位置信息,确定构成第一球体的多个面片分别在相机坐标系中的第一三维位置信息。
这里,可以基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
此处,模板球体例如为图4中41所示,模板球体包括多个模板面片,每个模板面片的模板顶点与模板球体的中心点之间具有一定的位置关系。第一球体能够基于模板球体变形得到,在对第一模板球体进行变形的情况下,例如可以基于所述每个第一球体的三个坐标轴分别对应的长度以及所述每个第一球体相对于所述相机坐标系的旋转角度,对所述模板球体进行形状及旋转角度变换;基于对所述模板球体进行形状及旋转角度变换的结果,以及所述第一位置关系,确定各个模板顶点与变换后的模板球体的中心点之间的第二位置关系;基于所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息以及所述第二位置关系,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
此处,在对模板球体进行形状及旋转角度变换的情况下,可以先对模板球体进行形状变换,使得模板球体的三个坐标轴,分别与第一球体的三个坐标轴的长度相等,然后基于对模板球体进行形状变换的结果进行旋转角度变换,使得模板球体的三个坐标轴在相机坐标系中的方向,与第一球体的三个坐标轴的方向一一对应,完成对模板球体的形状及旋转角度变换。
另外,也可以先对模板球体进行旋转角度变换,使得模板球体的三个轴在相机坐标系中的方向,与第一球体的三个坐标轴的方向一一对应;然后基于对模板球体进行旋转角度变换的结果进行形状变换,使得模板球体的三个坐标轴的长度分别与第一球体的三个坐标轴的长度相等,完成对模板球体的形状及旋转角度变换。
在完成对模板球体的形状及旋转角度变换后,也即确定了模板球体中三个坐标轴的长度、以及在相机坐标系中的旋转角度。此时,能够基于坐标轴的长度以及在相机坐标系中的旋转角度、以及构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系,确定多个模板面片的模板顶点与变换后的模板球体的中心点之间的第二位置关系。基于该第二位置关系、以及第一球体的中心点在相机坐标系中的第二三维位置信息,确多个构成多个模板面片的模板顶点在相机坐标系中的三维位置信息。此时,多个模板面片的模板顶点在相机坐标系中的三维位置信息,也即构成第一球体的多个面片的多个顶点分别在相机坐标系中的第一三维位置信息。
示例性的,参见图4所示,本公开实施例还提供一种将模板球体变换为第一球体的示例,在该示例中,模板球体如图4中41所示;将模板球体进行形状和旋转角度变换的结果如42所示;43和44表示由第一球体构成的人体;其中,43为第一球体构成的人体的透视图。
在得到第一球体的多个面片的多个顶点分别在相机坐标系中的第一三维位置信息后,基于构成第一球体的多个面片的多个顶点分别在相机坐标系中的第一三维位置信息,对构成第一对象的多个球体进行图像渲染处理,生成第一渲染图像。
这里,例如可以采用下述方式对构成第一对象的多个第一球体进行图像渲染处理:
基于所述第一三维位置信息以及相机的投影矩阵,确定第一渲染图像中每个像素点的部位索引以及面片索引;
基于确定的第一渲染图像中每个像素点的部位索引以及面片索引,生成所述第一渲染图像;
其中,任一像素点的部位索引标识所述任一像素点对应的所述第一对象上的部位;任一像素点的面片索引标识所述任一像素点对应的面片。
这里,相机为获取第一图像的相机;相机的投影矩阵可以基于相机在相机坐标系中的位置、以及构成第一球体的多个面片的多个顶点分别在相机坐标系中的第一三维位置信息求得。在得到第一相机的投影矩阵后,能够基于该投影矩阵,将多个第一球体映射到相机坐标系中,得到第一渲染图像。
在一种可能的实施方式中,在对构成第一对象的多个球体进行图像渲染处理的情况下,基于多个球体分别对应的第一球体位置信息,将多个第一球体进行集体渲染,得到包括所有第一球体的第一渲染图像。在该种情况下,是得到所有第一球体对应的第一渲染图像的梯度信息,基于该梯度信息,调整多个第一球体的第一球体位置信息。
在另一种可能的实施方式中,在对构成第一对象的多个第一球体进行图像渲染处理的情况下,针对多个第一球体中的每个第一球体分别进行渲染,得到与多个第一球体分别对应的第一渲染图像。在该种情况下,是得到多个第一球体分别对应的第一渲染图像的梯度信息,并基于多个第一球体分别对应的第一渲染图像的梯度信息,调整每个第一球体的第一球体位置。
在上述S103中,例如可以使用预先训练的语义分割网络对第一图像进行语义分割处理,得到第一 图像的语义分割图像。
(1):针对对多个第一球体进行集体渲染的情况,不同第一球体在被渲染至第一渲染图像的情况下对应的像素点的像素值不同;同时,在对第一图像进行语义分割处理,得到第一图像的语义分割图像的情况下,语义分割图像中任一像素点对应的像素值,表征第一图像中对应位置的像素点的所属部位的分类值。其中,第一对象的不同部位在语义分割图像中对应的分类值也不同。
示例性的,针对同一部位,与该部位对应的第一球体在被渲染至第一渲染图像的情况下对应的像素点的像素值,与该部位在语义分割图像中对应的分类值相同。
(2)针对对多个第一球体分别进行渲染的情况,在对第一图像进行语义分割处理的情况下,得到与表征第一对象不同部位的第一球体分别对应的语义分割图像。
在该种情况下,基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,例如可以采用下述方式:
针对每个第一球体,根据所述每个第一球体对应的第一渲染图像和语义分割图像,得到与所述每个第一球体对应的第一渲染图像的梯度信息;
基于与多个第一球体分别对应的第一渲染图像的梯度信息,得到多个第一球体对应的总的梯度信息。
这样,有利于简化不同部位对应的分类值的表达,简化在梯度计算过程中的运算复杂度。
在理论上,在获得的多个第一球体分别对应的第一球体位置信息完全正确的情况下,生成的第一渲染图像和第一语义分割图像中对应位置的像素点的像素值相同。在预测得到的任一第一球体的第一球体位置信息出现错误的情况下,则可能会导致第一渲染图像和第一语义分割图像中至少部分位置对应的像素点的像素值不相同。
基于上述原理,能够通过第一渲染图像和第一图像的语义分割图像,确定第一渲染图像的梯度信息,该梯度信息即表征了多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息的正确性程度。一般地,梯度越大,则表征第一球体位置信息的正确性程度越低;相应的,梯度越小,则表征第一球体位置信息的正确性程度越高;因此可以第一渲染图像的梯度信息,指导第一球体分别对应的第一球体位置信息的调整,使得第一球体位置信息在不断调整的过程中,所得到的第一渲染图像能够逐渐向着正确的方向不断优化,从而使得最终生成的第一对象的三维模型具有更高的精度。
这里,第一渲染图像的梯度信息包括:第一渲染图像中每个像素点的梯度值。
在确定第一渲染图像的梯度信息的情况下,例如可以遍历所述第一渲染图像中的各个像素点,针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值。
参见图5所示,本公开实施例还提供一种确定遍历到的像素点的梯度值的方法,包括:
S501:根据所述遍历到的像素点的所述第一像素值以及所述遍历到的像素点的所述第二像素值,确定所述遍历到的像素点的残差。
S502:在所述遍历到的像素点的残差为第一数值的情况下,将所述遍历到的像素点的梯度值确定为所述第一数值。
此处,针对遍历到的像素点,在该遍历到的像素点的第一像素值和第二像素值相等的情况下,则认为以该遍历到的像素点为投影点的位置点所属的第一球体的第一球体位置信息预测正确。此处,该位置点为表征第一对象任一部位的第一球体上任一面片上的位置点。在该遍历到的像素点的第一像素值和第二像素值不相等的情况下,则认为以该遍历到的像素点为投影点的位置点所属的第一球体的第一球体位置信息预测错误。
在一种可能的实施方式中,第一数值例如为0。
S503:在所述遍历到的像素点的残差不为所述第一数值的情况下,基于所述遍历到的像素点的所述第二像素值,从所述多个第一球体中确定所述遍历到的像素点对应的目标第一球体,并从构成所述目标第一球体的多个面片中确定目标面片;
S504:确定所述目标面片上的至少一个目标顶点在所述相机坐标系中的目标三维位置信息,其中,在所述至少一个目标顶点位于所述目标三维位置信息所标识的位置的情况下,将所述遍历到的像素点进行重新渲染得到的新的第一像素值,和所述遍历到的像素点对应的第二像素值之间的残差确定为所述第一数值;
S505:基于所述目标顶点在所述相机坐标系中的第一三维位置信息和所述目标三维位置信息,得到所述遍历到的像素点的梯度值。
在本公开的一些实施例中,参见图6所示,提供了在遍历到的像素点的残差并非第一数值的情况下,确定目标三维位置信息的多种示例。在该示例中,面片为三角面片,也即构成第一球体的任一面片包括三条边以及三个顶点。
在该示例中,像素点P为遍历到的像素点,且P在图像坐标系中的坐标值表示为:P=(u P,v P)。I P(x)∈0,1表示像素点P的渲染函数。
在图6中,61表示目标面片;该目标面片为表征第一对象中第i个部位的第一球体中第j个面片。
Figure PCTCN2021082485-appb-000003
表示目标面片中的第k个顶点,也即本公开实施例中的目标顶点。
62表示在相机所在方向将目标面片遮挡住的遮挡面片,将目标面片遮挡住的面片与目标面片属于不同的第一球体。
在图6中的a中,要将像素点P的第一像素值,渲染为与目标面片对应的第一像素值;在该示例中,在像素点P被遮挡面片62所遮挡,且目标面片61在图像坐标系中进行投影的情况下,都不会覆盖像素点P;因此,在相机坐标系中x轴方向、和y轴方向中任一方向调整目标顶点
Figure PCTCN2021082485-appb-000004
的位置,都不会使得像素点P重新渲染后得到的新的第一像素值与目标面片对应的第一像素值相同,因此,在该种情况下,如图6中a和图6中e所示,可以首先在相机坐标系中x轴方向移动目标顶点
Figure PCTCN2021082485-appb-000005
使得目标面片在在图像坐标系中投影的情况下,能够覆盖到像素点P,然后再在z轴方向调整目标顶点
Figure PCTCN2021082485-appb-000006
的位置,使得目标面片中投影至像素点P的位置点Q,能够位于遮挡面片的前方(相对于相机所在位置而言),进而得到目标顶点
Figure PCTCN2021082485-appb-000007
在所述相机坐标系中的目标三维位置信息。
此处,像素点P的梯度值满足下述公式(2)和公式(3):
Figure PCTCN2021082485-appb-000008
Figure PCTCN2021082485-appb-000009
其中,
Figure PCTCN2021082485-appb-000010
表示像素点P在x轴方向的梯度值,
Figure PCTCN2021082485-appb-000011
表示像素点P在z轴方向的梯度值。像素点P在y轴方向的梯度值为0。
在上述公式(2)和公式(3)中,δI P表示像素点P的残差。
x 0表示将目标顶点
Figure PCTCN2021082485-appb-000012
沿着x轴方向进行移动前,目标顶点
Figure PCTCN2021082485-appb-000013
在x轴上的坐标值;x 1表示将目标顶点
Figure PCTCN2021082485-appb-000014
沿着x轴方向进行移动后,目标顶点
Figure PCTCN2021082485-appb-000015
在x轴上的坐标值。
Δz=z 0-z 1表示目标面片中投影至像素点P的位置点Q与遮挡面片中投影至像素点P的位置点Q’之间的深度差,z 0表示Q的深度值,z 1表示Q’的深度值;
Figure PCTCN2021082485-appb-000016
和Q之间的连线,与
Figure PCTCN2021082485-appb-000017
Figure PCTCN2021082485-appb-000018
之间的连线在M 0处相交。λ表示超参数。Δ(·,·)表示两点之间的距离。
其中,在图6中的e中,
Figure PCTCN2021082485-appb-000019
分别表示
Figure PCTCN2021082485-appb-000020
Figure PCTCN2021082485-appb-000021
在图像坐标系中的投影点。
在图6中的b中,要将像素点P的第一像素值,渲染为与目标面片对应的第一像素值;在该示例中,像素点P未被遮挡面片62所遮挡,因此,只需要沿着相机坐标系的x轴方向移动目标顶点
Figure PCTCN2021082485-appb-000022
的位置,就会使得像素点P重新渲染后得到的新的第一像素值与目标面片对应的第一像素值相同,因此,在该种情况下,可以如图6中的b所示,可以在相机坐标系中x轴方向移动目标顶点
Figure PCTCN2021082485-appb-000023
使得目标面片在在图像坐标系中投影的情况下,能够覆盖到像素点P,得到目标顶点
Figure PCTCN2021082485-appb-000024
在所述相机坐标系中的目标三维位置信息。
在该种情况下,像素点P的梯度值满足上述公式(2),像素点P在z轴方向和y轴方向的梯度值均为0。
在图6中的c中,要将像素点P的第一像素值,渲染为与目标面片对应的第一像素值;在该示例中,在像素点P被遮挡面片62所遮挡,且目标面片61在图像坐标系中进行投影的情况下,会覆盖像素点P,因此不需要在相机坐标系的x轴方向和y轴方向调整目标顶点
Figure PCTCN2021082485-appb-000025
的位置,只需要根据图6中 的e所示,在z轴方向调整目标顶点
Figure PCTCN2021082485-appb-000026
的位置,使得目标面片中投影至像素点P的位置点Q,能够位于遮挡面片的前方(相对于相机所在位置而言),进而得到目标顶点
Figure PCTCN2021082485-appb-000027
在所述相机坐标系中的目标三维位置信息。
在该种情况下,像素点P的梯度值满足上述公式(3),像素点P在x轴方向和y轴方向的梯度值均为0。
如图6中d所示,要将像素点P的第一像素值,渲染为与目标面片不同的第一像素值;在该示例中,在像素点P未被遮挡面片62所遮挡,目标面片61在图像坐标系中进行投影的情况下,会覆盖像素点P;此时,需要沿着相机坐标系的x轴方向移动目标顶点
Figure PCTCN2021082485-appb-000028
的位置,就会使得像素点P重新渲染后得到的新的第一像素值与目标面片对应的第一像素值不相同,因此,在该种情况下,可以如图6中的d所示,可以在相机坐标系中x轴方向移动目标顶点
Figure PCTCN2021082485-appb-000029
使得目标面片在在图像坐标系中投影的情况下,不会覆盖到像素点P,得到目标顶点
Figure PCTCN2021082485-appb-000030
在所述相机坐标系中的目标三维位置信息。
在该种情况下,像素点P的梯度值满足上述公式(2),像素点P在y轴方向和z轴方向的梯度值均为0。
采用上述方式,即能够得到第一渲染图像中每一个像素点的梯度值;第一渲染图像中所有像素点的梯度值,构成了第一渲染图像的梯度信息。
在上述S104中,在基于第一渲染图像的梯度信息,调整第一球体的第一球体位置信息的情况下,例如可以对第一球体的第一球体位置信息中至少一项进行调整,也可以对所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息、所述每个第一球体的三个坐标轴分别对应的长度、以及所述每个第一球体相对于所述相机坐标系的旋转角度中的至少一项进行调整,使得基于调整后的第一球体位置信息生成的新的第一渲染图像中,各个像素点的梯度值均向着趋向于第一数值的方向发生变化,进而能够通过多次迭代过程,使得第一球体位置信息逐渐逼近于真实值,提升第一球体位置信息的精度,进而最终提升第一对象的三维模型的精度。
参见图7所示,本公开实施例还提供一种神经网络生成方法,包括:
S701:利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;
S702:基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;
S703:基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;
S704:基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
本公开实施例提供的神经网络的结构例如图3所示,此处不再赘述。
本公开实施例在利用待优化的神经网络对第二图像中的第二对象进行三维位置信息预测处理,得到表征第二图像中第二对象的三维模型的多个第二球体的第二球体位置信息的基础上,基于第二球体位置信息进行图像渲染,并基于图像渲染的结果,确定表征多个第二球体的第二球体位置信息正确性程度的梯度信息,并基于该梯度信息更新待优化的神经网络的,得到优化后的神经网络,使得优化后的神经网络具有更高的三维位置信息预测精度。
上述S702的实现过程与上述S102的实现过程类似;上述S703的实现过程与上述S103的实现过程类似,在此均不再赘述。
上述S704中,在基于第二渲染图像的梯度信息,更新待训练的神经网络的情况下,利用更新后的神经网络获得新的第二球体位置信息的基础上,基于新的第二球体位置信息所获取的新的第二渲染图像中,各个像素点的梯度值均向着趋向于第一数值的方向发生变化,进而能够通过多次对神经网络的优化,逐步提升神经网络对第二球体位置信息的预测精度。
基于上述内容可知,本公开实施例可以把某个像素上的梯度,传递给3D网格上的节点的欧式坐标,即可以使用物体轮廓、部件语义分割等图像信息,纠正3D物体模型的形状。以下提供一种本公开实施例的应用场景:
1、前项传播:对于从3D模型网格到图像像素;
根据给定的相机参数,利用小孔相机成像原理,计算每一个三角面片(上述的面片)在图像平面上的投影;对于图像平面上的每一个像素,计算这个像素所在的区域,距离相机距离最近的三角面片的索引(即在完整渲染时,这个像素是被哪一个三角面片渲染得到的);一张每个像素保存着三角面片索引的图像为三角面索引(Face Index)(上述的面片索引)。此处,用
Figure PCTCN2021082485-appb-000031
表示像素点(u,v)是否属 于第i个部件,并称之为部件索引(Part Index)(上述的部位索引);生成一张渲染图像,然后针对每一个部件(上述的部位),单独从完整的渲染图像中提取一部分像素值,其中,提取的该部分的像素坐标在部件索引中属于当前部件。
2、反向传播:将像素的梯度回传给3D网格的节点;
1)由于x,y方向的情况相同,此处以x方向上的梯度回传为例进行说明。像素的值可以是RGB值,可以是灰度值,也可以是亮度值和二值,此处以二值的情况为例,即可见的为1,不可见的为0。一个像素上的梯度,要么是正方向(0到1),要么是负方向(1到0)。为了将节点(上述的顶点)的欧式坐标和像素点的梯度联系起来,此处认为,在移动某个节点时,像素的值是线性变化的,而不是突变的。在没有遮挡出现的情况下:比如图6中的图a,
Figure PCTCN2021082485-appb-000032
(表示目标面片中的第k个顶点)向右移动时,三角形(上述的目标面片)的一边覆盖了点P,I P从0变为1,所以I P随x的变化量如图6中的图a下方第一个折线图中的黑实线所示,那么节点的梯度
Figure PCTCN2021082485-appb-000033
就是这个变化的斜率,如图6中的图a下方第二个折线图所画的黑实线所示。当像素点在某个三角面片的内部时,
Figure PCTCN2021082485-appb-000034
在x上移动,I P的变化是从1到0,如图6中的图c所示,此时,节点
Figure PCTCN2021082485-appb-000035
的梯度向左,向右均不相同。综上,在
Figure PCTCN2021082485-appb-000036
表示节点k的梯度的情况下,该节点属于第i个部件的第j个三角面片,那么存在上述公式(2)。而在遮挡的情况下,因为是部件级渲染,所以当前部件由于被其他部件遮挡,值不会被渲染,所以不管这个部件是否覆盖了这个像素点,该像素点在该部件的渲染图中的值为0,参考附图6,面片62不属于当前部件的三角面片,但面片62是最靠近相机平面的三角面片,所以x位于面片62内时,梯度不会发生变化,即恒等于0,如图6中的所有折线图中的虚线所示。
3、根据上述1和2部分,遍历所有像素,计算遍历得到的像素的梯度回传到3D模型的节点上的值;在多个像素对一个节点都有梯度回传的情况下,将所有的梯度进行累加;为了加速,此处可以使用并行加速的方法,可以使用cuda,也可以是CPU并行,独立计算每一个像素;最终通过此方式,得到了给定监督信息下3D模型节点的梯度。
采用上述方法,使用的监督信息不再局限于完整的渲染图片,可以利用物体的语义分割作为监督信息;在多个物体一起渲染的情况下,不同的物体也可以被视为部件,独立渲染,从而可以得知不同物体之间的位置关系。
本领域技术人员可以理解,在上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与三维模型生成方法对应的三维模型生成装置,由于本公开实施例中的装置与本公开实施例上述三维模型生成方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图8所示,为本公开实施例提供的一种三维模型生成装置的示意图,所述装置包括:第一获取部分81、第一生成部分82、第一梯度确定部分83、调整部分84、以及模型生成部分85;其中,
第一获取部分81,被配置为基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;
第一生成部分82,被配置为基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;
第一梯度确定部分83,被配置为基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;
调整部分84,被配置为基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息;
模型生成部分85,被配置为利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
在本公开的一些实施例中,所述第一生成部分82,在基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像的情况下,被配置为:
基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息;
基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成所述第一渲染图像。
在本公开的一些实施例中,所述第一生成部分82,在基于所述第一球体位置信息,确定构成所述 每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息的情况下,被配置为:
基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
在本公开的一些实施例中,所述每个第一球体的所述第一球体位置信息包括:所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息、所述每个第一球体的三个坐标轴分别对应的长度、以及所述每个第一球体相对于所述相机坐标系的旋转角度。
在本公开的一些实施例中,所述第一生成部分82,在基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息的情况下,被配置为:
基于所述每个第一球体的三个坐标轴分别对应的长度以及所述每个第一球体相对于所述相机坐标系的旋转角度,对所述模板球体进行形状及旋转角度变换;
基于对所述模板球体进行形状及旋转角度变换的结果以及所述第一位置关系,确定各个模板顶点与变换后的模板球体的中心点之间的第二位置关系;
基于所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息以及所述第二位置关系,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
在本公开的一些实施例中,所述第一获取部分81,还被配置为:获取所述第一图像的相机的投影矩阵;
所述第一生成部分82,在基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像的情况下,被配置为:
基于所述第一三维位置信息以及所述投影矩阵,确定第一渲染图像中每个像素点的部位索引以及面片索引;
基于确定的第一渲染图像中每个像素点的部位索引以及面片索引,生成所述第一渲染图像;
其中,任一像素点的部位索引标识所述任一像素点对应的所述第一对象上的部位;任一像素点的面片索引标识所述任一像素点对应的面片。
在本公开的一些实施例中,所述第一生成部分82,在基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像的情况下,被配置为:
针对所述每个第一球体,根据构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成与所述每个第一球体对应的第一渲染图像;
所述第一梯度确定部分83,在基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息的情况下,被配置为:
针对所述每个第一球体,根据所述每个第一球体对应的第一渲染图像和语义分割图像,得到与所述每个第一球体对应的第一渲染图像的梯度信息。
在本公开的一些实施例中,所述第一渲染图像的梯度信息包括:所述第一渲染图像中每个像素点的梯度值;
所述第一梯度确定部分83,在基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息的情况下,被配置为:
遍历所述第一渲染图像中的各个像素点,针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值。
在本公开的一些实施例中,所述第一梯度确定部分83,在针对遍历到的像素点在所述第一渲染图像中的第一像素值以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值的情况下,被配置为:
根据所述遍历到的像素点的所述第一像素值,以及所述遍历到的像素点的所述第二像素值,确定所述遍历到的像素点的残差;
在所述遍历到的像素点的残差为第一数值的情况下,将所述遍历到的像素点的梯度值确定为所述第一数值;
在所述遍历到的像素点的残差不为所述第一数值的情况下,基于所述遍历到的像素点的所述第二像素值,从所述多个第一球体中确定所述遍历到的像素点对应的目标第一球体,并从构成所述目标第一球体的多个面片中确定目标面片;
确定所述目标面片上的至少一个目标顶点在所述相机坐标系中的目标三维位置信息,其中,在所述至少一个目标顶点位于所述目标三维位置信息所标识的位置的情况下,将所述遍历到的像素点进行重新 渲染得到的新的第一像素值,和所述遍历到的像素点对应的第二像素值之间的残差确定为所述第一数值;
基于所述目标顶点在所述相机坐标系中的第一三维位置信息和所述目标三维位置信息,得到所述遍历到的像素点的梯度值。
在本公开的一些实施例中,所述第一获取部分81,在基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息的情况下,被配置为:
利用预先训练的位置信息预测网络,对所述第一图像进行位置信息预测处理,得到所述多个第一球体中每个第一球体在所述相机坐标系中的第一球体位置信息。
参见图9所示,本公开实施例还提供一种神经网络生成装置,包括:
第二获取部分91,被配置为利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;
第二生成部分92,被配置为基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;
第二梯度确定部分93,被配置为基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;
更新部分94,被配置为基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
关于装置中的各部分的处理流程、以及各部分之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。
本公开实施例还提供了一种计算机设备,如图10所示,为本公开实施例提供的计算机设备结构示意图,包括:
处理器11和存储器12;所述存储器12存储有所述处理器11可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器执行以实现下述步骤:
基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;
基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;
基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;
基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息,并利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型;
或者,所述机器可读指令被所述处理器执行以实现下述步骤:
利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;
基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;
基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;
基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
上述指令的执行过程可以参考本公开实施例中所述的三维模型生成方法、及神经网络生成方法的步骤,此处不再赘述。
本公开实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述方法实施例中所述的三维模型生成方法或神经网络生成方法的步骤。其中,所述存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例所提供的三维模型生成方法或神经网络生成方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可被配置为执行上述方法实施例中所述的三维模型生成方法或神经网络生成方法的步骤,可参见上述方法实施例,在此不再赘述。
本公开实施例还提供一种计算机程序,所述计算机程序被处理器执行时实现前述实施例的任意一种方法。所述计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品体现为计算机存储介质,在另一个可选实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
本公开实施例还提供了一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行时实现如上述的三维模型生成方法,或上述的神经网络生成方法。
通过本公开实施例,在三维重建的任务中,可以优化重建模型的精准度,降低了高自由度模型的自 遮挡产生的歧义性;并且,在深度学习中,通过本公开实施例,可以将图像和三维空间联系起来;从而提升了语义分割,三维重建等任务的准确性。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应所述理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者所述技术方案的部分可以以软件产品的形式体现出来,所述计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。
工业实用性
本公开实施例提供了一种三维模型生成方法、神经网络生成方法及装置,其中,所述三维模型生成方法包括:基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别被配置为表示所述第一对象不同部位;基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息,并利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。本公开实施例生成的三维模型具有更高的精度。

Claims (16)

  1. 一种三维模型生成方法,包括:
    基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;
    基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;
    基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;
    基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息,并利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
  2. 根据权利要求1所述三维模型生成方法,其中,所述基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像,包括:
    基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息;
    基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成所述第一渲染图像。
  3. 根据权利要求2所述的三维模型生成方法,其中,所述基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,包括:
    基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
  4. 根据权利要求3所述的三维模型生成方法,其中,所述每个第一球体的所述第一球体位置信息包括:所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息、所述每个第一球体的三个坐标轴分别对应的长度、以及所述每个第一球体相对于所述相机坐标系的旋转角度。
  5. 根据权利要求4所述的三维模型生成方法,其中,所述基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,包括:
    基于所述每个第一球体的三个坐标轴分别对应的长度以及所述每个第一球体相对于所述相机坐标系的旋转角度,对所述模板球体进行形状及旋转角度变换;
    基于对所述模板球体进行形状及旋转角度变换的结果以及所述第一位置关系,确定各个模板顶点与变换后的模板球体的中心点之间的第二位置关系;
    基于所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息以及所述第二位置关系,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
  6. 根据权利要求2-5任一项所述的三维模型生成方法,其中,
    所述方法还包括:获取所述第一图像的相机的投影矩阵;
    所述基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像,包括:
    基于所述第一三维位置信息以及所述投影矩阵,确定第一渲染图像中每个像素点的部位索引以及面片索引;
    基于确定的第一渲染图像中每个像素点的部位索引以及面片索引,生成所述第一渲染图像;
    其中,任一像素点的部位索引标识所述任一像素点对应的所述第一对象上的部位;任一像素点的面片索引标识所述任一像素点对应的面片。
  7. 根据权利要求2-6任一项所述的三维模型生成方法,其中,所述基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像,包括:
    针对所述每个第一球体,根据构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成与所述每个第一球体对应的第一渲染图像;
    所述基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,包括:
    针对所述每个第一球体,根据所述每个第一球体对应的第一渲染图像和语义分割图像,得到与所述每个第一球体对应的第一渲染图像的梯度信息。
  8. 根据权利要求1-7任一项所述的三维模型生成方法,其中,所述第一渲染图像的梯度信息包括:所述第一渲染图像中每个像素点的梯度值;
    所述基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,包括:
    遍历所述第一渲染图像中的各个像素点,针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值。
  9. 根据权利要求8所述的三维模型生成方法,其中,所述针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值,包括:
    根据所述遍历到的像素点的所述第一像素值,以及所述遍历到的像素点的所述第二像素值,确定所述遍历到的像素点的残差;
    在所述遍历到的像素点的残差为第一数值的情况下,将所述遍历到的像素点的梯度值确定为所述第一数值;
    在所述遍历到的像素点的残差不为所述第一数值的情况下,基于所述遍历到的像素点的所述第二像素值,从所述多个第一球体中确定所述遍历到的像素点对应的目标第一球体,并从构成所述目标第一球体的多个面片中确定目标面片;
    确定所述目标面片上的至少一个目标顶点在所述相机坐标系中的目标三维位置信息,其中,在所述至少一个目标顶点位于所述目标三维位置信息所标识的位置的情况下,将所述遍历到的像素点进行重新渲染得到的新的第一像素值,和所述遍历到的像素点对应的第二像素值之间的残差确定为所述第一数值;
    基于所述目标顶点在所述相机坐标系中的第一三维位置信息和所述目标三维位置信息,得到所述遍历到的像素点的梯度值。
  10. 根据权利要求1-9任一项所述的三维模型生成方法,其中,所述基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,包括:
    利用预先训练的位置信息预测网络,对所述第一图像进行位置信息预测处理,得到所述多个第一球体中每个第一球体在所述相机坐标系中的第一球体位置信息。
  11. 一种神经网络生成方法,包括:
    利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;
    基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;
    基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;
    基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
  12. 一种三维模型生成装置,包括:
    第一获取部分,被配置为基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;
    第一生成部分,被配置为基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;
    第一梯度确定部分,被配置为基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;
    调整部分,被配置为基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息;
    模型生成部分,被配置为利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
  13. 一种神经网络生成装置,包括:
    第二获取部分,被配置为利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;
    第二生成部分,被配置为基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;
    第二梯度确定部分,被配置为基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;
    更新部分,被配置为基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
  14. 一种电子设备,包括:处理器、以及存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器被配置为执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述处理器执行如权利要求1至11任一项所述的方法的步骤。
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被 电子设备运行时,所述电子设备执行如权利要求1至11任意一项所述的方法的步骤。
  16. 一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行时实现权利要求1至11中任意一项所述的方法。
PCT/CN2021/082485 2020-06-29 2021-03-23 三维模型生成方法、神经网络生成方法及装置 WO2022001222A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2021573567A JP2022542758A (ja) 2020-06-29 2021-03-23 3次元モデル生成方法、ニューラルネットワーク生成方法及び装置
EP21819707.7A EP3971840A4 (en) 2020-06-29 2021-03-23 THREE-DIMENSIONAL MODEL GENERATION METHOD, NEURON NETWORK GENERATION METHOD AND ASSOCIATED DEVICES
KR1020217042400A KR20220013403A (ko) 2020-06-29 2021-03-23 3차원 모델 생성 방법, 신경망 생성 방법 및 장치
US17/645,446 US20220114799A1 (en) 2020-06-29 2021-12-21 Three dimensional model generation method and apparatus, and neural network generating method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010607430.5 2020-06-29
CN202010607430.5A CN111739159A (zh) 2020-06-29 2020-06-29 三维模型生成方法、神经网络生成方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/645,446 Continuation US20220114799A1 (en) 2020-06-29 2021-12-21 Three dimensional model generation method and apparatus, and neural network generating method and apparatus

Publications (1)

Publication Number Publication Date
WO2022001222A1 true WO2022001222A1 (zh) 2022-01-06

Family

ID=72652991

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082485 WO2022001222A1 (zh) 2020-06-29 2021-03-23 三维模型生成方法、神经网络生成方法及装置

Country Status (6)

Country Link
US (1) US20220114799A1 (zh)
EP (1) EP3971840A4 (zh)
JP (1) JP2022542758A (zh)
KR (1) KR20220013403A (zh)
CN (1) CN111739159A (zh)
WO (1) WO2022001222A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739159A (zh) * 2020-06-29 2020-10-02 上海商汤智能科技有限公司 三维模型生成方法、神经网络生成方法及装置
CN112883102B (zh) * 2021-03-05 2024-03-08 北京知优科技有限公司 数据可视化展示的方法、装置、电子设备及存储介质
US11830138B2 (en) * 2021-03-19 2023-11-28 Adobe Inc. Predicting secondary motion of multidimentional objects based on local patch features
CN113239943B (zh) * 2021-05-28 2022-05-31 北京航空航天大学 基于部件语义图的三维部件提取组合方法和装置
KR102553304B1 (ko) * 2022-11-01 2023-07-10 주식회사 로지비 딥러닝 비전 학습 기반 물류 검수 서버 및 그 동작 방법
CN117274473B (zh) * 2023-11-21 2024-02-02 北京渲光科技有限公司 一种多重散射实时渲染的方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645416A (zh) * 2005-01-20 2005-07-27 上海交通大学 基于肌肉体积不变性的人肢体三维建模方法
CN105303611A (zh) * 2015-12-08 2016-02-03 新疆华德软件科技有限公司 基于旋转抛物面的虚拟人肢体建模方法
CN108648268A (zh) * 2018-05-10 2018-10-12 浙江大学 一种基于胶囊的人体模型逼近方法
CN108846892A (zh) * 2018-06-05 2018-11-20 陈宸 人体模型的确定方法及装置
CN111739159A (zh) * 2020-06-29 2020-10-02 上海商汤智能科技有限公司 三维模型生成方法、神经网络生成方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7203844B2 (ja) * 2017-07-25 2023-01-13 達闥機器人股▲分▼有限公司 トレーニングデータの生成方法、生成装置及びその画像のセマンティックセグメンテーション方法
EP3579196A1 (en) * 2018-06-05 2019-12-11 Cristian Sminchisescu Human clothing transfer method, system and device
CN111126242B (zh) * 2018-10-16 2023-03-21 腾讯科技(深圳)有限公司 肺部图像的语义分割方法、装置、设备及存储介质
CN110633628B (zh) * 2019-08-02 2022-05-06 杭州电子科技大学 基于人工神经网络的rgb图像场景三维模型重建方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1645416A (zh) * 2005-01-20 2005-07-27 上海交通大学 基于肌肉体积不变性的人肢体三维建模方法
CN105303611A (zh) * 2015-12-08 2016-02-03 新疆华德软件科技有限公司 基于旋转抛物面的虚拟人肢体建模方法
CN108648268A (zh) * 2018-05-10 2018-10-12 浙江大学 一种基于胶囊的人体模型逼近方法
CN108846892A (zh) * 2018-06-05 2018-11-20 陈宸 人体模型的确定方法及装置
CN111739159A (zh) * 2020-06-29 2020-10-02 上海商汤智能科技有限公司 三维模型生成方法、神经网络生成方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MIN WANG; FENG QIU; WENTAO LIU; CHEN QIAN; XIAOWEI ZHOU; LIZHUANG MA: "EllipBody: A Light-weight and Part-based Representation for Human Pose and Shape Recovery", ARXIV.ORG, 24 March 2020 (2020-03-24), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081628098 *
See also references of EP3971840A4 *

Also Published As

Publication number Publication date
EP3971840A4 (en) 2023-01-18
KR20220013403A (ko) 2022-02-04
CN111739159A (zh) 2020-10-02
JP2022542758A (ja) 2022-10-07
EP3971840A1 (en) 2022-03-23
US20220114799A1 (en) 2022-04-14

Similar Documents

Publication Publication Date Title
WO2022001222A1 (zh) 三维模型生成方法、神经网络生成方法及装置
AU2020200811B2 (en) Direct meshing from multiview input using deep learning
Li et al. Monocular real-time volumetric performance capture
CN111325851B (zh) 图像处理方法及装置、电子设备和计算机可读存储介质
US9892506B2 (en) Systems and methods for shape analysis using landmark-driven quasiconformal mapping
CN104637090B (zh) 一种基于单张图片的室内场景建模方法
WO2021253788A1 (zh) 一种人体三维模型构建方法及装置
Kanatani et al. Guide to 3D Vision Computation
Chhatkuli et al. Inextensible non-rigid shape-from-motion by second-order cone programming
CN115439607A (zh) 一种三维重建方法、装置、电子设备及存储介质
CN113936090A (zh) 三维人体重建的方法、装置、电子设备及存储介质
JP2021026759A (ja) オブジェクトの3dイメージングを実施するためのシステムおよび方法
CN115830241A (zh) 一种基于神经网络的真实感三维人脸纹理重建方法
Twarog et al. Playing with puffball: simple scale-invariant inflation for use in vision and graphics
CN112926543A (zh) 图像生成、三维模型生成方法、装置、电子设备及介质
Kazmi et al. Efficient sketch‐based creation of detailed character models through data‐driven mesh deformations
CN113223137B (zh) 透视投影人脸点云图的生成方法、装置及电子设备
Golyanik Robust Methods for Dense Monocular Non-Rigid 3D Reconstruction and Alignment of Point Clouds
Eapen et al. Elementary methods for generating three-dimensional coordinate estimation and image reconstruction from series of two-dimensional images
CN116912433B (zh) 三维模型骨骼绑定方法、装置、设备及存储介质
WO2023233575A1 (ja) 推定装置、学習装置、推定方法、学習方法及びプログラム
US20240161391A1 (en) Relightable neural radiance field model
Wang et al. Genetic-algorithm-based stereo vision with no block partitioning of input images
Kalel Modelling Project Work: 3D reconstruction using Stereo vision
Frigerio et al. Surface reconstruction using 3D morphological operators for objects acquired with a multi-Kinect system

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021573567

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217042400

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021819707

Country of ref document: EP

Effective date: 20211215

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21819707

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE