WO2022001222A1 - 三维模型生成方法、神经网络生成方法及装置 - Google Patents
三维模型生成方法、神经网络生成方法及装置 Download PDFInfo
- Publication number
- WO2022001222A1 WO2022001222A1 PCT/CN2021/082485 CN2021082485W WO2022001222A1 WO 2022001222 A1 WO2022001222 A1 WO 2022001222A1 CN 2021082485 W CN2021082485 W CN 2021082485W WO 2022001222 A1 WO2022001222 A1 WO 2022001222A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sphere
- position information
- image
- rendered image
- spheres
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 61
- 230000011218 segmentation Effects 0.000 claims abstract description 26
- 238000009877 rendering Methods 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 16
- 230000009466 transformation Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000001131 transforming effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/10—Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/20—Finite element generation, e.g. wire-frame surface description, tesselation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/242—Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/20—Indexing scheme for editing of 3D models
- G06T2219/2021—Shape modification
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to a method for generating a three-dimensional model, a method for generating a neural network, an apparatus, a device, and a computer-readable storage medium.
- the embodiments of the present disclosure provide at least a three-dimensional model generation method, a neural network generation method, an apparatus, a device, and a computer-readable storage medium.
- an embodiment of the present disclosure provides a method for generating a three-dimensional model, comprising: acquiring, based on a first image including a first object, a first sphere in a camera coordinate system of each first sphere among a plurality of first spheres position information, the multiple first spheres respectively represent different parts of the first object; a first rendered image is generated based on the first sphere position information of the multiple first spheres; based on the first rendered image and a semantically segmented image of the first image to obtain gradient information of the first rendered image; adjusting the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image, and using the adjusted first sphere position information of the plurality of first spheres to generate a three-dimensional model of the first object.
- the degree of correctness of the first sphere position information capable of representing the plurality of first spheres is determined and readjust the first sphere position information corresponding to the multiple first spheres based on the gradient information, so that the adjusted position information of the multiple first spheres has higher accuracy, that is, based on the multiple first spheres
- the three-dimensional model recovered from the position information of the first spheres corresponding to the first spheres also has higher accuracy.
- the generating a first rendered image based on the first sphere position information of the plurality of first spheres includes: determining, based on the first sphere position information, that each The first three-dimensional position information of the vertices of the multiple patches of each first sphere in the camera coordinate system; The first three-dimensional position information in the system is used to generate the first rendered image.
- the first object can be divided into a plurality of parts and represented as different first spheres, and based on the first three-dimensional position information in the camera coordinate system of the vertices of the plurality of patches constituting the different spheres, the first sphere can be generated.
- a rendered image in which the first rendered image includes three-dimensional relationship information of parts of different first objects, so that the three-dimensional model of the first object can be constrained based on the gradient information determined by the first rendered image, so that the three-dimensional The model has higher accuracy.
- the first three-dimensional position in the camera coordinate system of each vertex of a plurality of patches constituting each first sphere is determined based on the first sphere position information.
- information including: based on the first positional relationship between template vertices of a plurality of template patches constituting the template sphere and the center point of the template sphere, and the first sphere position information of each of the first spheres, Determine the first three-dimensional position information of each vertex in the camera coordinate system of each vertex of the plurality of patches constituting each of the first spheres.
- first spheres are obtained by deforming multiple template patches, and the surfaces of the spheres are represented by patches, thereby reducing the complexity of rendering to generate the first rendered image.
- the first sphere position information of each first sphere includes: second three-dimensional position information of the center point of each first sphere in the camera coordinate system, The lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
- the first positional relationship between the template vertices of a plurality of template patches constituting the template sphere and the center point of the template sphere, and the first position of each first sphere Sphere position information determining the first three-dimensional position information of each vertex in the camera coordinate system of each vertex of a plurality of patches constituting each first sphere, including: based on three vertices of each first sphere The respective lengths of the coordinate axes and the rotation angle of each first sphere relative to the camera coordinate system are used to transform the template sphere in shape and rotation angle; based on the shape and rotation angle transformation of the template sphere.
- the result and the first positional relationship determine the second positional relationship between each template vertex and the center point of the transformed template sphere; based on the first position of the center point of each first sphere in the camera coordinate system;
- the two-dimensional positional information and the second positional relationship determine the first three-dimensional positional information of each vertex of the plurality of patches constituting each first sphere in the camera coordinate system.
- the first three-dimensional position information can be obtained quickly.
- the method further includes: acquiring the projection matrix of the camera of the first image; Generating a first rendered image based on the first three-dimensional position information in the camera coordinate system includes: determining a part index and a patch index of each pixel in the first rendered image based on the first three-dimensional position information and the projection matrix ; Based on the determined part index and the patch index of each pixel in the first rendered image, generate the first rendered image; Wherein, the part index of any pixel identifies the first corresponding to the any pixel. A part on an object; the patch index of any pixel identifies the patch corresponding to the any pixel.
- the generating the first rendered image based on the first three-dimensional position information of the respective vertices of the multiple patches constituting the first sphere in the camera coordinate system includes: :
- the obtaining gradient information of the first rendered image based on the first rendered image and the semantically segmented image of the first image includes:
- the gradient information of the first rendered image corresponding to each first sphere is obtained according to the first rendered image and the semantically segmented image corresponding to each first sphere.
- the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image; the first rendered image based on the first rendered image and the first image to obtain the gradient information of the first rendered image, including: traversing each pixel in the first rendered image, and for the first pixel value of the traversed pixel in the first rendered image , and the second pixel value of the traversed pixel point in the semantic segmentation image, to determine the gradient value of the traversed pixel point.
- the gradient information of the first rendered image can be obtained by segmenting the first rendered image and the semantics of the first image.
- the first pixel value of the traversed pixel in the first rendered image, and the second pixel of the traversed pixel in the semantic segmentation image value, and determining the gradient value of the traversed pixel point includes: determining the traversed pixel point according to the first pixel value of the traversed pixel point and the second pixel value of the traversed pixel point.
- the target first sphere corresponding to the pixel points of the target sphere, and the target patch is determined from the multiple patches that constitute the target first sphere;
- the target of at least one target vertex on the target patch in the camera coordinate system is determined Three-dimensional position information, wherein, when the at least one target vertex is located at the position identified by the target three-dimensional position information, a new first pixel value obtained by re-rendering the traversed pixel point, and the The residual between the second pixel values corresponding to the traversed pixel points is determined as the first value; based on the first three-dimensional position information
- the acquiring, based on the first image including the first object, the first sphere position information in the camera coordinate system of each first sphere among the plurality of first spheres includes: using pre-training The location information prediction network is used to perform location information prediction processing on the first image to obtain first sphere location information of each of the first spheres in the camera coordinate system.
- an embodiment of the present disclosure further provides a method for generating a neural network, including: using a neural network to be trained, performing three-dimensional position information prediction processing on a second object in a second image, and obtaining a representation that the second object is different second sphere position information in the camera coordinate system of each of the plurality of second spheres in the part; generating a second rendered image based on the second sphere position information corresponding to the plurality of second spheres; The second rendered image and the semantically labeled image of the second image are obtained to obtain the gradient information of the second rendered image; based on the gradient information of the second rendered image, the neural network to be trained is updated to obtain the updated neural network.
- the internet The internet.
- the second sphere position information of the plurality of second spheres representing the three-dimensional model of the second object in the second image is obtained after the second sphere position information is obtained.
- image rendering is performed based on the second sphere position information, and based on the result of the image rendering, the gradient information of the correctness degree of the second sphere position information of the plurality of second spheres is determined, and the neural network to be optimized is updated based on the gradient information, The optimized neural network is obtained, so that the optimized neural network has higher prediction accuracy of three-dimensional position information.
- an embodiment of the present disclosure further provides a three-dimensional model generation apparatus, including: a first acquisition part configured to acquire, based on a first image including a first object, each of the plurality of first spheres in the position information of the first sphere in the camera coordinate system, the plurality of first spheres respectively represent different parts of the first object; the first generation part is configured to be based on the first sphere of the plurality of first spheres position information, to generate a first rendered image; a first gradient determination part, configured to obtain gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image; the adjustment part , configured to adjust the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image; the model generation part is configured to use the adjusted first spheres The position information of the first sphere is generated, and a three-dimensional model of the first object is generated.
- the first generating part is configured to: a sphere position information, to determine the first three-dimensional position information of each vertex of the plurality of patches constituting each first sphere respectively in the camera coordinate system; based on the plurality of faces constituting each first sphere The first three-dimensional position information of each vertex of the slice in the camera coordinate system is used to generate the first rendered image.
- the first generation part determines, based on the first sphere position information, that each vertex of the plurality of patches constituting the first sphere is in the camera coordinate system, respectively.
- the first three-dimensional position information of the The first sphere position information of the sphere determines the first three-dimensional position information in the camera coordinate system of the vertices of the plurality of patches constituting each first sphere.
- the first sphere position information of each first sphere includes: second three-dimensional position information of the center point of each first sphere in the camera coordinate system, all The lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
- the first generation part is based on a first positional relationship between template vertices of a plurality of template patches constituting a template sphere and a center point of the template sphere, and each of the The first sphere position information of the first sphere, in the case of determining the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system, is configured as: based on The corresponding lengths of the three coordinate axes of each first sphere and the rotation angle of each first sphere relative to the camera coordinate system, the template sphere is subjected to shape and rotation angle transformation; The result of the shape and rotation angle transformation of the template sphere and the first positional relationship, determine the second positional relationship between each template vertex and the center point of the transformed template sphere; based on the center of each first sphere The second three-dimensional position information of the point in the camera coordinate system and the second positional relationship determine the first vertices in the camera coordinate system of the vertices of the
- the first acquisition part is further configured to: acquire the projection matrix of the camera of the first image; the first generation part is based on the The first three-dimensional position information of each vertex of the plurality of patches in the camera coordinate system is configured to: determine the first three-dimensional position information based on the first three-dimensional position information and the projection matrix when the first rendered image is generated.
- the part index and patch index of each pixel in the first rendered image; the first rendered image is generated based on the determined part index and patch index of each pixel in the first rendered image; wherein any pixel
- the part index of the point identifies the part on the first object corresponding to the any pixel point; the patch index of any pixel point identifies the patch corresponding to the any pixel point.
- the first generating part generates the first three-dimensional position information in the camera coordinate system based on the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere.
- the first rendered image it is configured to: for each of the first spheres, according to the respective vertices of the plurality of patches constituting the each of the first spheres in the first three-dimensional camera coordinate system position information, generating a first rendered image corresponding to each of the first spheres;
- the first gradient determination part in the case of obtaining gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image, is configured to: for each of the For the first sphere, the gradient information of the first rendered image corresponding to each first sphere is obtained according to the first rendered image and the semantically segmented image corresponding to each first sphere.
- the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image; the first gradient determination part is based on the first rendered image. and the semantically segmented image of the first image, when the gradient information of the first rendered image is obtained, it is configured to: traverse each pixel in the first rendered image, and for the traversed pixel in the The first pixel value in the first rendered image and the second pixel value of the traversed pixel point in the semantic segmentation image are used to determine the gradient value of the traversed pixel point.
- the first gradient determination part is based on the first pixel value of the traversed pixel in the first rendered image, and the traversed pixel in the semantic segmentation.
- the second pixel value in the image when the gradient value of the traversed pixel point is determined, is configured to: according to the first pixel value of the traversed pixel point, and the traversed pixel the second pixel value of the point to determine the residual of the traversed pixel point; in the case where the residual of the traversed pixel point is the first value, the gradient of the traversed pixel point The value is determined as the first value; in the case that the residual of the traversed pixel point is not the first value, based on the second pixel value of the traversed pixel point, from the Determining a target first sphere corresponding to the traversed pixel point from among the plurality of first spheres, and determining a target patch from a plurality of patches constituting the target first sphere; determining at least one of the
- the first acquisition part acquires the position information of the first sphere in the camera coordinate system of each of the plurality of first spheres based on the first image including the first object.
- it is configured to: use a pre-trained position information prediction network to perform position information prediction processing on the first image to obtain the position information of each of the plurality of first spheres in the camera coordinate system First sphere position information.
- the embodiments of the present disclosure also provide a device for generating a neural network, including: a second acquisition part configured to use the neural network to be trained to predict three-dimensional position information of a second object in a second image processing to obtain second sphere position information in the camera coordinate system of each of the plurality of second spheres representing different parts of the second object; the second generation part is configured to be based on the plurality of second spheres, respectively generating a second rendered image corresponding to the position information of the second sphere; a second gradient determination part is configured to obtain the gradient of the second rendered image based on the second rendered image and the semantic annotation image of the second image information; the updating part is configured to update the neural network to be trained based on the gradient information of the second rendered image to obtain the updated neural network.
- an optional implementation manner of the present disclosure further provides an electronic device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the memory machine-readable instructions stored in the processor, when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the above-mentioned first aspect, or any one of the possible first aspects. or perform the above-mentioned second aspect, or the steps in any possible implementation manner of the second aspect.
- an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect or the first aspect when the computer program is run. Steps in any possible implementation manner of the second aspect; or perform the above-mentioned second aspect, or the steps in any possible implementation manner of the second aspect.
- an optional implementation manner of the present disclosure further provides a computer program, comprising computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the code to achieve The above-mentioned first aspect, or steps in any possible implementation manner of the first aspect; or implementing the above-mentioned second aspect, or steps in any possible implementation manner of the second aspect.
- FIG. 1 shows a flowchart of a method for generating a three-dimensional model provided by an embodiment of the present disclosure
- FIG. 2 shows a schematic diagram of an example of characterizing a human body by a plurality of first spheres provided by an embodiment of the present disclosure
- FIG. 3 is a schematic diagram showing an example of the structure of a location information prediction network provided by an embodiment of the present disclosure
- FIG. 4 shows a schematic diagram of an example of transforming a template sphere into a first sphere provided by an embodiment of the present disclosure
- FIG. 5 shows a flowchart of a method for determining a gradient value of a traversed pixel point provided by an embodiment of the present disclosure
- FIG. 6 shows various examples of determining the three-dimensional position information of the target when the residual error of the traversed pixel point provided by the embodiment of the present disclosure is not the first value
- FIG. 7 shows a flowchart of a method for generating a neural network provided by an embodiment of the present disclosure
- FIG. 8 shows a schematic diagram of a three-dimensional model generating apparatus provided by an embodiment of the present disclosure
- FIG. 9 shows a flowchart of a neural network generating apparatus provided by an embodiment of the present disclosure.
- FIG. 10 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
- a neural network is generally used to predict the 3D model parameters of the generated object in the 2D image, and the 3D model is generated based on the 3D model parameters.
- the current 3D model generation method cannot deal with the ambiguity caused by the occlusion of some parts of the 3D model reconstructed object, resulting in the inability to accurately restore the depth pose of the 3D model reconstructed object, which in turn leads to the accuracy of the generated 3D model. lower.
- an embodiment of the present disclosure provides a method for generating a three-dimensional model, by performing image rendering on the first sphere position information of a plurality of first spheres representing the three-dimensional model, and based on the result of the first image rendering, it is determined that the gradient information of the correctness degree of the first sphere position information of the plurality of first spheres, and readjust the first sphere position information corresponding to the plurality of first spheres based on the gradient information, so that the adjusted plurality of first spheres
- the position information of a sphere has higher accuracy, that is, the three-dimensional model restored based on the first sphere position information corresponding to the plurality of first spheres also has higher accuracy.
- the first spheres corresponding to the plurality of first spheres are respectively The position information is readjusted, so that the depth information of the first object can be restored with higher accuracy, and has higher accuracy.
- An embodiment of the present disclosure also provides a method for generating a neural network, which uses the neural network to be optimized to perform three-dimensional position information prediction processing on a second object in a second image, and obtains a three-dimensional model representing a three-dimensional model of the second object in the second image.
- image rendering is performed based on the second sphere position information, and based on the result of the image rendering, the gradient information of the correctness degree of the second sphere position information of the plurality of second spheres is determined. , and update the neural network to be optimized based on the gradient information to obtain an optimized neural network, so that the optimized neural network has higher prediction accuracy of three-dimensional position information.
- the execution body of the method for generating a 3D model provided by the embodiment of the present disclosure is generally a computer device with a certain computing capability.
- the computer equipment includes, for example, a terminal device or a server or other processing device, and the terminal device can be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant) , PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc.
- the three-dimensional model generation method may be implemented by a processor invoking computer-readable instructions stored in a memory.
- FIG. 1 is a flowchart of a method for generating a 3D model provided by an embodiment of the present disclosure
- the method includes steps S101 to S104 , wherein:
- S101 Based on a first image including a first object, obtain first sphere position information in the camera coordinate system of each first sphere in a plurality of first spheres, where the plurality of first spheres respectively represent the first object different parts;
- S102 Generate a first rendered image based on the first sphere position information of the plurality of first spheres;
- S103 Obtain gradient information of the first rendered image based on the first rendered image and the semantically segmented image of the first image;
- S104 Adjust the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image, and use the adjusted first sphere position information of the plurality of first spheres , generating a three-dimensional model of the first object.
- the first sphere position information in the camera coordinate system of each of the multiple first spheres representing different parts of the first object, according to the first sphere position information, An object is re-rendered to obtain a first rendered image; then the image is segmented based on the first rendered image and the semantics of the first image to obtain gradient information of the first rendered image, where the gradient information represents the first object based on the first sphere position information
- the correctness of the first rendered image obtained by re-rendering, so that in the process of adjusting the first sphere position information of each first sphere based on the gradient information, the part where the first sphere position information is predicted incorrectly is adjusted, so that the adjustment
- the obtained first sphere position information can more accurately represent the positions of different parts of the first object in the camera coordinate system, and then a three-dimensional model of the first object is generated based on the adjusted first sphere position information of each first sphere, with higher precision.
- the gradient information representing the degree of correctness of the first sphere position information of the plurality of first spheres is used to readjust the first sphere position information respectively corresponding to the plurality of first spheres, it is possible to The depth information of the first object is restored with higher accuracy, so that the obtained three-dimensional model has higher accuracy.
- the first object is divided into multiple parts, and different parts of the first object are divided into different parts. Prediction of three-dimensional position information is performed.
- the three-dimensional position information corresponding to different parts of the first object is represented by the first sphere position information of the first sphere in the camera coordinate system;
- the first sphere position information of the first sphere in the camera coordinate system includes: The three-dimensional position information (that is, the second three-dimensional position information) of the center point of the first sphere in the camera coordinate system, the lengths corresponding to the three coordinate axes of the first sphere, and the relative length of each first sphere relative to the camera The rotation angle of the coordinate system.
- the body can be divided into multiple parts according to the limbs and torso of the human body, and each part is represented by a first sphere; each first sphere includes three coordinate axes, which respectively represent the length of the bones , and the thickness of the part in different directions.
- an embodiment of the present disclosure provides an example in which a human body is represented by a plurality of first spheres.
- the human body is divided into 20 parts, and the 20 parts pass through the 20 first spheres.
- i 1,...,20 ⁇ ;
- ⁇ i E(R i ,C i ,X i );
- ⁇ i represents the first sphere position information of the i-th first sphere in the camera coordinate system, that is, the pose data of the corresponding part of the first sphere in the camera coordinate system;
- X i represents the i-th first sphere Dimensional data of a sphere whose parameters include: bone length l i , and part thickness in different directions and
- C i represents the three-dimensional coordinate value of the center point of the i-th first sphere in the camera coordinate system;
- R i represents the rotation information of the i-th first sphere in the camera coordinate system.
- the pose data S i of the i-th first sphere satisfies the following formula (1):
- O i is the offset vector, which represents the offset direction from the parent part corresponding to the ith first sphere to the current part; l i O i represents the position of the ith part of the human body in the key point layout local location.
- S parent(i) represents the pose data of the parent part.
- R parent(i) represents the rotation information of the parent part corresponding to the i-th first sphere in the camera coordinate system.
- a pre-trained position information prediction network can be used to perform position information prediction processing on the first image to obtain First sphere position information of each of the plurality of first spheres in the camera coordinate system.
- the embodiment of the present disclosure further provides an example of the structure of a position information prediction network, including: a feature extraction sub-network, a key point prediction sub-network, and a three-dimensional position information prediction sub-network.
- the feature extraction sub-network is used to perform feature extraction processing on the first image to obtain a feature map of the first image.
- the feature extraction sub-network includes, for example, convolutional neural networks (CNN), and the CNN can perform at least one-level feature extraction processing on the first image to obtain a feature map of the first image.
- CNN convolutional neural networks
- the process of performing at least one-level feature extraction processing on the first image by the CNN can also be regarded as the process of encoding the first image by using the CNN encoder.
- the key point prediction sub-network is configured to determine, based on the feature map of the first image, two-dimensional coordinate values of multiple key points of the first object in the first image.
- the key point prediction sub-network can perform at least one level of deconvolution processing based on the feature map of the first image to obtain a heat map of the first image, wherein the size of the heat map is, for example, the same as the size of the first image;
- the pixel value of any first pixel point in the heat map represents the probability that the second pixel point corresponding to the position of any first pixel point in the first image is a key point of the first object.
- two-dimensional coordinate values of multiple key points of the first object in the first image can be obtained.
- the three-dimensional position information prediction sub-network is used to obtain a plurality of first spheres constituting the first object respectively according to the two-dimensional coordinate values in the first image and the feature map of the first image based on a plurality of key points of the first object.
- the position information of the first sphere in the camera coordinate system is used to obtain a plurality of first spheres constituting the first object respectively according to the two-dimensional coordinate values in the first image and the feature map of the first image based on a plurality of key points of the first object.
- the first rendered image may be generated in the following manner, for example:
- the first sphere Based on the position information of the first sphere, determine the first three-dimensional position information of each vertex of the plurality of patches constituting the first sphere in the camera coordinate system; based on the first sphere constituting the first sphere The first three-dimensional position information of the respective vertices of the plurality of patches in the camera coordinate system is generated, and the first rendered image is generated.
- a patch is a collection of vertices and polygons representing the shape of a polyhedron in 3D computer graphics, also known as an unstructured mesh.
- the first sphere position information in the camera coordinate system of the plurality of patches constituting the first sphere can be determined based on the first sphere position information.
- it may be determined based on the first positional relationship between template vertices of a plurality of template patches constituting the template sphere and the center point of the template sphere, and the first sphere position information of each of the first spheres The first three-dimensional position information in the camera coordinate system of each vertex of the plurality of patches constituting each first sphere.
- the template sphere is shown as 41 in FIG. 4, for example, the template sphere includes a plurality of template patches, and the template vertex of each template patch has a certain positional relationship with the center point of the template sphere.
- the first sphere can be obtained based on the deformation of the template sphere. In the case of deforming the first template sphere, for example, the corresponding length of the three coordinate axes of each first sphere and the relative length of each first sphere can be obtained.
- shape and rotation angle transformation is performed on the template sphere; based on the result of the shape and rotation angle transformation on the template sphere, and the first positional relationship, it is determined that each template vertex and a second positional relationship between the center points of the transformed template spheres; based on the second three-dimensional positional information of the center point of each first sphere in the camera coordinate system and the second positional relationship, determine the composition The first three-dimensional position information of the respective vertices of the plurality of patches of each first sphere in the camera coordinate system.
- the template sphere in the case of transforming the shape and rotation angle of the template sphere, can be transformed in shape first, so that the three coordinate axes of the template sphere are respectively equal to the lengths of the three coordinate axes of the first sphere, and then The rotation angle transformation is performed based on the result of the shape transformation of the template sphere, so that the directions of the three coordinate axes of the template sphere in the camera coordinate system correspond one-to-one with the directions of the three coordinate axes of the first sphere, completing the transformation of the template sphere.
- Shape and rotation angle transformation is performed based on the result of the shape transformation of the template sphere, so that the directions of the three coordinate axes of the template sphere in the camera coordinate system correspond one-to-one with the directions of the three coordinate axes of the first sphere, completing the transformation of the template sphere.
- the lengths of the three coordinate axes in the template sphere and the rotation angle in the camera coordinate system are also determined.
- the length of the coordinate axis and the rotation angle in the camera coordinate system, and the first positional relationship between the template vertices of the template patches constituting the template sphere and the center point of the template sphere can be determined.
- the second positional relationship between the template vertices of each template patch and the center point of the transformed template sphere Based on the second position relationship and the second three-dimensional position information of the center point of the first sphere in the camera coordinate system, the three-dimensional position information of the template vertices constituting the plurality of template patches in the camera coordinate system is determined.
- the three-dimensional position information of the template vertices of the multiple template patches in the camera coordinate system that is, the first three-dimensional position information of the multiple vertexes of the multiple patches constituting the first sphere, respectively, in the camera coordinate system.
- an embodiment of the present disclosure further provides an example of transforming a template sphere into a first sphere.
- the template sphere is shown as 41 in FIG. 4 ;
- the result of the rotation angle transformation is shown in 42; 43 and 44 represent the human body formed by the first sphere; wherein, 43 is the perspective view of the human body formed by the first sphere.
- the three-dimensional position information is used to perform image rendering processing on a plurality of spheres constituting the first object to generate a first rendered image.
- image rendering processing may be performed on the plurality of first spheres constituting the first object in the following manner:
- the part index of any pixel point identifies the part on the first object corresponding to the any pixel point; the patch index of any pixel point identifies the patch corresponding to the any pixel point.
- the camera is the camera that acquires the first image;
- the projection matrix of the camera may be based on the position of the camera in the camera coordinate system, and the first three-dimensional values of the vertices of the multiple patches constituting the first sphere in the camera coordinate system, respectively.
- a plurality of first spheres can be mapped into the camera coordinate system based on the projection matrix to obtain a first rendered image.
- the multiple first spheres are collectively rendered based on the first sphere position information corresponding to the multiple spheres, respectively, A first rendered image including all first spheres is obtained.
- the gradient information of the first rendered images corresponding to all the first spheres is obtained, and the first sphere position information of the plurality of first spheres is adjusted based on the gradient information.
- rendering is performed for each of the multiple first spheres, respectively, to obtain the same result as the multiple first spheres.
- the first rendered images corresponding to the first spheres respectively.
- the gradient information of the first rendered images corresponding to the multiple first spheres is obtained, and based on the gradient information of the first rendered images corresponding to the multiple first spheres, the A sphere position.
- a pre-trained semantic segmentation network can be used to perform semantic segmentation processing on the first image to obtain a semantically segmented image of the first image.
- the pixel values of the corresponding pixels of different first spheres are different when they are rendered to the first rendered image;
- the pixel value corresponding to any pixel in the semantically segmented image represents the classification value of the part to which the pixel at the corresponding position in the first image belongs.
- the classification values corresponding to different parts of the first object in the semantically segmented image are also different.
- the pixel value of the corresponding pixel is the same as the classification value corresponding to the part in the semantic segmentation image.
- the gradient information of the first rendered image is obtained based on the first rendered image and the semantically segmented image of the first image, for example, the following methods may be used:
- each first sphere For each first sphere, obtain gradient information of the first rendered image corresponding to each first sphere according to the first rendered image and the semantically segmented image corresponding to each first sphere;
- the generated first rendered image and the first semantic segmentation image have the same pixel value of the corresponding position pixel. If the predicted first sphere position information of any first sphere is incorrect, the pixel values of the pixels corresponding to at least some positions in the first rendered image and the first semantic segmentation image may be different.
- the first rendered image and the semantically segmented image of the first image can be used to determine the gradient information of the first rendered image, where the gradient information represents that each of the first spheres is in the camera coordinate system The degree of correctness of the position information of the first sphere.
- the 3D model of the first object has higher accuracy.
- the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image.
- each pixel in the first rendered image may be traversed, the first pixel value of the traversed pixel in the first rendered image, and the The second pixel value of the traversed pixel point in the semantic segmentation image is determined, and the gradient value of the traversed pixel point is determined.
- an embodiment of the present disclosure further provides a method for determining the gradient value of a traversed pixel point, including:
- S501 Determine the residual of the traversed pixel point according to the first pixel value of the traversed pixel point and the second pixel value of the traversed pixel point.
- the traversed pixel point in the case where the first pixel value and the second pixel value of the traversed pixel point are equal, it is considered that the traversed pixel point is the projection point.
- the first sphere position information for a sphere is predicted correctly.
- the position point is a position point on any patch on the first sphere representing any part of the first object.
- the first sphere position information of the first sphere to which the position point of the traversed pixel point is the projection point belongs to the prediction Mistake.
- the first value is 0, for example.
- S504 Determine target three-dimensional position information in the camera coordinate system of at least one target vertex on the target patch, wherein, in the case that the at least one target vertex is located at a position identified by the target three-dimensional position information, Determining the residual difference between the new first pixel value obtained by re-rendering the traversed pixel point and the second pixel value corresponding to the traversed pixel point as the first value;
- S505 Obtain the gradient value of the traversed pixel point based on the first three-dimensional position information of the target vertex in the camera coordinate system and the three-dimensional position information of the target.
- the patch is a triangular patch, that is, any patch constituting the first sphere includes three edges and three vertices.
- the pixel point P is the traversed pixel point
- I P (x) ⁇ 0,1 point P represents a pixel rendering function.
- 61 represents a target patch; the target patch is the jth patch in the first sphere representing the ith part in the first object. represents the kth vertex in the target patch, that is, the target vertex in the embodiment of the present disclosure.
- 62 denotes an occlusion patch covering the target patch in the direction of the camera, and the patch occluded by the target patch and the target patch belong to different first spheres.
- the first pixel value of the pixel point P is to be rendered as the first pixel value corresponding to the target patch; in this example, the pixel point P is occluded by the occlusion patch 62, and
- the target patch 61 When the target patch 61 is projected in the image coordinate system, it will not cover the pixel point P; therefore, the target vertex is adjusted in either the x-axis direction and the y-axis direction in the camera coordinate system.
- the position of will not make the new first pixel value obtained after the re-rendering of the pixel P is the same as the first pixel value corresponding to the target patch.
- ⁇ I P represents a pixel point P residuals.
- x 0 means that the target vertex Before moving along the x-axis, the target vertex The coordinate value on the x-axis; x 1 means to place the target vertex After moving along the x-axis, the target vertex Coordinate value on the x-axis.
- ⁇ denotes a hyperparameter.
- ⁇ ( ⁇ , ⁇ ) represents the distance between two points.
- the first pixel value of the pixel point P is to be rendered as the first pixel value corresponding to the target patch; in this example, the pixel point P is not blocked by the blocking patch 62, so , just move the target vertex along the x-axis of the camera coordinate system , the new first pixel value obtained after the re-rendering of the pixel point P will be the same as the first pixel value corresponding to the target patch. Therefore, in this case, as shown in b in Figure 6, You can move the target vertex in the x-axis direction of the camera coordinate system When the target patch is projected in the image coordinate system, it can cover the pixel point P to obtain the target vertex The three-dimensional position information of the target in the camera coordinate system.
- the gradient value of the pixel point P satisfies the above formula (2), and the gradient values of the pixel point P in the z-axis direction and the y-axis direction are both 0.
- the first pixel value of the pixel point P is to be rendered as the first pixel value corresponding to the target patch; in this example, the pixel point P is occluded by the occlusion patch 62, and
- the target patch 61 When the target patch 61 is projected in the image coordinate system, it will cover the pixel point P, so there is no need to adjust the target vertex in the x-axis and y-axis directions of the camera coordinate system , just adjust the target vertex in the z-axis direction according to e in Figure 6
- the position of so that the position point Q projected to the pixel point P in the target patch can be located in front of the occlusion patch (relative to the position of the camera), and then the target vertex can be obtained.
- the three-dimensional position information of the target in the camera coordinate system The three-dimensional position information of the target in the camera coordinate system.
- the gradient value of the pixel point P satisfies the above formula (3), and the gradient values of the pixel point P in the x-axis direction and the y-axis direction are both 0.
- the first pixel value of the pixel point P is to be rendered as a first pixel value different from the target patch; in this example, the pixel point P is not blocked by the blocking patch 62,
- the target patch 61 When the target patch 61 is projected in the image coordinate system, it will cover the pixel point P; at this time, the target vertex needs to be moved along the x-axis direction of the camera coordinate system , the new first pixel value obtained by re-rendering the pixel point P will be different from the first pixel value corresponding to the target patch.
- the gradient value of the pixel point P satisfies the above formula (2), and the gradient values of the pixel point P in the y-axis direction and the z-axis direction are both 0.
- the gradient value of each pixel in the first rendered image can be obtained; the gradient values of all pixels in the first rendered image constitute gradient information of the first rendered image.
- the first sphere position information of the first sphere based on the gradient information of the first rendered image
- at least one item of the first sphere position information of the first sphere may be adjusted, or
- the second three-dimensional position information of the center point of each first sphere in the camera coordinate system, the lengths corresponding to the three coordinate axes of each first sphere, and the length of each first sphere Adjust with respect to at least one of the rotation angles of the camera coordinate system, so that in the new first rendered image generated based on the adjusted first sphere position information, the gradient value of each pixel tends to the first
- the direction of the numerical value changes, so that the position information of the first sphere can be gradually approximated to the real value through multiple iterations, the accuracy of the position information of the first sphere is improved, and the accuracy of the three-dimensional model of the first object is finally improved.
- an embodiment of the present disclosure further provides a method for generating a neural network, including:
- S701 Using the neural network to be trained, perform three-dimensional position information prediction processing on the second object in the second image, and obtain the camera coordinate system of each second sphere in a plurality of second spheres representing different parts of the second object.
- S702 Generate a second rendered image based on the second sphere position information corresponding to the plurality of second spheres respectively;
- S703 Obtain gradient information of the second rendered image based on the second rendered image and the semantically annotated image of the second image;
- S704 Based on the gradient information of the second rendered image, update the neural network to be trained to obtain an updated neural network.
- FIG. 3 the structure of the neural network provided by the embodiment of the present disclosure is shown in FIG. 3 , which will not be repeated here.
- the neural network to be optimized is used to perform three-dimensional position information prediction processing on the second object in the second image, and the second sphere positions of the plurality of second spheres representing the three-dimensional model of the second object in the second image are obtained.
- image rendering is performed based on the position information of the second sphere, and based on the result of the image rendering, gradient information representing the degree of correctness of the second sphere position information of the plurality of second spheres is determined, and the to-be-optimized update is based on the gradient information
- the optimized neural network is obtained, so that the optimized neural network has higher prediction accuracy of three-dimensional position information.
- the implementation process of the above S702 is similar to the implementation process of the above S102; the implementation process of the above S703 is similar to the implementation process of the above S103, and will not be repeated here.
- the embodiment of the present disclosure can transmit the gradient on a certain pixel to the Euclidean coordinates of the node on the 3D grid, that is, the shape of the 3D object model can be corrected by using image information such as object outline and semantic segmentation of parts.
- image information such as object outline and semantic segmentation of parts.
- the previous item is propagated: for the mesh from the 3D model to the image pixels;
- each triangular patch (the above-mentioned patch) on the image plane; for each pixel on the image plane, calculate the area where the pixel is located, the distance The index of the triangular patch closest to the camera (that is, which triangular patch is rendered by this pixel during complete rendering); an image that stores the triangular patch index for each pixel is the triangular face index (Face Index). ) (patch index above).
- the Part Index (the above-mentioned part index)
- a rendered image is generated, and then for each part (the above-mentioned part), a separate Extract a portion of pixel values from the complete rendered image, wherein the pixel coordinates of the extracted portion belong to the current part in the part index.
- the value of a pixel can be an RGB value, a grayscale value, or a brightness value and a binary value.
- the binary value is taken as an example, that is, the visible value is 1, and the invisible value is 0.
- the gradient on a pixel is either positive (0 to 1) or negative (1 to 0).
- the supervision information used is no longer limited to the complete rendered image, and the semantic segmentation of objects can be used as supervision information; in the case of multiple objects being rendered together, different objects can also be regarded as components and rendered independently. Thus, the positional relationship between different objects can be known.
- the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process, and the execution order of each step should be determined by its function and possible internal logic.
- the embodiment of the present disclosure also provides a three-dimensional model generation device corresponding to the three-dimensional model generation method. Since the device in the embodiment of the present disclosure is similar to the above-mentioned three-dimensional model generation method in the embodiment of the present disclosure, the implementation of the device can be Refer to the implementation of the method, and the repeated places will not be repeated.
- the device includes: a first acquisition part 81 , a first generation part 82 , a first gradient determination part 83 , an adjustment part 84 , and the model generation section 85; wherein,
- the first acquisition part 81 is configured to acquire first sphere position information in the camera coordinate system of each of the plurality of first spheres in the camera coordinate system based on the first image including the first object, the plurality of first spheres respectively represent different parts of the first object;
- a first generating part 82 configured to generate a first rendered image based on the first sphere position information of the plurality of first spheres
- a first gradient determination part 83 configured to obtain gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image
- an adjustment part 84 configured to adjust the first sphere position information of the plurality of first spheres based on the gradient information of the first rendered image
- the model generation part 85 is configured to generate a three-dimensional model of the first object using the adjusted first sphere position information of the plurality of first spheres.
- the first generation part 82 in the case of generating a first rendered image based on the first sphere position information of the plurality of first spheres, is configured to:
- the first rendered image is generated based on the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system.
- the first generation part 82 determines, based on the first sphere position information, that each vertex of the plurality of patches constituting the first sphere is at the camera coordinate, respectively.
- the first three-dimensional position information in the system it is configured as:
- the template sphere constitutes the template sphere.
- the first sphere position information of each first sphere includes: second three-dimensional position information of the center point of each first sphere in the camera coordinate system, The lengths corresponding to the three coordinate axes of each first sphere respectively, and the rotation angle of each first sphere relative to the camera coordinate system.
- the first generation part 82 is based on a first positional relationship between template vertices of a plurality of template patches constituting a template sphere and a center point of the template sphere, and the The first sphere position information of each first sphere, in the case of determining the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system, is configured as :
- each vertex of a plurality of patches constituting each first sphere is respectively First three-dimensional position information in the camera coordinate system.
- the first acquisition part 81 is further configured to: acquire the projection matrix of the camera of the first image;
- the first generation part 82 in the case of generating the first rendered image based on the first three-dimensional position information of the respective vertices of the plurality of patches constituting the first sphere in the camera coordinate system, respectively, is configured as:
- the part index of any pixel point identifies the part on the first object corresponding to the any pixel point; the patch index of any pixel point identifies the patch corresponding to the any pixel point.
- the first generation part 82 is based on the first three-dimensional position information in the camera coordinate system of the respective vertices of the plurality of patches constituting the first sphere, respectively,
- the case where the first rendered image is generated is configured as:
- the first gradient determination part 83 in the case of obtaining the gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image, is configured to:
- the gradient information of the first rendered image corresponding to each first sphere is obtained according to the first rendered image and the semantically segmented image corresponding to each first sphere.
- the gradient information of the first rendered image includes: a gradient value of each pixel in the first rendered image
- the first gradient determination part 83 in the case of obtaining the gradient information of the first rendered image based on the first rendered image and the semantic segmentation of the first image, is configured to:
- the first gradient determination part 83 determines the first pixel value in the first rendered image for the traversed pixel point in the first rendered image and the traversed pixel point in the semantic
- the second pixel value in the segmented image in the case of determining the gradient value of the traversed pixel point, is configured as:
- determining the first sphere from the plurality of first spheres based on the second pixel value of the traversed pixel point The target first sphere corresponding to the traversed pixel points is determined, and the target patch is determined from the multiple patches constituting the target first sphere;
- the new first pixel value obtained by re-rendering the traversed pixel point and the residual difference between the second pixel value corresponding to the traversed pixel point is determined as the first numerical value
- the gradient value of the traversed pixel point is obtained.
- the first acquisition part 81 acquires, based on the first image including the first object, the first sphere position of each of the plurality of first spheres in the camera coordinate system Information case is configured as:
- position information prediction processing is performed on the first image to obtain first sphere position information of each of the plurality of first spheres in the camera coordinate system.
- an embodiment of the present disclosure further provides a neural network generating apparatus, including:
- the second obtaining part 91 is configured to perform three-dimensional position information prediction processing on the second object in the second image by using the neural network to be trained, and obtain each of the plurality of second spheres representing different parts of the second object The position information of the second sphere in the camera coordinate system of the second sphere;
- the second generating part 92 is configured to generate a second rendered image based on the second sphere position information corresponding to the plurality of second spheres respectively;
- the second gradient determination part 93 is configured to obtain the gradient information of the second rendered image based on the second rendered image and the semantically annotated image of the second image;
- the updating part 94 is configured to update the neural network to be trained based on the gradient information of the second rendered image to obtain an updated neural network.
- a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, a unit, a module or a non-modularity.
- An embodiment of the present disclosure further provides a computer device.
- the schematic structural diagram of the computer device provided by the embodiment of the present disclosure includes:
- machine-readable instructions are executed by the processor to implement the following steps:
- the neural network to be trained is updated to obtain an updated neural network.
- Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the three-dimensional model generation method described in the foregoing method embodiments or Steps of a neural network generation method.
- the storage medium may be a volatile or non-volatile computer-readable storage medium.
- the computer program product of the three-dimensional model generation method or the neural network generation method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be configured to execute the above method embodiments.
- the steps of the three-dimensional model generation method or the neural network generation method reference may be made to the foregoing method embodiments, which will not be repeated here.
- Embodiments of the present disclosure also provide a computer program, which implements any one of the methods in the foregoing embodiments when the computer program is executed by a processor.
- the computer program product can be implemented in hardware, software, or a combination thereof.
- the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
- Embodiments of the present disclosure also provide a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device implements the above-mentioned three-dimensional model when executed. generation method, or the above-mentioned neural network generation method.
- the accuracy of the reconstructed model can be optimized, and the ambiguity caused by the self-occlusion of the high-degree-of-freedom model can be reduced; and, in deep learning, through the embodiments of the present disclosure, the Image and 3D space are linked; thereby improving the accuracy of tasks such as semantic segmentation, 3D reconstruction, etc.
- the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
- the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
- Embodiments of the present disclosure provide a method for generating a three-dimensional model, a method for generating a neural network, and an apparatus, wherein the method for generating a three-dimensional model includes: acquiring, based on a first image including a first object, each of the plurality of first spheres.
- First sphere position information of a sphere in the camera coordinate system the plurality of first spheres are respectively configured to represent different parts of the first object; based on the first sphere position information of the plurality of first spheres , generate a first rendered image; based on the first rendered image and the semantic segmentation of the first image, obtain gradient information of the first rendered image; based on the gradient information of the first rendered image, adjust the The first sphere position information of the plurality of first spheres is used, and the three-dimensional model of the first object is generated by using the adjusted first sphere position information of the plurality of first spheres.
- the three-dimensional model generated by the embodiment of the present disclosure has higher precision.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Architecture (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Processing Or Creating Images (AREA)
- Image Generation (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (16)
- 一种三维模型生成方法,包括:基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息,并利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
- 根据权利要求1所述三维模型生成方法,其中,所述基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像,包括:基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息;基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成所述第一渲染图像。
- 根据权利要求2所述的三维模型生成方法,其中,所述基于所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,包括:基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的所述第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
- 根据权利要求3所述的三维模型生成方法,其中,所述每个第一球体的所述第一球体位置信息包括:所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息、所述每个第一球体的三个坐标轴分别对应的长度、以及所述每个第一球体相对于所述相机坐标系的旋转角度。
- 根据权利要求4所述的三维模型生成方法,其中,所述基于构成模板球体的多个模板面片的模板顶点与所述模板球体的中心点之间的第一位置关系、以及所述每个第一球体的第一球体位置信息,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,包括:基于所述每个第一球体的三个坐标轴分别对应的长度以及所述每个第一球体相对于所述相机坐标系的旋转角度,对所述模板球体进行形状及旋转角度变换;基于对所述模板球体进行形状及旋转角度变换的结果以及所述第一位置关系,确定各个模板顶点与变换后的模板球体的中心点之间的第二位置关系;基于所述每个第一球体的中心点在所述相机坐标系中的第二三维位置信息以及所述第二位置关系,确定构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息。
- 根据权利要求2-5任一项所述的三维模型生成方法,其中,所述方法还包括:获取所述第一图像的相机的投影矩阵;所述基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像,包括:基于所述第一三维位置信息以及所述投影矩阵,确定第一渲染图像中每个像素点的部位索引以及面片索引;基于确定的第一渲染图像中每个像素点的部位索引以及面片索引,生成所述第一渲染图像;其中,任一像素点的部位索引标识所述任一像素点对应的所述第一对象上的部位;任一像素点的面片索引标识所述任一像素点对应的面片。
- 根据权利要求2-6任一项所述的三维模型生成方法,其中,所述基于构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成第一渲染图像,包括:针对所述每个第一球体,根据构成所述每个第一球体的多个面片的各个顶点分别在所述相机坐标系中的第一三维位置信息,生成与所述每个第一球体对应的第一渲染图像;所述基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,包括:针对所述每个第一球体,根据所述每个第一球体对应的第一渲染图像和语义分割图像,得到与所述每个第一球体对应的第一渲染图像的梯度信息。
- 根据权利要求1-7任一项所述的三维模型生成方法,其中,所述第一渲染图像的梯度信息包括:所述第一渲染图像中每个像素点的梯度值;所述基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息,包括:遍历所述第一渲染图像中的各个像素点,针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值。
- 根据权利要求8所述的三维模型生成方法,其中,所述针对遍历到的像素点在所述第一渲染图像中的第一像素值,以及所述遍历到的像素点在所述语义分割图像中的第二像素值,确定所述遍历到的像素点的梯度值,包括:根据所述遍历到的像素点的所述第一像素值,以及所述遍历到的像素点的所述第二像素值,确定所述遍历到的像素点的残差;在所述遍历到的像素点的残差为第一数值的情况下,将所述遍历到的像素点的梯度值确定为所述第一数值;在所述遍历到的像素点的残差不为所述第一数值的情况下,基于所述遍历到的像素点的所述第二像素值,从所述多个第一球体中确定所述遍历到的像素点对应的目标第一球体,并从构成所述目标第一球体的多个面片中确定目标面片;确定所述目标面片上的至少一个目标顶点在所述相机坐标系中的目标三维位置信息,其中,在所述至少一个目标顶点位于所述目标三维位置信息所标识的位置的情况下,将所述遍历到的像素点进行重新渲染得到的新的第一像素值,和所述遍历到的像素点对应的第二像素值之间的残差确定为所述第一数值;基于所述目标顶点在所述相机坐标系中的第一三维位置信息和所述目标三维位置信息,得到所述遍历到的像素点的梯度值。
- 根据权利要求1-9任一项所述的三维模型生成方法,其中,所述基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,包括:利用预先训练的位置信息预测网络,对所述第一图像进行位置信息预测处理,得到所述多个第一球体中每个第一球体在所述相机坐标系中的第一球体位置信息。
- 一种神经网络生成方法,包括:利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
- 一种三维模型生成装置,包括:第一获取部分,被配置为基于包含第一对象的第一图像,获取多个第一球体中每个第一球体在相机坐标系中的第一球体位置信息,所述多个第一球体分别表示所述第一对象不同部位;第一生成部分,被配置为基于所述多个第一球体的所述第一球体位置信息,生成第一渲染图像;第一梯度确定部分,被配置为基于所述第一渲染图像以及所述第一图像的语义分割图像,得到所述第一渲染图像的梯度信息;调整部分,被配置为基于所述第一渲染图像的梯度信息,调整所述多个第一球体的所述第一球体位置信息;模型生成部分,被配置为利用调整后的所述多个第一球体的所述第一球体位置信息,生成所述第一对象的三维模型。
- 一种神经网络生成装置,包括:第二获取部分,被配置为利用待训练的神经网络,对第二图像中的第二对象进行三维位置信息预测处理,得到表征所述第二对象不同部位的多个第二球体中每个第二球体在相机坐标系中的第二球体位置信息;第二生成部分,被配置为基于多个第二球体分别对应的第二球体位置信息,生成第二渲染图像;第二梯度确定部分,被配置为基于所述第二渲染图像、以及所述第二图像的语义标注图像,得到所述第二渲染图像梯度信息;更新部分,被配置为基于所述第二渲染图像的梯度信息,更新所述待训练的神经网络,得到更新后的神经网络。
- 一种电子设备,包括:处理器、以及存储器,所述存储器存储有所述处理器可执行的机器可读指令,所述处理器被配置为执行所述存储器中存储的机器可读指令,所述机器可读指令被所述处理器执行时,所述处理器执行如权利要求1至11任一项所述的方法的步骤。
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被 电子设备运行时,所述电子设备执行如权利要求1至11任意一项所述的方法的步骤。
- 一种计算机程序,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行时实现权利要求1至11中任意一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021573567A JP2022542758A (ja) | 2020-06-29 | 2021-03-23 | 3次元モデル生成方法、ニューラルネットワーク生成方法及び装置 |
EP21819707.7A EP3971840A4 (en) | 2020-06-29 | 2021-03-23 | THREE-DIMENSIONAL MODEL GENERATION METHOD, NEURON NETWORK GENERATION METHOD AND ASSOCIATED DEVICES |
KR1020217042400A KR20220013403A (ko) | 2020-06-29 | 2021-03-23 | 3차원 모델 생성 방법, 신경망 생성 방법 및 장치 |
US17/645,446 US20220114799A1 (en) | 2020-06-29 | 2021-12-21 | Three dimensional model generation method and apparatus, and neural network generating method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010607430.5 | 2020-06-29 | ||
CN202010607430.5A CN111739159A (zh) | 2020-06-29 | 2020-06-29 | 三维模型生成方法、神经网络生成方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/645,446 Continuation US20220114799A1 (en) | 2020-06-29 | 2021-12-21 | Three dimensional model generation method and apparatus, and neural network generating method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022001222A1 true WO2022001222A1 (zh) | 2022-01-06 |
Family
ID=72652991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/082485 WO2022001222A1 (zh) | 2020-06-29 | 2021-03-23 | 三维模型生成方法、神经网络生成方法及装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20220114799A1 (zh) |
EP (1) | EP3971840A4 (zh) |
JP (1) | JP2022542758A (zh) |
KR (1) | KR20220013403A (zh) |
CN (1) | CN111739159A (zh) |
WO (1) | WO2022001222A1 (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111739159A (zh) * | 2020-06-29 | 2020-10-02 | 上海商汤智能科技有限公司 | 三维模型生成方法、神经网络生成方法及装置 |
CN112883102B (zh) * | 2021-03-05 | 2024-03-08 | 北京知优科技有限公司 | 数据可视化展示的方法、装置、电子设备及存储介质 |
US11830138B2 (en) * | 2021-03-19 | 2023-11-28 | Adobe Inc. | Predicting secondary motion of multidimentional objects based on local patch features |
CN113239943B (zh) * | 2021-05-28 | 2022-05-31 | 北京航空航天大学 | 基于部件语义图的三维部件提取组合方法和装置 |
KR102553304B1 (ko) * | 2022-11-01 | 2023-07-10 | 주식회사 로지비 | 딥러닝 비전 학습 기반 물류 검수 서버 및 그 동작 방법 |
CN117274473B (zh) * | 2023-11-21 | 2024-02-02 | 北京渲光科技有限公司 | 一种多重散射实时渲染的方法、装置及电子设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1645416A (zh) * | 2005-01-20 | 2005-07-27 | 上海交通大学 | 基于肌肉体积不变性的人肢体三维建模方法 |
CN105303611A (zh) * | 2015-12-08 | 2016-02-03 | 新疆华德软件科技有限公司 | 基于旋转抛物面的虚拟人肢体建模方法 |
CN108648268A (zh) * | 2018-05-10 | 2018-10-12 | 浙江大学 | 一种基于胶囊的人体模型逼近方法 |
CN108846892A (zh) * | 2018-06-05 | 2018-11-20 | 陈宸 | 人体模型的确定方法及装置 |
CN111739159A (zh) * | 2020-06-29 | 2020-10-02 | 上海商汤智能科技有限公司 | 三维模型生成方法、神经网络生成方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7203844B2 (ja) * | 2017-07-25 | 2023-01-13 | 達闥機器人股▲分▼有限公司 | トレーニングデータの生成方法、生成装置及びその画像のセマンティックセグメンテーション方法 |
EP3579196A1 (en) * | 2018-06-05 | 2019-12-11 | Cristian Sminchisescu | Human clothing transfer method, system and device |
CN111126242B (zh) * | 2018-10-16 | 2023-03-21 | 腾讯科技(深圳)有限公司 | 肺部图像的语义分割方法、装置、设备及存储介质 |
CN110633628B (zh) * | 2019-08-02 | 2022-05-06 | 杭州电子科技大学 | 基于人工神经网络的rgb图像场景三维模型重建方法 |
-
2020
- 2020-06-29 CN CN202010607430.5A patent/CN111739159A/zh not_active Withdrawn
-
2021
- 2021-03-23 KR KR1020217042400A patent/KR20220013403A/ko not_active Application Discontinuation
- 2021-03-23 EP EP21819707.7A patent/EP3971840A4/en not_active Withdrawn
- 2021-03-23 JP JP2021573567A patent/JP2022542758A/ja active Pending
- 2021-03-23 WO PCT/CN2021/082485 patent/WO2022001222A1/zh unknown
- 2021-12-21 US US17/645,446 patent/US20220114799A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1645416A (zh) * | 2005-01-20 | 2005-07-27 | 上海交通大学 | 基于肌肉体积不变性的人肢体三维建模方法 |
CN105303611A (zh) * | 2015-12-08 | 2016-02-03 | 新疆华德软件科技有限公司 | 基于旋转抛物面的虚拟人肢体建模方法 |
CN108648268A (zh) * | 2018-05-10 | 2018-10-12 | 浙江大学 | 一种基于胶囊的人体模型逼近方法 |
CN108846892A (zh) * | 2018-06-05 | 2018-11-20 | 陈宸 | 人体模型的确定方法及装置 |
CN111739159A (zh) * | 2020-06-29 | 2020-10-02 | 上海商汤智能科技有限公司 | 三维模型生成方法、神经网络生成方法及装置 |
Non-Patent Citations (2)
Title |
---|
MIN WANG; FENG QIU; WENTAO LIU; CHEN QIAN; XIAOWEI ZHOU; LIZHUANG MA: "EllipBody: A Light-weight and Part-based Representation for Human Pose and Shape Recovery", ARXIV.ORG, 24 March 2020 (2020-03-24), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081628098 * |
See also references of EP3971840A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP3971840A4 (en) | 2023-01-18 |
KR20220013403A (ko) | 2022-02-04 |
CN111739159A (zh) | 2020-10-02 |
JP2022542758A (ja) | 2022-10-07 |
EP3971840A1 (en) | 2022-03-23 |
US20220114799A1 (en) | 2022-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022001222A1 (zh) | 三维模型生成方法、神经网络生成方法及装置 | |
AU2020200811B2 (en) | Direct meshing from multiview input using deep learning | |
Li et al. | Monocular real-time volumetric performance capture | |
CN111325851B (zh) | 图像处理方法及装置、电子设备和计算机可读存储介质 | |
US9892506B2 (en) | Systems and methods for shape analysis using landmark-driven quasiconformal mapping | |
CN104637090B (zh) | 一种基于单张图片的室内场景建模方法 | |
WO2021253788A1 (zh) | 一种人体三维模型构建方法及装置 | |
Kanatani et al. | Guide to 3D Vision Computation | |
Chhatkuli et al. | Inextensible non-rigid shape-from-motion by second-order cone programming | |
CN115439607A (zh) | 一种三维重建方法、装置、电子设备及存储介质 | |
CN113936090A (zh) | 三维人体重建的方法、装置、电子设备及存储介质 | |
JP2021026759A (ja) | オブジェクトの3dイメージングを実施するためのシステムおよび方法 | |
CN115830241A (zh) | 一种基于神经网络的真实感三维人脸纹理重建方法 | |
Twarog et al. | Playing with puffball: simple scale-invariant inflation for use in vision and graphics | |
CN112926543A (zh) | 图像生成、三维模型生成方法、装置、电子设备及介质 | |
Kazmi et al. | Efficient sketch‐based creation of detailed character models through data‐driven mesh deformations | |
CN113223137B (zh) | 透视投影人脸点云图的生成方法、装置及电子设备 | |
Golyanik | Robust Methods for Dense Monocular Non-Rigid 3D Reconstruction and Alignment of Point Clouds | |
Eapen et al. | Elementary methods for generating three-dimensional coordinate estimation and image reconstruction from series of two-dimensional images | |
CN116912433B (zh) | 三维模型骨骼绑定方法、装置、设备及存储介质 | |
WO2023233575A1 (ja) | 推定装置、学習装置、推定方法、学習方法及びプログラム | |
US20240161391A1 (en) | Relightable neural radiance field model | |
Wang et al. | Genetic-algorithm-based stereo vision with no block partitioning of input images | |
Kalel | Modelling Project Work: 3D reconstruction using Stereo vision | |
Frigerio et al. | Surface reconstruction using 3D morphological operators for objects acquired with a multi-Kinect system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021573567 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217042400 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2021819707 Country of ref document: EP Effective date: 20211215 |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21819707 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |