WO2022222091A1 - 一种基于单张照片的人物浅浮雕模型生成方法 - Google Patents

一种基于单张照片的人物浅浮雕模型生成方法 Download PDF

Info

Publication number
WO2022222091A1
WO2022222091A1 PCT/CN2021/088913 CN2021088913W WO2022222091A1 WO 2022222091 A1 WO2022222091 A1 WO 2022222091A1 CN 2021088913 W CN2021088913 W CN 2021088913W WO 2022222091 A1 WO2022222091 A1 WO 2022222091A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
bas
photo
character
skeleton
Prior art date
Application number
PCT/CN2021/088913
Other languages
English (en)
French (fr)
Inventor
周昆
陈翔
杨振杰
Original Assignee
浙江大学
杭州相芯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学, 杭州相芯科技有限公司 filed Critical 浙江大学
Priority to PCT/CN2021/088913 priority Critical patent/WO2022222091A1/zh
Publication of WO2022222091A1 publication Critical patent/WO2022222091A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation

Definitions

  • the invention relates to the geometric modeling field of computer graphics, in particular to a method for generating a multi-character bas-relief model based on a single photo.
  • Bas-relief is an artistic presentation with a long history, vivid in form and profound in meaning.
  • the generation of 3D bas-relief models combined with AR/VR and digital manufacturing technology has unique value in social, architecture, art creation, digital media and other fields.
  • the automatic generation method of the bas-relief model is based on the 3D shape, and the height of the model is reduced through various adaptive depth compression methods to achieve the goal.
  • the bas-relief model generation method using photos as input is often aimed at certain types of simple objects, or requires a lot of user interaction to complete, which is time-consuming and labor-intensive.
  • Current methods are unable to generate high-quality bas-relief models for a single person photo, a problem that is especially prominent in multi-person photos such as family photos.
  • multi-person shots there are often complex and emotionally charged physical interactions between characters. While existing neural network-like works can efficiently generate rough single-person models, such methods cannot accurately predict complex spatial occlusion relationships among multiple persons.
  • the object of the present invention is to provide a method for generating a bas-relief model for a single person photo in view of the deficiencies of the prior art, which can generate a multi-person bas-relief model on the basis of a small amount of user interaction, and ensure accurate and accurate spatial relationships. High fidelity of geometric detail features.
  • a method for generating a character bas-relief model based on a single photo comprising the following steps:
  • Step 1 Given a single photo containing a single person or multiple people, generate the 3D skeletons of all the characters and correct the wrong occlusion relationship between the bones in the 3D skeletons of all the characters through user interaction; then use the corrected correct occlusion relationship as Constraints, solve the correct 3D skeleton joint point coordinates to generate a 3D skeleton with correct occlusion relationship; and then fit each character's parameterized human body model based on the correct 3D skeleton relationship as a 3D human body guide model;
  • Step 2 Use the non-rigid plane deformation function to align the outline of the 3D human guidance model with the outline of the person in the photo and optimize the non-rigid plane deformation function, and then use the optimized non-rigid plane deformation function to align the 3D human guidance model in the image space.
  • the normal map is distorted to obtain a distorted normal map, and a least squares problem is solved based on the distorted normal map to obtain a low-frequency basic model of a human body bas-relief; wherein, the outline of the 3D human body guide model is in accordance with the normal direction of the 3D human body guide model.
  • the projection of the common boundary of the two groups of triangular patches facing outward and inward relative to the image viewing angle in the image space; the energy equation used to optimize the non-rigid plane deformation function is:
  • Z rq represents the correlation coefficient between the outline of the 3D human guidance model and the outline of the person in the photo, and it is required to satisfy the constraint Z rq ⁇ ⁇ 0,1 ⁇ , as well as R r represents the coordinate of the rth point of the outline of the person in the photo, Q q represents the coordinate of the qth point of the outline of the 3D human guide model, K 1 and K 2 are the number of point sets of the outline of the person in the photo and the outline of the 3D human guide model, respectively .
  • f is the non-rigid plane deformation function
  • ⁇ and ⁇ are the weights, which are real numbers
  • is the regularization function
  • L(f) represents the constraint term of the smoothness of the non-rigid plane deformation function f.
  • Step 3 extract high-frequency detail features from the photo and generate a detail normal map, and synthesize the detail normal map with the low-frequency basic model generated in step 2 to obtain the final human body bas-relief model; wherein, the high-frequency detail features are The gradient information of each layer of the gray value pyramid, and the detail normal map is the mean value of the gradient information of all layers.
  • the 3D skeleton generation method of all characters is:
  • the optimized energy is as follows:
  • the optimization variables are the camera internal parameters K and external parameters, that is, the similarity transformation matrix T i of each character's 3D skeleton, including scaling coefficients, rotation matrices and translation matrices represent the translation parameters in the three directions of x, y, and z respectively;
  • v is the joint point coordinates of the 3D skeleton, is the 3D skeleton joint point coordinate set of each character, N is the number of people in the photo;
  • p is the 2D joint point coordinates in the photo corresponding to v, ⁇ K is the projection function based on the camera parameter K,
  • step 1 using the correct occlusion relationship after correction as a constraint, solving the correct 3D skeleton joint point coordinates to generate a 3D skeleton with a correct occlusion relationship is specifically:
  • the joint points of the 3D skeleton are recalculated by optimizing the following energy to generate a 3D skeleton with correct occlusion relationship:
  • L is the matrix corresponding to the graph Laplacian
  • z and z (0) are the z-coordinate vectors before and after optimization of all nodes
  • is the interpolation parameter of the intersection point between the two endpoints of the bone
  • the subscript 0, 1 is the coordinate index of the two endpoints of the bone
  • Occpairs represents the intersecting bone pair in the photo
  • the subscript f , b represent the intersecting front and rear bones respectively
  • represents the weight
  • d gap represents the depth gap, which is used to compensate the bone thickness.
  • the parameterized human body model of each character is respectively fitted as a 3D human body guide model, specifically: using the fitting energy to optimize the posture parameters ⁇ i and ⁇ of each character.
  • the shape parameter ⁇ i is as follows:
  • v is on the joint point, is a collection of 3D skeleton joint point coordinates for each character whose occlusion relationship is correct, is the joint point of v in the parameterized human skeleton template.
  • the outline of the person is extracted from the photo by using the multi-scale edge detection method of the neural network.
  • the second step when optimizing the non-rigid plane deformation function, it also includes: the user specifies the contour of the 3D human body guide model representing the special posture and the correct corresponding point in the contour of the person in the photo as the key point pair, and the specified The correlation coefficient between key point pairs is fixed as 1 as a hard constraint form of variables to join the optimization conditions.
  • the head, hair, hands and feet of the characters in the low-frequency basic model are reconstructed by other methods, specifically:
  • the obtained height field is used as the height field of the hair, hands and feet.
  • the low frequency base model is fused with other base models using Poisson editing.
  • step 3 the detailed normal map and the low-frequency basic model generated in the step 2 are synthesized to obtain the final human body bas-relief model specifically:
  • the present invention innovatively proposes a method for generating a bas-relief model of a character for a single photo, which can generate a multi-person bas-relief model on the basis of a small amount of user interaction, and ensures the accuracy of the spatial relationship of the characters.
  • the bas-relief model generated by the method of the invention has realistic 3D visual perception, is suitable for all kinds of single-person or multi-person photos, and has high universality, robustness and practicability.
  • Figure 1 is a flow chart of a method for generating a bas-relief model of a character based on a single photo.
  • FIG. 2 is a schematic diagram of image occlusion relationship analysis, wherein a is the original image occlusion relationship, b is the wrong occlusion relationship, and c is the correct occlusion relationship.
  • Fig. 3 is a schematic diagram of bone intersection analysis, wherein subscript i represents bone l i , subscript j represents bone l j , a means no intersection, b means l i is located above l j , c means l j is located between l i superior.
  • a is a contour probability map of a photo obtained by a multi-scale edge detection method based on a neural network
  • b is a point set sampled from the contour probability map
  • c is the uniform sampling point set of the contour obtained by the k-means algorithm
  • d is the contour extracted from the 3D human guidance model.
  • Figure 5 is a schematic diagram of point alignment, in which a is the initial state.
  • the point with a small diameter represents the original position (the contour point of the 3D guide model), and the point with a large diameter represents the target position (2D contour point).
  • the connecting line represents the correlation coefficient
  • b is the correlation coefficient matrix Z, where the last row and last column are the extra labels of the key points
  • c is the alignment result of applying the optimized non-rigid plane deformation function.
  • Figure 6 is a schematic diagram of aligning the contour of the 3D human guidance model with the 2D contour of the photo in image space.
  • a is the initial state, in which the point with a small diameter represents the contour point of the 3D guide model, the point with a large diameter represents the 2D contour point
  • b is the result of the contour alignment without using the key point constraint
  • c is the contour alignment result using the key point constraint
  • d is a schematic diagram of the user selecting key points on the user interface, where the contour point at the elbow joint is selected as the key point.
  • Figure 7 is a schematic diagram of the generation of the base shape of the bas-relief; wherein a is the 3D human body guide model, b is the normal map rendered from the 3D human body guide model, c is the twisted normal map, and d is the basis reconstructed from the twisted normal map Model, e is the generation of the head, hair, hands and feet, etc., and f is the complete basic model.
  • Figure 8 is a schematic diagram of bas-relief synthesis of basic model and image details; wherein, a is the original single photo, b is the basic model, c is the detail normal map, and d is the final bas-relief model.
  • Step 1 Given a single photo containing a single person or multiple people, generate a 3D skeleton and a 3D human guidance model with correct occlusion relationships.
  • This step is one of the cores of the present invention, and is divided into the following sub-steps.
  • p is the 2D joint point coordinates in the photo corresponding to v
  • ⁇ K is the projection function based on the pinhole camera parameter K
  • is the regularization function.
  • the first term is the reprojection error constraint, so that each joint point v of the 3D skeleton is consistent with its corresponding 2D joint point p in the image space under the desired camera projection;
  • the second term is the regular term, so that the Spatial transformations remain consistent across depth translations.
  • the optimization energy is a nonlinear optimization process.
  • the first item constrains the Laplacian coordinate change corresponding to the 3D skeleton graph structure
  • the second item constrains the front-to-back occlusion relationship between the overlapping skeleton pairs in the image space
  • L is the matrix corresponding to the graph Laplacian operator
  • z and z (0) are the z-coordinate vectors before and after optimization of all joints
  • the schematic diagram of the intersection point is shown in Figure 3, where ⁇ is the interpolation parameter of the intersection point between the two endpoints of the bone, the subscripts 0, 1 are the coordinates of the two endpoints of the bone index, Occpairs Indicates the intersecting bone pair in the photo, the subscripts f and b are the indices of the two intersecting bones before and after, respectively, and ⁇ indicates the weight, which is set to 0.1 in this embodiment.
  • d gap represents the depth gap, which is used to compensate the bone thickness, and is set
  • the parametric human body model of each character is respectively fitted as the 3D human body guide model of the bas-relief generation algorithm.
  • the fitted energies are as follows:
  • v is on the joint point, is the joint point of v in the parameterized human skeleton template.
  • Optimized calculation to get attitude parameters and shape parameters After that, a 3D human body model corresponding to each character is generated by using a parameterized template (the SMPL model is used in this embodiment) 3D human-guided model as a subsequent step.
  • Step 2 Align the 3D human guide model with the contour features of the given photo based on non-rigid deformation to generate a low-frequency base model of human bas-relief.
  • This step is one of the cores of the present invention, and is divided into the following sub-steps.
  • Z rq represents the correlation coefficient between the contour of the 3D human guidance model and the 2D contour of the photo, while the constraint Z rq ⁇ ⁇ 0,1 ⁇ is required to be satisfied, as well as Z is a matrix composed of Z rr , R r represents the r-th point coordinate of the outline of the person in the photo, Q q represents the q-th point coordinate of the outline of the 3D human guide model, K 1 and K 2 are the outline of the person in the photo and 3D
  • the number of point sets of the outline of the human body guide model, preferably, K 1 1.2K 2 .
  • f is the non-rigid plane deformation function.
  • the first term measures the approximate fidelity between point sets
  • the second term constrains the smoothness of the non-rigid plane deformation function f
  • the specific optimization is solved in the form of thin plate splines
  • the third term penalizes outliers number
  • is the weight, which is set to 0.01 in this embodiment.
  • the key point pair specified by the user is added to the optimization condition as a hard constraint form of the variable. Specifically, the correlation coefficient between the specified key point pair is fixed to 1, and the correlation coefficient of one point in the key point pair is included. The sex coefficient is fixed at 0. That is, in the Z matrix, the Z value of the keypoint pair is fixed to 1, and the other Z values of the row and column where the keypoint is located are set to 0, as shown in Figure 5b.
  • a normal map without head, hair, hands and feet is first generated on the 3D human body guidance model, and a non-rigid plane variable function f is used to map the 3D guidance model in the image space.
  • Warp the graph and solve a least squares problem based on the warped normal graph and combine the resulting head, hair, hands and feet to reconstruct the base model of the human bas-relief
  • the head is a depth map generated by extracting facial landmarks from photos and then using a 3D facial expression model, and combining the depth map into the base model of the body area; hair, hands and feet are generated by drawing a mask in the image.
  • Step 3 As shown in Figure 8, the high-frequency detail features are extracted from the image, and synthesized with the low-frequency basic model generated in Step 2 to obtain the final human body bas-relief model.
  • the low-frequency basic model of the bas-relief generated in step 2 and the high-frequency detail normal map are synthesized by solving the following least squares problem:
  • a multi-person bas-relief model can be generated only on the basis of a small amount of user interaction (correction of wrong occlusion relationship, key points representing special postures, hair, hands and feet), and the spatial relationship of the characters is ensured. accuracy and high fidelity of geometric detail features.
  • Fig. 9 adopts the method of the present invention and the existing method (S.Tang, F.Tan, K.Cheng, Z.Li, S.Zhu, and P.Tan, "A neural network for detailed human depth estimation from a single image” ,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp.7750–7759.)
  • the comparison results of the bas-relief model generated by 10 photos it can be seen from the figure that the 3D bas-relief model generated by the method of the present invention Realistic visual perception, suitable for all kinds of single or multi-person photos, with high universality, robustness and practicability.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本发明公开了一种基于单张照片的人物浅浮雕模型生成方法。该方法基于输入照片构建3D人体骨架,并通过3D骨架层来解析复杂的人体自遮挡和相互遮挡关系,构建3D人体引导模型。本发明还提出了一种基于轮廓匹配的形变算法,使得生成的低频基础形状模型与图像空间准确对齐,并在此基础上将低频基础形状模型与图像中的高频细节特征进行融合,从而得到浅浮雕模型。本发明的交互方式简单、直接、高效,可以使普通用户快速地从自己拍摄的照片或网络照片中构建出人体浅浮雕模型。本发明适用于各种不同的单人或多人照片,具有较高普适性和稳定性。理论分析和实验结果表明,本发明生成的浅浮雕模型3D视觉感知逼真,极具实用性和应用前景。

Description

一种基于单张照片的人物浅浮雕模型生成方法 技术领域
本发明涉及计算机图形学的几何建模领域,尤其涉及一种基于单张照片的多人物浅浮雕模型生成方法。
背景技术
浅浮雕是一种历史久远的艺术呈现,其形式生动且内涵深刻。在数字化时代,3D浅浮雕模型的生成配合AR/VR和数字化制造技术,在社交、建筑、艺术创作、数字媒体等领域有独特价值。
目前,浅浮雕模型的自动生成方法都是3D形状出发,通过各类自适应的深度压缩方法将模型高度变浅来达成目的。以照片作为输入的浅浮雕模型生成方法往往针对某类简单物体,或者需要大量的用户交互来完成,耗时耗力。当前方法无法针对单张人物照片来生成高质量的浅浮雕模型,这个问题在家庭照等多人照片中尤其突出。在多人照中,人物之间往往存在复杂且情绪饱满的肢体交互。虽然现有的神经网络类工作可以高效地生成粗略的单人模型,但这类方法无法准确预测多人之间复杂的空间遮挡关系。
发明内容
本发明的目的在于针对现有技术的不足,提供一种针对单张人物照片的浅浮雕模型生成方法,在少量用户交互的基础上即可生成多人浅浮雕模型,并确保空间关系的准确和几何细节特征的高保真度。
本发明的目的是通过以下技术方案来实现的:
一种基于单张照片的人物浅浮雕模型生成方法,包括以下步骤:
步骤一:给定单张包含单人或者多人的照片,生成所有人物的3D骨架并通过用户交互对所有人物3D骨架中骨骼之间的错误遮挡关系进行修正;再利用修正后正确的遮挡关系作为约束,求解正确的3D骨架关节点坐标生成遮挡关系正确的3D骨架;再基于遮挡关系正确的3D骨架分别拟合每个人物的参数化人体模型作为3D人体引导模型;
步骤二:采用非刚性平面形变函数将3D人体引导模型的轮廓与照片中人物的轮廓进行对齐并优化非刚性平面形变函数,再利用优化的非刚性平面形变函数在图像空间对3D人体引导模型的法向图进行扭曲获得扭曲法向图,并基于扭曲法向图求解最小二乘问题获得人体浅浮雕的低频基础模型;其中,所述3D人体引导模型的轮廓为3D人体引导模型中按照法向相对图像视角朝外和朝内的两组三角形面片的共同边界在图像空间中的投影;优化非刚性平 面形变函数采用的能量方程为:
Figure PCTCN2021088913-appb-000001
其中,Z rq表示3D人体引导模型的轮廓与照片中人物的轮廓之间的相关性系数,同时要求满足约束Z rq∈{0,1},
Figure PCTCN2021088913-appb-000002
以及
Figure PCTCN2021088913-appb-000003
R r表示照片中人物轮廓的第r个点坐标,Q q表示3D人体引导模型轮廓的第q个点坐标,K 1、K 2分别为照片中人物轮廓和3D人体引导模型轮廓的点集数目。f为非刚性平面形变函数;λ、ξ为权重,均为实数,||*||为正则化函数,L(f)表示非刚性平面形变函数f平滑性的约束项。
步骤三:从照片中提取高频细节特征并生成细节法向图,细节法向图与步骤二生成的低频基础模型合成获得最终的人体浅浮雕模型;其中,所述高频细节特征为照片的灰度值金字塔每一层的梯度信息,细节法向图为所有层的梯度信息的均值。
进一步地,所述步骤一中,所有人物的3D骨架生成方法为:
使用神经网络从照片中获取每个人物的2D姿态;
使用神经网络从每个人物的2D姿态分别预测3D骨架;
基于所有人物的3D骨架,优化计算所使用照片的相机内外参数,使得3D骨架与照片中2D关节点在图像空间对齐,获得所有人物的3D骨架;
优化能量如下:
Figure PCTCN2021088913-appb-000004
优化变量为相机内参数K和外参数即每个人物3D骨架的相似变换矩阵T i,包括缩放系数、旋转矩阵和平移矩阵
Figure PCTCN2021088913-appb-000005
分别表示x,y,z三个方向的平移参量;v为3D骨架的关节点坐标,
Figure PCTCN2021088913-appb-000006
为每个人物的3D骨架关节点坐标集合,N为照片中的人物数;p为v对应的在照片中的2D关节点坐标,π K为基于相机内参数K的投影函数,||*||为正则化函数。
Figure PCTCN2021088913-appb-000007
表示给定照片中的N个人物z方向平移参量的均值。
进一步地,所述步骤一中,利用修正后正确的遮挡关系作为约束,求解正确的3D骨架关节点坐标生成遮挡关系正确的3D骨架具体为:
通过优化下述能量来重新计算3D骨架的关节点位,生成遮挡关系正确的3D骨架:
Figure PCTCN2021088913-appb-000008
其中,L是图拉普拉斯算子对应的矩阵,z和z (0)是所有关节点优化前后的z坐标向量,
Figure PCTCN2021088913-appb-000009
为第j根骨骼的交点z坐标,α是交点在骨骼两个端点之间的插值参数,下标0,1为骨骼的两个端点坐标索引,Occpairs表示照片中相交的骨骼对,下标f、b分别表示相交的前后两根骨骼,ω表示权重,d gap表示深度间隙,用于补偿骨骼厚度。
进一步地,所述步骤一中,基于遮挡关系正确的3D骨架分别拟合每个人物的参数化人体模型作为3D人体引导模型,具体为:采用拟合能量优化每个人物的姿态参数θ i和形状参数β i如下:
Figure PCTCN2021088913-appb-000010
其中,v是
Figure PCTCN2021088913-appb-000011
上的关节点,
Figure PCTCN2021088913-appb-000012
为遮挡关系正确的每个人物的3D骨架关节点坐标集合,
Figure PCTCN2021088913-appb-000013
是v在参数化人体骨架模板中的关节点。
优化计算得到姿态参数
Figure PCTCN2021088913-appb-000014
和形状参数
Figure PCTCN2021088913-appb-000015
之后,利用参数化模板生成每个人物对应的3D人体模型
Figure PCTCN2021088913-appb-000016
作为3D人体引导模型。
进一步地,所述步骤二中,使用神经网络的多尺度边缘检测方法从照片中提取人物的轮廓。
进一步地,所述步骤二中,优化非刚性平面形变函数时还包括:用户指定表示特殊姿态的3D人体引导模型的轮廓和照片中人物的轮廓中正确对应的点作为关键点对,将指定的关键点对之间的相关性系数固定为1作为变量的硬约束形式加入优化条件。
进一步地,所述步骤二中,低频基础模型中人物的头部、毛发、手和脚采用其他方法重建,具体为:
通过从照片中提取面部标志再利用三维面部表情模型生成深度图重建头部的低频基础模型;
通过在照片中绘制毛发、手和脚区域的掩模,再估计掩模的边界梯度信息,并利用边界梯度信息作为边界条件求解拉普拉斯问题,获得的高度场作为毛发、手和脚的低频基础模型并利用泊松编辑与其他基础模型进行融合。
进一步地,所述步骤三中,所述细节法向图与步骤二生成的低频基础模型合成获得最终的人体浅浮雕模型具体为:
Figure PCTCN2021088913-appb-000017
其中,
Figure PCTCN2021088913-appb-000018
为合成后的浅浮雕模型,
Figure PCTCN2021088913-appb-000019
为低频基础模型
Figure PCTCN2021088913-appb-000020
中(u,v)像素位的高度,
Figure PCTCN2021088913-appb-000021
为合成后的浅浮雕模型
Figure PCTCN2021088913-appb-000022
中(u,v)像素位的高度,
Figure PCTCN2021088913-appb-000023
为细节法向图
Figure PCTCN2021088913-appb-000024
中(u,v)像素位的法向量,T U和T V为合成后的浅浮雕模型
Figure PCTCN2021088913-appb-000025
的U,V两个方向的表面切向量,δ为权重,为实数;优化所得高度场即为最终浅浮雕模型。
本发明的有益效果是:本发明创新地提出了一种针对单张照片的人物浅浮雕模型生成方法,在少量用户交互的基础上即可生成多人浅浮雕模型,并确保人物空间关系的准确性和几何细节特征的高保真度。本发明方法生成的浅浮雕模型3D视觉感知逼真,适用于各类单人或多人照片,具有较高普适性、鲁棒性和实用性。
附图说明
图1是基于单张照片的人物浅浮雕模型生成方法流程图。
图2是图像遮挡关系解析的示意图,其中,a为原始图像遮挡关系,b为错误遮挡关系,c为正确遮挡关系。
图3是骨骼交点解析的示意图,其中,下标i表示骨骼l i,下标j表示骨骼l j,a为无交点,b为l i位于l j之上,c为l j位于l i之上。
图4是(照片)图像空间和3D人体引导模型轮廓提取的示意图,其中,a为基于神经网络的多尺度边缘检测方法获取照片的轮廓概率图,b为从轮廓概率图中采样的点集,c为k-means算法获得轮廓的均匀采样点集,d为从3D人体引导模型中提取的轮廓。
图5是点对齐示意图,其中,a为初始状态,图中,直径小的点表示原始位置(3D引导模型的轮廓点),直径大的点表示目标位置(2D轮廓点),二者间的连线表示相关性系数,b为相关性系数矩阵Z,其中,最后一行和最后一列是关键点的额外标记;c为应用优化后的非刚性平面形变函数的对齐结果。
图6是3D人体引导模型的轮廓与照片的2D轮廓在图像空间对齐示意图。a为初始状态,其中,直径小的点表示3D引导模型的轮廓点,直径大的点表示2D轮廓点,b为未使用关键点约束的轮廓对齐结果,c为使用关键点约束的轮廓对齐结果,d为用户在用户交互界面选择关键点的示意图,其中选择肘关节处的轮廓点作为关键点。
图7是浅浮雕基础形状生成的示意图;其中,a为3D人体引导模型,b为从3D人体引导模型渲染的法向图,c为扭曲法向图,d为由扭曲法向图重建的基础模型,e为头部、毛发、手和脚等的生成示意,f为完整的基础模型。
图8是对基础模型和图像细节进行浅浮雕合成的示意图;其中,a为原始单张照片,b为基 础模型,c为细节法向图,d为最终的浅浮雕模型。
图9是采用本发明方法与现有方法的10张照片生成的浅浮雕模型对比结果;其中第一行为原始单张照片,b为采用本发明方法生成的浅浮雕模型,c为采用现有方法生成的浅浮雕模型。
具体实施方式
下面根据附图详细说明本发明。
本发明单张照片的人物浅浮雕模型生成方法,其流程如图1所示,具体包括以下步骤:
步骤一:给定单张包含单人或者多人的照片,生成遮挡关系正确的3D骨架和3D人体引导模型。
该步骤是本发明的核心之一,分为以下子步骤。
(1.1)使用神经网络从照片中获取每个人物的2D姿态。本实施例中采用OpenPose进行2D姿态估计;
(1.2)使用神经网络从上一子步骤获得的2D姿态分别预测每个人物的3D骨架,得到关节点坐标集合
Figure PCTCN2021088913-appb-000026
对应给定照片中的N个人物。
(1.3)基于所有人物的3D骨架,优化计算所使用照片的相机内外参数,使得3D骨架与照片中2D关节点在图像空间对齐。其中,优化能量如下:
Figure PCTCN2021088913-appb-000027
优化变量为针孔相机参数K和外参数即每个人物3D骨架的相似变换矩阵T i(包括缩放、旋转和平移),具体表示为T i=[s iR i|t i],R i为旋转矩阵、
Figure PCTCN2021088913-appb-000028
为x,y,z三个方向的平移参量,s i为缩放标量。p为v对应的在照片中的2D关节点坐标,π K为基于针孔相机参数K的投影函数,||*||为正则化函数。第一项为重投影误差约束,使得3D骨架的每个关节点v在所求相机投影下与其相应的2D关节点p在图像空间保持一致;第二项是正则项,使得每个3D骨架的空间变换在深度平移上保持一致。
Figure PCTCN2021088913-appb-000029
表示给定照片中的N个人物z方向(垂直于图像平面的方向)平移参量的均值。该优化能量为非线性优化过程,本实施例中设定相机参数K中的焦距初始值为500,t=[0,0,400] T,s i不小于0.3,本实施例中s i为1。
(1.4)基于3D骨架模型,借助少量用户交互来修正骨骼之间的错误遮挡关系。由于每根骨骼是刚体,图像上相交的两根骨骼之间有明确的前后遮挡关系。当上一子步骤中得到的3D骨架中存在错误的骨骼遮挡关系时,用户可以简单切换两者前后关系,系统则会对这样的骨骼对关系进行记录,如图2所示。
(1.5)基于用户指定的遮挡关系约束,求解正确的3D骨架关节点坐标。此时系统会优 化下述能量来重新计算3D骨架的关节点位得到每个人物遮挡关系正确的3D骨架关节点坐标集合
Figure PCTCN2021088913-appb-000030
Figure PCTCN2021088913-appb-000031
其中,第一项约束3D骨架图结构所对应的拉普拉斯坐标变化,第二项约束图像空间上重叠的骨骼对之间的前后遮挡关系,L是图拉普拉斯算子对应的矩阵,而z和z (0)是所有关节点优化前后的z坐标向量,
Figure PCTCN2021088913-appb-000032
为第j根骨骼的交点z坐标,交点关系示意如图3所示,其中,α是交点在骨骼两个端点之间的插值参数,下标0,1为骨骼的两个端点坐标索引,Occpairs表示照片中相交的骨骼对,下标f、b分别是相交的前后两根骨骼的索引,ω表示权重,本实施例设为0.1。d gap表示深度间隙,用于补偿骨骼厚度,本实施例设为15。
(1.6)基于遮挡关系正确的3D骨架,分别拟合每个人物的参数化人体模型作为浅浮雕生成算法的3D人体引导模型。拟合能量如下:
Figure PCTCN2021088913-appb-000033
其中,v是
Figure PCTCN2021088913-appb-000034
上的关节点,
Figure PCTCN2021088913-appb-000035
则是v在参数化人体骨架模板中的关节点。优化计算得到姿态参数
Figure PCTCN2021088913-appb-000036
和形状参数
Figure PCTCN2021088913-appb-000037
之后,利用参数化模板(本实施例中采用SMPL模型)生成每个人物对应的3D人体模型
Figure PCTCN2021088913-appb-000038
作为后续步骤的3D人体引导模型。
步骤二:基于非刚性形变将3D人体引导模型与给定照片的轮廓特征进行对齐,生成人体浅浮雕的低频基础模型。
该步骤是本发明的核心之一,分为以下子步骤。
(2.1)基于三角面片的法向从3D人体引导模型中提取轮廓即3D轮廓并使用相机参数将其投影到图像空间,具体为:将3D人体引导模型的所有三角形面片按照法向相对图像视角朝外还是朝内分为两个组,找到两个组的共同边界作为3D轮廓,并将其投影到图像空间。
(2.2)使用神经网络从照片中提取关键的轮廓信息即2D轮廓,具体为:首先基于神经网络的多尺度边缘检测方法获取照片的轮廓概率图,然后使用Fisher-Yates shuffle和k-means算法获得轮廓的均匀采样点集,如图4所示。
(2.3)借助少量用户交互来指定稀疏的表示特殊姿态的在3D轮廓和2D轮廓中正确对应的点作为关键点对,作为后续点匹配算法的约束条件。
(2.4)基于非刚性点集匹配算法将3D人体引导模型的轮廓与照片中人物的2D轮廓在 图像空间进行对齐,如图5-6所示。优化时,最小化下述能量:
Figure PCTCN2021088913-appb-000039
Z rq表示3D人体引导模型的轮廓与照片的2D轮廓之间的相关性系数,同时要求满足约束Z rq∈{0,1},
Figure PCTCN2021088913-appb-000040
以及
Figure PCTCN2021088913-appb-000041
Z是Z rr组成的矩阵,R r表示照片中人物轮廓的第r个点坐标,Q q表示3D人体引导模型轮廓的第q个点坐标,K 1、K 2分别为照片中人物轮廓和3D人体引导模型轮廓的点集数目,优选地,K 1=1.2K 2。f为非刚性平面形变函数。其中,第一项衡量点集之间的近似保真度,第二项约束非刚性平面形变函数f的平滑性,具体优化时采用薄板样条的公式化形式来求解;第三项惩罚异常值的数目,ξ为权重,本实施例设为0.01。步骤(2.3)中用户指定的关键点对作为变量的硬约束形式加入优化条件,具体为:将指定的关键点对之间的相关性系数固定为1,并将包含关键点对中一点的相关性系数固定为0。即Z矩阵中,关键点对的Z值固定为1,关键点所在的行和列其他的Z值设为0,如图5b所示。
(2.5)基于点匹配计算所得的非刚性平面型变函数f在图像空间对3D人体引导模型的法向图进行扭曲,并基于扭曲法向图求解最小二乘问题来重建人体浅浮雕的基础模型
Figure PCTCN2021088913-appb-000042
具体地,如图7所示,先在3D人体引导模型上生成不含头部、毛发、手和脚等的法向图,利用非刚性平面型变函数f在图像空间对3D引导模型的法向图进行扭曲,并基于扭曲法向图求解最小二乘问题并结合生成的头部、毛发、手和脚来重建人体浅浮雕的基础模型
Figure PCTCN2021088913-appb-000043
其中头部为通过从照片中提取面部标志再利用三维面部表情模型生成的深度图,并将深度图结合至身体区域的基础模型中;毛发、手和脚通过如下方法生成:在图像中绘制掩模遮住这些区域并平滑边界,再估计掩模的边界梯度信息利用边界梯度信息作为边界条件来求解拉普拉斯问题,得到一个近似的高度场,最后利用泊松编辑(Possion Editing)与其他基础模型进行融合得到最终的基础模型
Figure PCTCN2021088913-appb-000044
步骤三:如图8所示,从图像提取高频细节特征,并与步骤二生成的低频基础模型进行合成,获得最终的人体浅浮雕模型。首先将图像转为灰度表示的金字塔,然后提取每一层k的梯度信息作为细节法向
Figure PCTCN2021088913-appb-000045
(u,v)为像素,
Figure PCTCN2021088913-appb-000046
表示灰度金字塔第k层中(u,v)像素位的梯度信息,normalize为归一化函数。接着合成所有层细节法向获得整体的细节法向图
Figure PCTCN2021088913-appb-000047
最后通过求解下述最小二乘问题将步骤二生成的浅浮雕低频基础模型与高频细节法向图进行合成:
Figure PCTCN2021088913-appb-000048
其中,
Figure PCTCN2021088913-appb-000049
为合成后的浅浮雕模型,
Figure PCTCN2021088913-appb-000050
为低频基础模型
Figure PCTCN2021088913-appb-000051
中(u,v)像素位的高度,
Figure PCTCN2021088913-appb-000052
为合成后的浅浮雕模型
Figure PCTCN2021088913-appb-000053
中(u,v)像素位的高度,
Figure PCTCN2021088913-appb-000054
为细节法向图
Figure PCTCN2021088913-appb-000055
中(u,v)像素位的法向量,T U和T V为合成后的浅浮雕模型
Figure PCTCN2021088913-appb-000056
的U,V两个方向的表面切向量。此优化所得高度场即为最终浅浮雕模型。δ为权重,本实施例中,头部区域设定为0.4,身体区域设定为0.1。
本发明的整个方法中,仅需要在少量用户交互(错误遮挡关系修正、表示特殊姿态的关键点、毛发、手和脚掩盖)的基础上即可生成多人浅浮雕模型,并确保人物空间关系的准确性和几何细节特征的高保真度。
图9是采用本发明方法与现有方法(S.Tang,F.Tan,K.Cheng,Z.Li,S.Zhu,and P.Tan,“A neural network for detailed human depth estimation from a single image,”in Proceedings of the IEEE International Conference on Computer Vision,2019,pp.7750–7759.)的10张照片生成的浅浮雕模型对比结果,从图中可以看出,本发明方法生成的浅浮雕模型3D视觉感知逼真,适用于各类单人或多人照片,具有较高普适性、鲁棒性和实用性。

Claims (8)

  1. 一种基于单张照片的人物浅浮雕模型生成方法,其特征在于,包括以下步骤:
    步骤一:给定单张包含单人或者多人的照片,生成所有人物的3D骨架并通过用户交互对所有人物3D骨架中骨骼之间的错误遮挡关系进行修正;再利用修正后正确的遮挡关系作为约束,求解正确的3D骨架关节点坐标生成遮挡关系正确的3D骨架;再基于遮挡关系正确的3D骨架分别拟合每个人物的参数化人体模型作为3D人体引导模型;
    步骤二:采用非刚性平面形变函数将3D人体引导模型的轮廓与照片中人物的轮廓进行对齐并优化非刚性平面形变函数,再利用优化的非刚性平面形变函数在图像空间对3D人体引导模型的法向图进行扭曲获得扭曲法向图,并基于扭曲法向图求解最小二乘问题获得人体浅浮雕的低频基础模型;其中,所述3D人体引导模型的轮廓为3D人体引导模型中按照法向相对图像视角朝外和朝内的两组三角形面片的共同边界在图像空间中的投影;优化非刚性平面形变函数采用的能量方程为:
    Figure PCTCN2021088913-appb-100001
    其中,Z rq表示3D人体引导模型的轮廓与照片中人物的轮廓之间的相关性系数,同时要求满足约束Z rq∈{0,1},
    Figure PCTCN2021088913-appb-100002
    以及
    Figure PCTCN2021088913-appb-100003
    R r表示照片中人物轮廓的第r个点坐标,Q q表示3D人体引导模型轮廓的第q个点坐标,K 1、K 2分别为照片中人物轮廓和3D人体引导模型轮廓的点集数目。f为非刚性平面形变函数;λ、ξ为权重,均为实数,||*||为正则化函数,L(f)表示非刚性平面形变函数f平滑性的约束项。
    步骤三:从照片中提取高频细节特征并生成细节法向图,细节法向图与步骤二生成的低频基础模型合成获得最终的人体浅浮雕模型;其中,所述高频细节特征为照片的灰度值金字塔每一层的梯度信息,细节法向图为所有层的梯度信息的均值。
  2. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤一中,所有人物的3D骨架生成方法为:
    使用神经网络从照片中获取每个人物的2D姿态;
    使用神经网络从每个人物的2D姿态分别预测3D骨架;
    基于所有人物的3D骨架,优化计算所使用照片的相机内外参数,使得3D骨架与照片中2D关节点在图像空间对齐,获得所有人物的3D骨架;
    优化能量如下:
    Figure PCTCN2021088913-appb-100004
    优化变量为相机内参数K和外参数即每个人物3D骨架的相似变换矩阵T i,包括缩放系数、旋转矩阵和平移矩阵
    Figure PCTCN2021088913-appb-100005
    分别表示x,y,z三个方向的平移参量;v为3D骨架的关节点坐标,
    Figure PCTCN2021088913-appb-100006
    为每个人物的3D骨架关节点坐标集合,N为照片中的人物数;p为v对应的在照片中的2D关节点坐标,π K为基于相机内参数K的投影函数,||*||为正则化函数。
    Figure PCTCN2021088913-appb-100007
    表示给定照片中的N个人物z方向平移参量的均值。
  3. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤一中,利用修正后正确的遮挡关系作为约束,求解正确的3D骨架关节点坐标生成遮挡关系正确的3D骨架具体为:
    通过优化下述能量来重新计算3D骨架的关节点位,生成遮挡关系正确的3D骨架:
    Figure PCTCN2021088913-appb-100008
    其中,L是图拉普拉斯算子对应的矩阵,z和z (0)是所有关节点优化前后的z坐标向量,
    Figure PCTCN2021088913-appb-100009
    为第j根骨骼的交点z坐标,α是交点在骨骼两个端点之间的插值参数,下标0,1为骨骼的两个端点坐标索引,Occpairs表示照片中相交的骨骼对,下标f、b分别表示相交的前后两根骨骼,ω表示权重,d gap表示深度间隙,用于补偿骨骼厚度。
  4. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤一中,基于遮挡关系正确的3D骨架分别拟合每个人物的参数化人体模型作为3D人体引导模型,具体为:采用拟合能量优化每个人物的姿态参数θ i和形状参数β i如下:
    Figure PCTCN2021088913-appb-100010
    其中,v是
    Figure PCTCN2021088913-appb-100011
    上的关节点,
    Figure PCTCN2021088913-appb-100012
    为遮挡关系正确的每个人物的3D骨架关节点坐标集合,
    Figure PCTCN2021088913-appb-100013
    是v在参数化人体骨架模板中的关节点。
    优化计算得到姿态参数
    Figure PCTCN2021088913-appb-100014
    和形状参数
    Figure PCTCN2021088913-appb-100015
    之后,利用参数化模板生成每个人物对应的3D人体模型
    Figure PCTCN2021088913-appb-100016
    作为3D人体引导模型。
  5. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤二中,使用神经网络的多尺度边缘检测方法从照片中提取人物的轮廓。
  6. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤二中,优化 非刚性平面形变函数时还包括:用户指定表示特殊姿态的3D人体引导模型的轮廓和照片中人物的轮廓中正确对应的点作为关键点对,将指定的关键点对之间的相关性系数固定为1作为变量的硬约束形式加入优化条件。
  7. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤二中,低频基础模型中人物的头部、毛发、手和脚采用其他方法重建,具体为:
    通过从照片中提取面部标志再利用三维面部表情模型生成深度图重建头部的低频基础模型;
    通过在照片中绘制毛发、手和脚区域的掩模,再估计掩模的边界梯度信息,并利用边界梯度信息作为边界条件求解拉普拉斯问题,获得的高度场作为毛发、手和脚的低频基础模型并利用泊松编辑与其他基础模型进行融合。
  8. 根据权利要求1所述的人物浅浮雕模型生成方法,其特征在于,所述步骤三中,所述细节法向图与步骤二生成的低频基础模型合成获得最终的人体浅浮雕模型具体为:
    Figure PCTCN2021088913-appb-100017
    其中,
    Figure PCTCN2021088913-appb-100018
    为合成后的浅浮雕模型,
    Figure PCTCN2021088913-appb-100019
    为低频基础模型
    Figure PCTCN2021088913-appb-100020
    中(u,v)像素位的高度,
    Figure PCTCN2021088913-appb-100021
    为合成后的浅浮雕模型
    Figure PCTCN2021088913-appb-100022
    中(u,v)像素位的高度,
    Figure PCTCN2021088913-appb-100023
    为细节法向图
    Figure PCTCN2021088913-appb-100024
    中(u,v)像素位的法向量,T U和T V为合成后的浅浮雕模型
    Figure PCTCN2021088913-appb-100025
    的U,V两个方向的表面切向量,δ为权重,为实数;优化所得高度场即为最终浅浮雕模型。
PCT/CN2021/088913 2021-04-22 2021-04-22 一种基于单张照片的人物浅浮雕模型生成方法 WO2022222091A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/088913 WO2022222091A1 (zh) 2021-04-22 2021-04-22 一种基于单张照片的人物浅浮雕模型生成方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/088913 WO2022222091A1 (zh) 2021-04-22 2021-04-22 一种基于单张照片的人物浅浮雕模型生成方法

Publications (1)

Publication Number Publication Date
WO2022222091A1 true WO2022222091A1 (zh) 2022-10-27

Family

ID=83723363

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088913 WO2022222091A1 (zh) 2021-04-22 2021-04-22 一种基于单张照片的人物浅浮雕模型生成方法

Country Status (1)

Country Link
WO (1) WO2022222091A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115862149A (zh) * 2022-12-30 2023-03-28 广州紫为云科技有限公司 一种生成3d人体骨骼关键点数据集的方法及系统
CN117476509A (zh) * 2023-12-27 2024-01-30 联合富士半导体有限公司 一种用于半导体芯片产品的激光雕刻装置及控制方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2387731A (en) * 2002-04-18 2003-10-22 Delcam Plc Deriving a 3D model from a scan of an object
CN109523635A (zh) * 2018-11-01 2019-03-26 深圳蒜泥科技投资管理合伙企业(有限合伙) 一种三维人体扫描非刚性重建和测量方法及装置
CN110097626A (zh) * 2019-05-06 2019-08-06 浙江理工大学 一种基于rgb单目图像的浅浮雕物体识别处理方法
CN110751665A (zh) * 2019-10-23 2020-02-04 齐鲁工业大学 一种由人像浮雕重建3d人像模型的方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2387731A (en) * 2002-04-18 2003-10-22 Delcam Plc Deriving a 3D model from a scan of an object
CN109523635A (zh) * 2018-11-01 2019-03-26 深圳蒜泥科技投资管理合伙企业(有限合伙) 一种三维人体扫描非刚性重建和测量方法及装置
CN110097626A (zh) * 2019-05-06 2019-08-06 浙江理工大学 一种基于rgb单目图像的浅浮雕物体识别处理方法
CN110751665A (zh) * 2019-10-23 2020-02-04 齐鲁工业大学 一种由人像浮雕重建3d人像模型的方法及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115862149A (zh) * 2022-12-30 2023-03-28 广州紫为云科技有限公司 一种生成3d人体骨骼关键点数据集的方法及系统
CN115862149B (zh) * 2022-12-30 2024-03-22 广州紫为云科技有限公司 一种生成3d人体骨骼关键点数据集的方法及系统
CN117476509A (zh) * 2023-12-27 2024-01-30 联合富士半导体有限公司 一种用于半导体芯片产品的激光雕刻装置及控制方法
CN117476509B (zh) * 2023-12-27 2024-03-19 联合富士半导体有限公司 一种用于半导体芯片产品的激光雕刻装置及控制方法

Similar Documents

Publication Publication Date Title
CN109408653B (zh) 基于多特征检索和形变的人体发型生成方法
Shi et al. Automatic acquisition of high-fidelity facial performances using monocular videos
Pauly et al. Example-based 3d scan completion
US6047078A (en) Method for extracting a three-dimensional model using appearance-based constrained structure from motion
KR20050059247A (ko) 3차원 안면 인식
WO2022222091A1 (zh) 一种基于单张照片的人物浅浮雕模型生成方法
CN109658444A (zh) 一种基于多模态特征的规则三维彩色点云配准方法
CN108564619B (zh) 一种基于两张照片的真实感三维人脸重建方法
CN112862949B (zh) 基于多视图的物体3d形状重建方法
CN106780713A (zh) 一种基于单幅照片的三维人脸建模方法及系统
CN111524226B (zh) 讽刺肖像画的关键点检测与三维重建方法
WO2021063271A1 (zh) 人体模型重建方法、重建系统及存储介质
Fan et al. Dual neural networks coupling data regression with explicit priors for monocular 3D face reconstruction
CN111402403B (zh) 高精度三维人脸重建方法
CN114913552B (zh) 一种基于单视角点云序列的三维人体稠密对应估计方法
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN113506333A (zh) 基于可变形图谱的医学影像配准网络训练数据集扩充方法
Ye et al. 3d morphable face model for face animation
CN110717978B (zh) 基于单张图像的三维头部重建方法
CN113379890B (zh) 一种基于单张照片的人物浅浮雕模型生成方法
Xi et al. A data-driven approach to human-body cloning using a segmented body database
WO2024103890A1 (zh) 模型构建方法、重建方法、装置、电子设备及非易失性可读存储介质
CN110490973B (zh) 一种模型驱动的多视图鞋模型三维重建方法
Hu et al. A dense point-to-point alignment method for realistic 3D face morphing and animation
CN116740281A (zh) 三维头部模型的生成方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21937332

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21937332

Country of ref document: EP

Kind code of ref document: A1