WO2021135627A1 - 一种目标物的三维模型构建方法和相关装置 - Google Patents

一种目标物的三维模型构建方法和相关装置 Download PDF

Info

Publication number
WO2021135627A1
WO2021135627A1 PCT/CN2020/126341 CN2020126341W WO2021135627A1 WO 2021135627 A1 WO2021135627 A1 WO 2021135627A1 CN 2020126341 W CN2020126341 W CN 2020126341W WO 2021135627 A1 WO2021135627 A1 WO 2021135627A1
Authority
WO
WIPO (PCT)
Prior art keywords
point
initial images
point cloud
cloud information
points
Prior art date
Application number
PCT/CN2020/126341
Other languages
English (en)
French (fr)
Inventor
林祥凯
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021135627A1 publication Critical patent/WO2021135627A1/zh
Priority to US17/667,399 priority Critical patent/US12014461B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20068Projection on vertical or horizontal image axis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/56Particle system, point based geometry or rendering

Definitions

  • This application relates to the field of electronic technology, and relates to the construction of a three-dimensional model of a target object.
  • Face reconstruction technology is a technology that reconstructs a 3D face model from one or more 2D face images.
  • the user stands in the field of view of the shooting lens and rotates the head as instructed so that the shooting lens can capture the user's face from different angles.
  • the instruction information instructing the user to rotate the head is It can be sent by a terminal responsible for controlling the shooting of the camera, such as a smartphone or tablet computer with camera functions.
  • Face reconstruction technology has a wide range of applications in various fields. For example, in the entertainment field, when a user is playing a 3D game, he can reconstruct the user’s face model to make the appearance of the game character look like the user’s appearance, thus making 3D games The construction of roles can be more personalized.
  • an embodiment of the present application provides a method for constructing a three-dimensional model of a target, including:
  • Acquire at least two initial images of the target at multiple shooting angles the at least two initial images respectively record the depth information of the target, and the depth information is used to record the distance between the multiple points of the target and the reference position distance;
  • an embodiment of the present application provides a device for constructing a three-dimensional model of a target, including:
  • the first acquiring unit is configured to acquire at least two initial images of the target at multiple shooting angles, the at least two initial images respectively record the depth information of the target, and the depth information is used for recording The distance between the multiple points of the target and the reference position;
  • a second acquiring unit configured to respectively acquire first point cloud information corresponding to the at least the initial image according to the depth information in the at least two initial images acquired by the first acquiring unit;
  • a fusion unit configured to fuse first point cloud information corresponding to the at least two initial images acquired by the second acquisition unit into second point cloud information
  • the construction unit is configured to construct a three-dimensional model of the target object according to the second point cloud information obtained by the fusion unit.
  • an embodiment of the present application provides a computer device.
  • the computer device includes an interactive device, an input/output (I/O) interface, a processor, and a memory.
  • the memory stores program instructions; the interactive device uses To obtain the operation instructions input by the user; the processor is used to execute the program instructions stored in the memory, and execute the method as described above.
  • an embodiment of the present application provides a storage medium, the storage medium is used to store a computer program, and the computer program is used to execute the method in the above aspect.
  • the embodiments of the present application provide a computer program product including instructions, which when run on a computer, cause the computer to execute the method in the above aspect.
  • the method for constructing a three-dimensional model of a target object includes: acquiring at least two initial images of the target object at multiple shooting angles, the at least two initial images respectively recording the depth information of the target object, and the depth information is used for To record the distance between the multiple points of the target and the reference position; according to the depth information in the at least two initial images, respectively obtain the first point cloud information corresponding to the at least two initial images; respectively correspond to the at least two initial images
  • the first point cloud information of is fused into the second point cloud information; the three-dimensional model of the target is constructed according to the second point cloud information.
  • the 3D model building process can be realized without the need to establish additional storage space.
  • the 3D model of the target object is directly constructed by point cloud fusion, which maximizes the utilization efficiency of storage space and enables the terminal to efficiently perform face reconstruction The modeling process.
  • FIG. 1 is a flowchart of an embodiment of a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 2 is a flowchart of another embodiment of a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 3 is a flowchart of another embodiment of a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of an embodiment of a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • Fig. 5a is a flowchart of another embodiment of a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 5b is a schematic diagram of a smoothing algorithm in the method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 5c is a schematic diagram of smoothing processing in the method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 6 is a flowchart of another embodiment of a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 7 is a schematic diagram of an initial image in a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 8 is a schematic diagram of second point cloud information in the method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 9 is a schematic diagram of a three-dimensional network in a method for constructing a three-dimensional model of a target provided by an embodiment of the application.
  • FIG. 10 is a schematic diagram of a second projection surface in the method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 11 is a schematic diagram of a three-dimensional network after trimming in the method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 12 is a schematic diagram of a three-dimensional model in a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 13 is a schematic diagram of a three-dimensional model in a method for constructing a three-dimensional model of a target provided by an embodiment of the application;
  • FIG. 14 is a schematic diagram of a computer device provided by an embodiment of the application.
  • FIG. 15 is a schematic diagram of a device for constructing a three-dimensional model of a target provided by an embodiment of the application.
  • Face reconstruction technology is a technology that reconstructs a 3D face model from one or more 2D face images.
  • the user stands in the field of view of the shooting lens and rotates the head as instructed so that the shooting lens can capture the user's face from different angles.
  • the instruction information instructing the user to rotate the head is It can be sent by a terminal responsible for controlling the shooting of the camera, such as a smartphone or tablet computer with camera functions.
  • Face reconstruction technology has a wide range of applications in various fields. For example, in the entertainment field, when a user is playing a 3D game, he can reconstruct the user’s face model to make the appearance of the game character look like the user’s appearance, thus making 3D games The construction of roles can be more personalized.
  • face reconstruction technology has the following characteristics: because entertainment and other purposes do not require particularly strong accuracy, the 2D images used to construct the 3D model are obtained by the user through the terminal’s own shooting, and the terminal’s Computing and storage capabilities are limited, and the face reconstruction technology in the prior art needs to occupy a large memory space and computing power.
  • the embodiments of the present application provide a method for constructing a three-dimensional model of a target, which can model the acquired two-dimensional image at the terminal to obtain a three-dimensional face model, and the obtained three-dimensional model can be used for Game software, social software and 3D printing and other scenes.
  • the method provided in the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
  • the first embodiment of a method for constructing a three-dimensional model of a target includes the following steps.
  • the specific type of the target object is not limited by the embodiment of the present application.
  • it can be a human face, a human body, or any part of the human body, or various objects, such as a doll or a car.
  • the following embodiments of the present application take a human face as an example for detailed description.
  • the face can be photographed through the shooting lens of the terminal.
  • the specific type of the terminal is not limited in this embodiment. It can be a smart phone or a tablet computer, etc.
  • the terminal prompts the user to take the face from different angles. Shooting to obtain at least two initial images recorded with faces recorded from different angles.
  • the initial images may also be obtained by scanning or the like, which is not limited in the embodiment of the present application.
  • the above-mentioned shooting lens needs to have the ability to record depth information, so that the initial images obtained are respectively recorded with the depth information of the target object, and the depth information is used to record the distance between each point of the target object and the shooting lens. distance.
  • the depth information in the at least two initial images respectively obtain first point cloud information corresponding to the at least two initial images.
  • each pixel will record a two-dimensional coordinate information, for example, the coordinate value of a pixel A is (x, y), plus the initial image It also includes depth information. Through the distance between pixel point A and the lens, the coordinate value of pixel point A becomes a three-dimensional point.
  • the coordinate value of three-dimensional point A is (x, y, z).
  • the first point cloud information includes It is used to record the coordinate values of multiple three-dimensional points of the target.
  • the first point cloud information is recorded in each initial image, and the three-dimensional points of the face generated at different shooting angles are recorded respectively, and the initial images obtained at different angles correspond to
  • the first point cloud information is adjusted to the same angle after being moved and then merged to obtain a second point cloud information, so that the second point cloud information can record the point cloud information of the target more accurately.
  • the target object is a human face
  • three initial images of the front face, the left face and the right face were taken by the user under the guidance of the terminal.
  • the first point cloud information of the three images is obtained.
  • the first point cloud information of the three images respectively records the point cloud information of the user's front face, left face and right face.
  • the user can be obtained
  • the point cloud information of the whole face is used as the second point cloud information.
  • a three-dimensional model of the target can be constructed based on the second point cloud information, thereby obtaining the target The three-dimensional model of the object.
  • the method for constructing a three-dimensional model of a target object includes: acquiring at least two initial images of the target object at different shooting angles; acquiring each of the initial images separately according to the depth information in each initial image Corresponding first point cloud information; fusing the first point cloud information corresponding to the at least two initial images into second point cloud information; constructing a three-dimensional model of the target object according to the second point cloud information.
  • the 3D model building process can be realized without the need to establish additional storage space.
  • the 3D model of the target object is directly constructed by point cloud fusion, which maximizes the utilization efficiency of storage space and enables the terminal to efficiently perform face reconstruction The modeling process.
  • step 103 when the first point cloud information is fused, it is necessary to know the relationship between the first point cloud information corresponding to different initial images, and then the first point cloud information can be fused.
  • a specific implementation method is provided below to solve this problem.
  • the second embodiment of the method for constructing a three-dimensional model of a target includes the following steps.
  • this step can refer to the above step 101, which will not be repeated here.
  • performing feature point detection can be achieved through landmark detection, specifically, feature point detection on the initial image is performed through a feature point detection model.
  • the feature point detection model can be obtained through training. For example, manually mark feature points in multiple face images as training material.
  • the training material marks the eye corners, nose tip, and mouth corners in the face image.
  • the resulting feature point detection model can have the ability to mark face images.
  • the feature point detection model can mark the initial Feature points in the image, such as the corners of the eyes, nose tip, and mouth corners of the face in the initial image.
  • the specific training method of the above model may be any training method in the prior art, which is not limited in this application.
  • the terminal After detecting the feature points of each initial image, the terminal can perform semantic recognition of different parts in each initial image according to the marked feature points, so that the terminal can know the names of the various parts of the face image in each initial image
  • the position marked by the first characteristic point A is the corner of the eye
  • the position marked by the second characteristic point B is the tip of the nose
  • the position marked by the third characteristic point C is the corner of the mouth, and so on.
  • the offset is used to identify the coordinate difference between the feature points at the same position of the target object in different initial images.
  • the feature point of the nose tip is B
  • the feature point of the tip of the nose is B'.
  • the camera pose is used to indicate the movement of the target relative to the shooting lens in different initial images, and the movement includes at least one of rotation and translation.
  • the initial image may be acquired by the user taking multiple two-dimensional photos with different angles under the instructions of the terminal, where, due to the different shooting angles, the angle of the user’s face in each initial image is different from the angle of the shooting lens. That is, different degrees of rotation or translation occurs relative to the shooting lens.
  • the camera pose of the target can be obtained through the offset calculated in the previous step, and the camera pose can be used to characterize the difference between different initial images. The change of the target.
  • the initial image is generally obtained by the user under the instruction of the terminal to perform shooting, for example, the user rotates the face relative to the shooting lens at a certain speed, here
  • the terminal controls the shooting lens to shoot at a preset time interval to obtain initial images of different angles. Since the user is not a professional, the speed of his face rotation cannot be guaranteed to be stable and linear. It may happen that the user rotates the face slowly at a certain angle, resulting in multiple initial images at the same shooting angle. Therefore, after the initial image is obtained, the initial image can be filtered.
  • the specific implementation method is:
  • the method of judging the similarity between the initial images can be performed by the camera pose.
  • the camera pose similarity between the two initial images is greater than the preset value, the two initial images can be judged.
  • the shooting angles between the images are relatively close. At this time, one of the initial images can be eliminated, thereby avoiding repeated processing of the initial images with similar angles in the subsequent process.
  • the first point cloud information corresponding to different initial images is a set of three-dimensional coordinate points generated based on different shooting angles. Ideally, for the same target object, it is reconstructed based on two-dimensional images shot at different angles. The three-dimensional coordinate point sets should be the same. However, in actual work, errors caused by shooting angles or pixel noise will cause certain errors between the three-dimensional coordinate point sets generated under different shooting angles.
  • the changes of the target between different initial images recorded by the camera pose are merged with the first point cloud information corresponding to the different initial images to obtain the second point cloud information closest to the actual situation of the target.
  • this step can refer to the above step 104, which will not be repeated here.
  • the semantics of the initial image is acquired through feature point detection, so that the machine can obtain the movement relationship between each initial image according to the detected feature points, and generate the camera pose, and then according to The camera pose performs point cloud fusion, so that the point cloud information generated at different shooting angles can be accurately fused according to the relationship obtained by the camera pose.
  • embodiment of the present application further provides a more detailed implementation manner to describe the specific process of point cloud fusion.
  • the third embodiment of the method for constructing a three-dimensional model of the target includes the following steps.
  • Steps 301 to 304 can refer to the above steps 201 to 204, which will not be repeated here.
  • one of the acquired at least two initial images is determined as the first frame.
  • the front face image is determined as the first frame from the front face image, the left face image, and the right face image.
  • the frame can be regarded as an initial reference point.
  • different initial images respectively correspond to their own first point cloud information.
  • the front face image (the first frame) generates point cloud information A
  • the left face image generates point cloud information B
  • the right face image generates point cloud information B
  • the side face image generates point cloud information C; among them, according to the camera pose, the shooting angle of the left face image is 90° for the front face, and the right face image is 90° for the front face.
  • the first point is a point in the first point cloud information
  • the second point is a point in the second point cloud information.
  • Each point in the first point cloud information is separately fused to obtain the second point cloud information.
  • the left face image has a three-dimensional point B1 at the tip of the user’s nose with the coordinate value (x2, y2, z2)
  • right There is a three-dimensional point C1 at the tip of the user’s nose in the side face image, with coordinates (x3, y3, z3)
  • after rotating the left face image and the right face image according to the above step 306 there are three three-dimensional points A1, B1 and C1 Coincident, at this time, perform fusion on A1, B1, and C1 to obtain a three-dimensional point D1 (x4, y4, z4), which is the three-dimensional point at the tip of the user'
  • the above three three-dimensional points A1, B1, and C1 cannot completely overlap. Therefore, A1, B1, and C1 need to be weighted.
  • the three three-dimensional points are assigned different weights, which specifically includes the following steps.
  • the initial image includes a front face image, three images of a left face and a right face, and each of the three images includes a point used to represent the tip of the user's nose (for example, the tip of the nose is the first Point).
  • the tip of the nose is the first Point.
  • different weights need to be assigned to points from different images.
  • it can be based on the shooting angle, image noise value, or normal direction of the initial image where the first point is located.
  • At least one of the first point is assigned a weight value, for example, the shooting angle of the frontal image is the most positive, and the accuracy rate is higher for the tip of the nose, so the weight assigned to the first point of the frontal image is 60%.
  • the weight of the first point of the side face image and the right face image is 20%, respectively.
  • the weight of the first point of the front face image is 60%, and the weight of the first point of the left face image and the right face image is 20% respectively, and the second point obtained after fusion is
  • x4 (x1*60%+x2*20%+x3*20%)/3
  • y4 (y1*60%+y2*20%+y3*20%)/3
  • z4 (Z1*60%+z2*20%+z3*20%)/3. Therefore, the three-dimensional points from different initial images can be more accurately merged according to different weights.
  • the user’s front face image includes the right eye corner A1 and the left corner.
  • the embodiments of the present application provide the following steps.
  • a first point with a smaller absolute value of the depth difference from the first point in the first frame is acquired in the first initial image, and point cloud fusion is performed with the first point in the first frame.
  • the coordinate value of B1 is (x1, y1, z1)
  • the coordinates of B2 are (x2, y2, z2)
  • the coordinates of point A1 are (x3, y3, z3)
  • D1
  • and D2
  • this step can refer to the above step 104, which will not be repeated here.
  • the effective point cloud in each initial image is projected to the reference coordinate system (the camera coordinate system where the first frame is located), and then the inner points of the overlapping area are weighted and fused, so that the first image can be obtained more accurately.
  • Two point cloud information so as to establish a more accurate three-dimensional model.
  • the second point cloud information needs to be further processed to obtain a three-dimensional model of the target object.
  • the following embodiments of the present application provide a method for obtaining the target based on the second point cloud information.
  • the specific implementation of the three-dimensional model of the object is described in detail below with reference to the accompanying drawings for ease of understanding.
  • the fourth embodiment of the method for constructing a three-dimensional model of a target includes the following steps.
  • Steps 501 to 507 can refer to the above steps 301 to 307, which will not be repeated here.
  • the three-dimensional network is a non-porous surface connecting various points in the second point cloud information.
  • the Poisson reconstruction used in the technical solution provided in this application is the Poisson reconstruction technology in the prior art, which is not limited in the embodiment of this application.
  • the purpose of the Poisson reconstruction is to generate a watertight, non-porous surface , Poisson reconstruction
  • the input is the point cloud and its corresponding normal.
  • the point cloud comes from the result of direct fusion in the previous step.
  • the normal is directly extracted from the depth image of the selected frame. That is, the depth image is regarded as a Z(x,y) function.
  • Poisson reconstruction can be achieved by the following formula.
  • x, y, and z are the abscissa, ordinate, and depth coordinates of each three-dimensional point, direction is the normal direction, and magnitude is the size of the normal vector.
  • the resulting normal is normal It is equal to the direction of the normal divided by the size of the normal, which is the norm of the normal. So as to realize the extraction of the normal.
  • the three-dimensional network obtained by Poisson reconstruction already has the prototype of the three-dimensional model.
  • the three-dimensional network may contain some background shapes, and the surface of the three-dimensional network of human faces has some uneven surfaces.
  • This phenomenon requires post-processing of the three-dimensional network.
  • the post-processing steps include clipping and smoothing.
  • the post-processing may specifically include the following steps.
  • the points in the three-dimensional network are projected along the z-axis direction to obtain the first projection surface.
  • Convex Hull is a concept in computational geometry (graphics).
  • V the intersection S of all convex sets containing X is called the convex hull of X.
  • the convex hull of X can be constructed by the convex combination of all points (X1,...Xn) in X; in the two-dimensional Euclidean space, the convex hull can be imagined as a rubber ring that just encloses all the points.
  • the convex hull is constructed by connecting the feature points. Since only the face can have a curved surface capable of constructing the convex hull, the area where the face is located can be distinguished from the background area in this way, and the final second The projection surface is the area where the human face is located.
  • the three-dimensional network is cut according to the second projection surface to eliminate the three-dimensional network of non-target objects.
  • the part of the projection surface other than the second projection surface is eliminated in the three-dimensional network, so as to achieve clipping, and the three-dimensional network of non-target objects is eliminated, and a three-dimensional network with only the face area is obtained.
  • the smoothing process can be implemented by HC Laplacian Smoothing.
  • smoothing can be implemented by the algorithm of HC Laplacian Smooth as shown in Figure 5b, as shown in Figure 5b.
  • the impact factor factor1 is set to 0.0
  • the factor2 is set to It is 0.9
  • the algorithm idea shown in Figure 5b is: select an original point and the point connected to the original point, and obtain a new point through preset rules and weighted average to replace the original point.
  • the coordinate value is assigned to q; i represents the following table of a certain vertex (that is, the vertex to be smoothed), Vear represents all vertices, "for all i ⁇ Vear do" represents that the subsequent formula is executed for all the vertices to be smoothed, n represents the same as the i vertex
  • Adj means adjacent points, "if n ⁇ 0 then” means that when the number of vertices connected to the i vertex is not 0, the subsequent algorithm will be executed; pi means the average point, which is connected to the i vertex
  • the result of the weighted average of all adjacent points by introducing the two parameters of ⁇ and ⁇ , prevents the smoothed model from shrinking, and at the same time, makes the balance effect can converge.
  • bi The role of bi is to determine a direction, that is, to move closer to the original point. It can be understood as the connection line between qi and pi, whose length is determined by ⁇ , and di is determined by the weighted average of bi of all adjacent points, that is, the effect is determined by ⁇ , and each original point has a corresponding bi.
  • the two points qj2 to qj3 are respectively smoothed, and two smoothed points pj2 and pj3 are obtained.
  • the obtained connection line of qj1, pj2, qi, pj3 and qj4 is The smoothed curve.
  • post-processing is performed through the second point cloud information obtained after the fusion, so as to obtain a trimmed and smooth three-dimensional model, and the reconstruction of the three-dimensional model of the target is completed.
  • the method for constructing a three-dimensional model of the target provided by the embodiments of this application can be implemented based on the surface element Surfel model.
  • the following is a detailed description of the method for constructing a three-dimensional model of the target provided by the embodiments of this application in combination with specific usage scenarios. Description.
  • the fifth embodiment of the method for constructing a three-dimensional model of a target includes the following steps.
  • this step can refer to the above step 101, which will not be repeated here.
  • FIG. 7 As shown in FIG. 7, the user takes 8 photos with different angles under the guidance of the terminal, thereby obtaining 8 initial images at different shooting angles.
  • the color information comes from the RGB information in the image recorded by the shooting lens.
  • the shooting lens has the ability to record depth information, such as a depth camera, so that the initial image also includes the depth information of the image.
  • depth information such as a depth camera
  • this step can refer to the above step 202, which will not be repeated here.
  • the face in the eight initial images shown in FIG. 7 is marked with multiple feature points 701 as feature point information.
  • the offset of each feature point is obtained according to the feature point, and then the camera pose of the target in each initial image is obtained according to the offset.
  • the above steps 203 to 204 which will not be repeated here.
  • the camera pose reflects the shooting angle of the target in each initial image.
  • the initial images with similar shooting angles can be filtered out and eliminated, thereby avoiding the repetition of initial images with similar shooting angles. Processing, thus realizing the work process of frame screening.
  • steps 204 please refer to the relevant records of step 204, which will not be repeated here.
  • the depth information is back-projected to obtain the three-dimensional point information of the target, that is, the first point cloud information.
  • the depth information is back-projected to obtain the three-dimensional point information of the target, that is, the first point cloud information.
  • the second point cloud information includes a plurality of three-dimensional points 801, where each three-dimensional point 801 is the one described in step 307 Second point.
  • the specific implementation of this step can refer to the above step 508, which will not be repeated here.
  • FIG. 9 After the Poisson reconstruction is completed, the three-dimensional network mash shown in FIG. 9 is obtained.
  • the three-dimensional network includes a face part 901 and a background part 902.
  • the specific implementation of this step can refer to the above step 509, which will not be repeated here.
  • the three-dimensional network is projected in a direction perpendicular to the lens surface according to the feature points to obtain the first projection surface, and then the feature points are connected to form a convex hull in the first projection surface.
  • the area where the convex hull is acquired is the second projection surface.
  • the obtained second projection surface is shown in FIG. 10, and 1001 in FIG. 10 is the second projection surface of the area where the human face is located.
  • the three-dimensional network is trimmed according to the second projection surface to remove the non-target three-dimensional network, that is, the background part 902 in FIG. 9, leaving the face part 901, and the result is a face 1101 as shown in FIG. Three-dimensional network.
  • the three-dimensional network is smoothed to obtain the three-dimensional model of the human face 1201 as shown in FIG. 12.
  • the three-dimensional model obtained in the above step 609 is a model with only the shape of a human face, and does not include color information.
  • the color information obtained from the initial image is performed on the obtained three-dimensional model.
  • Texture mapping so that the three-dimensional model has color information, and the three-dimensional face model with texture color information as shown in Figure 13 is obtained.
  • the three-dimensional model can be rotated at will.
  • Figure 13 shows the front view 1301 of the three-dimensional model. Side view 1302 and left side view 1303.
  • the specific implementation manner of the above-mentioned texture mapping is any implementation manner in the prior art, and the embodiment of this application will not be specifically described.
  • the three-dimensional model shown in Figure 12 is finally obtained.
  • the three-dimensional model shown in Figure 13 can be obtained, thereby achieving A 3D model reconstruction based on 2D images.
  • the method for constructing a three-dimensional model of a target object includes: acquiring at least two initial images of the target object at different shooting angles; acquiring each of the initial images separately according to the depth information in each initial image Corresponding first point cloud information; fusing the first point cloud information corresponding to the at least two initial images into second point cloud information; constructing a three-dimensional model of the target object according to the second point cloud information.
  • the 3D model building process can be realized without the need to establish additional storage space.
  • the 3D model of the target object is directly constructed by point cloud fusion, which maximizes the utilization efficiency of storage space and enables the terminal to efficiently perform face reconstruction The modeling process.
  • a computer device includes hardware structures and/or software modules corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the above method may be implemented by one physical device, or jointly implemented by multiple physical devices, or may be a logical function module in one physical device, which is not specifically limited in the embodiment of the present application.
  • FIG. 14 is a schematic diagram of the hardware structure of a computer device provided by an embodiment of the application.
  • the computer device includes at least one processor 1401, a communication line 1402, a memory 1403, and at least one communication interface 1404.
  • the processor 1401 can be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application-specific integrated circuit (server IC), or one or more programs for controlling the execution of the program of this application Integrated circuits.
  • CPU central processing unit
  • server IC application-specific integrated circuit
  • the communication line 1402 may include a path to transmit information between the aforementioned components.
  • Communication interface 1404 using any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area networks (WLAN), etc. .
  • RAN radio access network
  • WLAN wireless local area networks
  • the memory 1403 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), or other types that can store information and instructions
  • the dynamic storage device can also be electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, optical disc storage (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory may exist independently, and is connected to the processor through a communication line 1402. The memory can also be integrated with the processor.
  • the memory 1403 is used to store computer-executable instructions for executing the solution of the present application, and the processor 1401 controls the execution.
  • the processor 1401 is configured to execute computer-executable instructions stored in the memory 1403, so as to implement the method provided in the foregoing embodiment of the present application.
  • the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
  • the processor 1401 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 14.
  • the computer device may include multiple processors, such as the processor 1401 and the processor 1407 in FIG. 14.
  • processors can be a single-CPU (single-CPU) processor or a multi-core (multi-CPU) processor.
  • the processor here may refer to one or more devices, circuits, and/or processing cores for processing data (for example, computer program instructions).
  • the computer device may further include an output device 1405 and an input device 1406.
  • the output device 1405 communicates with the processor 1401, and can display information in a variety of ways.
  • the output device 1405 may be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector (projector) Wait.
  • the input device 1406 communicates with the processor 1401, and can receive user input in a variety of ways.
  • the input device 1406 may be a mouse, a keyboard, a touch screen device, or a sensor device.
  • the above-mentioned computer device may be a general-purpose device or a special-purpose device.
  • the computer device can be a desktop computer, a portable computer, a network server, a handheld computer (personal digital assistant, PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or a device with a similar structure in Figure 14 .
  • PDA personal digital assistant
  • the embodiments of this application do not limit the type of computer equipment.
  • the embodiment of the present application may divide the storage device into functional units according to the foregoing method examples.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 15 shows a schematic diagram of a device for constructing a three-dimensional model of a target object.
  • the device for constructing a three-dimensional model of a target includes:
  • the first acquiring unit 1501 is configured to acquire at least two initial images of a target object at multiple shooting angles, the at least two initial images respectively record the depth information of the target object, and the depth information is used To record the distance between multiple points of the target and the reference position;
  • a second acquiring unit 1502 which is configured to respectively acquire first point cloud information corresponding to the at least two initial images according to the depth information in the at least two initial images acquired by the first acquiring unit 1501;
  • a fusion unit 1503 which is configured to fuse first point cloud information corresponding to the at least two initial images acquired by the second acquisition unit 1502 into second point cloud information;
  • the construction unit 1504 is configured to construct a three-dimensional model of the target object according to the second point cloud information obtained by the fusion unit 1503.
  • the device further includes a feature point detection unit 1505, and the feature point detection unit 1505 is configured to:
  • the camera pose is used to indicate the movement of the target relative to the reference position in different initial images, and the movement includes rotation and translation
  • the reference position is the position of the shooting lens for shooting the target
  • the fusion unit 1503 is also used for:
  • the first point cloud information corresponding to the at least two initial images is fused into the second point cloud information according to the camera pose.
  • the fusion unit 1503 is also used for:
  • the first point overlapping between different initial images is merged into a second point, where the first point is a point in the first point cloud information, and the second point is a point in the second point cloud information .
  • the fusion unit 1503 is also used for:
  • the overlapping first point is merged into the second point according to the weight.
  • the fusion unit 1503 is also used for:
  • the fusion unit 1503 is also used for:
  • the first point and the first frame that have a smaller absolute value of the depth difference from the first point in the first frame are acquired in the first initial image
  • Point cloud fusion is performed on the first point in to obtain the second point, where the first initial image is an image that is not the first frame of the at least two initial images.
  • construction unit 1504 is also used to:
  • construction unit 1504 is also used to:
  • the device further includes a screening unit 1506, and the screening unit 1506 is configured to:
  • an embodiment of the present application also provides a storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the method provided in the foregoing embodiment.
  • the embodiments of the present application also provide a computer program product including instructions, which when run on a computer, cause the computer to execute the method provided in the above-mentioned embodiments.
  • the steps of the method or algorithm described in the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例所提供的目标物的三维模型构建方法,包括:获取目标物在多个拍摄角度上的至少两个初始图像,至少两个初始图像分别记录有目标物的深度信息,深度信息用于记录目标物的多个点与参考位置之间的距离;根据至少两个初始图像中的深度信息,分别获取至少两个初始图像对应的第一点云信息;将至少两个初始图像分别对应的第一点云信息融合为第二点云信息;根据第二点云信息构建目标物的三维模型。本申请还提供一种装置、设备及介质,不需要建立额外的存储空间即可实现三维模型的建立过程,直接通过点云融合的方式来构建目标物的三维模型,实现了存储空间利用效率的最大化,使得终端能够高效地执行人脸重建的建模过程。

Description

一种目标物的三维模型构建方法和相关装置
本申请要求于2020年01月02日提交中国专利局、申请号为202010003052.X、申请名称为“一种目标物的三维模型构建方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子技术领域,涉及目标物的三维模型构建。
背景技术
人脸重建技术,即通过一张或多张的2D人脸图像重建出人脸3D模型的技术。在具体工作过程中,用户站在拍摄镜头的视野范围内,并按指示转动头部,以使得拍摄镜头能够拍摄到用户不同角度的脸部画面,其中,该指示用户转动头部的指示信息,可以是由负责控制拍摄镜头拍摄的终端发出的,例如具备摄像功能的智能手机或平板电脑等。
人脸重建技术在各个领域都有广泛的应用,例如以娱乐领域来说,用户在进行3D游戏时,可以通过重建用户的人脸模型,使得游戏角色的长相为用户的长相,从而使得3D游戏角色的构建能够更加个性化。
发明内容
有鉴于此,为解决上述问题,本申请提供的技术方案如下:
一方面,本申请实施例提供了一种目标物的三维模型构建方法,包括:
获取目标物在多个拍摄角度上的至少两个初始图像,该至少两个初始图像分别记录有该目标物的深度信息,该深度信息用于记录该目标物的多个点与参考位置之间的距离;
根据该至少两个初始图像中的深度信息,分别获取该至少两个初始图像对应的第一点云信息;
将该至少两个初始图像分别对应的第一点云信息融合为第二点云信息;
根据该第二点云信息构建该目标物的三维模型。
另一方面,本申请实施例提供了一种目标物的三维模型构建装置,包括:
第一获取单元,该第一获取单元用于获取目标物在多个拍摄角度上的至少两个初始图像,该至少两个初始图像分别记录有该目标物的深度信息,该深度信息用于记录该目标物的多个点与参考位置之间的距离;
第二获取单元,该第二获取单元用于根据该第一获取单元获取的该至少两个初始图像中的深度信息,分别获取该至少初始图像对应的第一点云信息;
融合单元,该融合单元用于将该第二获取单元获取的该至少两个初始图像分别对应的第一点云信息融合为第二点云信息;
构建单元,该构建单元用于根据该融合单元得到的该第二点云信息构建该目标物的三维模型。
另一方面,本申请实施例提供了一种计算机设备所述计算机设备包括:交互装置、输入/输出(I/O)接口、处理器和存储器,该存储器中存储有程序指令;该交互装置用于获取用户输入的操作指令;该处理器用于执行存储器中存储的程序指令,执行如上述方面的方 法。
另一方面,本申请实施例提供了一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行以上方面的方法。
又一方面,本申请实施例提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行以上方面的方法。
本申请实施例所提供的目标物的三维模型构建方法,包括:获取目标物在多个拍摄角度上的至少两个初始图像,至少两个初始图像分别记录有目标物的深度信息,深度信息用于记录目标物的多个点与参考位置之间的距离;根据至少两个初始图像中的深度信息,分别获取至少两个初始图像对应的第一点云信息;将至少两个初始图像分别对应的第一点云信息融合为第二点云信息;根据第二点云信息构建目标物的三维模型。不需要建立额外的存储空间即可实现三维模型的建立过程,直接通过点云融合的方式来构建目标物的三维模型,实现了存储空间利用效率的最大化,使得终端能够高效地执行人脸重建的建模过程。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例所提供的目标物的三维模型构建方法的一个实施例的流程图;
图2为本申请实施例所提供的目标物的三维模型构建方法的另一个实施例的流程图;
图3为本申请实施例所提供的目标物的三维模型构建方法的另一个实施例的流程图;
图4为本申请实施例所提供的目标物的三维模型构建方法一个实施例的示意图;
图5a为本申请实施例所提供的目标物的三维模型构建方法的另一个实施例的流程图;
图5b为本申请实施例所提供的目标物的三维模型构建方法的中平滑处理算法的示意图;
图5c为本申请实施例所提供的目标物的三维模型构建方法的中平滑处理的示意图;
图6为本申请实施例所提供的目标物的三维模型构建方法的另一个实施例的流程图;
图7为本申请实施例所提供的目标物的三维模型构建方法中初始图像的示意图;
图8为本申请实施例所提供的目标物的三维模型构建方法中第二点云信息的示意图;
图9为本申请实施例所提供的目标物的三维模型构建方法中三维网络的示意图;
图10为本申请实施例所提供的目标物的三维模型构建方法中第二投影面的示意图;
图11为本申请实施例所提供的目标物的三维模型构建方法中剪裁后三维网络的示意图;
图12为本申请实施例所提供的目标物的三维模型构建方法中三维模型的示意图;
图13为本申请实施例所提供的目标物的三维模型构建方法中三维模型的示意图;
图14为本申请实施例所提供的计算机设备的示意图;
图15为本申请实施例所提供的目标物的三维模型构建装置的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本 申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
人脸重建技术,即通过一张或多张的2D人脸图像重建出人脸3D模型的技术。在具体工作过程中,用户站在拍摄镜头的视野范围内,并按指示转动头部,以使得拍摄镜头能够拍摄到用户不同角度的脸部画面,其中,该指示用户转动头部的指示信息,可以是由负责控制拍摄镜头拍摄的终端发出的,例如具备摄像功能的智能手机或平板电脑等。
人脸重建技术在各个领域都有广泛的应用,例如以娱乐领域来说,用户在进行3D游戏时,可以通过重建用户的人脸模型,使得游戏角色的长相为用户的长相,从而使得3D游戏角色的构建能够更加个性化。
在上述娱乐领域为代表的一些领域中,人脸重建技术存在以下特点:由于娱乐等用途不需要特别强的精度,用于构建3D模型的2D图像是用户通过终端自行拍摄获得的,而终端的计算和存储能力是有限的,现有技术中的人脸重建技术需要占用较大的内存空间和计算能力。
因此,为了解决上述问题,本申请实施例提供一种目标物的三维模型构建方法,能够在终端,对获取到的二维图像建模,得到三维的人脸模型,所得到的三维模型可用于游戏软件、社交软件和3D打印等多种场景中。为便于理解,以下结合附图,对本申请实施例所提供的方法进行详细说明。
请参阅图1,如图1所示,本申请实施例所提供的目标物的三维模型构建方法的实施例一包括以下步骤。
101、获取目标物在多个拍摄角度上的至少两个初始图像。
本实施例中,对于该目标物的具体类型,本申请实施例并不进行限定,例如可以为人脸、人体或者人体的任何部位,也可以是各种物体,例如玩偶或汽车等。为便于理解,本申请下述实施例以人脸为例进行详细说明。
进一步地,可以是通过终端的拍摄镜头对人脸进行拍摄,该终端的具体类型本申请实施例亦不进行限定,可以为智能手机或平板电脑等,终端提示用户,在不同的角度对面部进行拍摄,以得到记录有从不同角度记录面部的至少两张初始图像,可选地,该初始图像也可以是通过扫描等方式获得的,对此本申请实施例并不进行限定。
更进一步地,上述拍摄镜头需要具有记录深度信息的能力,以使得所得到的中初始图像分别记录有目标物的深度信息,所述深度信息用于记录目标物的各个点与拍摄镜头之间的距离。
102、根据至少两个初始图像中的深度信息,分别获取至少两个初始图像对应的第一点云信息。
本实施例中,对于初始图像中所记录的图像,每个像素点中都会记录有一个二维的坐标信息,例如一个像素点A的坐标值为(x,y),再加上初始图像中还包括深度信息,通过像素点A距离镜头的距离,使得像素点A的坐标值成为一个三维点,该三维点A的坐标值为(x,y,z),第一点云信息中包括用于记录目标物的多个三维点的坐标值。
103、将至少两个初始图像分别对应的第一点云信息融合为第二点云信息。
本实施例中,每个初始图像中分别记录有各自的第一点云信息,分别记录了在不同的拍摄角度下所生成的人脸三维点,将不同角度下拍摄得到的初始图像所对应的第一点云信息,经移动后调整到同一角度后进行融合,得到一个第二点云信息,从而使得该第二点云信息可以更精确地记录目标物的点云信息。
例如,目标物为人脸,用户在终端指导下分别拍摄的正脸、左侧脸和右侧脸三张初始图像,经过上述步骤102的处理,分别得到了三张图像的第一点云信息,该三张图像的第一点云信息分别记录了用户正脸、左侧脸和右侧脸状态下的点云信息,将三张图像所对应的第一点云信息融合后,即可得到用户全脸的点云信息,作为第二点云信息。
104、根据第二点云信息构建目标物的三维模型。
本实施例中,由于第二点云信息中已经记录有目标物的各个点的三维坐标信息,因此,根据该第二点云信息,即可对目标物的三维模型进行构建,从而得到了目标物的三维模型。
本申请实施例所提供的目标物的三维模型构建方法,包括:获取目标物在不同拍摄角度上的至少两个初始图像;根据每个初始图像中的深度信息,分别获取所述每个初始图像对应的第一点云信息;将所述至少两个初始图像分别对应的第一点云信息融合为第二点云信息;根据所述第二点云信息构建所述目标物的三维模型。不需要建立额外的存储空间即可实现三维模型的建立过程,直接通过点云融合的方式来构建目标物的三维模型,实现了存储空间利用效率的最大化,使得终端能够高效地执行人脸重建的建模过程。
需要说明的是,在上述步骤103中,对第一点云信息进入融合时,需要知道不同的初始图像对应的第一点云信息之间的关系,才能够对第一点云信息进行融合,为便于理解,以下提供一种具体的实现方式,以解决此问题。
请参阅图2,如图2所示,本申请实施例所提供的目标物的三维模型构建方法的实施例二包括以下步骤。
201、获取目标物在多个拍摄角度上的至少两个初始图像。
本实施例中,本步骤可参阅上述步骤101,此处不再赘述。
202、对至少两个初始图像分别进行特征点检测,以在至少两个初始图像中分别得到用于标记目标物的至少两个特征点。
本实施例中,可选地,执行特征点检测可以通过landmark检测来实现,具体地,通过特征点检测模型来对初始图像进行特征点检测。该特征点检测模型可以是通过训练得到的,例如,在多张人脸图像中通过手动标记特征点作为训练素材,训练素材中标记了人脸图像中的眼角、鼻尖及嘴角等特征点,使用该训练素材对特征点检测模型进行训练后,所得到 的特征点检测模型即可具备对人脸图像进行标记的能力,当输入初始图像时,特征点检测模型可以根据其训练结果,标记处初始图像中的特征点,例如初始图像中人脸的眼角、鼻尖及嘴角等特征点。对于上述模型的具体训练方法,可以是现有技术中的任意一种训练方法,对此本申请并不进行限定。
通过对各个初始图像进行特征点检测后,使得终端可以根据所标记的特征点,对各个初始图像中的不同部位进行语义识别,使得终端可以得知各个初始图像中人脸图像的各个部位的名称,例如第一特征点A所标记的位置是眼角,第二特征点B所标记的位置是鼻尖,第三特征点C所标记的位置是嘴角等等。
203、在至少两个初始图像之间,获取至少两个特征点间的偏移量。
本实施例中,偏移量用于标识目标物同一位置的特征点在不同初始图像之间的坐标差值,例如,在正脸图像中,鼻尖的特征点为B;左侧脸图像中,鼻尖的特征点为B’,则通过计算特征点B和特征点B’之间的偏移量,即可知晓用户在向右转动面部拍照时,所偏移的角度,进一步地,还可以通过同样方法计算其他特征点之间的偏移量,例如眼角特征点的偏移量,嘴角偏移点的偏移量等等。
204、根据偏移量获取至少两个初始图像中目标物的相机位姿。
本实施例中,相机位姿用于表示在不同的初始图像中目标物相对拍摄镜头的移动,移动包括旋转和平移中的至少一种。具体实施方式中,初始图像的获取可以是用户在终端的指示下拍摄多张不同角度的二维照片,其中,由于拍摄的角度不同,每张初始图像中用户面部相对与拍摄镜头的角度不同,即相对拍摄镜头发生不同程度的旋转或平移,对此,通过前序步骤中计算获得的偏移量,即可获取目标物的相机位姿,该相机位姿可以用于表征不同初始图像之间的目标物的变化。
可选地,由于本申请实施例所提供的方法主要应用于娱乐场景下,初始图像的获得一般是用户在终端的指示下执行拍摄的,例如,用户以一定速度相对拍摄镜头旋转面部,在此过程中终端控制拍摄镜头间隔预设时间拍摄,以得到不同角度的初始图像。由于用户并不是专业人员,其旋转面部的速度不能保证呈稳定的线性,可能会发生在某个角度下用户旋转面部较慢,导致获得了同一拍摄角度下多张初始图像的情况。因此,可以在获得初始图像之后,可以对初始图像进行筛选。具体实现方式为:
剔除至少两个初始图像中相似度大于预设值的图像。
本实施例中,判断初始图像之间相似度的方式可以是通过相机位姿来进行的,当两个初始图像之间的相机位姿相似度大于预设值时,即可判断该两个初始图像之间的拍摄角度较为接近,此时剔除其中一张初始图像即可,从而避免了后续过程中对相近角度的初始图像进行重复处理。
205、根据相机位姿将至少两个初始图像分别对应的第一点云信息融合为第二点云信息。
本实施例中,不同初始图像对应的第一点云信息,是基于不同拍摄角度所生成的三维坐标点集合,理想情况下,对于同一目标物,基于不同角度下拍摄的二维图像所重建的三维坐标点集合应该是相同的,然而,在实际工作中,由于拍摄角度或像素点噪声等原因产生的误差,会使得不同拍摄角度下生成的三维坐标点集合之间存在一定误差,因此需要根 据相机位姿所记录的不同初始图像之间目标物的变化,对不同初始图像所对应的第一点云信息进行融合,以得到最接近目标物实际情况的第二点云信息。
206、根据第二点云信息构建目标物的三维模型。
本实施例中,本步骤可参阅上述步骤104,此处不再赘述。
本实施例中,在执行点云融合之前,通过特征点检测来获取到初始图像的语义,使得机器能够根据检测到的特征点获取各个初始图像之间的移动关系,生成相机位姿,之后根据该相机位姿来执行点云融合,从而能够根据相机位姿所得到的关系,对不同拍摄角度下生成的点云信息进行精确融合。
进一步地,本申请实施例进一步提供一种更详细的实现方式,对点云融合的具体过程进行说明。
请参阅图3,如图3所示,本申请实施例所提供的目标物的三维模型构建方法的实施例三包括以下步骤。
步骤301至304可参阅上述步骤201至204,此处不再赘述。
305、从至少两个初始图像中确定一个初始图像为第一帧。
本实施例中,从获取的至少两个初始图像中确定一个为第一帧,例如,从正脸图像、左侧脸图像和右侧脸图像中确定正脸图像作为第一帧,该第一帧可以视为一个初始的基准点。
306、根据相机位姿将至少两个初始图像中除第一帧以外的其他初始图像的点移动到第一帧的角度。
本实施例中,不同的初始图像分别对应有一个自己的第一点云信息,例如,正脸图像(第一帧)生成了点云信息A,左侧脸图像生成了点云信息B,右侧脸图像生成了点云信息C;其中,根据相机位姿可知,左侧脸图像的拍摄角度为正脸向右旋转90°,右侧脸图像为正脸向左旋转90°。则此步骤中,需要对左侧脸图像和右侧脸图像所对应的点云信息进行移动:将左侧脸图像的点云信息B向左旋转90°,右侧脸图像的点云信息C向右旋转90°,以使得左侧脸图像和右侧脸图像所对应的点全部移动到正脸图像(第一帧的角度)。
307、将不同的初始图像之间重叠的第一点融合为第二点。
本实施例中,第一点为第一点云信息中的点,第二点为第二点云信息中的点。对第一点云信息中的每个点分别进行融合,以得到第二点云信息。例如,正脸图像中用户鼻尖处有一三维点A1,坐标值为(x1,y1,z1);左侧脸图像中用户鼻尖处有一三维点B1,坐标值为(x2,y2,z2);右侧脸图像中用户鼻尖处有一三维点C1,坐标值为(x3,y3,z3);根据上述步骤306对左侧脸图像和右侧脸图像进行旋转后,A1、B1及C1三个三维点重合,此时,对A1、B1及C1执行融合,得到三维点D1(x4,y4,z4),该三维点D1即为第二点云信息中用户鼻尖处的三维点。
需要说明的是,在上述工作的过程中,由于初始图像的拍摄角度即图像噪声等原因,上述A1、B1及C1三个三维点不可能完全重叠,因此,需要按照权重对A1、B1及C1三个三维点分配不同的权重,具体包括如下步骤。
1、分别对至少两个初始图像中的第一点分配不同的权重。
本实施例中,例如,初始图像包括正脸图像,左侧脸图像和右侧脸图像三张,三张图像中均包括一个用于表示用户鼻尖的点(作为举例,鼻尖点即为第一点),此时,在执行点融合时,需要根据对来自不同的图像的点分配不同的权重,可选地,可以根据第一点所在初始图像的拍摄角度、图像噪声值或法线方向中的至少一种对第一点分配权重值,例如,正脸图像的拍摄角度最正,对于鼻尖这个点来说准确率较高,因此分配正脸图像的第一点的权重为60%,左侧脸图像和右侧脸图像的第一点的权重分别为20%。
2、根据权重将重叠的第一点融合为第二点。
本实施例中,如上述举例,正脸图像的第一点的权重为60%,左侧脸图像和右侧脸图像的第一点的权重分别为20%,则融合后所得到的第二点D1的三维坐标中:x4=(x1*60%+x2*20%+x3*20%)/3;y4=(y1*60%+y2*20%+y3*20%)/3;z4=(z1*60%+z2*20%+z3*20%)/3。从而能够根据不同的权重,对来自不同初始图像的三维点进行更加精确的融合。
需要说明的是,在上述点云融合的过程中,还有可能会遇到一种特殊的情况,请参阅图4,如图4所示,用户的正脸图像中,包括右眼角A1和左眼角A2两个三维点;右侧脸图像中,由于用户的面部左转了90°,此时在右侧脸图像中,用于表示右眼角的三维点B1和用于表示左眼角的三维点B2在右侧脸图像中是重叠的,此时会造成一个问题,即机器无法区分B1和B2两个点,从而无法判断B1和B2两个点中哪一个应该与点A1融合。为此,在遇到此类问题时,本申请实施例提供如下步骤。
在第一初始图像中获取与第一帧中的第一点深度差的绝对值更小的第一点与第一帧中的第一点进行点云融合。
本实施例中,对于B1和B2两个点,只是在x和y两个坐标值上重叠,用于表征二者深度信息的z坐标值是不同的,例如,B1的坐标值为(x1、y1、z1),B2的坐标值为(x2、y2、z2),当B1和B2两个点在初始图中重叠时,x1=x2,y1=y2,但z1≠z2;A1点的坐标为(x3、y3、z3),,此时,对深度信息z坐标作差,得到D1=|z3-z1|和D2=|z3-z2|,此时由于B1点和A1点实际上都是用于表示右眼的坐标点,二者之间的纵坐标距离会近一些,因此可以得到D1<D2,从而可以得到,B1才是应该与A1进行融合的点。
308、根据第二点云信息构建目标物的三维模型。
本实施例中,本步骤可参阅上述步骤104,此处不再赘述。
本实施例中,将每个初始图像中的有效点云均投影到参考坐标系下(第一帧所在相机坐标系),之后对于重叠区域的内点进行加权融合,从而能够更精准地得到第二点云信息,从而建立更加精确的三维模型。
需要说明的是,在得到第二点云信息后,需要对第二点云信息进行进一步的处理,以得到目标物的三维模型,以下本申请实施例提供一种基于第二点云信息得到目标物的三维模型的具体实施方式,为便于理解,以下结合附图进行详细说明。
请参阅图5a,如图5a所示,本申请实施例所提供的目标物的三维模型构建方法的实施例四包括以下步骤。
步骤501至507可参阅上述步骤301至307,此处不再赘述。
508、对第二点云信息进行泊松重建,以得到目标物的三维网络。
本实施例中,三维网络为连接第二点云信息中各个点的无孔洞表面。本申请所提供的技术方案中所使用的泊松重建为现有技术中的泊松重建技术,对此本申请实施例并不进行限定,泊松重建的目的在于生成一个水密性的无孔洞表面,泊松重建在构建泊松方程时,输入是点云及其对应法线,点云来自上一步直接融合后的结果。为了保证法线方向一致性的原则,直接所筛选帧的深度图像上提取法线。即将深度图像看作Z(x,y)函数。
可选地,泊松重建可以通过以下公式来实现。
dzdx=(z(x+1,y)-z(x-1,y))/2.0;(公式1)
dzdy=(z(x,y+1)-z(x,y-1))/2.0;(公式2)
direction=(-dxdz,-dydz,1.0);(公式3)
magnitude=sqrt(direction.x**2+direction.y**2+direction.z**2)(公式4)
normal=direction/magnitude(公式5)
上述公式中,x、y及z分别为各个三维点的横坐标、纵坐标和深度坐标,direction为法线方向,magnitude为法线向量的大小,如公式5所示,最终得到的法线normal等于法线方向除以法线大小,即法线的模。从而实现了法线的提取。
509、对三维网络进行剪裁和平滑处理,以得到三维模型。
本实施例中,通过泊松重建所得到的三维网络已经具备了三维模型的雏形,此时,该三维网络中可能会包含一些背景的形状,同时人脸三维网络中的表面存在一些不平滑的现象,因此需要对三维网络进行后处理,后处理步骤就包括剪裁和平滑处理。可选地,后处理可以具体包括以下步骤。
1、根据特征点对三维网络沿垂直于镜头面的方向投影,得到第一投影面。
本实施例中,将三维网络中的点沿着z轴方向投影,即可得到第一投影面。
2、在第一投影面中将特征点连接成凸包,获取凸包所在的区域为第二投影面。
本实施例中,凸包(Convex Hull)是一个计算几何(图形学)中的概念。在一个实数向量空间V中,对于给定集合X,所有包含X的凸集的交集S被称为X的凸包。X的凸包可以用X内所有点(X1,...Xn)的凸组合来构造;在二维欧几里得空间中,凸包可想象为一条刚好包住所有点的橡皮圈。通过连接特征点的方式构建凸包,由于只有人脸上才会有能够构建凸包的弧面,因此可以通过此种方式,来区分人脸所在的区域和背景区域,最终所得到的第二投影面即为人脸所在的区域。
3、根据第二投影面对三维网络进行剪裁,以剔除非目标物的三维网络。
本实施例中,根据第二投影面,在三维网络中剔除投影面为第二投影面以外的部分,从而实现剪裁,剔除非目标物的三维网络,得到了一个仅有人脸区域的三维网络。
4、对剪裁后的三维网络进行平滑处理,得到三维模型。
本实施例中,在完成平滑化处理后,即可得到目标物的三维模型,可选地,该平滑处理可以通过HC Laplacian Smooth平滑处理来实现。
可选地,平滑处理通过HC Laplacian Smooth的算法实现可以如图5b所示,如图5b所示,作为一种优选的实施方式,为了保留更多的细节,影响因子factor1设为0.0,factor2设为0.9,如图5b所示的算法思路为:选中一个原始点,以及与该原始点相连的点,通过预设规则和加权平均得到新的点从而取代原始点。图5b中,p和o均表示原始点,通过“p:=o”,将o的坐标值赋予给p,之后通过“repeat”循环执行后续代码,通过“q:=p”将p代表的坐标值赋给q;i表示某一顶点(即待平滑顶点)的下表,Vear表示所有顶点,“for all i∈Vear do”表示,对所有待平滑顶点执行后续公式,n表示与i顶点相连接的顶点个数,Adj表示邻接点,“if n≠0 then”表示当与i顶点相连接的顶点个数不为0时,执行后续算法;pi表示平均点,即与i顶点相连的所有邻接点的加权平均后的结果,通过引入α和β两个参数,防止平滑后的模型收缩,同时,使得平衡效果可以收敛,bi的作用是为了确定一个方向,即向原始点靠拢,bi可以理解为qi和pi的连线,其长度由α来决定,di由所有邻接点的bi的加权平均决定,即效果由β决定,每个原始点都有一个对应的bi。如图5c所示,经过图5b的算法流程,qj2至qj3两个点分别经过平滑,得到pj2和pj3两个平滑后的点,得到的qj1、pj2、qi、pj3及qj4的连线即为平滑后的曲线。
上述步骤中,通过融合之后得到的第二点云信息进行后处理,从而得到剪裁过后的、平滑的三维模型,完成了目标物三维模型的重建。
需要说明的是,本申请实施例所提供的目标物的三维模型构建方法可以基于面元Surfel模型来实现,以下结合具体使用场景,对本申请实施例所提供的目标物的三维模型构建方法进行详细说明。
请参阅图6,如图6所示,本申请实施例所提供的目标物的三维模型构建方法的实施例五包括以下步骤。
601、获取目标物在不同拍摄角度上的至少两个初始图像。
本实施例中,本步骤的具体实现方式可参阅上述步骤101,此处不再赘述。请参阅图7,如图7所示,用户在终端的指引下拍摄了8张不同角度的照片,从而得到了8张不同拍摄角度上的初始图像。
602、分别获取初始图像中的色彩信息和深度信息。
本实施例中,色彩信息来自于拍摄镜头所记录的图像中的RGB信息,同时,该拍摄镜 头具备记录深度信息的能力,例如可以为深度相机,从而初始图像中还包括有图像的深度信息。对于获取到的上述8张初始图像,分别获取其色彩信息和深度信息。
603、对初始图像分别进行特征点检测,得到特征点信息。
本实施例中,本步骤可参阅上述步骤202,此处不再赘述。经过特征点检测,如图7所示的8张初始图像中的人脸被标记有多个特征点701作为特征点信息。
604、根据特征点信息分别获取每个初始图像的相机位姿。
本实施例中,根据特征点获取每个特征点的偏移量,之后根据偏移量获取每个初始图像中目标物的相机位姿。具体实现方式可参阅上述步骤203至204,此处不再赘述。
605、根据相机位姿执行帧筛选。
本实施例中,相机位姿反应了每个初始图像中目标物的拍摄角度,通过相机位姿,可以筛选出拍摄角度相近的初始图像并剔除,从而避免了对拍摄角度相近的初始图像的重复处理,从而实现了帧筛选的工作过程。详细工作步骤可参阅步骤204的相关记载,此处不再赘述。
606、根据每个初始图像中的深度信息,分别获取每个初始图像对应的第一点云信息。
本实施例中,对深度信息进行反投影,得到目标物的三维点信息,即第一点云信息。具体步骤可参阅上述步骤102,此处不再赘述。
607、根据相机位姿将各个第一点云信息融合为第二点云信息。
本实施例中,执行点云融合的详细实现方式可参阅上述步骤305至307,此处不再赘述。当执行完点云融合后,所得到的第二点云信息如图8所示,第二点云信息中包括多个三维点801,其中,每个三维点801均为步骤307中所述的第二点。
608、对第二点云信息进行泊松重建,以得到目标物的三维网络。
本实施例中,本步骤的具体实现方式可参阅上述步骤508,此处不再赘述。请参阅图9,当完成泊松重建后,得到如图9所示的三维网络mash该三维网络中包括人脸部分901和背景部分902。
609、对三维网络进行剪裁和平滑处理,以得到三维模型。
本实施例中,本步骤的具体实施方式可参阅上述步骤509,此处不再赘述。其中,需要说明的是,在执行剪裁的过程中,根据特征点对三维网络沿垂直于镜头面的方向投影,得到第一投影面,之后在第一投影面中将特征点连接成凸包,获取凸包所在的区域为第二投影面。所得到的第二投影面如图10所示,图10中的1001即为人脸所在区域的第二投影面。 之后根据第二投影面对三维网络进行剪裁,以剔除非目标物的三维网络,即图9中的背景部分902,留下人脸部分901,得到如图11所示的只包含人脸1101的三维网络。最后对三维网络进行平滑处理,得到如图12所示的关于人脸1201的三维模型。
610、根据初始图像的色彩信息对三维模型进行纹理贴图。
本实施例中,上述步骤609所得到的三维模型为一个仅具备人脸外形的模型,是不包括色彩信息的,本步骤中,通过对初始图像所得到的色彩信息对所得到的三维模型进行纹理贴图,使得三维模型具备色彩信息,得到如图13所示的具有纹理色彩信息的人脸三维模型,该三维模型可以任意旋转,图13中示出的为该三维模型的正视图1301、右侧视图1302和左侧视图1303。其中,上述纹理贴图的具体实现方式为现有技术中任意一种实现方式,对此申请实施例不进行具体说明。
本实施例中,通过本申请实施例所提供的方法,基于图7中的初始图像,最终得到如图12所示的三维模型,经过纹理贴图后可得到图13所示的三维模型,从而实现了基于二维图像的三维模型重建。
本申请实施例所提供的目标物的三维模型构建方法,包括:获取目标物在不同拍摄角度上的至少两个初始图像;根据每个初始图像中的深度信息,分别获取所述每个初始图像对应的第一点云信息;将所述至少两个初始图像分别对应的第一点云信息融合为第二点云信息;根据所述第二点云信息构建所述目标物的三维模型。不需要建立额外的存储空间即可实现三维模型的建立过程,直接通过点云融合的方式来构建目标物的三维模型,实现了存储空间利用效率的最大化,使得终端能够高效地执行人脸重建的建模过程。
上述对本申请实施例提供的方案进行了介绍。可以理解的是,计算机设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
从硬件结构上来描述,上述方法可以由一个实体设备实现,也可以由多个实体设备共同实现,还可以是一个实体设备内的一个逻辑功能模块,本申请实施例对此不作具体限定。
例如,上述方法均可以通过图14中的计算机设备来实现。图14为本申请实施例提供的计算机设备的硬件结构示意图。该计算机设备包括至少一个处理器1401,通信线路1402, 存储器1403以及至少一个通信接口1404。
处理器1401可以是一个通用中央处理器(central processing unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,服务器IC),或一个或多个用于控制本申请方案程序执行的集成电路。
通信线路1402可包括一通路,在上述组件之间传送信息。
通信接口1404,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。
存储器1403可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically able programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过通信线路1402与处理器相连接。存储器也可以和处理器集成在一起。
其中,存储器1403用于存储执行本申请方案的计算机执行指令,并由处理器1401来控制执行。处理器1401用于执行存储器1403中存储的计算机执行指令,从而实现本申请上述实施例提供的方法。
可选的,本申请实施例中的计算机执行指令也可以称之为应用程序代码,本申请实施例对此不作具体限定。
在具体实现中,作为一种实施例,处理器1401可以包括一个或多个CPU,例如图14中的CPU0和CPU1。
在具体实现中,作为一种实施例,计算机设备可以包括多个处理器,例如图14中的处理器1401和处理器1407。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,计算机设备还可以包括输出设备1405和输入设备1406。输出设备1405和处理器1401通信,可以以多种方式来显示信息。例如,输出设备 1405可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备,阴极射线管(cathode ray tube,CRT)显示设备,或投影仪(projector)等。输入设备1406和处理器1401通信,可以以多种方式接收用户的输入。例如,输入设备1406可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的计算机设备可以是一个通用设备或者是一个专用设备。在具体实现中,计算机设备可以是台式机、便携式电脑、网络服务器、掌上电脑(personal digital assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或有图14中类似结构的设备。本申请实施例不限定计算机设备的类型。
本申请实施例可以根据上述方法示例对存储设备进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
比如,以采用集成的方式划分各个功能单元的情况下,图15示出了一种目标物的三维模型构建装置的示意图。
如图15所示,本申请实施例提供的目标物的三维模型构建装置,包括:
第一获取单元1501,该第一获取单元1501用于获取目标物在多个拍摄角度上的至少两个初始图像,该至少两个初始图像分别记录有该目标物的深度信息,该深度信息用于记录该目标物的多个点与参考位置之间的距离;
第二获取单元1502,该第二获取单元1502用于根据该第一获取单元1501获取的该至少两个初始图像中的深度信息,分别获取该至少两个初始图像对应的第一点云信息;
融合单元1503,该融合单元1503用于将该第二获取单元1502获取的该至少两个初始图像分别对应的第一点云信息融合为第二点云信息;
构建单元1504,该构建单元1504用于根据该融合单元1503得到的该第二点云信息构建该目标物的三维模型。
可选地,该装置还包括特征点检测单元1505,该特征点检测单元1505用于:
对该至少两个初始图像分别进行特征点检测,以在该至少两个初始图像中分别得到用于标记该目标物的至少两个特征点,所述至少两个特征点用于标识所述目标物的同一位置分别在所述至少两个初始图像中的特征点;
在该至少两个初始图像之间,获取该至少两个特征点间的偏移量,该偏移量用于表示该目标物的同一位置的特征点在不同初始图像之间的坐标差值;
根据该偏移量获取该至少两个初始图像中该目标物的相机位姿,该相机位姿用于表示在不同的初始图像中该目标物相对该参考位置的移动,该移动包括旋转和平移中的至少一种,该参考位置为拍摄该目标物的拍摄镜头所在的位置;
该融合单元1503,还用于:
根据该相机位姿将该至少两个初始图像分别对应的第一点云信息融合为该第二点云信息。
可选地,该融合单元1503还用于:
从所述至少两个初始图像中确定一个初始图像为第一帧;
根据该相机位姿将所述至少两个初始图像中除第一帧以外的其他初始图像的点移动到该第一帧的角度;
将不同的该初始图像之间重叠的第一点融合为第二点,其中,该第一点为该第一点云信息中的点,该第二点为该第二点云信息中的点。
可选地,该融合单元1503还用于:
分别对不同该初始图像中的第一点分配不同的权重;
根据该权重将重叠的该第一点融合为该第二点。
可选地,该融合单元1503还用于:
根据该第一点所在初始图像的拍摄角度、图像噪声值或法线方向中的至少一种对该第一点分配权重值。
可选地,该融合单元1503还用于:
当第一初始图像中存在两个重叠的第一点时,在该第一初始图像中获取与该第一帧中的第一点深度差的绝对值更小的第一点与该第一帧中的第一点进行点云融合,以得到该第二点,其中,该第一初始图像为该至少两个初始图像中不为该第一帧的图像。
可选地,该构建单元1504还用于:
对该第二点云信息进行泊松重建,以得到该目标物的三维网络,该三维网络为连接该第二点云信息中各个点的无孔洞表面;
对该三维网络进行剪裁和平滑处理,以得到该三维模型。
可选地,该构建单元1504还用于:
根据该特征点对该三维网络沿垂直于镜头面的方向投影,得到第一投影面;
在该第一投影面中将该特征点连接成凸包,获取凸包所在的区域为第二投影面;
根据该第二投影面对该三维网络进行剪裁,以剔除非该目标物的三维网络;
对剪裁后的该三维网络进行平滑处理,得到该三维模型。
可选地,该装置还包括筛选单元1506,该筛选单元1506用于:
剔除该至少两个初始图像中相似度大于预设值的图像。
另外,本申请实施例还提供了一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行上述实施例提供的方法。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例提供的方法。
有关本申请实施例提供的计算机存储介质中存储的程序的详细描述可参照上述实施例,在此不做赘述。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的核心思想或范围的情况下,在其它实施例中实现。因此,本申 请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (16)

  1. 一种目标物的三维模型构建方法,所述方法由终端设备执行,所述方法包括:
    获取目标物在多个拍摄角度上的至少两个初始图像,所述至少两个初始图像分别记录有所述目标物的深度信息,所述深度信息用于记录所述目标物的多个点与参考位置之间的距离;
    根据所述至少两个初始图像中的深度信息,分别获取所述至少两个初始图像对应的第一点云信息;
    将所述至少两个初始图像分别对应的第一点云信息融合为第二点云信息;
    根据所述第二点云信息构建所述目标物的三维模型。
  2. 根据权利要求1所述的方法,所述获取目标物在多个拍摄角度上的至少两个初始图像之后,还包括:
    对所述至少两个初始图像分别进行特征点检测,以在所述至少两个初始图像中分别得到用于标记所述目标物的至少两个特征点,所述至少两个特征点用于标识所述目标物的同一位置分别在所述至少两个初始图像中的特征点;
    在所述至少两个初始图像之间,获取所述至少两个特征点间的偏移量,所述偏移量用于标识所述目标物的同一位置的特征点在不同初始图像之间的坐标差值;
    根据所述偏移量获取所述至少两个初始图像中所述目标物的相机位姿,所述相机位姿用于表示在不同的初始图像中所述目标物相对所述参考位置的移动,所述移动包括旋转和平移中的至少一种,所述参考位置为拍摄所述目标物的拍摄镜头所在的位置;
    所述将所述至少两个初始图像分别对应的第一点云信息融合为第二点云信息,包括:
    根据所述相机位姿将所述至少两个初始图像分别对应的第一点云信息融合为所述第二点云信息。
  3. 根据权利要求2所述的方法,所述根据所述相机位姿将所述至少两个初始图像分别对应的第一点云信息融合为所述第二点云信息,包括:
    从所述至少两个初始图像中确定一个初始图像为第一帧;
    根据所述相机位姿将所述至少两个初始图像中除所述第一帧以外的其他初始图像的点移动到所述第一帧的角度;
    将所述至少两个初始图像之间重叠的第一点融合为第二点,其中,所述第一点为所述第一点云信息中的点,所述第二点为所述第二点云信息中的点。
  4. 根据权利要求3所述的方法,所述将所述初始图像之间重叠的第一点融合为第二点,包括:
    分别对所述至少两个初始图像中的第一点分配权重;
    根据所述权重将重叠的所述第一点融合为所述第二点。
  5. 根据权利要求4所述的方法,所述分别对所述至少两个初始图像中的第一点分配权重,包括:
    根据所述第一点所在初始图像的拍摄角度、图像噪声值或法线方向中的至少一种对所 述第一点分配权重值。
  6. 根据权利要求3所述的方法,所述将所述初始图像之间重叠的第一点融合为第二点,包括:
    当第一初始图像中存在两个重叠的第一点时,在所述第一初始图像中获取与所述第一帧中的第一点深度差的绝对值更小的第一点与所述第一帧中的第一点进行点云融合,以得到所述第二点,其中,所述第一初始图像为所述至少两个初始图像中不为所述第一帧的图像。
  7. 根据权利要求1至6任一所述的方法,所述根据所述第二点云信息构建所述目标物的三维模型,包括:
    对所述第二点云信息进行泊松重建,以得到所述目标物的三维网络,所述三维网络为连接所述第二点云信息中各个点的无孔洞表面;
    对所述三维网络进行剪裁和平滑处理,以得到所述三维模型。
  8. 根据权利要求7所述的方法,所述对所述三维网络进行剪裁和平滑处理,以得到所述三维模型,包括:
    根据所述特征点对所述三维网络沿垂直于镜头面的方向投影,得到第一投影面;
    在所述第一投影面中将所述特征点连接成凸包,获取凸包所在的区域为第二投影面;
    根据所述第二投影面对所述三维网络进行剪裁,以剔除非所述目标物的三维网络;
    对剪裁后的所述三维网络进行平滑处理,得到所述三维模型。
  9. 根据权利要求1至6任一所述的方法,所述根据所述至少两个初始图像中的深度信息,分别获取所述至少两个初始图像对应的第一点云信息之前,还包括:
    剔除所述至少两个初始图像中相似度大于预设值的图像。
  10. 一种目标物的三维模型构建装置,包括:
    第一获取单元,所述第一获取单元用于获取目标物在多个拍摄角度上的至少两个初始图像,所述至少两个初始图像分别记录有所述目标物的深度信息,所述深度信息用于记录所述目标物的多个点与参考位置之间的距离;
    第二获取单元,所述第二获取单元用于根据所述第一获取单元获取的所述至少两个初始图像中的深度信息,分别获取所述至少两个初始图像对应的第一点云信息;
    融合单元,所述融合单元用于将所述第二获取单元获取的所述至少两个初始图像分别对应的第一点云信息融合为第二点云信息;
    构建单元,所述构建单元用于根据所述融合单元得到的所述第二点云信息构建所述目标物的三维模型。
  11. 根据权利要求10所述的装置,所述装置还包括特征点检测单元,所述特征点检测单元用于:
    对所述至少两个初始图像分别进行特征点检测,以在所述至少两个初始图像中分别得到用于标记所述目标物的至少两个特征点,所述至少两个特征点用于标识所述目标物的同一位置分别在所述至少两个初始图像中的特征点;
    在所述至少两个初始图像之间,获取所述至少两个特征点间的偏移量,所述偏移量用 于标识所述目标物的同一位置的特征点在不同初始图像之间的坐标差值;
    根据所述偏移量获取所述至少两个初始图像中所述目标物的相机位姿,所述相机位姿用于表示在不同的初始图像中所述目标物相对所述参考位置的移动,所述移动包括旋转和平移中的至少一种,所述参考位置为拍摄所述目标物的拍摄镜头所在的位置;
    所述融合单元,还用于:
    根据所述相机位姿将所述至少两个初始图像分别对应的第一点云信息融合为所述第二点云信息。
  12. 根据权利要求11所述的装置,所述融合单元还用于:
    从所述至少两个初始图像中确定一个初始图像为第一帧;
    根据所述相机位姿将所述至少两个初始图像中除所述第一帧以外的其他初始图像的点移动到所述第一帧的角度;
    将所述至少两个初始图像之间重叠的第一点融合为第二点,其中,所述第一点为所述第一点云信息中的点,所述第二点为所述第二点云信息中的点。
  13. 根据权利要求12所述的装置,所述融合单元还用于:
    分别对所述至少两个所述初始图像中的第一点分配权重;
    根据所述权重将重叠的所述第一点融合为所述第二点。
  14. 一种计算机设备,所述计算机设备包括:交互装置、输入/输出(I/O)接口、处理器和存储器,所述存储器中存储有程序指令;
    所述交互装置用于获取用户输入的操作指令;
    所述处理器用于执行存储器中存储的程序指令,执行如权利要求1-9中任意一项所述的方法。
  15. 一种计算机可读存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行如权利要求1-9中任意一项所述的方法。
  16. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-9任意一项所述的方法。
PCT/CN2020/126341 2020-01-02 2020-11-04 一种目标物的三维模型构建方法和相关装置 WO2021135627A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/667,399 US12014461B2 (en) 2020-01-02 2022-02-08 Method for constructing three-dimensional model of target object and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010003052.X 2020-01-02
CN202010003052.XA CN111199579B (zh) 2020-01-02 2020-01-02 一种目标物的三维模型构建方法、装置、设备及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/667,399 Continuation US12014461B2 (en) 2020-01-02 2022-02-08 Method for constructing three-dimensional model of target object and related apparatus

Publications (1)

Publication Number Publication Date
WO2021135627A1 true WO2021135627A1 (zh) 2021-07-08

Family

ID=70745457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126341 WO2021135627A1 (zh) 2020-01-02 2020-11-04 一种目标物的三维模型构建方法和相关装置

Country Status (3)

Country Link
US (1) US12014461B2 (zh)
CN (1) CN111199579B (zh)
WO (1) WO2021135627A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272758A (zh) * 2023-11-20 2023-12-22 埃洛克航空科技(北京)有限公司 基于三角格网的深度估计方法、装置、计算机设备和介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199579B (zh) 2020-01-02 2023-01-24 腾讯科技(深圳)有限公司 一种目标物的三维模型构建方法、装置、设备及介质
CN114092898A (zh) * 2020-07-31 2022-02-25 华为技术有限公司 目标物的感知方法及装置
CN112215934B (zh) * 2020-10-23 2023-08-29 网易(杭州)网络有限公司 游戏模型的渲染方法、装置、存储介质及电子装置
CN112164143A (zh) * 2020-10-23 2021-01-01 广州小马慧行科技有限公司 三维模型的构建方法、构建装置、处理器和电子设备
CN112396117A (zh) * 2020-11-24 2021-02-23 维沃移动通信有限公司 图像的检测方法、装置及电子设备
CN112767541A (zh) * 2021-01-15 2021-05-07 浙江商汤科技开发有限公司 三维重建方法及装置、电子设备和存储介质
CN113643788B (zh) * 2021-07-15 2024-04-05 北京复数健康科技有限公司 基于多个图像获取设备确定特征点的方法及系统
CN115775024B (zh) * 2022-12-09 2024-04-16 支付宝(杭州)信息技术有限公司 虚拟形象模型训练方法及装置
CN116386100A (zh) * 2022-12-30 2023-07-04 深圳市宗匠科技有限公司 人脸图像采集方法及皮肤检测方法、装置、设备和介质
CN116503508B (zh) * 2023-06-26 2023-09-08 南昌航空大学 一种个性化模型构建方法、系统、计算机及可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050712A (zh) * 2013-03-15 2014-09-17 索尼公司 三维模型的建立方法和装置
US9984499B1 (en) * 2015-11-30 2018-05-29 Snap Inc. Image and point cloud based tracking and in augmented reality systems
CN108717728A (zh) * 2018-07-19 2018-10-30 安徽中科智链信息科技有限公司 一种基于多视角深度摄像机的三维重建装置及方法
CN109304866A (zh) * 2018-09-11 2019-02-05 魏帅 使用3d摄像头自助拍照打印3d人像的一体设备及方法
CN109377551A (zh) * 2018-10-16 2019-02-22 北京旷视科技有限公司 一种三维人脸重建方法、装置及其存储介质
CN109693387A (zh) * 2017-10-24 2019-04-30 三纬国际立体列印科技股份有限公司 基于点云数据的3d建模方法
CN111199579A (zh) * 2020-01-02 2020-05-26 腾讯科技(深圳)有限公司 一种目标物的三维模型构建方法、装置、设备及介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102647351B1 (ko) * 2017-01-26 2024-03-13 삼성전자주식회사 3차원의 포인트 클라우드를 이용한 모델링 방법 및 모델링 장치
CN107230225B (zh) * 2017-04-25 2020-06-09 华为技术有限公司 三维重建的方法和装置
TWI634515B (zh) * 2018-01-25 2018-09-01 廣達電腦股份有限公司 三維影像處理之裝置及方法
CN109242961B (zh) * 2018-09-26 2021-08-10 北京旷视科技有限公司 一种脸部建模方法、装置、电子设备和计算机可读介质
TWI677413B (zh) * 2018-11-20 2019-11-21 財團法人工業技術研究院 用於機械手臂系統之校正方法及裝置
CN109658449B (zh) * 2018-12-03 2020-07-10 华中科技大学 一种基于rgb-d图像的室内场景三维重建方法
CN110223387A (zh) * 2019-05-17 2019-09-10 武汉奥贝赛维数码科技有限公司 一种基于深度学习的三维模型重建技术
CN110363858B (zh) * 2019-06-18 2022-07-01 新拓三维技术(深圳)有限公司 一种三维人脸重建方法及系统
CN110458957B (zh) 2019-07-31 2023-03-10 浙江工业大学 一种基于神经网络的图像三维模型构建方法及装置
CN111160232B (zh) * 2019-12-25 2021-03-12 上海骏聿数码科技有限公司 正面人脸重建方法、装置及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050712A (zh) * 2013-03-15 2014-09-17 索尼公司 三维模型的建立方法和装置
US9984499B1 (en) * 2015-11-30 2018-05-29 Snap Inc. Image and point cloud based tracking and in augmented reality systems
CN109693387A (zh) * 2017-10-24 2019-04-30 三纬国际立体列印科技股份有限公司 基于点云数据的3d建模方法
CN108717728A (zh) * 2018-07-19 2018-10-30 安徽中科智链信息科技有限公司 一种基于多视角深度摄像机的三维重建装置及方法
CN109304866A (zh) * 2018-09-11 2019-02-05 魏帅 使用3d摄像头自助拍照打印3d人像的一体设备及方法
CN109377551A (zh) * 2018-10-16 2019-02-22 北京旷视科技有限公司 一种三维人脸重建方法、装置及其存储介质
CN111199579A (zh) * 2020-01-02 2020-05-26 腾讯科技(深圳)有限公司 一种目标物的三维模型构建方法、装置、设备及介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272758A (zh) * 2023-11-20 2023-12-22 埃洛克航空科技(北京)有限公司 基于三角格网的深度估计方法、装置、计算机设备和介质
CN117272758B (zh) * 2023-11-20 2024-03-15 埃洛克航空科技(北京)有限公司 基于三角格网的深度估计方法、装置、计算机设备和介质

Also Published As

Publication number Publication date
CN111199579A (zh) 2020-05-26
CN111199579B (zh) 2023-01-24
US20220165031A1 (en) 2022-05-26
US12014461B2 (en) 2024-06-18

Similar Documents

Publication Publication Date Title
WO2021135627A1 (zh) 一种目标物的三维模型构建方法和相关装置
JP7249390B2 (ja) 単眼カメラを用いたリアルタイム3d捕捉およびライブフィードバックのための方法およびシステム
KR102523512B1 (ko) 얼굴 모델의 생성
US11423556B2 (en) Methods and systems to modify two dimensional facial images in a video to generate, in real-time, facial images that appear three dimensional
Fried et al. Perspective-aware manipulation of portrait photos
CN103140879B (zh) 信息呈现装置、数字照相机、头戴式显示器、投影仪、信息呈现方法和信息呈现程序
CN107506714A (zh) 一种人脸图像重光照的方法
RU2586566C1 (ru) Способ отображения объекта
JP2019534510A (ja) 表面モデル化システムおよび方法
WO2020133862A1 (zh) 游戏角色模型的生成方法、装置、处理器及终端
WO2019196745A1 (zh) 人脸建模方法及相关产品
CN111652123B (zh) 图像处理和图像合成方法、装置和存储介质
US20220319231A1 (en) Facial synthesis for head turns in augmented reality content
CN112513875A (zh) 眼部纹理修复
CN112766027A (zh) 图像处理方法、装置、设备及存储介质
US10559116B2 (en) Interactive caricature generation from a digital image
CN115803783A (zh) 从2d图像重建3d对象模型
JP6852224B2 (ja) 全視角方向の球体ライトフィールドレンダリング方法
CN110895823A (zh) 一种三维模型的纹理获取方法、装置、设备及介质
CN111460937B (zh) 脸部特征点的定位方法、装置、终端设备及存储介质
US10621788B1 (en) Reconstructing three-dimensional (3D) human body model based on depth points-to-3D human body model surface distance
CN113240811B (zh) 三维人脸模型创建方法、系统、设备及存储介质
Vanakittistien et al. Game‐ready 3D hair model from a small set of images
CN113538655B (zh) 一种虚拟人脸的生成方法及设备
Noh et al. Retouch transfer for 3D printed face replica with automatic alignment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20908693

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20908693

Country of ref document: EP

Kind code of ref document: A1