CN109584362B

CN109584362B - Three-dimensional model construction method and device, electronic equipment and storage medium

Info

Publication number: CN109584362B
Application number: CN201811536471.9A
Authority: CN
Inventors: 朴镜潭; 王权; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2023-03-21
Anticipated expiration: 2038-12-14
Also published as: CN109584362A

Abstract

The present disclosure relates to a three-dimensional model construction method and apparatus, an electronic device, and a storage medium, the method including: acquiring first attitude data of a target object in a depth image according to a plurality of frames of depth images and a plurality of frames of color images of the target object; determining a target area of a second key frame in the depth image according to a first target point in a first key frame of the color image; and constructing a three-dimensional model of the target object according to the target area of the second key frame and the first posture data in the depth image. According to the three-dimensional model building method, the first posture data is determined through the color image and the depth image, the target area of the second key frame is determined through the first target point in the first key frame, the requirement on the accuracy of the image sensor is reduced, further, the three-dimensional model is built for the target area, the processing on the background area is omitted, the performance requirement on a processor can be reduced, and the processing efficiency is improved.

Description

Three-dimensional model construction method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for building a three-dimensional model, an electronic device, and a storage medium.

Background

In the related art, data fusion can be performed by using multi-frame color images and multi-frame depth images to construct a three-dimensional model for a target object such as a human face, but the method for constructing the three-dimensional model in the related art has abundant image information and has higher requirements on the performance of a processor and the precision of an image sensor, so that the method for constructing the three-dimensional model in the related art has poor universality and has poor modeling effect in a processor with limited computing capacity or a sensor with low precision.

Disclosure of Invention

The disclosure provides a three-dimensional model construction method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a three-dimensional model construction method including:

acquiring first attitude data of a target object in a multi-frame depth image according to the multi-frame depth image of the target object and multi-frame color images respectively corresponding to the multi-frame depth image;

respectively determining target areas of a plurality of second key frames in the multi-frame depth image according to first target points in the plurality of first key frames of the multi-frame color image, wherein the plurality of second key frames respectively correspond to the plurality of first key frames, and the first target points are located in contour areas of target objects of the plurality of first key frames;

and constructing a three-dimensional model of the target object according to the target areas of the second key frames and the first posture data in the multi-frame depth images.

According to the three-dimensional model construction method disclosed by the embodiment of the application, the first posture data of the target object is determined through the color image and the depth image, the target area of the second key frame of the depth image is determined through the first target point in the first key frame of the color image, the requirement on the accuracy of the image sensor is reduced, further, the three-dimensional model of the target object is constructed according to the target area and the first posture data, the three-dimensional model is constructed aiming at the target area, the processing on the background area is omitted, the performance requirement on a processor can be reduced, the processing efficiency is improved, and the universality of the three-dimensional model construction method is improved.

In one possible implementation manner, obtaining first pose data of a target object in a multi-frame depth image according to the multi-frame depth image of the target object and multi-frame color images respectively corresponding to the multi-frame depth image includes:

obtaining target depth data according to a target depth image and a target color image, wherein the target depth image is any one frame of a multi-frame depth image, the target color image is a color image corresponding to the target depth image, and the target depth data comprises coordinate data of pixel points of a target object;

and determining first posture data of the target object in the target depth image according to a second target point in the target color image and the target depth data, wherein the second target point is positioned in a contour area of the target object of the target color image.

In one possible implementation, determining first pose data of the target object in the target depth image according to the second target point in the target color image and the target depth data includes:

determining pose estimation data of the target object in the target color image according to a second target point in the target color image;

determining depth information of a second target point in the target color image according to the second target point and the target depth data;

and determining first posture data of the target object in the target depth image according to the depth information of the second target point and the posture estimation data of the target object in the target color image.

In this way, the attitude estimation data of the target object can be determined through the second target point in the target color image, compared with the method of performing attitude estimation only by using the color image in the related art, the requirement on the precision of the image sensor can be reduced, the precision of the first attitude data is improved, furthermore, the attitude estimation data is combined with the target depth data to determine the first attitude data, the depth information of the second target point can be determined through the target depth data, the deficiency of the depth information in the attitude estimation data is made up, and more accurate first attitude data is obtained.

In one possible implementation, constructing a three-dimensional model of the target object according to the target areas of the second keyframes and the first pose data in the multi-frame depth image includes:

according to the first attitude data in the multi-frame depth image, second attitude data of the target object in the plurality of second key frames is obtained;

and obtaining a three-dimensional model of the target object according to the second attitude data in the plurality of second key frames and the target areas of the plurality of second key frames.

In one possible implementation, obtaining a three-dimensional model of the target object according to the second pose data in the second keyframes and the target areas of the second keyframes includes:

performing point cloud registration on target areas of a plurality of second key frames to obtain three-dimensional depth information of the target object;

determining three-dimensional position information of the target object according to second pose data in the plurality of second keyframes;

and determining a three-dimensional model of the target object according to the three-dimensional position information and the three-dimensional depth information.

By the method, point cloud registration can be performed on the target areas of the second key frames, so that background areas in the second key frames are omitted, processing resources are saved, processing efficiency is improved, and performance requirements on a processor are reduced. The three-dimensional position information determined from the second pose data may reduce the accumulated error and improve the accuracy of the three-dimensional position information of the target object. Furthermore, a three-dimensional model of the target object can be obtained by utilizing three-dimensional depth information and three-dimensional position information obtained by point cloud registration, so that the interference of a background area is reduced, and the accuracy of the constructed three-dimensional model is improved.

In one possible implementation manner, obtaining second pose data of the target object in the plurality of second keyframes according to the first pose data in the multi-frame depth image includes:

registering first attitude data in a multi-frame depth image to obtain three-dimensional attitude data;

and registering the first attitude data in the second keyframe according to the three-dimensional attitude data to obtain the second attitude data.

In one possible implementation, the method further includes:

and obtaining a color three-dimensional model of the target object according to the three-dimensional model of the target object and the multi-frame color images.

According to another aspect of the present disclosure, there is provided a three-dimensional model building apparatus including:

the acquisition module is used for acquiring first attitude data of a target object in a multi-frame depth image according to the multi-frame depth image of the target object and multi-frame color images respectively corresponding to the multi-frame depth image;

a determining module, configured to determine target regions of multiple second key frames in the multiple-frame depth image according to first target points in multiple first key frames of the multiple-frame color image, where the multiple second key frames correspond to the multiple first key frames, respectively, and the first target points are located in contour regions of target objects of the multiple first key frames;

and the building module is used for building a three-dimensional model of the target object according to the target areas of the second key frames and the first posture data in the multi-frame depth images.

In one possible implementation, the obtaining module is further configured to:

In one possible implementation, the building module is further configured to:

and obtaining a three-dimensional model of the target object according to the second attitude data in the second key frames and the target areas of the second key frames.

In one possible implementation, the construction module is further configured to:

determining three-dimensional position information of the target object according to second attitude data in the plurality of second key frames;

In one possible implementation, the building module is further configured to:

In one possible implementation, the apparatus further includes:

and the color three-dimensional model obtaining module is used for obtaining a color three-dimensional model of the target object according to the three-dimensional model of the target object and the multi-frame color images.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the three-dimensional model building method is executed.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described three-dimensional model construction method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a three-dimensional model building method according to an embodiment of the disclosure;

FIG. 2 shows a flow diagram of a three-dimensional model building method according to an embodiment of the present disclosure;

FIGS. 3A and 3B show application diagrams of a three-dimensional model building method according to an embodiment of the disclosure;

FIG. 4 shows a block diagram of a three-dimensional model building apparatus according to an embodiment of the present disclosure;

FIG. 5 shows a block diagram of a three-dimensional model building apparatus according to an embodiment of the present disclosure

FIG. 6 shows a block diagram of an electronic device according to an embodiment of the disclosure;

fig. 7 shows a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a three-dimensional model building method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes:

in step S11, obtaining first pose data of a target object in a multi-frame depth image according to the multi-frame depth image of the target object and multi-frame color images respectively corresponding to the multi-frame depth image;

in step S12, respectively determining target regions of a plurality of second key frames in the multi-frame depth image according to first target points in a plurality of first key frames of the multi-frame color image, where the plurality of second key frames respectively correspond to the plurality of first key frames, and the first target points are located in contour regions of target objects of the plurality of first key frames;

in step S13, a three-dimensional model of the target object is constructed according to the target areas of the plurality of second keyframes and the first pose data in the depth images of the plurality of frames.

According to the three-dimensional model building method disclosed by the embodiment of the application, the first posture data of the target object is determined through the color image and the depth image, the target area of the second key frame of the depth image is determined through the first target point in the first key frame of the color image, the requirement on the precision of the image sensor is reduced, further, the three-dimensional model of the target object is built according to the target area and the first posture data, the three-dimensional model is built aiming at the target area, the processing on the background area is omitted, the performance requirement on a processor can be reduced, the processing efficiency is improved, and the universality of the three-dimensional model building method is improved.

In a possible implementation manner, the three-dimensional model building method may be performed by a terminal device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer-readable instruction stored in a memory. Alternatively, the method may be performed by a server by acquiring a color image (including RGB information of each pixel point) and a depth image (including depth information of each pixel point) by a terminal device or an image acquisition device (e.g., a camera, etc.), and transmitting the color image and the depth image to the server.

In one possible implementation manner, a plurality of frames of depth images of the target object and a plurality of frames of color images respectively corresponding to the plurality of frames of depth images may be acquired by an image acquisition device (e.g., a camera, etc.). In an example, the image capture device may have both a depth image sensor and a color image sensor, and the depth image and the color image of the target object may be obtained simultaneously by the same camera. In an example, the depth image and the color image of multiple angles and/or multiple distances of the target object may be acquired using an image acquisition device, for example, the image acquisition may be performed around the target object using the image acquisition device, and the like, and the present disclosure does not limit the manner of image acquisition. In an example, the target object may include a three-dimensional object having a characteristic shape such as a human face, and the present disclosure does not limit the type of the target object.

In one possible implementation manner, in step S11, obtaining first pose data of the target object in the multi-frame depth images according to the multi-frame depth images of the target object and multi-frame color images respectively corresponding to the multi-frame depth images may include: obtaining target depth data according to a target depth image and a target color image, wherein the target depth image is any one frame of a multi-frame depth image, the target color image is a color image corresponding to the target depth image, and the target depth data comprises coordinate data of a target object; and determining first posture data of the target object in the target depth image according to a second target point in the target color image and the target depth data, wherein the second target point is positioned in a contour area of the target object of the target color image.

In a possible implementation manner, the target depth image is any one frame of a multi-frame depth image, and the target color image is a color image corresponding to the target depth image, that is, the target depth image and the target color image are images obtained by an image acquisition device for the same target object at the same time. The correspondence between the target depth image and the target color image can be determined by internal parameters (e.g., focal length, optical center position, lens distortion, etc.) of the image acquisition device, and pixel points in the target depth image and the target color image can be corresponded by the internal parameters of the image acquisition device, so that depth information corresponding to each pixel point in the target color image, that is, target depth data, can be obtained. Since the gray scale of the pixel point in the target depth image represents the distance between the pixel point and the camera of the image acquisition device, the coordinate data (i.e., three-dimensional coordinate data) of each pixel point, including the three-dimensional coordinate data of the pixel point of the target object, can be obtained through the internal parameters of the image acquisition device, the target depth image, and the target color image.

In one possible implementation, the first pose data of the target object in the target depth image may be determined according to a second target point in the target color image and the target depth data. In an example, the second target point in the target color image is in a contour region of the target object, the second target point may be a key point capable of representing a contour and a shape of the target object, in an example, the target object may be a human face, the second target point may be a human face key point representing a contour of the human face and shape information such as five sense organs, and the position of the human face key point may be determined by a method such as human face key point recognition through a convolutional neural network. The method for determining the key points of the human face is not limited by the disclosure.

In one possible implementation, determining first pose data of the target object in the target depth image according to the second target point in the target color image and the target depth data includes: determining pose estimation data of the target object in the target color image according to a second target point in the target color image; determining depth information of a second target point in the target color image according to the second target point and the target depth data; and determining first posture data of the target object in the target depth image according to the depth information of the second target point and the posture estimation data of the target object in the target color image.

In one possible implementation, the position of the second target point in the target color image may be determined, and pose estimation data of the target object in the target color image may be determined based on the second target point. The pose estimation data may be data representing a position at which the target object is located and an angle of the target object with respect to a standard position. In an example, the target object may be a human face, a front face of the human face may be taken as a standard position, and a relative position and an angle of the human face with respect to the front face in the target color image may be represented by the pose estimation data. For example, a certain face key point may be selected in the second target point (face key point), and a relative position and angle between the face key point and the face key point in the front face may be determined as pose estimation data, for example, the pose estimation data may be represented by a six-dimensional vector (x, y, z, α, β, γ), wherein three dimensions (x, y, z) of the six-dimensional vector ₁ ) Representing relative position, three dimensionsDegrees (α, β, γ) represent relative angles.

In an example, the pose estimation data of the target object may be obtained according to the second target point, for example, the pose estimation data of the target object may be determined according to the position of the second target point, the distance between a plurality of second target points, and the like, and in an example, the pose estimation data of the target object may be obtained according to the second target point using a solvePnP function, and the present disclosure does not limit a method of obtaining the pose estimation data of the target object according to the second target point. In a target color image, since the color image cannot provide depth information of pixel points, the depth information needs to be estimated through characteristics such as edges, shadows and the like in the color image, the estimation method has high requirements on the accuracy of an image sensor, and the accuracy of an estimated value is low, for example, z is ₁ The value of (a) is difficult to determine accurately.

In one possible implementation, the depth information of the second target point in the target color image may be determined based on the second target point and the target depth data. The target depth data includes depth information of each pixel point in the color image, and depth information of a second target point in the target color image can be determined according to the target depth data, or the target depth data includes three-dimensional coordinate data of each pixel point in the color image, and the position of the second target point in the three-dimensional coordinate can be determined, that is, the depth information of the second target point can be determined. In an example, the target object is a face, the second target point is a face key point, and depth information of the face key point can be determined through the target depth data.

In one possible implementation, the first pose data of the target object in the target depth image may be determined according to the depth information of the second target point and the pose estimation data of the target object in the target color image. In the pose estimation data, since the color image cannot provide depth information of the pixel point, the depth information of the second target point is difficult to determine, and therefore, the depth information of the second target point and the pose estimation data can be combined to determine the first pose data. In an example, the pose estimation data is based on face key pointsSix-dimensional vector (x, y, z) ₁ α, β, γ) in ₁ The estimated value of the face key point is difficult to determine, the depth of the face key point can be determined according to the depth information of the face key point, namely the relative position between the selected face key point and the face key point in the front face can be determined, and further z can be determined ₁ Thereby obtaining a six-dimensional vector (x, y, z, α, β, γ) of the first pose data, wherein six parameters are accurately determined in the six-dimensional vector of the first pose data. In this way, the first pose data of the target object in each frame of the depth image can be obtained.

In one possible implementation, a first key frame may be determined in a multi-frame color image, and a second key frame corresponding to the first key frame may be determined in a multi-frame depth image. In an example, the color image and the depth image are acquired at different angles and/or distances around the target object using the image acquisition device, for example, the acquisition is performed around the target object, an angle threshold may be set during the acquisition of the color image and the depth image around the target object, one first key frame and one second key frame are selected per rotation angle threshold, for example, the angle threshold is 2 °, and one first key frame and one second key frame are selected per rotation 2 °. The present disclosure does not limit the angle threshold.

In a possible implementation manner, in step S12, the target areas in the plurality of second key frames may be determined according to the first target points in the plurality of first key frames, respectively. The first target point may be a key point capable of representing a contour and a shape of a target object, in an example, the target object is a face, the first target point may be a face key point representing shape information such as a contour and five sense organs of the face, and a position of the face key point may be determined by a method such as face key point recognition through a convolutional neural network.

In one possible implementation, a contour region of a target object may be determined in a first key frame of the color image according to the first target point, for example, the target object is a human face, and the contour region of the human face may be determined according to a human face key point. Further, a corresponding target region in the second key frame may be determined by a contour region in the first key frame, in an example, pixel points in the first key frame and the second key frame are corresponding to each other, and a position of a pixel point in the target region may be determined by a position of a pixel point in the contour region.

In one possible implementation manner, in step S13, obtaining a three-dimensional model of the target object according to the target areas of the second keyframes and the first pose data in the multi-frame depth image includes: according to the first attitude data in the multi-frame depth image, second attitude data of the target object in the plurality of second key frames is obtained; and obtaining a three-dimensional model of the target object according to the second attitude data in the second key frames and the target areas of the second key frames.

In one possible implementation manner, second posture data of the target object in the second key frame can be obtained according to the first posture data in the multi-frame depth image. The first attitude data of the target object can be determined by each frame of depth image, the first attitude data in the multiple frames of depth images can be fused, errors can be generated in the fusion process due to the fact that the first attitude data of different depth images possibly have deviation, and if all the first attitude data are directly fused, the accumulated errors are possibly large, and the attitude of the target object cannot be accurately represented. In an example, in the process of acquiring a color image and a depth image at different angles and/or distances around a target object by using an image acquisition device, one first key frame and one second key frame may be selected at intervals of an angle threshold, first pose data in the depth image between two adjacent second key frames may be fused, and calibration may be performed by using the second key frames to eliminate accumulated errors of the fused first pose data, and second pose data in a plurality of second key frames may be acquired.

In one possible implementation manner, obtaining second pose data of the target object in the plurality of second keyframes according to the first pose data in the multi-frame depth image includes: registering first attitude data in a multi-frame depth image to obtain three-dimensional attitude data; and registering the first attitude data in the second keyframe according to the three-dimensional attitude data to obtain the second attitude data.

In a possible implementation manner, the first pose data in the depth images between two adjacent second keyframes may be registered to register the first pose data in the multiple depth images into the same coordinate system, so as to obtain three-dimensional pose data in the coordinate system, and during the registration process, due to shooting angles, coordinate selection and the like, errors in the registration of the first pose data of the multiple depth images into the same coordinate system may be accumulated, that is, accumulated errors may be generated. The three-dimensional pose data registered to the coordinate system may be registered with the first pose data in the second keyframe to eliminate accumulated errors, and the registered first pose data in the second keyframe may be determined as second pose data.

In one possible implementation, obtaining a three-dimensional model of the target object according to the second pose data in the second keyframes and the target areas of the second keyframes includes: performing point cloud registration on target areas of a plurality of second key frames to obtain three-dimensional depth information of the target object; determining three-dimensional position information of the target object according to second attitude data in the plurality of second key frames; and determining a three-dimensional model of the target object according to the three-dimensional position information and the three-dimensional depth information.

In a possible implementation manner, a pixel point in a target region of a second key frame of the depth image includes depth information of a target object, and angles and/or positions of the plurality of second key frames are different from each other, so that a coordinate system in which the target object is located in the plurality of second key frames is different, for example, the target object is a human face, and in the second key frame obtained at the angle of the front face, the direction of the front face is parallel to any coordinate axis of the coordinate system of the second key frame, for example, parallel to the direction of the x axis of the coordinate system of the second key frame. In the second key frame obtained by the angle of the side face, the direction of the side face is parallel to the direction of the x axis of the second key frame, and the direction of the front face is perpendicular to the direction of the x axis. In any two second keyframes, the coordinate systems are different from each other. Therefore, it is necessary to register the target regions in multiple second keyframes into the same coordinate system by means of point cloud registration, for example, the coordinate system of the second keyframes acquired at an angle of the frontal face can be registered.

In one possible implementation, the three-dimensional position information of the target object may be determined from the second pose data in the plurality of second keyframes. The second pose data is pose data for eliminating accumulated errors, and the second pose data may be fused to obtain three-dimensional position information of the target object, for example, the second pose data in the plurality of second keyframes are six-dimensional vectors, and the pose data of the target object with respect to the standard position, that is, the three-dimensional position information of the target object, is obtained according to the six-dimensional vectors of the plurality of second keyframes. In an example, the target object is a human face, the face of the human face may be used as a standard position, the second pose data in the plurality of second keyframes may be fused, in an example, a coordinate transformation matrix between a coordinate system of any one of the second keyframes and a coordinate system of the second keyframe acquired at an angle of the face may be obtained, and the second pose data (i.e., a six-dimensional vector) in each of the second keyframes may be coordinate-transformed by the coordinate transformation matrix to obtain a six-dimensional vector in the coordinate system of the second keyframe acquired at the angle of the face of the second pose data in each of the second keyframes, wherein each six-dimensional vector may represent a position of a plane in which the captured target object is located in each of the second keyframes acquired at the angle of the face (e.g., after the six-dimensional vector of the second keyframe acquired at the angle of the side face is coordinate-transformed, the position of the side face in the coordinate system of the second keyframe acquired at the angle of the face may represent a position of the side face in the coordinate system of the second keyframe acquired at the angle of the face), and the coordinate system of the second keyframe acquired at the face information of the face may be obtained by the coordinate transformation matrix of the face of the second keyframe acquired at the face of the face.

In one possible implementation, a three-dimensional model of the target object may be determined according to the three-dimensional position information and the three-dimensional depth information. The three-dimensional depth information after point cloud registration is the scattered and disordered point cloud data of the coordinate system, and the surface of the three-dimensional model is determined according to the three-dimensional position information and the three-dimensional depth information.

In an example, a grid may be constructed in a coordinate space of a coordinate system, the grid dividing the coordinate space into a plurality of cubes, the surfaces of the three-dimensional model of the target object being constructed by assigning distance field values to the cubes. The distance field value of the cube can be determined through three-dimensional position information and three-dimensional depth information, wherein the three-dimensional position information can determine the position and the angle of the cube, the three-dimensional depth information can determine the depth information of the cube, the distance field value of the cube can be determined, when the distance field value is larger than 0, the cube is shown to be on the outer side of the surface of the three-dimensional model, when the distance field value is smaller than 0, the cube is shown to be on the inner side of the surface of the three-dimensional model, therefore, all cubes with the distance field value equal to 0 can be determined, the volume of the cube can be gradually reduced through an approximation method, the specific position of the surface of the three-dimensional model is determined, namely, the surface of the three-dimensional model is contained in the cube with the distance field value equal to 0, and the volume of the cube can be gradually reduced, and the position of the surface of the three-dimensional model can be determined. Further, for each vertex of the cube, a normal vector of a face composed of vertices adjacent to the point may be determined as a normal vector of the surface, so that an orientation of the surface of the three-dimensional model, that is, a direction of the normal vector may be determined according to the normal vector of the surface. In an example, a three-dimensional model of the target object may be constructed using a TSDF (Truncated Signed Distance Field) algorithm, an MC (Marching Cube) algorithm, or the like. The present disclosure does not limit the method used to determine the surface of the three-dimensional model.

FIG. 2 shows a flow chart of a three-dimensional model building method according to an embodiment of the present disclosure. As shown in fig. 2, the method further comprises:

in step S14, a color three-dimensional model of the target object is obtained according to the three-dimensional model of the target object and the multi-frame color image.

In one possible implementation, the color three-dimensional model of the target object may be obtained by adding colors to the three-dimensional model according to RGB information in the color image. In an example, the three-dimensional model is obtained by performing point cloud registration on a second key frame in the depth image, each point on the surface of the three-dimensional model is obtained by a pixel point (i.e., depth information) in the depth image, and meanwhile, the pixel point in the depth image corresponds to the pixel point in the color image, so that RGB information of the pixel point in the color image can be added to the surface of the three-dimensional model to obtain a color three-dimensional model of the target object.

According to the three-dimensional model construction method disclosed by the embodiment of the disclosure, the attitude estimation data of the target object is determined through the second target point in the target color image, the requirement on the precision of the image sensor can be reduced, the depth information of the second target point is determined through the target depth data, the deficiency of the depth information in the attitude estimation data is made up, and the more accurate first attitude data is obtained. And point cloud registration is carried out on the target areas of the plurality of second key frames, so that a background area in the second key frames is omitted, processing resources are saved, processing efficiency is improved, and the performance requirement on a processor is reduced. And obtaining second attitude data of the second keyframe according to the first attitude data, and determining three-dimensional position information according to the second attitude data, so that the accumulated error can be reduced, and the accuracy of the three-dimensional position information of the target object can be improved. Furthermore, a three-dimensional model of the target object can be obtained by utilizing three-dimensional depth information and three-dimensional position information obtained by point cloud registration, so that the interference of a background area is reduced, and the accuracy of the constructed three-dimensional model is improved.

Fig. 3A and 3B show application diagrams of a three-dimensional model construction method according to an embodiment of the present disclosure. As shown in fig. 3A, the target object may be a human face, and a plurality of frames of depth images of the human face and a plurality of frames of color images respectively corresponding to the plurality of frames of depth images may be acquired by an image capturing apparatus (e.g., a camera or the like) having both a depth image sensor and a color image sensor. For example, image acquisition may be performed around a face image using an image acquisition device, for example, at various angles of the face of a person.

In one possible implementation manner, the first pose data in the multiple frames of depth images can be obtained according to the multiple frames of depth images and the multiple frames of color images shot around the human face. In an example, the target depth image is any one of a plurality of frames of depth images, the target color image is a color image corresponding to the target depth image, and the correspondence between the target depth image and the target color image can be determined according to internal parameters (e.g., focal length, optical center position, lens distortion, etc.) of the image acquisition device, so that depth information of each pixel point in the target color image, that is, target depth data, can be obtained.

In one possible implementation, the face key points in the target color image may be obtained, and the pose estimation data of the face is obtained according to the second target point using the solvePnP function. And the depth information of the key points of the face in the target color image can be obtained through the target depth data, so that the depth information of the key points of the face makes up the deficiency of the depth information in the attitude estimation data, and the first attitude data of the face in the target depth image is obtained. In this way, the first pose data of the face in each frame of the depth image can be obtained.

In one possible implementation, a first key frame may be determined in a multi-frame color image, and a second key frame corresponding to the first key frame may be determined in a multi-frame depth image. For example, an angle threshold (e.g., 2 °) may be set, with one first key frame and one second key frame being selected per rotation angle threshold.

In a possible implementation manner, the contour region of the face can be determined in the first key frame of the color image according to the key points of the face, and the target region in the second key frame can be determined according to the corresponding relationship between the pixel points in the first key frame and the pixel points in the second key frame, so that only the target region can be processed, the background region is omitted, the processing resources are saved, and the processing efficiency is improved.

In one possible implementation manner, second pose data of the face in the second key frame can be obtained according to the first pose data in the multi-frame depth image. For example, the first pose data in the depth image between two adjacent second keyframes may be fused and calibrated using the second keyframes to eliminate the accumulated error of the fused first pose data, and obtain the second pose data in the plurality of second keyframes.

In one possible implementation, point cloud registration may be performed on target regions of multiple second keyframes to obtain three-dimensional depth information of the human face, that is, the target regions in the multiple second keyframes are registered in the same coordinate system, in an example, a coordinate system (e.g., XYZ coordinate system in fig. 3B) of the second keyframes acquired at an angle of the frontal face.

In one possible implementation, the front face of the human face may be used as a standard position, and the second pose data in the two keyframes are fused, so as to obtain three-dimensional position information of the human face, for example, the three-dimensional position information of the human face in the XYZ coordinate system of fig. 3B.

In a possible implementation manner, the three-dimensional depth information after point cloud registration is still point cloud data of scattered disorder of a coordinate system, and a three-dimensional model needs to be determined according to the three-dimensional position information and the three-dimensional depth information, for example, a three-dimensional model of a human face (such as the three-dimensional model of the human face in fig. 3B) can be constructed by using a TSDF algorithm.

In one possible implementation, the color three-dimensional model of the target object may be obtained by adding colors to the three-dimensional model according to RGB information in the color image.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

In addition, the present disclosure also provides a three-dimensional model building apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the three-dimensional model building methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Fig. 4 shows a block diagram of a three-dimensional model building apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus includes:

an obtaining module 11, configured to obtain first pose data of a target object in a multi-frame depth image according to the multi-frame depth image of the target object and multi-frame color images respectively corresponding to the multi-frame depth image;

a determining module 12, configured to determine target regions of a plurality of second key frames in the multi-frame depth image according to first target points in a plurality of first key frames of the multi-frame color image, where the plurality of second key frames correspond to the plurality of first key frames, respectively, and the first target points are located in contour regions of target objects of the plurality of first key frames;

and the building module 13 is configured to build a three-dimensional model of the target object according to the target areas of the plurality of second key frames and the first pose data in the plurality of frames of depth images.

In one possible implementation, the obtaining module 11 is further configured to:

In one possible implementation, the building module 13 is further configured to:

Fig. 5 shows a block diagram of a three-dimensional model building apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus further comprises:

and a color three-dimensional model obtaining module 14, configured to obtain a color three-dimensional model of the target object according to the three-dimensional model of the target object and the multi-frame color images.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as a memory 1932, is also provided that includes computer program instructions executable by a processing component 1922 of an electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A three-dimensional model construction method is characterized by comprising the following steps:

constructing a three-dimensional model of the target object according to the target areas of the second key frames and the first attitude data in the multi-frame depth images;

the method for acquiring the first posture data of the target object in the multi-frame depth image according to the multi-frame depth image of the target object and the multi-frame color images respectively corresponding to the multi-frame depth image comprises the following steps:

determining pose estimation data of the target object in a target color image according to a second target point in the target color image, wherein the target color image is a color image acquired simultaneously with a target depth image, the target depth image is any one frame in the multi-frame depth images, and the second target point is a key point representing the outline and the shape of the target object;

the pose estimation data comprises: a z-coordinate of the second target point in a z-axis direction;

determining depth information of a second target point in the target color image according to the second target point and target depth data, wherein the target depth data comprises: depth information of each pixel point in the color image;

and determining first posture data of the target object in the target depth image according to the depth information of the second target point and the posture estimation data of the target object in the target color image, wherein the first posture data is the posture estimation data of the z coordinate after the depth information of the second target point is estimated again.

2. The method of claim 1, wherein constructing the three-dimensional model of the target object based on the target regions of the plurality of second keyframes and the first pose data in the plurality of frames of depth images comprises:

3. The method of claim 2, wherein obtaining the three-dimensional model of the target object based on the second pose data in the second plurality of keyframes and the target regions of the second plurality of keyframes comprises:

4. The method of claim 2, wherein obtaining second pose data of the target object in the plurality of second keyframes according to the first pose data in the multi-frame depth image comprises:

registering first attitude data in multi-frame depth images to obtain three-dimensional attitude data;

5. The method of claim 1, further comprising:

6. A three-dimensional model building apparatus, comprising:

the building module is used for building a three-dimensional model of the target object according to the target areas of the second key frames and the first posture data in the multi-frame depth images;

the obtaining module is further configured to:

the attitude estimation data includes: a z-coordinate of the second target point in a z-axis direction;

and determining first attitude data of the target object in the target depth image according to the depth information of the second target point and the attitude estimation data of the target object in the target color image, wherein the first attitude data is the attitude estimation data of the z coordinate after the depth information of the second target point is estimated again.

7. The apparatus of claim 6, wherein the construction module is further configured to:

8. The apparatus of claim 7, wherein the construction module is further configured to:

9. The apparatus of claim 7, wherein the construction module is further configured to:

10. The apparatus of claim 6, further comprising:

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the method of any one of claims 1 to 5.

12. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 5.