WO2023078135A1 - 三维建模方法和装置、计算机可读存储介质及计算机设备 - Google Patents

三维建模方法和装置、计算机可读存储介质及计算机设备 Download PDF

Info

Publication number
WO2023078135A1
WO2023078135A1 PCT/CN2022/127595 CN2022127595W WO2023078135A1 WO 2023078135 A1 WO2023078135 A1 WO 2023078135A1 CN 2022127595 W CN2022127595 W CN 2022127595W WO 2023078135 A1 WO2023078135 A1 WO 2023078135A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
information
target object
key
distance
Prior art date
Application number
PCT/CN2022/127595
Other languages
English (en)
French (fr)
Inventor
吴文岩
李世楷
李华
钱晨
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023078135A1 publication Critical patent/WO2023078135A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular to a three-dimensional modeling method and device, a computer-readable storage medium, and computer equipment.
  • 3D modeling is an important research content in computer vision and computer graphics.
  • 3D modeling generally refers to reconstructing the 3D model of the target object from the image of the target object.
  • the integrity of the three-dimensional model of the target object modeled is relatively low.
  • an embodiment of the present disclosure provides a three-dimensional modeling method, the method including: determining information of at least one first key point in a first region of the target object based on a target image of the target object, the target The image includes the image corresponding to the first area; based on the information of at least one reference key point in the at least one first key point, determine the information of the second key point to be completed; based on the at least one first key point
  • the key point information and the second key point information are used to perform three-dimensional modeling on the target object to obtain a three-dimensional model of the target object.
  • an embodiment of the present disclosure provides a three-dimensional modeling device, the device comprising: a first determining module, configured to determine at least one first key of a first region of the target object based on a target image of the target object point information, the target image includes the image corresponding to the first area; the second determination module is configured to determine the information to be completed based on the information of at least one reference key point in the at least one first key point Information about the second key point; a three-dimensional modeling module, configured to perform three-dimensional modeling on the target object based on the information about the at least one first key point and the information about the second key point, to obtain the target object 3D model of .
  • an embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any embodiment is implemented.
  • an embodiment of the present disclosure provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the program, it implements the computer program described in any embodiment. described method.
  • FIG. 1A is a schematic diagram of an image of a target object shown in an embodiment of the present disclosure.
  • FIG. 1B is a schematic diagram of a three-dimensional model of a target object shown in an embodiment of the present disclosure.
  • Fig. 2 is a flowchart of a three-dimensional modeling method according to an embodiment of the present disclosure.
  • FIG. 3A is a schematic diagram of the topological structure of key points of a human body according to an embodiment of the present disclosure.
  • FIG. 3B is a schematic diagram of key points of a building according to an embodiment of the present disclosure.
  • FIG. 4A is a schematic diagram of an image of a target object according to an embodiment of the present disclosure.
  • FIG. 4B is a schematic diagram of an image of a target object according to an embodiment of the present disclosure.
  • FIG. 4C is a schematic diagram of an image of a target object according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a three-dimensional modeling result of an embodiment of the present disclosure.
  • FIG. 6 is a complete flowchart of an embodiment of the present disclosure.
  • FIG. 7 is a block diagram of a three-dimensional modeling device according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • Three-dimensional modeling generally refers to reconstructing a three-dimensional model of the target object from an image of the target object.
  • 3D modeling In related technologies, only the 3D model corresponding to the local area of the target object included in the image can be reconstructed through 3D modeling.
  • the image shown in FIG. 1A only includes the upper body (from the head to the waist) of the target object. , does not include the lower body (legs, feet) of the target object, therefore, as shown in Figure 1B, only the 3D model of the upper body of the target object can be obtained through 3D modeling, but the 3D model of the lower body of the target object cannot be obtained .
  • an embodiment of the present disclosure provides a 3D modeling method, as shown in FIG. 2 , the method includes:
  • Step 201 Based on the target image of the target object, determine the information of at least one first key point of the first area of the target object, the target image includes an image corresponding to the first area;
  • Step 202 Based on the information of at least one reference key point in the at least one first key point, determine the information of the second key point to be completed;
  • Step 204 Based on the information of the at least one first key point and the information of the second key point, perform three-dimensional modeling on the target object to obtain a three-dimensional model of the target object.
  • the information of the first key point of the first area of the target object is obtained based on the target image of the target object, and then the information of the second key point is completed based on the information of the reference key point in the first key point. After key point completion, the obtained key point information is more complete. Therefore, compared with the 3D model obtained by 3D modeling based only on the information of the first key point, the information based on the first key point and the obtained The completeness of the three-dimensional model obtained by performing three-dimensional modeling based on the information of the second key point is higher.
  • the target object may include a living body such as a person or an animal, or may include a non-living body such as a building or a table.
  • a target object can contain one or more keypoints.
  • the key points of the target object may include bone key points of the target object.
  • the topology of the complete human body may include multiple key points.
  • the multiple key points may include human head key points, neck key points, Limb key points, torso key points, etc.
  • Fig. 3A shows a schematic diagram of key point distribution. In the figure, the black dots represent the key points, and the numbers represent the numbers of the key points.
  • the key points and their corresponding numbers are: 0-nose key point, 1-neck key point, 2-right shoulder key point, 3-right elbow Key points, 4-key point of right wrist, 5-key point of left shoulder, 6-key point of left elbow, 7-key point of left wrist, 8-key point of middle hip, 9-key point of right hip, 10-key point of right knee Point, 11-right ankle key point, 12-left hip key point, 13-left knee key point, 14-left ankle key point, 15-right eye key point, 16-left eye key point, 17-right ear key point Point, 18-left ear key point, 19-left big toe key point, 20-left little toe key point, 21-left heel key point, 22-right big toe key point, 23-right little toe key point, 24- Right heel key.
  • the multiple key points may include other key points (for example, finger key points) in addition to the above key points.
  • the multiple key points may include some of the above key points and other key points.
  • the corner points of the target object can also be used as the key points of the target object.
  • the corner points can include the corner points of the pixel values of the target object on the target image, and can also include the geometric shape of the target object. inflection point.
  • the key points of the building include inflection points of geometric shapes such as A, B, C, and D, and corner points of pixel values such as E and F.
  • a target image of the target object may be acquired.
  • the target image can be an RGB image or a grayscale image.
  • the target image may be an image acquired by an image acquisition device (for example, a camera or a video camera), or an image acquired by hand drawing or other methods.
  • an image acquisition device for example, a camera or a video camera
  • By performing key point detection on the target image information of at least one first key point in the first region of the target object can be acquired, where the information of the first key point includes at least position information of the first key point. Due to reasons such as shooting angle and occlusion, often only some key points of the target object can be detected from the target image.
  • the target object may also be included in the target image, which will not be listed here.
  • the number of target images may be greater than 1, and different target images may include different local regions on the target object, but the union of local regions included in multiple target images may still not include all regions of the target object.
  • the area of the target object included in the target image is referred to as a first area on the target object.
  • the first area includes the head area, neck area, and shoulder area of the target object.
  • some background regions may also be included in the target image.
  • multiple target objects may be included in the target image.
  • the image area where the target object is located can be extracted from the target image by means of image segmentation, and then the subsequent processing is performed.
  • the first key point may include at least one reference key point, and the reference key point may be any one or more first key points.
  • the second key point to be completed may be at least one key point in a second area other than the first area on the target object.
  • the head key points, neck key points, left shoulder key points and right shoulder key points of the target object can be detected from the target image, and the key points in the above four key points At least one of the key points is used as a reference key point, and at least any other key point on the target object other than the above four key points is used as the second key point to be completed.
  • a first distance estimate between the second key point and the reference key point may be obtained; based on the first distance estimate And the information of the reference key point determines the information of the second key point.
  • the first estimated distance value may be predetermined and stored, and called when it is necessary to complete the information of the second key point.
  • the first estimated distance value may be determined based on empirical values.
  • the first estimated distance value may be a fixed value.
  • the distance between the key point of the left shoulder and the key point of the right shoulder of an adult is about 40cm
  • the left shoulder key point can be The first distance estimate between the keypoint and the right shoulder keypoint is determined to be 40 cm.
  • the first estimated distance value may be a dynamic value.
  • the ratio of the distance between the key point of the left shoulder to the key point of the right shoulder of an adult and the height of the adult is about 0.225, then the key point of the left shoulder to the key point of the right shoulder can be determined based on the product of the average height of an adult and 0.225
  • the first estimated distance value may be selected from a preset interval. Assuming that the distance between the key point of the left shoulder and the key point of the right shoulder of an adult is within the interval [35cm, 45cm], a distance value (for example, a maximum value, a minimum value, an intermediate value or other values) can be selected from this interval ) as the first distance estimate.
  • the estimated distance between the second sample key point and the first sample key point of each sample object in the plurality of sample objects can be obtained respectively, and the distance estimate value corresponding to each sample object is determined based on the The first distance estimate.
  • the second sample key point is a key point on the sample object corresponding to the second key point
  • the first sample key point is a key point on the sample object corresponding to the reference key point.
  • the average, maximum or minimum value of the estimated distance values corresponding to each sample object may be determined as the first estimated distance value, or the estimated distance value corresponding to one sample object may be randomly selected as the first estimated distance value.
  • the average size of multiple sample objects may be determined, and the distance between the second sample key point and the first sample key point of the sample objects under the average size is used as the first distance estimation value.
  • the average size of multiple sample objects refers to the average value of the sizes of multiple sample objects. Taking the target object as an example, the average height of multiple people (that is, the average height of multiple people) can be determined, and a person whose height is the average height is selected as a sample object, and the second sample on the sample object The distance between the key point and the first sample key point is used as the first estimated distance value.
  • a real object in the physical space can be used as a sample object, and by measuring the distance between the second sample key point and the first sample key point of the sample object, the distance between the second sample key point and the first sample key point can be obtained. Estimated distance between points.
  • the image of the sample object may also be collected, key point information is obtained by performing key point detection on the image, and then an estimated distance between the second sample key point and the first sample key point is determined based on the detected key point information .
  • the number of sample images may be greater than or equal to 1, and the key points included in each sample image may be the same or different. In some embodiments, the number of sample images is N (for example, 100), and all key points on the sample object can be detected from each sample image.
  • the first estimated distance value may also be determined based on the attributes of the target object.
  • the attributes may include but not limited to height, weight, gender, age and so on.
  • different first distance estimates may be determined for males and females, respectively, and/or different first distance estimates may be determined for adults and minors.
  • the first estimated distance value corresponding to the target object of each attribute may be determined in any of the above-mentioned manners, which will not be repeated here.
  • the first key points may include two reference key points.
  • a first distance between the two reference key points may be acquired; based on the first distance and a predetermined scale factor, determine the distance between the second key point and the target key point in the first key point.
  • the second distance, the scale factor is used to characterize the proportional relationship between the first distance and the second distance; determine the second key point based on the second distance and the information of the target key point Information.
  • the target key point may be any one of the two reference key points.
  • the target key point can be any one of the two reference key points, or all but two of the first key points.
  • the target key point and the second key point are adjacent key points.
  • two key points adjacent means that these two key points are directly connected in the key point topology diagram, for example, the key point numbered 8 in Figure 3A is respectively connected to the key point numbered 1 and the key point numbered 9. The point is adjacent to the key point numbered 12.
  • two adjacent key points include a parent node and a child node
  • the parent node and child node can be determined according to the position of the node.
  • the key points at the top can be determined as parent nodes
  • the key points at the bottom can be determined as child nodes.
  • the "above” and “below” can be determined based on the ordinate of the position of the key point, and in different coordinate systems, the relationship between the coordinates and the orientation may be different.
  • the parent node and child node can be defined according to the actual situation, and will not be repeated here.
  • the target key point may also be a pre-selected key point, for example, a neck key point.
  • the target keypoint can be a keypoint that is aligned in a certain direction with the second keypoint. Where the two key points are aligned in a certain direction, it means that the position offset of the two key points in other directions other than this direction is small.
  • the target keypoint may also be one of a pre-selected set of keypoints, for example, one of the neck keypoint, left eye keypoint, and right shoulder keypoint.
  • the information of the second key point may also be completed in the manner described in the above embodiment.
  • the first distance between the two reference key points can be determined based on the result of the key point detection. For example, the coordinates of the two reference key points can be determined respectively, and then the Euclidean distance between the two reference key points can be determined based on the coordinates of the two reference key points, and the Euclidean distance can be determined as the first distance.
  • the scale factor can be determined based on empirical values. For example, assuming that the two reference key points are the left shoulder key point and the left elbow key point respectively, and the second key point is the left wrist key point, according to experience, the distance between the left shoulder key point and the left elbow key point is equal to The proportional relationship between the distances between the key points of the left elbow and the key points of the left wrist is generally about 1:1, therefore, the scale factor can be determined as 1.
  • the second estimated distance between the two reference key points may be obtained, and the third estimated distance between the second key point and the target key point may be obtained, based on A proportional relationship between the second distance estimate and the third distance estimate determines the scaling factor.
  • the manner of obtaining the second estimated distance value and the third estimated distance value may be the same as the manner of obtaining the first estimated distance value. For example, it may be determined based on empirical values, or may be determined based on sample objects.
  • the second estimated distance value and the third estimated distance value are determined based on sample objects
  • the second estimated distance value may be based on two second estimated distance values of each sample object in a plurality of sample objects.
  • the distance between a sample key point is determined; wherein, two first sample key points correspond to two reference key points respectively.
  • the third distance estimation value may be determined based on the distance between the second sample key point and the third sample key point of each sample object; wherein, the second sample key point is similar to the second sample key point Correspondingly, the third sample key point corresponds to the target key point.
  • the first distance is denoted as D1
  • the embodiments of the present disclosure are not limited to using the product or quotient of the first distance and the scale factor directly as the second distance.
  • a random offset may be added to the above product or quotient, or the above product
  • the quotient may be scaled according to a preset ratio as the second distance, or other methods may be used to determine the second distance, which will not be listed here.
  • the information of different second key points may be determined based on the same or different scale factors.
  • the target object is symmetrical about a certain axis of symmetry. Therefore, the scale factor corresponding to the second key point on one side of the target object may be determined, and the scale factor of the key point symmetrical to the second key point may be determined as the same scale factor. This way, symmetric second keypoints can share the same scale factor.
  • the distribution of multiple second key points is relatively close, for example, the distance between two adjacent second key points among the multiple second key points is relatively close, then the multiple second key points Points can share the same scale factor.
  • a scaling factor may also be determined for each second key point.
  • the sample object and the target object may be objects of the same category, for example, they are both human beings.
  • the target object and the sample object may also be objects of different categories.
  • the target object is a human and the sample object is a bear.
  • the sample objects can be determined randomly, or determined by the user according to requirements. Since the distances between the key points of different categories of sample objects are often different, the information of the key points of different categories of sample objects is used to determine the information of the second key point to be completed, and the obtained information of the second key point Therefore, the display effect of the 3D model obtained by the final 3D reconstruction is also different.
  • key point A corresponds to another key point (called key point B), which may mean that key point A and key point B are key points in the same part, for example, both are left shoulder key points .
  • key point B may also refer to a key point where key point A and key point B are symmetrical.
  • the symmetry of the two key points means that the positions of the two key points on the object to which they belong are symmetrical.
  • key point A is the key point of the left shoulder on the target object
  • key point B is the key point of the right shoulder on the sample object. Since the categories of the sample object and the target object may be different, the positions and numbers of the key points of the sample object and the key points of the target object may be different.
  • the corresponding relationship between the key points of the sample object and the key points of the target object may be established in advance, so as to complement the information of the second key point.
  • the first estimated distance value, the second estimated distance value, the third estimated distance value and/or the scale factor may be acquired and stored in advance, and determined to be It is called directly when completing the information of the second key point.
  • the at least one reference key point includes a first reference key point and a second reference key point
  • the information of the second key point includes the information of the second key point in the first direction and the Information about the second key point in the second direction.
  • the first direction and the second direction are different directions.
  • the first direction is orthogonal to the second direction.
  • the first direction is the direction of the line connecting the target object's nose key point to the target object's neck key point
  • the second direction is the direction of the line connecting the target object's neck key point and the right shoulder key point.
  • other directions may also be determined as the first direction and the second direction according to actual needs, and no further examples are given here.
  • the first direction may be a vertical direction
  • the second direction may be a horizontal direction.
  • Information of the second key point in each direction may be determined based on the same information of the first key point, or may be determined based on information of different first key points.
  • the information pair of the first key point whose relative displacement in the second direction is less than the first threshold can be selected to determine the information of the second key point in the first direction, and the information in the first direction can be selected
  • the information of the second key point in the second direction is determined based on the information of the first key point whose relative displacement is less than the second threshold.
  • the way of determining the information of the second key point in each direction is similar to the way of determining the information of the second key point in the previous embodiment, the only difference is that when determining the distance of the second key point in the first direction , the above-mentioned various distances and scale factors are projection distances in the first direction and scale factors in the first direction; when determining the distance of the second key point in the second direction, the above-mentioned various distances and The scale factor is the projection distance in the second direction and the scale factor in the second direction.
  • the first distance between the two reference key points can be obtained at the The first projection distance in the first direction, based on the first projection distance and the predetermined scale factor in the first direction, determine the second key point and the target key in the first key point A second projection distance of the second distance between points in the first direction, and the scale factor in the first direction is used to characterize the proportional relationship between the first projection distance and the second projection distance . Then, the information of the second key point in the first direction may be determined based on the second projection distance and the information of the target key point. The manner of determining the information of the second key point in the second direction is similar and will not be repeated here.
  • the target key point is the key point numbered 8
  • the second key point to be completed is the key point numbered
  • the information of the key point numbered 9 can be determined only based on the scale factor in the horizontal direction.
  • the target key point is the key point numbered 9
  • the second key point to be completed is the key point numbered 10
  • the key point numbered 9 can be determined only based on the scale factor in the vertical direction. information.
  • initial deformation parameters and initial motion parameters of the target object may be determined based on the target image, the initial deformation parameters are used to characterize the body shape of the target object, and the initial motion parameters are used to characterize the An action performed by the target object; based on the difference between the target object in the target image and the three-dimensional model of the target object, the information of the at least one first key point, and the information of the second key point, the Optimizing the initial deformation parameters and the initial motion parameters to obtain optimized deformation parameters and optimized motion parameters; performing three-dimensional modeling on the target object based on the optimized deformation parameters and optimized motion parameters.
  • the initial deformation parameters are a set of parameters used to characterize the size of the target object.
  • the initial deformation parameters can be used to represent the size of the target object.
  • the initial action parameters can include the rotation angle of each key point of the target object, and can also include the global rotation angle of the target object.
  • the initial action parameters can determine the action performed by the target object, such as playing, jumping, etc.
  • the initial deformation parameters and initial motion parameters determined based on the target object may have certain errors, so they need to be optimized.
  • the optimization process uses the completed key point information and the difference between the target object in the target image and the 3D model of the target object to constrain, so that the final 3D model and the target object in the target image more consistent.
  • the difference between the target object in the target image and the three-dimensional model of the target object is determined based on: determining a first mask of the target object based on the target image; performing three-dimensional modeling on the target object with initial deformation parameters and initial action parameters to obtain an initial three-dimensional model of the target object, and determining a second mask of the target object based on the initial three-dimensional model; based on the first A mask and the second mask determine a difference between a target object in the target image and a three-dimensional model of the target object.
  • the first mask and the second mask may be two images with the same pixel size.
  • the pixel values of the pixel points belonging to the target object in the first mask and the second mask are the first pixel value (for example, 0), and the pixel values of the pixel points not belonging to the target object are the second pixel value (for example, 255) , in this way, the target object determined based on the target image and the target object determined based on the initial 3D model can be represented by the binarized mask image.
  • the intersection-over-union ratio between the first mask and the second mask can be determined, denoted as IoU:
  • A represents the set of pixel points that all belong to the target object in the first mask
  • B represents the set of pixel points that all belong to the target object in the second mask
  • S represents the area.
  • a difference between the target object in the target image and a three-dimensional model of the target object may be determined based on the intersection-over-union ratio.
  • the embodiments of the present disclosure may also use other methods to determine the difference between the target object in the target image and the 3D model of the target object, which will not be listed here.
  • the 3D point cloud of the surface of the target object can be determined based on the optimized deformation parameters and the optimized action parameters, and the 3D model of the target object can be obtained after rendering.
  • the key point information can be input into the pre-trained neural network, the key point information includes the information of the at least one reference key point and the initial value of the information of the second key point, and the information of a key point
  • the information includes position information and confidence information of the key point, and the initial value of the confidence information of the second key point is less than a preset confidence threshold; acquiring information based on the at least one reference key point of the neural network Output the information of the second key point.
  • the position information of a key point can be used to characterize the position of the key point in a preset coordinate system.
  • the preset coordinate system can be the coordinate system corresponding to the target image, or it can be the world coordinate system or other coordinate systems.
  • the coordinate system It can be selected based on actual needs, and can be converted through the conversion matrix between coordinate systems in different situations.
  • the position information of the reference key point can be determined based on the target image, and the position information of the second key point needs to be obtained through completion.
  • an initial value such as [0,0] may be set for the position information of the second key point, and the initial value may be corrected by key point completion to obtain the final position information of the second key point.
  • the confidence information of a key point is used to characterize the credibility of the location information of the key point, and a value within a certain range may be used to represent the value of the confidence information, and the certain range may be a range between 0 and 1, It can also be in the range of 0 to 100, or other ranges.
  • the larger the value of the confidence information the more credible the location information of the key points; on the contrary, the smaller the value of the confidence information, the less credible the location information of the key points.
  • the first key point can be directly detected from the target image, or directly included in the target image, therefore, the confidence information of the first key point has a larger value.
  • the initial position information of the second key point is obtained without calculation, and may be quite different from the actual position information of the second key point. Therefore, the value of the initial confidence information of the second key point is relatively small.
  • the confidence information of the second key point may be set to a value greater than the confidence threshold. In this way, on the one hand, it is convenient to distinguish whether the position information of the second key point is the initial information or the information obtained after completion, and on the other hand, it is convenient to use the position information of the second key point with a higher confidence threshold to calculate the position information of other key points. information.
  • the confidence threshold may be set to 0, that is, the initial confidence information of the second key point is 0.
  • the confidence degree after completion of the second key point may be adjusted to 1. Of course, in practical applications, other values may also be used as the initial confidence level and the adjusted confidence level of the second key point.
  • the second key point after obtaining the information of the second key point, can also be determined based on the information of the second key point and the distance between the third key point to be completed and the second key point.
  • Three key points of information For the specific manner of determining the information of the third key point, reference may be made to the manner of determining the information of the second key point, which will not be repeated here.
  • the key point information includes a confidence level
  • it may be based on the information of the second key point whose confidence level is greater than the confidence level threshold and the third key point to be completed and the information whose confidence level is greater than the confidence level threshold
  • the distance between the second key points determines the information of the third key points.
  • the target object may be three-dimensionally modeled to obtain a three-dimensional image of the target object Model.
  • the three-dimensional model of the target object obtained by performing three-dimensional modeling based on the target image shown in FIG. 1A in some embodiments is shown in FIG. 5 . It can be seen that by performing 3D modeling in this embodiment, the part of the target object not included in the target image can be reconstructed, thereby improving the integrity of the 3D model obtained by 3D modeling.
  • the finally obtained 3D model can only include the 3D model corresponding to the partial area of the target object.
  • the 3D modeling of the present disclosure is better than the 3D modeling based only on the key points obtained from the target image. Modeling methods can improve the integrity of the 3D model.
  • each data format is [x, y, confidence information]
  • x, y are image coordinates
  • Point position confidence information ranges from 0 to 1.
  • the key point array is used as input, and the key point array contains undetected key points (ie, the second key point), and the array corresponding to the undetected key points is represented by [0,0,0].
  • the output of this step is an array after key point completion, and the undetected key points are completed to obtain complete key point information.
  • Adaptive scale estimation Construct a data set containing 100 complete single-person pictures, calculate the average value of the vertical distance and the average value of the horizontal distance of each group of adjacent key points of the human body, for example, calculate ⁇ 0,1 ⁇ , ⁇ 1,2 respectively ⁇ , ⁇ 1,5 ⁇ , ⁇ 1,8 ⁇ and other adjacent key points’ average vertical distance and horizontal distance. Then divide the average value of the vertical distance between the key points of the nose and the key point of the neck by the average value of the vertical distance between the key points of each group to obtain the scale factor of the vertical direction of the key points of each group. Then divide the average value of the horizontal distance between the adjacent key points of each group by the average value of the horizontal distance between the right shoulder key point and the neck key point to obtain the scale factor of the horizontal direction of each group of adjacent key points.
  • 3D modeling is performed.
  • the input of this step is the completed key points of the human body, initial motion parameters, and initial deformation parameters, and the output is 6890 3D point clouds representing the human body, which are rendered to obtain a 3D human body.
  • this step use the completed human body key point information and human body mask to adjust the initial motion parameters and initial deformation parameters, and finally obtain a complete human body 3D model after rendering.
  • It can support 3D modeling through partial images to obtain a complete 3D model of the target object, wherein only some key points of the target object can be detected through partial images.
  • the reconstructed 3D model can be consistent with the motion in the local image in terms of motion.
  • the reconstructed 3D model can also be consistent with the shape in the local image in terms of shape.
  • the above action sequence may be a key point sequence, that is, an image sequence composed of multiple key point topological maps; it may also be a video sequence of the target object. For example, taking a half-length photo of a human body and a reference video of performing a dancing move, a dancing video of a full-body human body in the half-length photo can be obtained.
  • This disclosure relates to the field of augmented reality.
  • acquiring the image information of the target object in the real environment and then using various visual correlation algorithms to detect or identify the relevant features, states and attributes of the target object, and thus obtain the image information that matches the specific application.
  • AR effect combining virtual and reality.
  • the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places.
  • Vision-related algorithms may involve visual positioning, SLAM, 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc.
  • Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display.
  • the relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network.
  • the above-mentioned convolutional neural network is a network model obtained by performing model training based on a deep learning framework.
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • An embodiment of the present disclosure also provides a three-dimensional modeling device, as shown in FIG. 7, the device includes:
  • the first determining module 701 is configured to determine information of at least one first key point of a first area of the target object based on a target image of the target object, the target image including an image corresponding to the first area;
  • the second determination module 702 is configured to determine the information of the second key point to be completed based on the information of at least one reference key point among the at least one first key point;
  • a three-dimensional modeling module 703, configured to perform three-dimensional modeling on the target object based on the information of the at least one first key point and the information of the second key point, to obtain a three-dimensional model of the target object.
  • information of a first key point in a first region on the target object is obtained based on a target image of the target object, and then information of a second key point is completed based on information of a reference key point in the first key point. After key point completion, the obtained key point information is more complete. Therefore, compared with the 3D model obtained by performing 3D modeling based only on the information of the first key point, based on the information of the first key point and the obtained The completeness of the three-dimensional model obtained by performing three-dimensional modeling based on the information of the second key point is higher.
  • the at least one first key point includes a reference key point; the second determining module is configured to: acquire a first distance estimate between the second key point and the reference key point value; determine the information of the second key point based on the first estimated distance value and the information of the reference key point.
  • only one reference key point can be used to complete the information of the second key point. The completion process is low in complexity and easy to implement, thereby reducing the complexity of 3D modeling and improving the efficiency of 3D modeling.
  • the at least one first key point includes at least two first key points; the at least one first key point includes two reference key points; the second determination module is configured to: acquire the A first distance between the two reference key points; based on the first distance and a predetermined scale factor, determine the distance between the second key point and the target key point in the at least two first key points The second distance, the scale factor is used to characterize the proportional relationship between the first distance and the second distance; the target key point is any key point in the two reference key points, or The target key point is other key points in the at least two first key points except the reference key point; determine the second key point based on the second distance and information of the target key point information.
  • the proportional relationship between the first distance and the second distance is used as a scale factor to map the first distance between two reference key points, and the first distance between the second key point and the target key point is obtained.
  • the second distance because the variation range of the ratio of the distance between each key point of the target object is generally small, therefore, the second distance can be accurately determined through the above-mentioned method, so that the information of the second key point can be accurately determined, and the improvement is improved. 3D modeling accuracy.
  • the device further includes: a first obtaining module, configured to obtain a second estimated distance between the two reference key points; a second obtaining module, configured to obtain the second key point A third estimated distance to the target key point; a third determining module, configured to determine the scaling factor based on a proportional relationship between the second estimated distance and the third estimated distance.
  • a relatively accurate scale factor can be obtained through the proportional relationship between the second estimated distance value and the third estimated distance value, thereby further improving the accuracy of the information of the second key point.
  • the second distance estimation value is determined based on the distance between two first sample key points of each sample object in the plurality of sample objects; The two reference key points correspond respectively; the third distance estimate is determined based on the distance between the second sample key point and the third sample key point of each sample object in the plurality of sample objects; the second The sample key point corresponds to the second key point, and the third sample key point corresponds to the target key point.
  • the at least one reference key point includes a first reference key point and a second reference key point; the second determining module is configured to: determine the second reference key point based on information of the first reference key point Information of the key point in the first direction; determining information of the second key point in the second direction based on the information of the second reference key point.
  • the device further includes: a fourth determining module, configured to determine the second key point based on the information of the second key point and the distance between the third key point to be completed and the second key point. the information of the third key point; the three-dimensional modeling module is configured to: based on the information of the at least one first key point, the information of the second key point and the information of the third key point, the The target object is modeled in 3D. After obtaining the information of the second key point, the information of the third key point can be determined based on the information of the second key point, so as to complete more key points and further improve the integrity of the three-dimensional model.
  • a fourth determining module configured to determine the second key point based on the information of the second key point and the distance between the third key point to be completed and the second key point. the information of the third key point
  • the three-dimensional modeling module is configured to: based on the information of the at least one first key point, the information of the second key point and the information of the third key point, the The target object is modeled
  • the second determination module is configured to: input key point information into a pre-trained neural network, where the key point information includes information about the at least one reference key point and information about the second key point The initial value of a key point, the information of a key point includes the position information and confidence information of the key point, the initial value of the confidence information of the second key point is less than the preset confidence threshold; the neural network is obtained based on the The information of the second key point is output from the information of the at least one reference key point.
  • the key point completion process can be automatically realized through the neural network, and the confidence of the key points is used in the process of the key point completion by the neural network, so that it can automatically determine which key points are the key points that need to be completed.
  • the device further includes: a setting module, configured to set the second key point information output by the neural network based on the information of the at least one reference key point after obtaining the The confidence information of the key point is set to a value greater than the confidence threshold; the three-dimensional modeling module is used to: based on the information of the at least one first key point, a second key whose confidence is greater than the confidence threshold point information and the information of the third key point, and perform three-dimensional modeling on the target object.
  • the information of the completed second key point can be automatically used as known information to complete the third key point without manual selection.
  • the three-dimensional modeling module is used to: determine the initial deformation parameters and initial motion parameters of the target object based on the target image, the initial deformation parameters are used to characterize the body shape of the target object, the The initial action parameter is used to characterize the action performed by the target object; based on the difference between the target object in the target image and the three-dimensional model of the target object, the information of the at least one first key point, and the information of the second key point, optimizing the initial deformation parameters and the initial action parameters to obtain optimized deformation parameters and optimized action parameters; performing three-dimensional modeling on the target object based on the optimized deformation parameters and optimized action parameters.
  • the device further includes: a fifth determination module, configured to determine the first mask of the target object based on the target image; a sixth determination module, configured to determine the first mask of the target object based on the initial deformation parameter and the initial Perform three-dimensional modeling of the target object with action parameters to obtain an initial three-dimensional model of the target object, and determine a second mask of the target object based on the initial three-dimensional model; the seventh determination module is configured to The first mask and the second mask determine a difference between a target object in the target image and a three-dimensional model of the target object.
  • the device further includes: a post-processing module, configured to perform action recognition and/or behavior prediction on the target object based on the three-dimensional model of the target object;
  • the three-dimensional model of the target object is subjected to motion transfer processing.
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • the embodiment of this specification also provides a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.
  • FIG. 8 shows a schematic diagram of a more specific hardware structure of a computing device provided by the embodiment of this specification.
  • the device may include: a processor 801 , a memory 802 , an input/output interface 803 , a communication interface 804 and a bus 805 .
  • the processor 801 , the memory 802 , the input/output interface 803 and the communication interface 804 are connected to each other within the device through the bus 805 .
  • the processor 801 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification.
  • the processor 801 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.
  • the memory 802 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 802 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 802 and invoked by the processor 801 for execution.
  • the input/output interface 803 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 804 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 805 includes a path for transferring information between the various components of the device (eg, processor 801, memory 802, input/output interface 803, and communication interface 804).
  • the above device only shows the processor 801, the memory 802, the input/output interface 803, the communication interface 804, and the bus 805, in the specific implementation process, the device may also include other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开实施例提供一种三维建模方法和装置、计算机可读存储介质及计算机设备,所述方法包括:基于目标对象的目标图像,确定所述目标对象上的第一区域的第一关键点的信息,所述目标图像中包括所述第一区域;基于所述第一关键点中的参考关键点的信息,确定待补全的第二关键点的信息;基于所述第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。

Description

三维建模方法和装置、计算机可读存储介质及计算机设备
相关申请的交叉引用
本公开要求于2021年11月02日提交的、申请号为202111289214.1的中国专利申请的优先权,该申请以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及三维建模方法和装置、计算机可读存储介质及计算机设备。
背景技术
三维建模是计算机视觉和计算机图形学中的重要研究内容,三维建模一般是指通过目标对象的图像重建出该目标对象的三维模型。在相关技术中,由于目标对象的图像往往只识别到目标对象上的部分关键点所在的区域,从而建模出的目标对象的三维模型的完整度较低。
发明内容
第一方面,本公开实施例提供一种三维建模方法,所述方法包括:基于目标对象的目标图像,确定所述目标对象的第一区域的至少一个第一关键点的信息,所述目标图像中包括所述第一区域对应的图像;基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息;基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。
本公开实施例首先基于目标对象的目标图像获取目标对象上的第一区域的第一关键点的信息,再基于第一关键点中的参考关键点的信息补全第二关键点的信息。经过关键点补全,获得的关键点的信息更加完整,因此,相比于仅基于第一关键点的信息进行三维建模得到的三维模型而言,基于所述第一关键点的信息和所述第二关键点的信息进行三维建模所得到的三维模型的完整度更高。
第二方面,本公开实施例提供一种三维建模装置,所述装置包括:第一确定模块,用于基于目标对象的目标图像,确定所述目标对象的第一区域的至少一个第一关键点的信息,所述目标图像中包括所述第一区域对应的图像;第二确定模块,用于基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息;三维建模模块,用于基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。
第三方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现任一实施例所述的方法。
第四方面,本公开实施例提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一实施例所述的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1A是本公开实施例示出的目标对象的图像的示意图。
图1B是本公开实施例示出的目标对象的三维模型的示意图。
图2是本公开实施例的三维建模方法的流程图。
图3A是本公开实施例的人体关键点拓扑结构的示意图。
图3B是本公开实施例的建筑物的关键点的示意图。
图4A是本公开实施例的目标对象的图像的示意图。
图4B是本公开实施例的目标对象的图像的示意图。
图4C是本公开实施例的目标对象的图像的示意图。
图5是本公开实施例的三维建模结果的示意图。
图6是本公开实施例的完整流程图。
图7是本公开实施例的三维建模装置的框图。
图8是本公开实施例的计算机设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这 些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。
三维建模一般是指通过目标对象的图像重建出该目标对象的三维模型。在相关技术中,通过三维建模只能重建出图像中包括的目标对象的局部区域对应的三维模型,例如,图1A所示的图像只包括目标对象的上半身(从头部到腰部的部分),而不包括目标对象的下半身(腿部、脚部),因此,如图1B所示,通过三维建模也只能得到目标对象的上半身的三维模型,而不能得到目标对象的下半身的三维模型。
为了提高三维建模得到的三维模型的完整度,本公开实施例提供一种三维建模方法,参见图2,所述方法包括:
步骤201:基于目标对象的目标图像,确定所述目标对象的第一区域的至少一个第一关键点的信息,所述目标图像中包括所述第一区域对应的图像;
步骤202:基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息;
步骤204:基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。
本公开实施例首先基于目标对象的目标图像获取目标对象的第一区域的第一关键点的信息,再基于第一关键点中的参考关键点的信息补全第二关键点的信息。经过关键点补全,获得的关键点的信息更加完整,因此,相比于仅基于第一关键点的信息进行三维建模得到的三维模型而言,基于所述第一关键点的信息和所述第二关键点的信息进行三维建模所得到的三维模型的完整度更高。
在步骤201中,目标对象可以包括人、动物等活体,也可以包括非活体,例如,建筑物、桌子等。目标对象可以包括一个或多个关键点。对于包括骨骼的目标对象,所述目标对象的关键点可以包括所述目标对象的骨骼关键点。例如,在目标对象为完整人体的情况下,完整人体的拓扑结构可包括多个关键点,在一些实施例中,所述多个关键点可以包括人的头部关键点、颈部关键点、四肢关键点、躯干关键点等。图3A示出了一种关键点分布示意图。图中,黑点表示关键点,数字表示关键点的编号,各关键点及其对应的编号分别为:0-鼻子关键点,1-脖子关键点,2-右肩关键点,3-右手肘关键点, 4-右手腕关键点,5-左肩关键点,6-左手肘关键点,7-左手腕关键点,8-臀中部关键点,9-臀右部关键点,10-右膝关键点,11-右脚踝关键点,12-臀左部关键点,13-左膝关键点,14-左脚踝关键点,15-右眼关键点,16-左眼关键点,17-右耳关键点,18-左耳关键点,19-左大脚趾关键点,20-左小脚趾关键点,21-左脚后跟关键点,22-右大脚趾关键点,23-右小脚趾关键点,24-右脚后跟关键点。当然,上述编号并非必须的,在实际应用中也可以采用其他方式对各关键点进行编号。在其他实施例中,所述多个关键点除了上述各个关键点以外,还可以包括其他关键点(例如,手指关键点)。或者,所述多个关键点可以包括上述的部分关键点以及其他关键点。
对于其他类型的目标对象,也可以采用目标对象的角点作为目标对象的关键点,所述角点可以包括目标对象在目标图像上的像素值的角点,也可以包括目标对象的几何形状的拐点。在如图3B所示的建筑物的图像中,建筑物的关键点包括A、B、C、D等几何形状的拐点以及E、F等像素值的角点。
为了对目标对象进行三维建模,可以获取目标对象的目标图像。目标图像可以是RGB图像或者是灰度图像。目标图像可以是通过图像采集装置(例如,相机或者摄像头)采集得到的图像,也可以是通过手绘或其他方式得到的图像。通过对目标图像进行关键点检测,可以获取目标对象的第一区域的至少一个第一关键点的信息,所述第一关键点的信息至少包括所述第一关键点的位置信息。由于拍摄角度和遮挡等原因,从目标图像中往往只能检测到目标对象的部分关键点。如图4A,4B,4C所示,从图4A所示的目标图像中仅能检测出目标对象的头部关键点、颈部关键点和肩部关键点;从图4B所示的目标图像中仅能检测出目标对象的腿部以上关键点;从图4C所示的目标图像中仅能检测出目标对象的右侧关键点。
除了图4A-C所示的情况之外,目标图像中也可以包括目标对象的其他局部区域,此处不再一一列举。特别地,目标图像的数量可以大于1,不同的目标图像中可以包括目标对象上不同的局部区域,但多张目标图像中包括的局部区域的并集可能仍然不包括目标对象的所有区域。
目标图像中包括的目标对象的区域称为目标对象上的第一区域。例如,在图4A所示的目标图像中,第一区域包括目标对象的头部区域、颈部区域和肩部区域。除了目标对象之外,目标图像中还可能包括一些背景区域。或者,目标图像中可能包括多个目标对象。为了便于处理,并减少背景区域和其他目标对象的影响,可以通过图像分割等方式从目标图像中提取出目标对象所在的图像区域,再进行后续处理。
在步骤202中,第一关键点可以包括至少一个参考关键点,参考关键点可以是任意 的一个或多个第一关键点。待补全的第二关键点可以是目标对象上除第一区域以外的第二区域的至少一个关键点。例如,针对图4A所示的目标图像,可以从该目标图像中检测出目标对象的头部关键点、颈部关键点、左肩关键点和右肩关键点,并将上述四个关键点中的至少一个作为参考关键点,将目标对象上除上述四个关键点以外的其他至少任意一个关键点作为待补全的第二关键点。
在所述至少一个第一关键点中包括一个参考关键点的情况下,可以获取所述第二关键点与所述参考关键点之间的第一距离估计值;基于所述第一距离估计值以及所述参考关键点的信息确定所述第二关键点的信息。所述第一距离估计值可以预先确定并存储,并在需要补全第二关键点的信息时调用。下面对确定第一距离估计值的几种方式进行说明。
在一些实施方式中,所述第一距离估计值可以根据经验值确定。例如,所述第一距离估计值可以是一个固定值。在参考关键点为左肩关键点,第二关键点为右肩关键点的情况下,根据经验值可知,成年人的左肩关键点到右肩关键点之间的距离约为40cm,则可以将左肩关键点到右肩关键点之间的第一距离估计值确定为40cm。又例如,所述第一距离估计值可以是一个动态值。成年人的左肩关键点到右肩关键点之间的距离与该成年人的身高之间的比例约为0.225,则可以基于成年人的平均身高与0.225的乘积确定左肩关键点到右肩关键点之间的第一距离。再例如,所述第一距离估计值可以从一个预设区间内选取。假设成年人的左肩关键点到右肩关键点之间的距离在区间[35cm,45cm]之内,则可以从该区间内选取一个距离值(例如,最大值、最小值、中间值或者其他值)作为所述第一距离估计值。
在一些实施方式中,可以分别获取多个样本对象中每个样本对象的第二样本关键点与第一样本关键点之间的距离估计值,基于各个样本对象对应的距离估计值确定所述第一距离估计值。其中,所述第二样本关键点为样本对象上对应于所述第二关键点的关键点,所述第一样本关键点为样本对象上对应于所述参考关键点的关键点。例如,可以将各个样本对象对应的距离估计值的平均值、最大值或者最小值确定为第一距离估计值,或者随机选择一个样本对象对应的距离估计值作为所述第一距离估计值。或者,可以确定多个样本对象的平均尺寸,并以所述平均尺寸下的样本对象的所述第二样本关键点与所述第一样本关键点的距离作为所述第一距离估计值。其中,多个样本对象的平均尺寸是指多个样本对象的尺寸的平均值。以目标对象是人为例,可以确定多个人的平均身高(即,多个人的身高的平均值),并选取一个身高为平均身高的人作为样本对象,将该样本对象上的所述第二样本关键点与所述第一样本关键点之间的距离作为所述第一距 离估计值。
可以将物理空间中的真实对象作为样本对象,通过对样本对象的第二样本关键点与第一样本关键点之间的距离进行测量,得到所述第二样本关键点与第一样本关键点之间的距离估计值。也可以采集样本对象的图像,通过对图像进行关键点检测,得到关键点信息,再基于检测得到的关键点信息确定所述第二样本关键点与第一样本关键点之间的距离估计值。样本图像的数量可以大于或等于1,各样本图像中包括的关键点可以相同,也可以不同。在一些实施例中,所述样本图像的数量为N(例如,100),从每张样本图像中均能够检测出样本对象上的所有关键点。
进一步地,为了提高第一距离估计值的准确度,还可以基于目标对象的属性确定第一距离估计值。在目标对象为人的情况下,所述属性可以包括但不限于身高、体重、性别、年龄等。例如,可以分别为男性和女性确定不同的第一距离估计值,和/或为成年人和未成年人确定不同的第一距离估计值。每种属性的目标对象对应的第一距离估计值可以采用上述任意一种方式确定,此处不再赘述。
在所述第一关键点的数量大于或等于2的情况下,所述第一关键点中可以包括两个参考关键点。可以获取两个所述参考关键点之间的第一距离;基于所述第一距离以及预先确定的尺度因子,确定所述第二关键点与所述第一关键点中的目标关键点之间的第二距离,所述尺度因子用于表征所述第一距离与所述第二距离之间的比例关系;基于所述第二距离以及所述目标关键点的信息确定所述第二关键点的信息。
其中,在所述第一关键点的数量等于2的情况下,所述目标关键点可以是两个参考关键点中的任意一个参考关键点。在所述第一关键点的数量大于或等于3的情况下,所述目标关键点既可以是两个参考关键点中的任意一个参考关键点,也可以是第一关键点中除两个所述参考关键点以外的关键点。在一些实施例中,所述目标关键点与所述第二关键点为相邻的关键点。其中,两个关键点相邻是指这两个关键点在关键点拓扑结构图中直接相连,例如,图3A中编号为8的关键点分别与编号为1的关键点、编号为9的关键点和编号为12的关键点相邻。进一步地,相邻的两个关键点中包括一个父节点和一个子节点,可以根据节点的位置确定父节点和子节点。例如,对于人体关键点拓扑结构图而言,可以将位置处于上方的关键点确定为父节点,将位置处于下方的关键点确定为子节点。所述“上方”和“下方”可以基于关键点的位置的纵坐标确定,在不同的坐标系下,坐标与方位之间的关系可能不同。父节点与子节点可以根据实际情况自行定义,此处不再赘述。
然而,本公开的方案不限于第二关键点与目标关键点相邻的情况,在其他实施例中, 目标关键点也可以是预先选择的某个关键点,例如,颈部关键点。或者,目标关键点可以是与第二关键点在某个方向上对齐的关键点。其中,两个关键点在某个方向上对齐,是指这两个关键点在除该方向以外的其他方向上的位置偏移量较小。或者,目标关键点也可以是预先选择的一组关键点中的其中一个关键点,例如,颈部关键点、左眼关键点、右肩关键点中的其中一个关键点。在第二关键点与目标关键点不相邻的情况下,也可以采用上述实施例中所述的方式补全第二关键点的信息。
由于各个参考关键点的信息可以通过对目标图像进行关键点检测得到,因此,可以基于关键点检测的结果确定两个所述参考关键点之间的第一距离。例如,可以分别确定两个所述参考关键点的坐标,再基于两个所述参考关键点的坐标确定两个所述参考关键点之间的欧氏距离,将欧式距离确定为所述第一距离。
在一些实施方式中,所述尺度因子可以基于经验值确定。例如,假设两个所述参考关键点分别为左肩关键点和左手肘关键点,所述第二关键点为左手腕关键点,根据经验可知,左肩关键点和左手肘关键点之间的距离与左手肘关键点和左手腕关键点之间的距离之间的比例关系一般约为1:1,因此,可以将所述尺度因子确定为1。
或者,在一些实施方式中,可获取两个所述参考关键点之间的第二距离估计值,并获取所述第二关键点与所述目标关键点之间的第三距离估计值,基于所述第二距离估计值与所述第三距离估计值之间的比例关系确定所述尺度因子。
其中,获取所述第二距离估计值以及所述第三距离估计值的方式可以与获取第一距离估计值的方式相同。例如,可以根据经验值确定,或者,可以基于样本对象确定。
在基于样本对象确定所述第二距离估计值以及所述第三距离估计值的实施例中,具体地,所述第二距离估计值可以基于多个样本对象中每个样本对象的两个第一样本关键点之间的距离确定;其中,两个第一样本关键点与两个参考关键点分别对应。所述第三距离估计值可以基于所述每个样本对象的第二样本关键点与第三样本关键点之间的距离确定;其中,所述第二样本关键点与所述第二关键点相对应,所述第三样本关键点与所述目标关键点相对应。
假设所述第二距离估计值和所述第三距离估计值分别记为d2和d3,则尺度因子s可以记为s=d2/d3。假设所述第一距离记为D1,则所述第二距离D2可以记为:D2=D1/s。或者,尺度因子s可以记为s=d3/d2,则所述第二距离D2可以记为:D2=D1*s。当然,本公开实施例中并不限于将第一距离与尺度因子的乘积或者商直接作为第二距离,例如,还可以在上述乘积或者商的基础上增加一个随机偏置量,或者对上述乘积或者商按照预设比例缩放以后作为第二距离,或者采用其他方式确定第二距离,此处不再一一列举。
待补全的第二关键点可以是一个或多个。其中,不同的第二关键点的信息可以基于相同或不同的尺度因子确定。在一些实施例中,目标对象相对于某个对称轴是对称的。因此,可以确定目标对象一侧的第二关键点对应的尺度因子,并将与该第二关键点对称的关键点的尺度因子确定为相同的尺度因子。这样,对称的第二关键点可以共享同一个尺度因子。在一些实施例中,多个第二关键点的分布比较接近,例如,这多个第二关键点中相邻两个第二关键点之间的距离都比较接近,则这多个第二关键点可以共享同一个尺度因子。在一些实施例中,为了使获取的第二关键点的信息比较精确,也可以分别为每个第二关键点确定尺度因子。
在上述实施例中,样本对象与目标对象可以是相同类别的对象,例如,都是人。或者,目标对象与样本对象也可以是不同类别的对象。例如,目标对象是人,样本对象是熊。可以随机确定样本对象,或者由用户根据需求来确定样本对象。由于不同类别的样本对象的关键点之间的距离往往不同,因此,采用不同类别的样本对象的关键点的信息来确定待补全的第二关键点的信息,获得的第二关键点的信息也不同,从而最终进行三维重建得到的三维模型的展示效果也不同。
上述一个关键点(称为关键点A)与另一个关键点(称为关键点B)相对应,可以是指关键点A与关键点B为相同部位的关键点,例如,都是左肩关键点。或者,也可以是指关键点A与关键点B为对称的关键点。其中,两个关键点对称是指这两个关键点在所属对象上的位置相对称。例如,关键点A为目标对象上的左肩关键点,关键点B为样本对象上的右肩关键点。由于样本对象与目标对象的类别可能不同,因此,样本对象的关键点与目标对象的关键点的位置和数量都可能不同。可以预先建立样本对象的关键点与目标对象的关键点之间的对应关系,以便补全第二关键点的信息。
为了提高关键点补全的效率,所述第一距离估计值、所述第二距离估计值、所述第三距离估计值和/或所述尺度因子可以预先获取并存储,并在需要确定待补全的第二关键点的信息时直接调用。
在一些实施例中,所述至少一个参考关键点包括第一参考关键点和第二参考关键点,所述第二关键点的信息包括所述第二关键点在第一方向上的信息和所述第二关键点在第二方向上的信息。在确定待补全的第二关键点的信息时,可以基于所述第一参考关键点的信息确定所述第二关键点在第一方向上的信息,并基于所述第二参考关键点的信息确定所述第二关键点在第二方向上的信息。
其中,所述第一方向与所述第二方向为不同的方向。在一些实施例中,第一方向与第二方向正交。例如,第一方向为目标对象的鼻子关键点到目标对象的颈部关键点的连 线所在的方向,第二方向为目标对象的脖子关键点与右肩关键点的连线所在的方向。在其他实施例中,也可以根据实际需要将其他方向确定为所述第一方向和所述第二方向,此处不再一一举例。在实际应用中,所述第一方向可以是垂直方向,所述第二方向可以是水平方向。
第二关键点在各个方向上的信息可以基于相同的第一关键点的信息确定,也可以基于不同的第一关键点的信息确定。为了提高处理准确度,可以选取在第二方向上的相对位移量小于第一阈值的第一关键点的信息对来确定第二关键点在第一方向上的信息,并选取在第一方向上的相对位移量小于第二阈值的第一关键点的信息来确定第二关键点在第二方向上的信息。
其中,确定第二关键点在各个方向上的信息的方式与前述实施例中确定第二关键点信息的方式类似,不同之处仅在于,在确定第二关键点在第一方向上的距离时,上述各种距离以及尺度因子为在所述第一方向上的投影距离以及所述第一方向上的尺度因子;在确定第二关键点在第二方向上的距离时,上述各种距离以及尺度因子为在所述第二方向上的投影距离以及所述第二方向上的尺度因子。
例如,在所述第一关键点的数量大于或等于2,所述第一关键点中包括两个参考关键点的情况下,可以获取两个所述参考关键点之间的第一距离在所述第一方向上的第一投影距离,基于所述第一投影距离以及预先确定的所述第一方向上的尺度因子,确定所述第二关键点与所述第一关键点中的目标关键点之间的第二距离在所述第一方向上的第二投影距离,所述第一方向上的尺度因子用于表征所述第一投影距离与所述第二投影距离之间的比例关系。然后,可以基于所述第二投影距离以及所述目标关键点的信息确定所述第二关键点在第一方向上的信息。确定所述第二关键点在第二方向上的信息的方式类似,此处不再赘述。
在包括至少两个方向的情况下,并非一定要基于每个方向的尺度因子来确定第二关键点的信息。例如,假设第一方向为垂直方向,第二方向为水平方向,针对图3A所示的关键点拓扑结构,在目标关键点是编号为8的关键点,待补全的第二关键点是编号为9的关键点的情况下,由于这两个关键点在垂直方向上几乎是对齐的,因此,可以只基于水平方向的尺度因子确定编号为9的关键点的信息。同理,在目标关键点是编号为9的关键点,待补全的第二关键点是编号为10的关键点的情况下,可以只基于垂直方向的尺度因子确定编号为9的关键点的信息。
在步骤203中,可以基于所述目标图像确定所述目标对象的初始形变参数以及初始动作参数,所述初始形变参数用于表征所述目标对象的体型,所述初始动作参数用于表 征所述目标对象执行的动作;基于所述目标图像中的目标对象与所述目标对象的三维模型之间的差异、所述至少一个第一关键点的信息以及所述第二关键点的信息,对所述初始形变参数和所述初始动作参数进行优化,得到优化形变参数和优化动作参数;基于优化形变参数和优化动作参数对所述目标对象进行三维建模。
其中,初始形变参数是一组用于表征目标对象的尺寸的参数,对于目标对象为人的情况,初始形变参数可用于表征目标对象的体型胖瘦。初始动作参数中可以包括目标对象的每个关键点的旋转角度,还可以包括目标对象的全局旋转角度,通过初始动作参数可以确定目标对象执行的动作,例如,打球、跳跃等。基于目标对象确定的初始形变参数以及初始动作参数可能具有一定的误差,因此需要进行优化。优化过程同时采用补全后的各个关键点信息以及所述目标图像中的目标对象与所述目标对象的三维模型之间的差异进行约束,从而使最终得到的三维模型与目标图像中的目标对象更加相符。
在一些实施例中,所述目标图像中的目标对象与所述目标对象的三维模型之间的差异基于以下方式确定:基于所述目标图像确定所述目标对象的第一掩膜;基于所述初始形变参数和初始动作参数对所述目标对象进行三维建模,得到所述目标对象的初始三维模型,并基于所述初始三维模型确定所述目标对象的第二掩膜;基于所述第一掩膜与所述第二掩膜,确定所述目标图像中的目标对象与所述目标对象的三维模型之间的差异。
在上述实施例中,所述第一掩膜与所述第二掩膜可以是像素尺寸相等的两张图像。第一掩膜和第二掩膜中属于目标对象的像素点的像素值为第一像素值(例如,0),不属于目标对象的像素点的像素值为第二像素值(例如,255),这样,可以用二值化的掩膜图像来表征基于目标图像确定的目标对象以及基于初始三维模型确定的目标对象。可以确定所述第一掩膜与所述第二掩膜之间的交并比,记为IoU:
Figure PCTCN2022127595-appb-000001
其中,A表示在第一掩膜中均属于目标对象的像素点的集合,B表示在第二掩膜中均属于目标对象的像素点的集合,S表示面积。可以基于所述交并比确定所述目标图像中的目标对象与所述目标对象的三维模型之间的差异。除了采用交并比之外,本公开实施例还可以采用其他方式确定所述目标图像中的目标对象与所述目标对象的三维模型之间的差异,此处不再一一列举。
在得到优化形变参数和优化动作参数之后,可以基于优化形变参数和优化动作参数确定目标对象表面的三维点云,经过渲染后得到目标对象的三维模型。
在一些实施例中,可以将关键点信息输入预先训练的神经网络,所述关键点信息包 括所述至少一个参考关键点的信息以及所述第二关键点的信息的初始值,一个关键点的信息包括所述关键点的位置信息和置信度信息,所述第二关键点的置信度信息的初始值小于预设的置信度阈值;获取所述神经网络基于所述至少一个参考关键点的信息输出的所述第二关键点的信息。
一个关键点的位置信息可以用于表征该关键点在预设坐标系下的位置,所述预设坐标系可以是目标图像对应的坐标系,也可以是世界坐标系或者其他坐标系,坐标系可以基于实际需要选择,并在不同的情况下通过坐标系之间的转换矩阵进行转换。参考关键点的位置信息可以基于目标图像确定,第二关键点的位置信息需要通过补全得到。本实施例中可以为第二关键点的位置信息设置一个初始值,例如[0,0],并通过关键点补全对该初始值进行修正,得到第二关键点最终的位置信息。
一个关键点的置信度信息用于表征该关键点的位置信息的可信程度,可以采用一定范围内的数值表示置信度信息的取值,所述一定范围可以是0到1之间的范围,也可以是0到100的范围,或者其他范围。置信度信息的取值越大,表示关键点的位置信息越可信;反之,置信度信息的取值越小,表示关键点的位置信息越不可信。一般来说,第一关键点能够直接从目标图像中检测得到,或者直接包括在目标图像中,因此,第一关键点的置信度信息取值较大。而第二关键点的初始位置信息是未经计算得到的,与该第二关键点的实际位置信息可能差异较大,因此,第二关键点的初始置信度信息的取值较小。
在获取所述第二关键点的信息之后,可以将所述第二关键点的置信度信息设置为大于所述置信度阈值的值。这样,一方面便于区分第二关键点的位置信息是初始信息还是补全后计算得到的信息,另一方面便于将置信度阈值较高的第二关键点的位置信息用于计算其他关键点的信息。在一些实施例中,所述置信度阈值可以设置为0,即,第二关键点的初始置信度信息为0。在另一些实施例中,第二关键点补全后的置信度可以调整为1。当然,在实际应用中,也可以采用其他数值作为第二关键点的初始置信度以及调整后的置信度。
在一些实施例中,得到第二关键点的信息之后,还可以基于所述第二关键点的信息以及待补全的第三关键点与所述第二关键点之间的距离确定所述第三关键点的信息。确定第三关键点的信息的具体方式可以参见确定第二关键点的信息的方式,此处不再赘述。
在关键点的信息包括置信度的情况下,可以基于置信度大于所述置信度阈值的第二关键点的信息以及待补全的第三关键点与所述置信度大于所述置信度阈值的第二关键点之间的距离确定所述第三关键点的信息。
然后,可以基于所述至少一个第一关键点的信息、所述第二关键点的信息以及所述第三关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。一些实施例的基于图1A所示的目标图像进行三维建模得到的目标对象的三维模型如图5所示。可以看出,通过本实施例进行三维建模,可以重建出目标对象未包括在目标图像中的部分,从而提高三维建模得到的三维模型的完整度。应当说明的是,通过本实施例的方式可以仅对目标对象的部分关键点进行补全,相应地,最终得到的三维模型中可以仅包括目标对象的部分区域对应的三维模型。然而,无论补全之后得到的是目标对象上完整的全部关键点,还是目标对象的部分关键点,相对于仅仅基于从目标图像上能够获得的关键点来进行三维建模,本公开的三维建模方法都能够提高三维模型的完整度。
下面结合图6,以目标对象是人为例,对本公开的完整流程进行说明。
对于输入的目标图像,首先进行关键点检测,检测得到的关键点保存在25*3大小的二维数组中,每个数据格式为[x,y,置信度信息],x,y为图像坐标点位置,置信度信息的范围为0到1。
然后,进行关键点自适应补全。将关键点数组作为输入,关键点数组中包含未检测到的关键点(即第二关键点),未检测到的关键点对应的数组用[0,0,0]来表示。该步骤的输出为关键点补全后的数组,将未检测到的关键点补全,得到完整的关键点信息。
关键点自适应补全的过程可分为以下三个步骤:
(1)自适应尺度估计。构建一个包含100张完整单人图片的数据集,分别计算出人体各组相邻关键点的垂直距离的平均值和水平距离的平均值,例如,分别计算{0,1}、{1,2}、{1,5}、{1,8}等相邻关键点的垂直距离的平均值和水平距离的平均值。然后用各组相邻关键点垂直距离的平均值除以鼻子关键点与脖子关键点之间的垂直距离的平均值,得到各组相邻关键点的垂直方向的尺度因子。再用各组相邻关键点水平距离的平均值除以右肩关键点与脖子关键点之间的水平距离的平均值,得到各组相邻关键点的水平方向的尺度因子。
(2)关键点补全。利用上一步骤中得到的垂直方向的尺度因子和水平方向的尺度因子,对缺失的关键点进行补全。对于某个待补全的关键点来说,该待补全的关键点相对于相邻父节点的垂直距离等于该待补全的关键点对应的垂直方向的尺度因子乘以在该图像中鼻子关键点与脖子关键点之间的垂直距离,该待补全的关键点相对于相邻父节点的水平距离等于该待补全的关键点对应的水平方向的尺度因子乘以在该图像中右肩关键点与脖子关键点之间的水平距离。
(3)置信度检测。遍历整个关键点数组,将数组中置信度为0的置信度信息调整 为1。最后输出关键点补全后的数组。
然后,进行三维建模。这一步的输入为补全后的人体关键点、初始动作参数以及初始形变参数,输出为6890个表示人体的三维点云,经过渲染后得到3D人体。在这一步骤中,使用补全的人体关键点的信息以及人体掩码来调整初始动作参数以及初始形变参数,最后经过渲染得到完整的人体三维模型。
本公开实施例具有以下优点:
(1)可以支持通过局部图像进行三维建模,得到目标对象的完整的三维模型,其中,通过局部图像仅能检测出目标对象的部分关键点。
(2)重建出来的三维模型在动作方面可以与局部图像中的动作保持一致。
(3)重建出来的三维模型在形体方面也可以与局部图像中的形体保持一致。
本公开实施例可用于以下应用场景:
(1)基于所述目标对象的三维模型对所述目标对象进行动作识别和/或行为预测。在实际运用中,目前都是基于全身人体来进行动作识别及行为预测。由于缺乏下半身的信息,基于半身人体来进行动作识别及行为预测是非常复杂的。使用本公开实施例的方式进行关键点补全及三维建模,可以实现比较精确的半身动作识别及行为预测。
(2)获取参考动作序列;基于所述参考动作序列对所述目标对象的三维模型进行动作迁移处理。上述动作序列可以是关键点序列,即,多张关键点拓扑图组成的图像序列;也可以是目标对象的视频序列。例如,采用一张人体的半身照片和一段执行跳舞动作的参考动作视频,就可以得到半身照中全身人体的跳舞视频。
本公开涉及增强现实领域,通过获取现实环境中的目标对象的图像信息,进而借助各类视觉相关算法实现对目标对象的相关特征、状态及属性进行检测或识别处理,从而得到与具体应用匹配的虚拟与现实相结合的AR效果。示例性的,目标对象可涉及与人体相关的脸部、肢体、手势、动作等,或者与物体相关的标识物、标志物,或者与场馆或场所相关的沙盘、展示区域或展示物品等。视觉相关算法可涉及视觉定位、SLAM、三维重建、图像注册、背景分割、对象的关键点提取及跟踪、对象的位姿或深度检测等。具体应用不仅可以涉及跟真实场景或物品相关的导览、导航、讲解、重建、虚拟效果叠加展示等交互场景,还可以涉及与人相关的特效处理,比如妆容美化、肢体美化、特效展示、虚拟模型展示等交互场景。可通过卷积神经网络,实现对目标对象的相关特征、状态及属性进行检测或识别处理。上述卷积神经网络是基于深度学习框架进行模型训练而得到的网络模型。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不 意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
本公开实施例还提供一种三维建模装置,如图7所示,所述装置包括:
第一确定模块701,用于基于目标对象的目标图像,确定所述目标对象的第一区域的至少一个第一关键点的信息,所述目标图像中包括所述第一区域对应的图像;
第二确定模块702,用于基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息;
三维建模模块703,用于基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。
本公开实施例首先基于目标对象的目标图像获取目标对象上的第一区域的第一关键点的信息,再基于第一关键点中的参考关键点的信息补全第二关键点的信息。经过关键点补全,获得的关键点的信息更加完整,因此,相比于仅基于第一关键点的信息进行三维建模得到的三维模型而言,基于所述第一关键点的信息和所述第二关键点的信息进行三维建模所得到的三维模型的完整度更高。
在一些实施方式中,所述至少一个第一关键点中包括一个参考关键点;所述第二确定模块用于:获取所述第二关键点与所述参考关键点之间的第一距离估计值;基于所述第一距离估计值以及所述参考关键点的信息确定所述第二关键点的信息。本实施例仅采用一个参考关键点即可对第二关键点的信息进行补全,补全过程复杂度低且易实现,从而降低了三维建模的复杂度,提高了三维建模效率。
在一些实施方式中,所述至少一个第一关键点包括至少两个第一关键点;所述至少一个第一关键点中包括两个参考关键点;所述第二确定模块用于:获取所述两个参考关键点之间的第一距离;基于所述第一距离以及预先确定的尺度因子,确定所述第二关键点与所述至少两个第一关键点中的目标关键点之间的第二距离,所述尺度因子用于表征所述第一距离与所述第二距离之间的比例关系;所述目标关键点为所述两个参考关键点中的任意一个关键点,或者所述目标关键点为所述至少两个第一关键点中除所述参考关键点以外的其他关键点;基于所述第二距离以及所述目标关键点的信息确定所述第二关键点的信息。
本实施例将第一距离与所述第二距离之间的比例关系作为尺度因子来对两个参考关键点之间的第一距离进行映射,得到第二关键点与目标关键点之间的第二距离,由于目标对象的各关键点之间的距离的比例的变化范围一般较小,因此,通过上述方式能够准确地确定第二距离,从而能够准确地确定第二关键点的信息,提高了三维建模的准确 度。
在一些实施方式中,所述装置还包括:第一获取模块,用于获取所述两个参考关键点之间的第二距离估计值;第二获取模块,用于获取所述第二关键点与所述目标关键点之间的第三距离估计值;第三确定模块,用于基于所述第二距离估计值与所述第三距离估计值之间的比例关系确定所述尺度因子。本实施例通过第二距离估计值与第三距离估计值之间的比例关系能够得到较为准确的尺度因子,从而进一步提高第二关键点的信息的准确度。
在一些实施方式中,所述第二距离估计值基于多个样本对象中每个样本对象的两个第一样本关键点之间的距离确定;所述两个第一样本关键点与所述两个参考关键点分别对应;所述第三距离估计值基于所述多个样本对象中每个样本对象的第二样本关键点与第三样本关键点之间的距离确定;所述第二样本关键点与所述第二关键点相对应,所述第三样本关键点与所述目标关键点相对应。
在一些实施方式中,所述至少一个参考关键点包括第一参考关键点和第二参考关键点;所述第二确定模块用于:基于所述第一参考关键点的信息确定所述第二关键点在第一方向上的信息;基于所述第二参考关键点的信息确定所述第二关键点在第二方向上的信息。通过在两个方向上分别确定第二关键点的信息,能够进一步提高第二关键点的信息的准确度,从而提高三维建模的准确度。
在一些实施方式中,所述装置还包括:第四确定模块,用于基于所述第二关键点的信息以及待补全的第三关键点与所述第二关键点之间的距离确定所述第三关键点的信息;所述三维建模模块用于:基于所述至少一个第一关键点的信息、所述第二关键点的信息和所述第三关键点的信息,对所述目标对象进行三维建模。在得到第二关键点的信息之后,可以以第二关键点的信息为基础确定第三关键点的信息,从而对更多的关键点进行补全,进一步提高了三维模型的完整度。
在一些实施方式中,所述第二确定模块用于:将关键点信息输入预先训练的神经网络,所述关键点信息包括所述至少一个参考关键点的信息以及所述第二关键点的信息的初始值,一个关键点的信息包括所述关键点的位置信息和置信度信息,所述第二关键点的置信度信息的初始值小于预设的置信度阈值;获取所述神经网络基于所述至少一个参考关键点的信息输出的所述第二关键点的信息。通过神经网络可以自动实现关键点补全过程,且神经网络在进行关键点补全的过程中采用了关键点的置信度,从而能够自动确定哪些关键点是需要补全的关键点。
在一些实施方式中,所述装置还包括:设置模块,用于在获取所述神经网络基于 所述至少一个参考关键点的信息输出的所述第二关键点的信息之后,将所述第二关键点的置信度信息设置为大于所述置信度阈值的值;所述三维建模模块用于:基于所述至少一个第一关键点的信息、置信度大于所述置信度阈值的第二关键点的信息和所述第三关键点的信息,对所述目标对象进行三维建模。通过重新设置补全后的第二关键点的置信度,能够自动将补全后的第二关键点的信息作为已知信息来对第三关键点进行补全,无需人工选择。
在一些实施方式中,所述三维建模模块用于:基于所述目标图像确定所述目标对象的初始形变参数以及初始动作参数,所述初始形变参数用于表征所述目标对象的体型,所述初始动作参数用于表征所述目标对象执行的动作;基于所述目标图像中的目标对象与所述目标对象的三维模型之间的差异、所述至少一个第一关键点的信息以及所述第二关键点的信息,对所述初始形变参数和所述初始动作参数进行优化,得到优化形变参数和优化动作参数;基于优化形变参数和优化动作参数对所述目标对象进行三维建模。通过本实施例的方案,能够获得较为准确的优化形变参数和优化动作参数,从而提高三维建模的准确度。
在一些实施方式中,所述装置还包括:第五确定模块,用于基于所述目标图像确定所述目标对象的第一掩膜;第六确定模块,用于基于所述初始形变参数和初始动作参数对所述目标对象进行三维建模,得到所述目标对象的初始三维模型,并基于所述初始三维模型确定所述目标对象的第二掩膜;第七确定模块,用于基于所述第一掩膜与所述第二掩膜,确定所述目标图像中的目标对象与所述目标对象的三维模型之间的差异。
在一些实施方式中,所述装置还包括:后处理模块,用于基于所述目标对象的三维模型对所述目标对象进行动作识别和/或行为预测;和/或基于获取的参考动作序列对所述目标对象的三维模型进行动作迁移处理。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述任一实施例所述的方法。
图8示出了本说明书实施例所提供的一种更为具体的计算设备硬件结构示意图,该设备可以包括:处理器801、存储器802、输入/输出接口803、通信接口804和总线805。其中处理器801、存储器802、输入/输出接口803和通信接口804通过总线805 实现彼此之间在设备内部的通信连接。
处理器801可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。处理器801还可以包括显卡,所述显卡可以是Nvidia titan X显卡或者1080Ti显卡等。
存储器802可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器802可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器802中,并由处理器801来调用执行。
输入/输出接口803用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口804用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线805包括一通路,在设备的各个组件(例如处理器801、存储器802、输入/输出接口803和通信接口804)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器801、存储器802、输入/输出接口803、通信接口804以及总线805,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的方法。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁 带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。

Claims (15)

  1. 一种三维建模方法,其特征在于,所述方法包括:
    基于目标对象的目标图像,确定所述目标对象的第一区域的至少一个第一关键点的信息,所述目标图像中包括所述第一区域对应的图像;
    基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息;
    基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。
  2. 根据权利要求1所述的方法,其特征在于,所述至少一个第一关键点中包括一个参考关键点;所述基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息,包括:
    获取所述第二关键点与所述参考关键点之间的第一距离估计值;
    基于所述第一距离估计值以及所述参考关键点的信息确定所述第二关键点的信息。
  3. 根据权利要求1所述的方法,其特征在于,所述至少一个第一关键点包括至少两个第一关键点;所述至少两个第一关键点中包括两个参考关键点;所述基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息,包括:
    获取所述两个参考关键点之间的第一距离;
    基于所述第一距离以及预先确定的尺度因子,确定所述第二关键点与所述至少两个第一关键点中的目标关键点之间的第二距离,所述尺度因子用于表征所述第一距离与所述第二距离之间的比例关系;所述目标关键点为所述两个参考关键点中的任意一个关键点,或者所述目标关键点为所述至少两个第一关键点中除所述参考关键点以外的其他关键点;
    基于所述第二距离以及所述目标关键点的信息确定所述第二关键点的信息。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    获取所述两个参考关键点之间的第二距离估计值;
    获取所述第二关键点与所述目标关键点之间的第三距离估计值;
    基于所述第二距离估计值与所述第三距离估计值之间的比例关系确定所述尺度因子。
  5. 根据权利要求4所述的方法,其特征在于,所述第二距离估计值基于多个样本对象中每个样本对象的两个第一样本关键点之间的距离确定;所述两个第一样本关键点 与所述两个参考关键点分别对应;
    所述第三距离估计值基于所述多个样本对象中每个样本对象的第二样本关键点与第三样本关键点之间的距离确定;所述第二样本关键点与所述第二关键点相对应,所述第三样本关键点与所述目标关键点相对应。
  6. 根据权利要求1所述的方法,其特征在于,所述至少一个参考关键点包括第一参考关键点和第二参考关键点;所述基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息,包括:
    基于所述第一参考关键点的信息确定所述第二关键点在第一方向上的信息;
    基于所述第二参考关键点的信息确定所述第二关键点在第二方向上的信息。
  7. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    基于所述第二关键点的信息以及待补全的第三关键点与所述第二关键点之间的距离确定所述第三关键点的信息;
    所述基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,包括:
    基于所述至少一个第一关键点的信息、所述第二关键点的信息和所述第三关键点的信息,对所述目标对象进行三维建模。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息,包括:
    将关键点信息输入预先训练的神经网络,所述关键点信息包括所述至少一个参考关键点的信息以及所述第二关键点的信息的初始值,一个关键点的信息包括所述关键点的位置信息和置信度信息,所述第二关键点的置信度信息的初始值小于预设的置信度阈值;
    获取所述神经网络基于所述至少一个参考关键点的信息输出的所述第二关键点的信息。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    在获取所述神经网络基于所述至少一个参考关键点的信息输出的所述第二关键点的信息之后,将所述第二关键点的置信度信息设置为大于所述置信度阈值的值;
    所述基于所述至少一个第一关键点的信息、所述第二关键点的信息和所述第三关键点的信息,对所述目标对象进行三维建模,包括:
    基于所述至少一个第一关键点的信息、置信度大于所述置信度阈值的第二关键点的信息和所述第三关键点的信息,对所述目标对象进行三维建模。
  10. 根据权利要求1所述的方法,其特征在于,所述基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,包括:
    基于所述目标图像确定所述目标对象的初始形变参数以及初始动作参数,所述初始形变参数用于表征所述目标对象的体型,所述初始动作参数用于表征所述目标对象执行的动作;
    基于所述目标图像中的目标对象与所述目标对象的三维模型之间的差异、所述至少一个第一关键点的信息以及所述第二关键点的信息,对所述初始形变参数和所述初始动作参数进行优化,得到优化形变参数和优化动作参数;
    基于优化形变参数和优化动作参数对所述目标对象进行三维建模。
  11. 根据权利要求10所述的方法,其特征在于,所述方法还包括:
    基于所述目标图像确定所述目标对象的第一掩膜;
    基于所述初始形变参数和初始动作参数对所述目标对象进行三维建模,得到所述目标对象的初始三维模型,并基于所述初始三维模型确定所述目标对象的第二掩膜;
    基于所述第一掩膜与所述第二掩膜,确定所述目标图像中的目标对象与所述目标对象的三维模型之间的差异。
  12. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    基于所述目标对象的三维模型对所述目标对象进行动作识别和/或行为预测;
    和/或
    基于获取的参考动作序列对所述目标对象的三维模型进行动作迁移处理。
  13. 一种三维建模装置,其特征在于,所述装置包括:
    第一确定模块,用于基于目标对象的目标图像,确定所述目标对象的第一区域的至少一个第一关键点的信息,所述目标图像中包括所述第一区域对应的图像;
    第二确定模块,用于基于所述至少一个第一关键点中的至少一个参考关键点的信息,确定待补全的第二关键点的信息;
    三维建模模块,用于基于所述至少一个第一关键点的信息和所述第二关键点的信息,对所述目标对象进行三维建模,得到所述目标对象的三维模型。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至12任意一项所述的方法。
  15. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至12任意一项所述的方法。
PCT/CN2022/127595 2021-11-02 2022-10-26 三维建模方法和装置、计算机可读存储介质及计算机设备 WO2023078135A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111289214.1 2021-11-02
CN202111289214.1A CN113724378B (zh) 2021-11-02 2021-11-02 三维建模方法和装置、计算机可读存储介质及计算机设备

Publications (1)

Publication Number Publication Date
WO2023078135A1 true WO2023078135A1 (zh) 2023-05-11

Family

ID=78686474

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/127595 WO2023078135A1 (zh) 2021-11-02 2022-10-26 三维建模方法和装置、计算机可读存储介质及计算机设备

Country Status (2)

Country Link
CN (1) CN113724378B (zh)
WO (1) WO2023078135A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113724378B (zh) * 2021-11-02 2022-02-25 北京市商汤科技开发有限公司 三维建模方法和装置、计算机可读存储介质及计算机设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111107278A (zh) * 2018-10-26 2020-05-05 北京微播视界科技有限公司 图像处理方法、装置、电子设备及可读存储介质
CN111611903A (zh) * 2020-05-15 2020-09-01 北京百度网讯科技有限公司 动作识别模型的训练方法、使用方法、装置、设备和介质
CN112562083A (zh) * 2020-12-10 2021-03-26 上海影创信息科技有限公司 基于深度相机的静态人像三维重建与动态人脸融合方法
US20210118170A1 (en) * 2019-10-18 2021-04-22 Aisin Seiki Kabushiki Kaisha Tiptoe position estimating device and fingertip position estimating device
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和系统、介质及计算机设备
CN113554741A (zh) * 2020-04-24 2021-10-26 北京达佳互联信息技术有限公司 一种对象三维重建的方法、装置、电子设备及存储介质
CN113724378A (zh) * 2021-11-02 2021-11-30 北京市商汤科技开发有限公司 三维建模方法和装置、计算机可读存储介质及计算机设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10692183B2 (en) * 2018-03-29 2020-06-23 Adobe Inc. Customizable image cropping using body key points
CN109949412B (zh) * 2019-03-26 2021-03-02 腾讯科技(深圳)有限公司 一种三维对象重建方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111107278A (zh) * 2018-10-26 2020-05-05 北京微播视界科技有限公司 图像处理方法、装置、电子设备及可读存储介质
US20210118170A1 (en) * 2019-10-18 2021-04-22 Aisin Seiki Kabushiki Kaisha Tiptoe position estimating device and fingertip position estimating device
CN113554741A (zh) * 2020-04-24 2021-10-26 北京达佳互联信息技术有限公司 一种对象三维重建的方法、装置、电子设备及存储介质
CN111611903A (zh) * 2020-05-15 2020-09-01 北京百度网讯科技有限公司 动作识别模型的训练方法、使用方法、装置、设备和介质
CN112562083A (zh) * 2020-12-10 2021-03-26 上海影创信息科技有限公司 基于深度相机的静态人像三维重建与动态人脸融合方法
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和系统、介质及计算机设备
CN113724378A (zh) * 2021-11-02 2021-11-30 北京市商汤科技开发有限公司 三维建模方法和装置、计算机可读存储介质及计算机设备

Also Published As

Publication number Publication date
CN113724378A (zh) 2021-11-30
CN113724378B (zh) 2022-02-25

Similar Documents

Publication Publication Date Title
JP6911107B2 (ja) 表面モデル化システムおよび方法
JP7249390B2 (ja) 単眼カメラを用いたリアルタイム3d捕捉およびライブフィードバックのための方法およびシステム
US10121076B2 (en) Recognizing entity interactions in visual media
JP2023083561A (ja) ホモグラフィ適合を介した完全畳み込み着目点検出および記述
Miksik et al. The semantic paintbrush: Interactive 3d mapping and recognition in large outdoor spaces
KR20210047920A (ko) 얼굴 모델의 생성
WO2022205762A1 (zh) 三维人体重建方法、装置、设备及存储介质
JP2019096113A (ja) キーポイントデータに関する加工装置、方法及びプログラム
WO2022147976A1 (zh) 三维重建及相关交互、测量方法和相关装置、设备
KR102264803B1 (ko) 이미지에서 캐릭터를 추출하여 캐릭터 애니메이션을 생성하는 방법 및 이를 이용한 장치
WO2022237249A1 (zh) 三维重建方法、装置和系统、介质及计算机设备
CN110717391A (zh) 一种基于视频图像的身高测量方法、系统、装置和介质
WO2023078135A1 (zh) 三维建模方法和装置、计算机可读存储介质及计算机设备
WO2020134925A1 (zh) 人脸图像的光照检测方法、装置、设备和存储介质
US20210407125A1 (en) Object recognition neural network for amodal center prediction
WO2022237026A1 (zh) 平面信息检测方法及系统
Bueno et al. Metrological evaluation of KinectFusion and its comparison with Microsoft Kinect sensor
CN105205786B (zh) 一种图像深度恢复方法及电子设备
CN115280367A (zh) 具有改进的姿态跟踪的运动学交互系统
Lin et al. Automatic upright orientation and good view recognition for 3D man-made models
Krishnamurthy Human detection and extraction using kinect depth images
Zhang et al. Visual Error Correction Method for VR Image of Continuous Aerobics
CN112767538B (zh) 三维重建及相关交互、测量方法和相关装置、设备
KR20190066804A (ko) 구형 영상을 생성하는 방법, 구형 영상을 재생하는 방법 및 그 장치들
CN117218279A (zh) 生成服装数据的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889160

Country of ref document: EP

Kind code of ref document: A1