CN114119913A

CN114119913A - Human body model driving method, device and storage medium

Info

Publication number: CN114119913A
Application number: CN202010876708.9A
Authority: CN
Inventors: 闫浩男; 周润楠; 张胜凯; 郑天祥; 杨超杰; 焦年红; 吴圣杰
Original assignee: Beijing Momo Information Technology Co Ltd
Current assignee: Beijing Momo Information Technology Co Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2022-03-01

Abstract

The invention discloses a driving method of a human body model, which comprises the steps of obtaining posture and body type parameters of a target human body model, wherein the three-dimensional human body posture and body type parameters correspond to bones and a plurality of basic parameters of a three-dimensional standard human body model; establishing a standard human body model in an initial posture; inputting the obtained groups of base and skeleton parameters into a standard three-dimensional human body model for fitting; obtaining a three-dimensional human body model with the same body type as the target human body; completing driving the bone to move from the initial posture to the target posture; and obtaining a three-dimensional target human body model grid with the same posture as the target human body two-dimensional image. The invention optimizes the fitting of the standard human body model and the target human body model to form an animation series from the initial posture to the target posture, and optimizes the process of driving the skeleton to move by using a variable speed frame interpolation method, so that the whole fitting process reaches good balance in time and effect.

Description

Human body model driving method, device and storage medium

Technical Field

The invention belongs to the field of human body three-dimensional model modeling, and particularly relates to a human body model driving method, a device and a storage medium, in particular to a standard human body model and target human body model driving method, device and storage medium based on an interpolation frame technology.

Background

With the development of internet technology, online shopping is more and more popular. Compared with shopping in a physical store, online shopping has the advantages of multiple commodity types, convenience in shopping and the like. However, there are some problems that are not easy to solve when purchasing commodities on the internet, and most importantly, the commodities to be purchased cannot be checked on the spot. The problem of clothing is most prominent in all commercial varieties. Compared with the method that the clothes effect can be changed and checked in real time in the shopping of a physical store, the online clothes shopping can not provide an effect picture aiming at a consumer, only can provide a picture of model fitting, and even has no fitting picture, so that the consumer can not intuitively obtain the matching degree of the clothes and the body figure of the consumer in real time. Resulting in a large amount of returns.

In response to this problem, operators have attempted to solve this problem by providing simulated fitting effects for consumers using virtual fitting techniques. Of course, there are other situations in reality where virtual fitting and changing techniques can be used, such as in network games. Therefore, this technology has been developed more rapidly.

The virtual fitting refers to a technical application that a user can check the dressing change effect on a terminal screen in real time without actually changing clothes with the wearing effect. The existing dressing change technology mainly comprises a plane fitting technology and a three-dimensional virtual fitting technology. The former basically collects pictures of users, collects pictures of clothes, and then cuts and splices the pictures to form images after dressing, but the images have poor reality due to a simple and rough image processing mode, the actual body type of the users is not considered at all, and the requirements of the users cannot be met only by carrying and hardbanding clothes on the pictures of the users. The latter usually collects the three-dimensional information of the person through a three-dimensional collecting device and combines the characteristics of the clothes, or manually inputs the body data information provided by the user, virtually generates a human body three-dimensional model according to a certain rule, and then combines the human body three-dimensional model with the clothes map. Overall, such three-dimensional virtual fitting requires a large amount of data acquisition or three-dimensional data calculation, and has high hardware cost and is not easy to popularize among ordinary users.

With the development of cloud computing technology, artificial intelligence technology and intelligent terminal processing capacity, a two-dimensional virtual fitting technology is generated. Such techniques essentially comprise three steps: (1) processing the personal body information provided by the user to obtain a target human body model; (2) processing the clothing information to obtain a clothing model; (3) the human body model and the clothing model are fused together to generate a simulated figure of the clothing worn by a person.

However, due to the accumulation of many uncertain factors such as the process design, the model parameter selection, the neural network training method, and the like, the quality of the finally generated clothes changing picture is not as good as that of the traditional three-dimensional virtual fitting technology, wherein the fitting of the human body model is the basic step, and the subsequent dressing process also needs to be based on the previously generated human body model, so once the human body model is generated inaccurately, the problems of overlarge body shape difference between the human body model and the fitting person, skin texture loss, body part loss, and the like are easily caused, and the effect of the finally generated clothes changing picture is influenced.

In the general field of computer vision, there are many initial starting points for human body modeling, which generally include three major categories, namely omni-directional scanning of a real human body by using a 3D scanning device, a three-dimensional reconstruction method based on multi-view depth-of-field photography, and a method of combining a given image with a human body model to achieve three-dimensional reconstruction. The 3D scanning equipment is used for carrying out omnibearing scanning on a real human body to obtain the most accurate information, but the equipment is expensive usually and needs high cooperation of a human body model, and the whole processing process has high requirements on the processing equipment, so the equipment is generally applied to some professional fields; secondly, the multi-view three-dimensional reconstruction method needs to provide images with multiple overlapped views of a reconstructed human body and establish a space conversion relation among the images, multiple groups of cameras are used for shooting multiple images, a 3D model is spliced, the operation is relatively simplified, the calculation complexity is still high, and in most cases, only people participating in the scene can obtain multi-angle images. A model obtained by splicing the pictures taken by the depth camera in the multi-angle shooting method does not have body scale data and cannot provide a basis for 3D perception. Thirdly, only one image needs to be provided by the method of combining a single image with the human body model, the weight and the threshold value which can be used for describing curves of the neck, the chest, the waist, the hip and other parts of the human body are obtained by the neural network based three-dimensional human body characteristic curve intelligent generation method based on the neural network, and then the predicted human body model can be obtained by directly generating the human body three-dimensional curve which is matched with the real human body shape according to the size parameter information of the girth, the width, the thickness and the like of the human body section. However, the method still needs to consume a large amount of calculation due to a small amount of input information, so that the final model effect is not satisfactory.

Based on the internet technology and the network environment characteristics of the internet technology, the mode of directly outputting the final human body model from a single image is undoubtedly preferable, the convenience is the best, and the user does not need to visit the scene and only needs one photo to complete the whole clothes changing process. The problem then comes to be that it will become the mainstream as long as it can be guaranteed that the resulting photo effect obtained by it is substantially equivalent to the real 3D simulated dressing change. Among them, it is important to obtain a human body model closest to the real state of the human body through one photograph.

In the prior art, methods for constructing a human body model generally have several types: (1) the method is based on regression, a human body model represented by voxels is reconstructed through a convolutional neural network, the algorithm firstly estimates the position of a main joint point of a human body according to an input picture, then in a given voxel grid with a specified size according to the position of a key point, and the shape of the reconstructed human body is described by the whole shape of the internally occupied voxels according to whether each unit voxel in the voxel grid is occupied or not; (2) the method comprises the steps of roughly marking simple human skeleton key points on an image, and then carrying out initial matching and fitting on a human model according to the rough key points to obtain the approximate shape of the human body. (3) Representing the human skeleton by 23 skeleton nodes, then representing the posture of the whole human body by the rotation of each skeleton node, simultaneously representing the shape of the human body by 6890 vertex positions, giving the positions of the skeleton nodes in the fitting process, and simultaneously fitting the parameters of the shape and the posture so as to reconstruct the three-dimensional human body; or the CNN model is used for predicting key points on the image, and then the SMPL model is used for fitting to obtain an initial human body model. And then, the shape parameters obtained by fitting are used for back and forth normalizing the bounding boxes of the individual body joints, one bounding box corresponds to each joint, and the bounding boxes are represented by axial length and radius. And finally, combining the initial model and the bounding box obtained by regression to obtain the three-dimensional human body reconstruction. The method has the problems of low modeling speed, insufficient modeling precision and strong dependence on the created body and posture database on the reconstruction effect.

In the prior art, a human body modeling method based on body measurement data is disclosed, as shown in fig. 1, the method includes: acquiring body measurement data; performing linear regression on a pre-established human body model through a pre-trained prediction model according to the body measurement data, and fitting to obtain a prediction human body model, wherein the pre-established human body model comprises a plurality of groups of pre-defined marking feature points and corresponding standard shape bases, and the body measurement data comprises measurement data corresponding to each group of marking feature points; and obtaining a target human body model according to the prediction human body model, wherein the target human body model comprises measurement data, a target shape base and a target shape coefficient. However, this method has very high requirements for the body measurement data, including body length data and girth data, such as height, arm length, shoulder width, leg length, calf length, thigh length, foot length, head circumference, chest circumference, waist circumference, thigh circumference, etc., and not only the measurement but also the calculation. The calculation amount is actually saved, but the user experience is very poor, and the procedure is very complicated. In addition, the training mode of the SMPL model is referred to in the training of the human body model.

The SMPL model is a parameterized human body model, is a human body modeling method proposed by Mapu, and can carry out arbitrary human body modeling and animation driving. The biggest difference between the method and the traditional LBS is that the method for imaging the body surface morphology of the human body posture can simulate the protrusion and the depression of human muscles in the limb movement process. Therefore, the surface distortion of the human body in the motion process can be avoided, and the shapes of the muscle stretching and contraction motion of the human body can be accurately depicted. In the method, beta and theta are input parameters, wherein beta represents 10 parameters of human body with high and low fat and thin body, head-to-body ratio and the like, and theta represents 75 parameters of human body overall motion pose and 24 joint relative angles. However, the core of the model generation method is the accumulation of a large amount of training data to obtain the relationship between the body type and the shape bases, but the relationship has strong correlation, each shape base cannot be independently controlled, and decoupling operation is not easy to perform, for example, a certain correlation relationship also exists between arms and legs, the legs theoretically move along with the movement of the arms when the arms move, and improvement aiming at different characteristic body types on the SMPL model is difficult to realize.

The second prior art discloses a 3D human body modeling method based on a single photo, which comprises the following steps: acquiring a photo, analyzing the photo, marking key points of a human body in the photo, and calculating space coordinates of the key points; acquiring the distance between a skeleton point in a pre-created standard human body model and a key point in a picture, aligning the skeleton point and the key point, and generating a basic human body model; acquiring a basic chartlet in a pre-created standard human body model, calculating a difference value between the basic chartlet and skin texture of a human face in a photo, and fusing by using an edge channel to generate basic texture data; and generating a 3D human body model according to the basic human body model and the basic texture data. 3D human body modeling is realized through a picture, and the model is supported by a skeleton and a muscle system, so that expressions and actions can be generated. However, in the method, after the distances between the key points of the user picture and the key points of the standard mannequin are matched, the distances are adjusted to achieve the posture of the target human body, and then the final human body model can be obtained after difference calculation and fusion are carried out through skin textures in the basic chartlet and the picture.

The third prior art discloses a method for generating a three-dimensional human body model, which comprises the following steps: acquiring a two-dimensional human body image; inputting the two-dimensional human body image into a three-dimensional human body parameter model to obtain a three-dimensional human body parameter corresponding to the two-dimensional human body image; inputting the training sample into a neural network for training to obtain a three-dimensional human parameter model, wherein the training sample comprises: inputting the standard two-dimensional human body image in the training sample into the neural network to obtain a predicted three-dimensional human body parameter corresponding to the standard two-dimensional human body image; adjusting a three-dimensional flexible deformable model according to the predicted three-dimensional human body parameters to obtain a predicted three-dimensional human body model; and obtaining the position of the predicted joint point in the standard two-dimensional human body image through reverse mapping according to the position of the joint point in the predicted three-dimensional human body model. In the modeling mode, only joint parameters are used for judging by using a model and finally parameters output by a neural network, and then the parameters are subjected to detail adjustment consistent with the target human posture by using the mature body type of the SMPL model, although the calculated amount is reduced, because the input parameters are less and the adjustment can be completed only on the basis of the SMPL prediction model, the human model which is particularly ideal and highly consistent with the target human posture is difficult to output.

Therefore, in order to match with the development trend of the internet industry, in the subdivision field of virtual fitting, the minimum input information, the minimum calculation amount and the best effect are three basic targets which are always pursued. An optimal balance point needs to be found among the three, and the human body modeling method which can achieve simple input, has the calculated amount not exceeding the bearing capacity of the terminal equipment and has the effect close to that of professional equipment is provided.

Disclosure of Invention

Based on the above problems, the present invention provides a driving method, apparatus and storage medium of a manikin that overcome the above problems.

The invention provides a driving method of a human body model, which comprises the following steps: establishing a standard human body model in an initial posture; acquiring posture and body type parameters of a target human body model, wherein the three-dimensional human body posture and body type parameters correspond to bones and a plurality of basic parameters of a three-dimensional standard human body model; inputting the obtained groups of base and skeleton parameters into a standard three-dimensional human body model for fitting; completing driving the bone to move from the initial posture to the target posture; and obtaining a three-dimensional target human body model grid with the same posture as the target human body two-dimensional image.

Preferably, the driving process further includes obtaining position coordinates of the initial posture and the target posture; the initial attitude parameters are obtained by the initial parameters of the standard mannequin model, and the bone information of the target attitude is obtained by the regression prediction of the neural network model. The driving process further includes generating an animation sequence that moves from the initial pose to the target pose. After the initial state of the skeleton information and the target posture state parameters are obtained, a skeleton information time sequence from the initial posture to the target posture is formed in a frame interpolation mode of linear interpolation or nearest neighbor interpolation, and in the driving process, a father skeleton node is driven first, and then a son skeleton node is driven. In the process of generating the animation sequence, a grid frame inserting mode is adopted for processing, after each frame drives the skeleton to move, the vertex of the human body model in the current state, namely the plane information, is obtained through the calculation of the weight parameters of the standard human body model, and the grid state of the current human body model is updated, recorded and stored.

Preferably, the frame interpolation speed is a non-uniform frame interpolation speed, and is set to be slow in position from an initial point and a target point and fast in the middle movement process, that is, the single frame movement amplitude is small in the initial action and the ending action, and the movement amplitude is large in the middle movement process.

Preferably, in the process of generating the animation sequence, the driving speed is reduced when the frame is inserted to the end of the motion, and the whole animation sequence is obtained by keeping a plurality of frames when the frame is moved to the final target posture.

Preferably, the standard human body model is a three-dimensional standard human body model constructed by combining mathematical models; the three-dimensional standard human body model has a mathematical weight relation between skeleton points and a model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture; the three-dimensional standard human body model is defined by a plurality of body base parameters and a plurality of skeleton parameters, the body bases form the whole human body model grid, and each body base is independently controlled and changed by the base parameters without mutual influence.

Preferably, the step of obtaining parameters of the target human body model further comprises, 1) obtaining a two-dimensional image of the target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

Preferably, the method also comprises the steps of acquiring the two-dimensional human body contour image by using a target detection algorithm, wherein the target detection algorithm is a target area rapid generation network based on a convolutional neural network; before the two-dimensional human body image is input into the first neural network model, the method further comprises a process of training the neural network, the training sample comprises a standard two-dimensional human body image marking the position of an original joint point, and the position of the original joint point is marked on the two-dimensional human body image with high accuracy by manual work.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any of the preceding claims.

An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for implementing any of the above method steps when executing a program stored in the memory.

The invention has the beneficial effects that:

1. the model fitting speed is high, and the posture reduction degree after fitting is high. In order to fit the target human body posture vividly, in the change process from an initial posture (position) to a target posture, an optimized frame interpolation method is adopted for completion, compared with the traditional frame interpolation method, the skeleton information of the target posture is obtained by model regression prediction, and meanwhile, an animation sequence moving from the initial posture to the target posture is generated; and forming a bone information time sequence from the initial posture to the target posture through frame interpolation modes such as linear interpolation, nearest neighbor interpolation and the like. In the process of generating the animation sequence, processing in a grid mesh frame inserting mode; the frame interpolation speed is set to be slow in the positions of the front and back distance initial points and the target point, and the middle movement process is fast; more importantly, the model is static for a plurality of frames when reaching the final target posture, so that the model obtains an effective buffer after moving at a high speed and before being static, further obtains the whole complete animation sequence, and the posture fitting accuracy of the human body model is higher. Compared with the method of inserting frames at a constant speed, the method is closer to the real physical world motion law, the simulation effect of the clothes and the human body posture is better, and the equivalent processing time can be reduced.

2. The user operation is simple. The invention provides a method for analyzing a whole-body picture of a human body through a deep neural network to obtain accurate human body three-dimensional model parameters, only one common picture is needed to quickly model the human body, the characteristics and the trend of the Internet era are well adapted, and the method is simple and fast. The user does not need any preparation, and uploading a photo is all the work the user needs to complete. If the invention is applied to scenes such as entertainment small programs or network shopping, the experience and the viscosity of the user can be greatly enhanced. The 3D model obtained without a depth of field camera or a plurality of groups of cameras corresponds to the real shape of the human body, and provides a wide application scene for various industries, such as clothes, health and the like.

3. A hierarchical deep neural network is used with high frequency creativity. The invention fully utilizes the advantages of the deep learning network and can restore the posture and the body type of the human body in various complex scenes with high precision. Different neural networks are respectively used for different purposes, and the neural network models with different input conditions and training modes are utilized, so that accurate contour separation of the human body under a complex background, semantic segmentation of the human body and determination of key points and joint points are realized, the influence of loose clothes and hairstyle is eliminated, and the real body type and shape of the human body are approached to the maximum extent. In the prior art, a neural network model is also used, but the functions and functions of the neural network model are greatly different due to different input conditions, input parameters and training modes.

4. The neural network model is more scientific and targeted. In the prior art, some image processing methods are too pursuit to simply straighten out a model, time is not spent on polishing the details of the model, mapping from a 2D picture to a 3D body model is completed purely through training of mass image data, although the efficiency is high, the processing flow is too simple, a three-dimensional human body model is generated completely by depending on a neural network model, the consistency and the effect of the proportion and the detail part of the body are not satisfactory, and the subsequent further processing is not helpful at all, and the method can become an obstacle which is difficult to be crossed by a subsequent program. The human body contour, the human body semantic segmentation, the key points and the joint points of the neural network at the previous stage can be used as input items, model parameters can be generated from multiple angles, parameters output by the neural network at the next stage comprise two categories of a posture pos and a body shape, actions and the body shape can be controlled respectively, and the posture and the body shape of the human body model can be accurately copied by combining the reference model.

5. The human body model is accurate and controllable. The currently popular human body reconstruction methods based on single images are mainly divided into the reconstruction of parameterized human body models. The most commonly used parameterized model is the mapau SMPL model, which contains two sets of 72 parameters for describing body posture and body size. Aiming at the problem of single picture reconstruction, the position of a two-dimensional joint is estimated from a picture, and then the SMPL parameter is obtained by optimizing through the minimum projection distance between a three-dimensional joint and a two-dimensional plane joint, so that the human body is obtained. However, the SMPL model is mainly subjected to deep learning and training through a large number of human body model examples, the relationship between the body shape and the shape base is an overall association relationship, the decoupling difficulty is high, the body part to be controlled cannot be controlled at will, the generated model cannot achieve high consistency with the real human body posture and the body shape, and in addition, if the SMPL model is further applied to the subsequent dressing process, the representation capability of the geometric details of the human body surface is limited, and the detailed texture of the clothes on the human body surface cannot be well reconstructed. However, the human body model is not obtained through training, and the parameters have corresponding relations based on the mathematical principle, that is, the parameters of each group are independent without mutual involvement, so that the model is more explanatory in the transformation process and can better represent the shape change of a certain part of the body. Generally speaking, the human body is in the shape of thousands of people, the proportion of thighs and shanks of many people does not meet a certain accurate proportion, and the model can control the thighs and the shanks and adjust the lengths of the thighs and the shanks respectively by controlling input parameters so as to accurately determine the proportion of the legs.

6. Is more suitable for the body types of Asians. Body modeling typically involves the design of a number of standard body models, so-called standard body models or base mannequins. The control of the human body, namely the control from the initial posture to the target posture can be realized through the self-built standard human body model, and the part of work is the basis that the clothes after being finished change along with the change of the human body posture. The specific process of the clothing to reach the target posture along with the human body can be calculated only if the human body accurately reaches the target posture. In the process, a set of standard human body models (standard mannequins) which are more in line with the Asian human body types are built by self instead of using a Marcap SMPL model and training a plurality of basic human body models by relying on European human body type data. The set of human tables preferably can comprise 170 skeleton and 20 physique-based parameters, and greatly enrich the detailed part of the human body model, and express the details beyond the SMPL model. And in combination with the characteristics of independent control of each base, each part of the mannequin can be independently and accurately controlled and modified according to requirements, so that the effect of more attractive appearance of each mannequin is achieved. In addition, the local part of the mannequin is manually adjusted in a later period, such as the number of top points and the number of faces, which are functions which cannot be completed by other models represented by the SMPL model. Except that the model height can be adjusted accurately, other types such as the size is fat thin, arm length, the proportion of shank, waist length and waistline etc. all can carry out accurate control to make the people's platform more conform to user's size.

The invention optimizes the fitting of the standard human body model and the target human body model to form an animation series from the initial posture to the target posture, and optimizes the process of driving the skeleton to move by using a variable speed frame interpolation method, so that the whole fitting process reaches good balance in time and effect. In addition, a physique base suitable for the characteristics of the Asian physique is selected through the standard human body model, and a three-dimensional human body model which is closer to the Asian physique than the SMPL model of the Mapuji and has better independent operation and controllability is generated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a flowchart of a human model fitting process according to an embodiment;

FIG. 2 is a process flow diagram of a model parameter acquisition module of an embodiment;

FIG. 3 is a flow diagram of a complete modeling process of one embodiment;

FIG. 4 is a schematic diagram of the system of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The following describes a method for processing a human body image according to an embodiment of the present invention in detail with reference to the accompanying drawings.

As shown in fig. 3, an embodiment of the present invention provides a method for driving a human body model, where the method includes: establishing a standard human body model in an initial posture; acquiring posture and body type parameters of a target human body model, wherein the three-dimensional human body posture and body type parameters correspond to bones and a plurality of basic parameters of a three-dimensional standard human body model; inputting the obtained groups of base and skeleton parameters into a standard three-dimensional human body model for fitting; obtaining a three-dimensional human body model with the same body type as the target human body; completing driving the bone to move from the initial posture to the target posture; and obtaining a three-dimensional target human body model grid with the same posture as the target human body two-dimensional image. Through advanced learning and simulated motion setting, the calculation amount and the complexity degree of the three-dimensional human body model during generation can be greatly reduced, and the effect far exceeding the reality degree of the human body model generated by the existing 2D picture can be achieved.

Firstly, the invention discloses a human body model modeling method which roughly comprises three steps.

The first part is to process the acquired human body image to obtain the parameter information needed to generate the human body model. Previously, the selection of these skeletal key points is usually performed manually, but this method is inefficient and not suitable for the requirement of fast pace in the internet era, so that today when the neural network is in the way, it is a trend to use the deep-learning neural network to replace the manual selection of the key points. However, how to efficiently utilize the neural network is a problem that needs further research. In general, the idea of secondary neural network plus data refinement is adopted to construct the parameter acquisition system. As shown in fig. 2, we use a deep-learning neural network to generate these parameters, which mainly includes the following sub-steps: 1) acquiring a two-dimensional image of a target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

The two-dimensional image of the target human body may be a two-dimensional image including a human body image in any posture and in any dressing. The acquisition of the two-dimensional human body contour image utilizes a target detection algorithm, which is a target area fast generation network based on a convolutional neural network.

Before the two-dimensional human body image is input into the first neural network model, the method further comprises a process of training the neural network, the training sample comprises a standard two-dimensional human body image marking the position of an original joint point, and the position of the original joint point is marked on the two-dimensional human body image with high accuracy by manual work. Here, a target image is first acquired, and human body detection is performed on the target image using a target detection algorithm. Human detection is not the detection of a real human body by using a measuring instrument, but in the invention, the actual detection means that for any given image, usually a two-dimensional picture containing enough information, such as a human face, the four limbs and the body requirements of a human are all included in the picture. Then, a certain strategy is adopted to search the given image so as to determine whether the given image contains the human body, and if the given image contains the human body, parameters such as the position and the size of the human body are given. In this embodiment, before acquiring key points of a human body in a target image, human body detection needs to be performed on the target image to acquire a human body frame indicating a human body position in the target image, and since an image input by a user can be any image, there are inevitable backgrounds of some non-human body images, such as a table chair, a large-tree automobile building, and the like, and these useless backgrounds are removed through some mature algorithms.

Meanwhile, semantic segmentation, joint point detection, bone detection and edge detection are carried out, and good foundation can be laid for generating a 3D human body model later by collecting the 1D point information and the 2D surface information. A first stage neural network is used to generate a map of the joints of the human body, alternatively, a target detection algorithm may rapidly generate a network for a target area based on a convolutional neural network. The first neural network needs to carry out massive data training, some photos collected from the network are labeled by manpower, then the photos are input into the neural network for training, the neural network through deep learning can basically achieve the purpose that the joint point graph with the same accuracy and effect as those of the artificially labeled joint points can be immediately obtained after the photos are input, and meanwhile, the efficiency is tens of times or even hundreds of times that of the artificially labeled joint points.

In the invention, the position of the joint point of the human body in the picture is obtained, only the first step is completed, 1D point information is obtained, and 2D surface information is generated according to the 1D point information, and the work can be completed through a neural network model and a mature algorithm in the prior art. The invention redesigns the working process and intervention time of the neural network model, reasonably designs various conditions and parameters, makes the parameter generation work more efficient, reduces the degree of manual participation, is very suitable for the internet application scene, for example, in the virtual reloading program, the user can obtain the reloading result in a basically instant manner without waiting, and plays a vital role in improving the attraction of the program to the user.

After the relevant 1D point information and 2D surface information are obtained, the parameters or results, namely a relevant node map, a semantic segmentation map, body skeleton points and/or key point information of a target human body can be used as input items to be substituted into a second neural network subjected to deep learning to carry out regression of human body posture and body type parameters. Through the regression calculation of the second neural network, a plurality of groups of three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters can be immediately output. Preferably, the loss function of the neural network is designed based on a three-dimensional standard human body model (base human body model), a predicted three-dimensional human body model, a standard two-dimensional human body image in which the positions of the original joint points are labeled, and a standard two-dimensional human body image including the positions of the predicted joint points.

The second part is that the pre-design models some of the underlying mannequins. The main working contents are as follows: and combining the mathematical model to construct a three-dimensional standard human body model, namely a standard human body model or a basic mannequin. The SMPL human body model of Mapu can avoid surface distortion of a human body in the motion process, and can accurately depict the shapes of muscle stretching and contraction motions of the human body. In the method, beta and theta are input parameters, wherein beta represents 10 parameters of human body with high and low fat and thin body, head-to-body ratio and the like, and theta represents 75 parameters of human body overall motion pose and 24 joint relative angles. The beta parameter is a shape Blend position parameter, the shape change of the human body can be controlled through 10 incremental templates, and specifically, the change of the shape of the human body controlled by each parameter can be depicted through a dynamic graph. By studying the continuous animation of parameter change, we can clearly see that the continuous change of each control human body form parameter can cause local and even whole linkage change of the human body model, and in order to reflect the movement of human muscle tissues, the linear change of each parameter of the SMPL human body model can cause large-area grid change. Figuratively speaking, for example, when adjusting the parameter of β 1, the model may directly understand the parameter change of β 1 as the whole change of the body, and you may only want to adjust the proportion of the waist, but the model may force the fat and thin of the legs, chest and even hands to adjust together. Although the working mode can greatly simplify the working process and improve the efficiency, the project pursuing the modeling effect is really very inconvenient. Because the SMPL human body model is a model which is trained by Western body pictures and measurement data and accords with the body type of a Western person, the body change rule basically accords with the common change curve of the Western person, and when the SMPL human body model is applied to modeling of a human body model of an Asian person, a plurality of problems can occur, such as the proportion of arms and legs, the proportion of waist and body, the proportion of neck, the length of legs, the length of arms and the like. Through our research, the aspects have large difference, and if the SMPL human body model is used in a hard way, the final generation effect can not meet our requirements.

Therefore, the effect is improved by adopting a human body model self-made mode. The core of the method is that a human body blend body type base is built to realize accurate independent control of a human body. The three-dimensional standard human body model has a mathematical weight relation between skeleton points and a model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture; the three-dimensional standard human body model is defined by a plurality of body base parameters and a plurality of skeleton parameters, the body bases form the whole human body model grid, and each body base is independently controlled and changed by the base parameters without mutual influence. Preferably, the three-dimensional standard human body model (basic human body platform) is composed of parameters of 20 physique bases and 170 skeleton parameters. So-called accurate control, on the one hand has increased the parameter of control, does not continue to use ten beta control parameters of mapplet, and like this, the parameter that can adjust is except general fat thin, has still added the length of arm, the length of shank, the fat thin of waist, buttock and chest etc. has improved the parameter more than one time in the aspect of the bone parameter, has richened the scope that can adjust the parameter greatly, provides good basis for the design standard manikin that becomes more meticulous. The independent control means that each base is independently controlled, such as waist, legs, hands, head and the like, each skeleton can be independently adjusted in length and is independent from each other, and physical linkage is not generated, so that fine adjustment of the human body model can be better realized. The model is no longer foolproof and cannot be adjusted to the form satisfied by the designer. The existing model embodies a corresponding relation on the mathematical principle, which is actually equivalent to that the model is redesigned from two parts of artificial aesthetics and data statistical analysis, so that the model is generated into a correct model which is considered to be in accordance with the body type of the Asian person according to the design rule of the model, and the model is obviously different from a big data training model of the SMPL human model, so that the parameter transformation of the model is more interpretable, the local body change of the human model can be better represented, in addition, the change is based on the mode of the mathematical principle, the influence among all parameters is avoided, and the completely independent state is kept between arms and legs. In fact, such many different parameters are designed, so that the defect of training a human body model by big data can be avoided, the human body model is accurately controlled in more dimensions, the parameters are not limited to some indexes such as height and the like, and the modeling effect is greatly improved. Only on the premise of self-building a form base, the setting of such many independent control parameters has practical significance, and the two are not available for meeting the requirements of designers on the standard.

The third part is fitting the parameters of the manikin to the manikin. As shown in fig. 1, the method comprises the following substeps of corresponding the obtained three-dimensional human body posture and body type parameters to a plurality of basic and skeleton parameters of a three-dimensional standard human body model; inputting the obtained groups of base and skeleton parameters into a standard three-dimensional human parameter model for fitting; obtaining a three-dimensional human body model with the same body type as the target human body; completing driving the bone to move from the initial posture to the target posture; and obtaining a three-dimensional target human body model grid with the same posture as the target human body two-dimensional image.

The three-dimensional human body model has a mathematical weight relation between the skeleton points and the model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture. In this section, the two parameters generated in the previous section are used to substitute the pre-designed human body model for the construction of the 3D human body model. The two types of parameters are similar to the names of the human body SMPL model parameters of Mapu, but the actual contained contents are different greatly. Because the basis of the two models is different, namely, the self-made three-dimensional standard human body model (basic mannequin) is adopted in the method, the SMPL model of Mapu adopts the standard human body model generated by big data training, the two models have different generation and calculation modes, and although the two models are finally embodied as the generated 3D human body model, the connotation is larger. After this step, a preliminary 3D phantom is obtained, including the bone positions and the mesh (mesh) of the phantom with long short messages.

In this section, the mannequin also performs a change from the initial pose to the target pose. Since we input only one photo, the target body posture on the photo is usually different from the basic body posture, and then, in order to fit the target body posture, the change from the initial posture to the target posture is completed. When a plurality of groups of basic and skeletal parameters are fit-driven in a standard three-dimensional human parameter model in order to simulate the motion state of the model more vividly, the method also comprises the following steps,

1) obtaining position coordinates of an initial posture and a target posture; the initial attitude parameters are obtained by the initial parameters of the standard mannequin model, and the bone information of the target attitude is obtained by the regression prediction of the neural network model.

2) Generating an animation sequence moving from an initial pose to a target pose; after the initial state of the bone information and the state parameter of the target posture are obtained, a time sequence of the bone information from the initial posture to the target posture is formed through frame interpolation modes such as linear interpolation, nearest neighbor interpolation and the like. In the driving process, the two modes of global linear interpolation, early-movement father node re-driving son nodes and the like can be divided according to the number of the bones driven by each frame, the driving state in the physical world is considered, the mode of the later mode, early-movement father node and re-driving son node is adopted, the animation sequence action interpolated in this way is more fit with the real physical world, and the simulated effect is better.

3) In the process of generating the animation sequence, processing in a grid mesh frame inserting mode; after each frame drives the skeleton to move, the vertex of the human body model in the current state, namely the face information, is obtained through calculation of the weight parameters of the standard human body platform, the current human body model mesh state is updated, recorded and stored, which is a key step for ensuring the model reduction degree, and the unexpected deformation and distortion of the human body model can be reduced to the acceptable degree.

4) The frame interpolation speed is set to be slow in the positions of the front and back distance initial points and the target point and fast in the middle movement process; the method adopts a non-uniform frame interpolation rate, namely a mode that the moving amplitude of a single frame is small in the process of initial action and ending action and the moving amplitude of the moving intermediate process is large. The initial state of the physical action in the simulated real world has a certain acceleration process, the higher inter-frame displacement distance is kept in the motion process, and the driving speed is reduced to the end of the motion.

5) And (5) keeping static for a plurality of frames when the target gesture is driven to the final target gesture, and obtaining the whole animation sequence. Compared with the method of inserting frames at a constant speed, the method is closer to the real physical world motion law, so that after the model moves at a high speed, an effective buffer is obtained before the model is static, the whole complete animation sequence is further obtained, an area with dense inserted frames is formed near the final posture, the posture fitting accuracy of the human body model is higher, and the simulated effect is better.

6) Completing driving the bone to move from the initial posture to the target posture; at this point, the standard mannequin has become a mannequin that substantially conforms to the target mannequin pose and body type, with a natural pose and without perforating the model.

Since the data of the skeleton and the information and the data of the mesh are obtained, the driving of the skeleton becomes easier under the condition, an LBS algorithm (skeleton covering) and a DQS algorithm can be adopted, and certainly, the collision body is also considered, because the model of the standard mannequin is in a three-dimensional standard posture, unreasonable interpenetration can occur between the meshes of the human body model due to the change from an initial posture to a target posture, and the defect of interpenetration can not occur between the meshes only by combining the collision body.

The method of generating a three-dimensional human model according to an embodiment of the present invention described in conjunction with fig. 1 to 3 may be implemented by a human image processing apparatus. Fig. 4 is a diagram illustrating a hardware configuration 300 of an apparatus for processing a human body image according to an embodiment of the present invention.

The invention also discloses a computer readable storage medium, in which a computer program is stored, which, when executed by a processor, implements the driving method and steps described above.

The electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus; a memory for storing a computer program; and the processor is used for realizing the driving method and the steps when executing the program stored in the memory.

As shown in fig. 4, the apparatus 300 for implementing human body fitting in the present embodiment includes: the device comprises a processor 301, a memory 302, a communication interface 303 and a bus 310, wherein the processor 301, the memory 302 and the communication interface 303 are connected through the bus 310 and complete mutual communication.

In particular, the processor 301 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured as one or more integrated circuits implementing an embodiment of the present invention.

Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include an HDD, a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the human image processing apparatus 300, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory. In a particular embodiment, the memory 302 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The communication interface 303 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiment of the present invention.

The bus 310 includes hardware, software, or both to couple the components of the apparatus 300 for processing human body images to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

That is, the apparatus 300 for processing a human body image shown in fig. 4 may be implemented to include: a processor 301, a memory 302, a communication interface 303, and a bus 310. The processor 301, memory 302 and communication interface 303 are coupled by a bus 310 and communicate with each other. The memory 302 is used to store program code; the processor 301 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 302 for performing the method of three-dimensional mannequin fitting in any of the embodiments of the present invention, thereby implementing the three-dimensional mannequin fitting method and apparatus described in connection with fig. 1-3.

The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium is stored with computer program instructions; the computer program instructions, when executed by a processor, implement the method for processing human body images provided by the embodiments of the present invention.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A method of driving a mannequin, the method comprising:

1) establishing a standard human body model in an initial posture;

2) acquiring posture and body type parameters of a target human body model, wherein the three-dimensional human body posture and body type parameters correspond to bones and a plurality of basic parameters of a three-dimensional standard human body model;

3) inputting the obtained groups of base and skeleton parameters into a standard three-dimensional human body model for fitting;

4) completing driving the bone to move from the initial posture to the target posture;

5) and obtaining a three-dimensional target human body model grid with the same posture as the target human body two-dimensional image.

2. The method of claim 1, wherein the driving process further comprises obtaining position coordinates of an initial pose and a target pose; the initial attitude parameters are obtained by the initial parameters of the standard mannequin model, and the bone information of the target attitude is obtained by the regression prediction of the neural network model.

3. The method of claim 1, wherein the driving process further comprises generating an animation sequence that moves from an initial pose to a target pose.

4. The method according to claim 3, wherein after obtaining the initial state of the skeleton information and the target pose state parameters, a time series of the skeleton information from the initial pose to the target pose is formed by frame interpolation using linear interpolation or nearest neighbor interpolation, and during the driving, the parent skeleton node is driven first, and then the child skeleton nodes are driven.

5. The method of claim 3, wherein in the process of generating the animation sequence, a mesh frame interpolation mode is adopted for processing, after each frame drives the skeleton to move, the vertex (namely face) information of the manikin in the current state is obtained through the calculation of the weight parameters of the standard manikin, and the mesh state of the current manikin is updated, recorded and stored.

6. The method according to claim 3, wherein the frame interpolation speed is a non-uniform frame interpolation speed, and is set to be slow in position from the initial point and the target point and fast in the middle movement process, that is, the moving amplitude of a single frame is small in the initial action and the ending action, and the moving amplitude in the middle movement process is large.

7. The method of claim 3, wherein during the generation of the animation sequence, the driving speed is reduced from frame insertion to the end of the motion, and the entire animation sequence is obtained by keeping a few frames still while moving to the final target pose.

8. The method of claim 1, wherein the standard mannequin is a three-dimensional standard mannequin constructed in conjunction with a mathematical model; the three-dimensional standard human body model has a mathematical weight relation between skeleton points and a model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture; the three-dimensional standard human body model is defined by a plurality of body base parameters and a plurality of skeleton parameters, the body bases form the whole human body model grid, and each body base is independently controlled and changed by the base parameters without mutual influence.

9. The method of claim 1, wherein the step of obtaining parameters of the target human body model further comprises, 1) obtaining a two-dimensional image of the target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

10. The method of claim 1, further comprising, acquiring the two-dimensional body contour image using a target detection algorithm that is a target area fast generation network based on a convolutional neural network; before the two-dimensional human body image is input into the first neural network model, the method further comprises a process of training the neural network, the training sample comprises a standard two-dimensional human body image marking the position of an original joint point, and the position of the original joint point is marked on the two-dimensional human body image with high accuracy by manual work.

11. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-10.

12. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus; a memory for storing a computer program; a processor for implementing the method steps of any of claims 1-10 when executing a program stored in the memory.