CN114119911A

CN114119911A - Human body model neural network training method, device and storage medium

Info

Publication number: CN114119911A
Application number: CN202010876685.1A
Authority: CN
Inventors: 张胜凯; 唐杰; 郑天祥; 闫浩男; 焦年红; 吴圣杰; 孙霁泽
Original assignee: Beijing Momo Information Technology Co ltd
Current assignee: Beijing Momo Information Technology Co ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2022-03-01

Abstract

The invention discloses a human body model neural network training method, which comprises the steps of obtaining a two-dimensional image of a target human body; constructing a three-dimensional standard human body model; acquiring three-dimensional human body parameters of a target human body model through a neural network; obtaining a three-dimensional target human body model grid with the same posture and body type as the target human body; importing the two-dimensional image and the target human body model into three-dimensional modeling software; carrying out consistency adjustment on the two-dimensional image and the target human body model by using three-dimensional modeling software; and substituting the new target human body model as an output result into the neural network to finish the training of the neural network. According to the training method of the three-dimensional human body model neural network, the generated human body model is aligned and adjusted in detail, so that the parameter precision of the deep learning neural network is greatly improved, the convergence speed is obviously accelerated, and the consistency and the reduction degree of the three-dimensional human body model during generation can be greatly improved.

Description

Human body model neural network training method, device and storage medium

Technical Field

The invention belongs to the field of human body three-dimensional modeling, and particularly relates to a training method, equipment and a storage medium of a neural network model used in human body model modeling, in particular to a method for obtaining a model training neural network model by correcting a human body model generated by the neural network model and then using the corrected human body model.

Background

With the development of internet technology, online shopping is more and more popular. Compared with shopping in a physical store, online shopping has the advantages of multiple commodity types, convenience in shopping and the like. However, there are some problems that are not easy to solve when purchasing commodities on the internet, and most importantly, the commodities to be purchased cannot be checked on the spot. The problem of clothing is most prominent in all commercial varieties. Compared with the method that the clothes effect can be changed and checked in real time in the shopping of a physical store, the online clothes shopping can not provide an effect picture aiming at a consumer, only can provide a picture of model fitting, and even has no fitting picture, so that the consumer can not intuitively obtain the matching degree of the clothes and the body figure of the consumer in real time. Resulting in a large amount of returns.

In response to this problem, operators have attempted to solve this problem by providing simulated fitting effects for consumers using virtual fitting techniques. Of course, there are other situations in reality where virtual fitting and changing techniques can be used, such as in network games. Therefore, this technology has been developed more rapidly.

The virtual fitting refers to a technical application that a user can check the dressing change effect on a terminal screen in real time without actually changing clothes with the wearing effect. The existing dressing change technology mainly comprises a plane fitting technology and a three-dimensional virtual fitting technology. The former basically collects pictures of users, collects pictures of clothes, and then cuts and splices the pictures to form images after dressing, but the images have poor reality due to a simple and rough image processing mode, the actual body type of the users is not considered at all, and the requirements of the users cannot be met only by carrying and hardbanding clothes on the pictures of the users. The latter usually collects the three-dimensional information of the person through a three-dimensional collecting device and combines the characteristics of the clothes, or manually inputs the body data information provided by the user, virtually generates a human body three-dimensional model according to a certain rule, and then combines the human body three-dimensional model with the clothes map. Overall, such three-dimensional virtual fitting requires a large amount of data acquisition or three-dimensional data calculation, and has high hardware cost and is not easy to popularize among ordinary users.

With the development of cloud computing technology, artificial intelligence technology and intelligent terminal processing capacity, a two-dimensional virtual fitting technology is generated. Such techniques essentially comprise three steps: (1) processing the personal body information provided by the user to obtain a target human body model; (2) processing the clothing information to obtain a clothing model; (3) the human body model and the clothing model are fused together to generate a simulated figure of the clothing worn by a person.

However, due to the accumulation of many uncertain factors such as the process design, the model parameter selection, the neural network training method, and the like, the quality of the finally generated clothes changing picture is not as good as that of the traditional three-dimensional virtual fitting technology, wherein the establishment of the human body model is the basic step, and the subsequent dressing process also needs to be based on the previously generated human body model, so once the human body model is generated inaccurately, the problems of overlarge body type difference between the human body model and the fitting person, skin texture loss, body part loss, and the like are easily caused, and the effect of the finally generated clothes changing picture is influenced.

In the general field of computer vision, there are many initial starting points for human body modeling, which generally include three major categories, namely omni-directional scanning of a real human body by using a 3D scanning device, a three-dimensional reconstruction method based on multi-view depth-of-field photography, and a method of combining a given image with a human body model to achieve three-dimensional reconstruction. The 3D scanning equipment is used for carrying out omnibearing scanning on a real human body to obtain the most accurate information, but the equipment is expensive usually and needs high cooperation of a human body model, and the whole processing process has high requirements on the processing equipment, so the equipment is generally applied to some professional fields; secondly, the multi-view three-dimensional reconstruction method needs to provide images with multiple overlapped views of a reconstructed human body and establish a space conversion relation among the images, multiple groups of cameras are used for shooting multiple images, a 3D model is spliced, the operation is relatively simplified, the calculation complexity is still high, and in most cases, only people participating in the scene can obtain multi-angle images. A model obtained by splicing the pictures taken by the depth camera in the multi-angle shooting method does not have body scale data and cannot provide a basis for 3D perception. Thirdly, only one image needs to be provided by the method of combining a single image with the human body model, the weight and the threshold value which can be used for describing curves of the neck, the chest, the waist, the hip and other parts of the human body are obtained by the neural network based three-dimensional human body characteristic curve intelligent generation method based on the neural network, and then the predicted human body model can be obtained by directly generating the human body three-dimensional curve which is matched with the real human body shape according to the size parameter information of the girth, the width, the thickness and the like of the human body section. However, the method still needs to consume a large amount of calculation due to a small amount of input information, so that the final model effect is not satisfactory.

Based on the internet technology and the network environment characteristics of the internet technology, the mode of directly outputting the final human body model from a single image is undoubtedly preferable, the convenience is the best, and the user does not need to visit the scene and only needs one photo to complete the whole clothes changing process. The problem then comes to be that it will become the mainstream as long as it can be guaranteed that the resulting photo effect obtained by it is substantially equivalent to the real 3D simulated dressing change. Among them, it is important to obtain a human body model closest to the real state of the human body through one photograph.

In the prior art, methods for constructing a human body model generally have several types: (1) the method is based on regression, a human body model represented by voxels is reconstructed through a convolutional neural network, the algorithm firstly estimates the position of a main joint point of a human body according to an input picture, then in a given voxel grid with a specified size according to the position of a key point, and the shape of the reconstructed human body is described by the whole shape of the internally occupied voxels according to whether each unit voxel in the voxel grid is occupied or not; (2) the method comprises the steps of roughly marking simple human skeleton key points on an image, and then carrying out initial matching and fitting on a human model according to the rough key points to obtain the approximate shape of the human body. (3) Representing the human skeleton by 23 skeleton nodes, then representing the posture of the whole human body by the rotation of each skeleton node, simultaneously representing the shape of the human body by 6890 vertex positions, giving the positions of the skeleton nodes in the fitting process, and simultaneously fitting the parameters of the shape and the posture so as to reconstruct the three-dimensional human body; or the CNN model is used for predicting key points on the image, and then the SMPL model is used for fitting to obtain an initial human body model. And then, the shape parameters obtained by fitting are used for back and forth normalizing the bounding boxes of the individual body joints, one bounding box corresponds to each joint, and the bounding boxes are represented by axial length and radius. And finally, combining the initial model and the bounding box obtained by regression to obtain the three-dimensional human body reconstruction. The method has the problems of low modeling speed, insufficient modeling precision and strong dependence on the created body and posture database on the reconstruction effect.

In the prior art, a human body modeling method based on body measurement data is disclosed, as shown in fig. 1, the method includes: acquiring body measurement data; performing linear regression on a pre-established human body model through a pre-trained prediction model according to the body measurement data, and fitting to obtain a prediction human body model, wherein the pre-established human body model comprises a plurality of groups of pre-defined marking feature points and corresponding standard shape bases, and the body measurement data comprises measurement data corresponding to each group of marking feature points; and obtaining a target human body model according to the prediction human body model, wherein the target human body model comprises measurement data, a target shape base and a target shape coefficient. However, this method has very high requirements for the body measurement data, including body length data and girth data, such as height, arm length, shoulder width, leg length, calf length, thigh length, foot length, head circumference, chest circumference, waist circumference, thigh circumference, etc., and not only the measurement but also the calculation. The calculation amount is actually saved, but the user experience is very poor, and the procedure is very complicated. In addition, the training mode of the SMPL model is referred to in the training of the human body model.

The SMPL model is a parameterized human body model, is a human body modeling method proposed by Mapu, and can carry out arbitrary human body modeling and animation driving. The biggest difference between the method and the traditional LBS is that the method for imaging the body surface morphology of the human body posture can simulate the protrusion and the depression of human muscles in the limb movement process. Therefore, the surface distortion of the human body in the motion process can be avoided, and the shapes of the muscle stretching and contraction motion of the human body can be accurately depicted. In the method, beta and theta are input parameters, wherein beta represents 10 parameters of human body with high and low fat and thin body, head-to-body ratio and the like, and theta represents 75 parameters of human body overall motion pose and 24 joint relative angles. However, the core of the model generation method is the accumulation of a large amount of training data to obtain the relationship between the body type and the shape bases, but the relationship has strong correlation, each shape base cannot be independently controlled, and decoupling operation is not easy to perform, for example, a certain correlation relationship also exists between arms and legs, the legs theoretically move along with the movement of the arms when the arms move, and improvement aiming at different characteristic body types on the SMPL model is difficult to realize.

The second prior art discloses a method for generating a three-dimensional human body model, which comprises the following steps: acquiring a two-dimensional human body image; inputting the two-dimensional human body image into a three-dimensional standard human body model to obtain three-dimensional human body parameters corresponding to the two-dimensional human body image; inputting the training sample into a neural network for training to obtain a three-dimensional standard human body model, wherein the training sample comprises: inputting the standard two-dimensional human body image in the training sample into the neural network to obtain a predicted three-dimensional human body parameter corresponding to the standard two-dimensional human body image; adjusting a three-dimensional flexible deformable model according to the predicted three-dimensional human body parameters to obtain a predicted three-dimensional human body model; and obtaining the position of the predicted joint point in the standard two-dimensional human body image through reverse mapping according to the position of the joint point in the predicted three-dimensional human body model. In the modeling mode, only joint parameters are used for judging by using a model and finally parameters output by a neural network, and then the parameters are subjected to detail adjustment consistent with the target human posture by using the mature body type of the SMPL model, although the calculated amount is reduced, because the input parameters are less and the adjustment can be completed only on the basis of the SMPL prediction model, the human model which is particularly ideal and highly consistent with the target human posture is difficult to output.

Prior art three discloses a 3D fitting platform based on machine learning, includes: (1) inputting height, weight, three-dimensional circumference, gender and skin color by a user, and shooting the photo information of the user to a 3d fitting platform by a mobile phone camera; (2) the 3d fitting platform utilizes a machine learning technology to train according to a large amount of existing real data and models of the platform, and matches a model closest to the stature of a user according to the information of height, weight, three dimensions, gender and skin color input by the user; (3) after the user purchases and sees heavy clothes on line, the user clicks the fitting and jumps to a 3d fitting platform, and the platform utilizes the 3dmax technology to carry out matching modeling on the clothes and the human body model and generate an actual effect model. Building a convolutional neural network by using a tensoflow frame, taking height, weight, three-dimensional and face shapes input by a user as x vectors, and training by using real sample data used firstly; and after a large amount of calculation, data including the shoulder width, the arm thickness, the arm length, the leg thickness, the leg length, the chest circumference input by the user and the hip circumference input by the user of the model are obtained, and the data are used as variables to perform 3d modeling in combination with a 3Dmax technology. The biggest disadvantage of this technique is that it is very difficult to obtain data of real human body, and it needs to actually measure each real sample, and the workload is very uneconomical.

In the current human body modeling field, a method for generating a human body model by using a neural network model is in a more mainstream position due to the existence of massive training pictures and the good effect of the output model. Therefore, the quality of the generated model, that is, the degree of fit with the original image becomes the most critical factor, and how to train the deep-learning neural network to achieve the optimal training effect and training efficiency becomes a problem to be solved urgently.

Disclosure of Invention

In order to match with the development trend of the internet industry, in the subdivision field of virtual fitting, particularly human body modeling, the minimum input information, the minimum calculation amount and the best effect are three basic targets which are always pursued. An optimal balance point needs to be found among the three, and the human body modeling method which can achieve simple input, has the calculated amount not exceeding the bearing capacity of the terminal equipment and has the effect close to that of professional equipment is provided. In order to solve the above problems, the present invention provides a manikin neural network training method, apparatus and storage medium that overcome the above problems.

The invention provides a human body model neural network training method, which comprises the following steps:

1) acquiring a two-dimensional image of a target human body;

2) obtaining parameters of a target human body model by using a neural network, and obtaining a three-dimensional target human body model grid according to the parameters;

3) importing the two-dimensional image and the target human body model into three-dimensional modeling software;

4) carrying out consistency adjustment on the two-dimensional image and the target human body model by using three-dimensional modeling software to obtain a corrected new target human body model;

5) substituting the new target human body model as an output result into a neural network for calculation;

6) and adjusting the model parameters by using the calculation result to finish the training of the neural network.

The training method further comprises the following steps: combining the mathematical model to construct a three-dimensional standard human body model; acquiring three-dimensional human body parameters of a target human body model through a neural network, wherein the three-dimensional human body parameters comprise three-dimensional human body action posture parameters and three-dimensional human body shape parameters; the obtained three-dimensional human body posture and body type parameters correspond to a plurality of basic and skeleton parameters of the three-dimensional standard human body model; and inputting the obtained groups of base and skeleton parameters into a standard three-dimensional standard human body model for fitting.

Preferably, the step of obtaining parameters of the target human body model further comprises, 1) obtaining a two-dimensional image of the target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

Preferably, before the two-dimensional human body image is input into the first neural network model, a process of training the first neural network is further included, the training sample includes a standard two-dimensional human body image marking original joint point positions, and the original joint point positions are marked on the two-dimensional human body image with high accuracy by manual work.

Preferably, the three-dimensional standard human body model is composed of parameters of a plurality of shape bases and parameters of a plurality of bones, the plurality of shape bases form the whole human body model, and each shape base is independently controlled and changed by the parameters of the bases without mutual influence. The three-dimensional standard human body model has a mathematical weight relation between skeleton points and a model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture.

Preferably, the consistency adjustment is performed manually. The consistency of the postures of the introduced target human body model and the two-dimensional image of the target human body is realized by adjusting the skeleton of the human body model, and the consistency of the body types is realized by adjusting the body form base of the human body model. After obtaining the corrected new target human body model, rendering three views of the human body model in software, comparing the three views with the picture, and performing supplementary correction and adjustment.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method steps of any of the preceding claims.

An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for implementing any of the above method steps when executing a program stored in the memory.

The invention has the beneficial effects that:

1. the training effect is excellent. Since we train the neural network with the most correct answer. As is well known, various algorithms of the neural network are various, various parameters are also very many, and various LOSS functions (LOSS) are various due to different project requirements, and only through continuous training and iteration, the most correct "answer" can be better approached to be produced. In practical tests, we find that for a neural network which directly obtains a three-dimensional human body model through a picture, the accuracy of the result is not very high due to various parameters and LOSs functions, especially complex human body postures. However, the prior art has no particularly good method to effectively train against errors, and the current method of replacing the output result into the model has slow convergence rate, larger contingency when parameters are directly adjusted, and unstable convergence rate. The method comprises the steps of leading a target human body model which is generated by a neural network and does not accurately reflect the real posture and the real posture of a human body in a photo into mainstream three-dimensional modeling software, leading an original photo into the software, and controlling the human body model to be visually attached to the photo to obtain a corrected new target human body model. The new model will be closer to the true body pose and shape in the photograph than the straight out model. And the corrected model is substituted into a neural network for training, which is equivalent to training the neural network with correct answers, so that the accuracy of parameters and the rationality of LOSS can be greatly improved, the convergence speed of the neural network model is ideal, and a target human body model which is highly consistent with the posture and the body type of a human photo can be obtained after the training is finished.

2. The neural network model is more scientific and targeted. In the prior art, some image processing methods are too pursuit to simply straighten out a model, time is not spent on polishing the details of the model, mapping from a 2D picture to a 3D body model is completed purely through training of mass image data, although the efficiency is high, the processing flow is too simple, a three-dimensional human body model is generated completely by depending on a neural network model, the consistency and the effect of the proportion and the detail part of the body are not satisfactory, and the subsequent further processing is not helpful at all, and the method can become an obstacle which is difficult to be crossed by a subsequent program. After slightly adjusting the human body models which are not accurate, the human body models can be used as good answers to train the neural network, parameters are adjusted, and the optimal output state of the neural network is approached step by step.

3. High frequencies use deep neural networks. The invention fully utilizes the advantages of the deep learning network and can restore the posture and the body type of the human body in various complex scenes with high precision. Different neural networks are respectively used for different purposes, and the neural network models with different input conditions and training modes are utilized, so that accurate contour separation of the human body under a complex background, semantic segmentation of the human body and determination of key points and joint points are realized, the influence of loose clothes and hairstyle is eliminated, and the real body type and shape of the human body are approached to the maximum extent. In the prior art, a neural network model is also used, but the functions and functions of the neural network model are greatly different due to different input conditions, input parameters and training modes. The human body contour, human body semantic segmentation, key points and joint points of the neural network at the previous stage can be used as input items, model parameters can be generated from multiple angles, parameters output by the neural network at the next stage comprise two categories of postures and body types, actions and body types can be controlled respectively, and the postures and the body types of the human body model can be accurately copied by combining the reference model.

4. The human body model is accurate and controllable. The currently popular human body reconstruction methods based on single images are mainly divided into the reconstruction of parameterized human body models. The most commonly used parameterized model is the mapau SMPL model, which contains two sets of 72 parameters for describing body posture and body size. Aiming at the problem of single picture reconstruction, the position of a two-dimensional joint is estimated from a picture, and then the SMPL parameter is obtained by optimizing through the minimum projection distance between a three-dimensional joint and a two-dimensional plane joint, so that the human body is obtained. However, the SMPL model is mainly subjected to deep learning and training through a large number of human body model examples, the relationship between the body shape and the shape base is an overall association relationship, the decoupling difficulty is high, the body part to be controlled cannot be controlled at will, the generated model cannot achieve high consistency with the real human body posture and the body shape, and in addition, if the SMPL model is further applied to the subsequent dressing process, the representation capability of the geometric details of the human body surface is limited, and the detailed texture of the clothes on the human body surface cannot be well reconstructed. However, the human body model is not obtained through training, and the parameters have corresponding relations based on the mathematical principle, that is, the parameters of each group are independent without mutual involvement, so that the model is more explanatory in the transformation process and can better represent the shape change of a certain part of the body. Generally speaking, the three-dimensional human body model which is closer to the body type of Asian people than the SMPL model of Mapu and has better independent operation and controllability is generated by establishing the self-contained standard human body model and using the parameters corresponding to 20 physique bases and 17 theta bones. The human body size is thousands of people, the proportion of thighs and shanks of a plurality of people does not meet a certain accurate proportion, and the model can realize the respective control and length adjustment of the thighs and the shanks by controlling input parameters so as to accurately determine the proportion of the legs. In fact, only such a human body model can be perfectly matched with the whole set of training method, because the training method requires that each part of the human body model can be adjusted independently as much as possible, so that the advantages of the training method can be fully exerted, and the human body in the human body model and the human body in the photo can be attached to each part visible to the naked eye.

5. The user operation is simple. The invention provides a method for analyzing a whole-body picture of a human body through a deep neural network to obtain accurate human body three-dimensional model parameters, only one common picture is needed to quickly model the human body, the characteristics and the trend of the Internet era are well adapted, and the method is simple and fast. The user does not need any preparation, and uploading a photo is all the work the user needs to complete. If the invention is applied to scenes such as entertainment small programs or network shopping, the experience and the viscosity of the user can be greatly enhanced. The 3D model obtained without a depth of field camera or a plurality of groups of cameras corresponds to the real shape of the human body, and provides a wide application scene for various industries, such as clothes, health and the like.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is an overall process flow diagram of one embodiment;

FIG. 2 is a process flow diagram of a model parameter acquisition module of an embodiment;

FIG. 3 is a schematic view of an embodiment of a mannequin correction process;

FIG. 4 is a schematic diagram of the system of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The following describes a method for processing a human body image according to an embodiment of the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention provides a training method for a three-dimensional human body model neural network, which greatly improves the parameter accuracy of a deep learning neural network, significantly accelerates the convergence rate, and can greatly improve the consistency and the reduction degree of the three-dimensional human body model during generation by aligning and adjusting details of the generated human body model.

Firstly, the invention discloses a training method of a human body model neural network, which comprises the following steps:

1) acquiring a two-dimensional image of a target human body;

The training method generally comprises three partial steps. Firstly, obtaining a three-dimensional target human body model with the same posture and body type as a target human body; secondly, performing consistency adjustment on the two-dimensional image and the target human body model by using three-dimensional modeling software; and thirdly, training the neural network by using the corrected human body model.

The first part is to process the acquired human body image to obtain the parameter information needed to generate the human body model. Previously, the selection of these skeletal key points is usually performed manually, but this method is inefficient and not suitable for the requirement of fast pace in the internet era, so that today when the neural network is in the way, it is a trend to use the deep-learning neural network to replace the manual selection of the key points. However, how to efficiently utilize the neural network is a problem that needs further research. In general, the idea of secondary neural network plus data refinement is adopted to construct the parameter acquisition system. As shown in fig. 2, we use a deep-learning neural network to generate these parameters, which mainly includes the following sub-steps: 1) acquiring a two-dimensional image of a target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

The acquisition of the two-dimensional human body contour image utilizes a target detection algorithm, which is a target area fast generation network based on a convolutional neural network. Before the two-dimensional human body image is input into the first neural network model, the method further comprises a process of training the neural network, the training sample comprises a standard two-dimensional human body image marking the position of an original joint point, and the position of the original joint point is marked on the two-dimensional human body image with high accuracy by manual work. Here, a target image is first acquired, and human body detection is performed on the target image using a target detection algorithm. Human detection is not the detection of a real human body by using a measuring instrument, but in the invention, the actual detection means that for any given image, usually a two-dimensional picture containing enough information, such as a human face, the four limbs and the body requirements of a human are all included in the picture. Then, a certain strategy is adopted to search the given image so as to determine whether the given image contains the human body, and if the given image contains the human body, parameters such as the position and the size of the human body are given. In this embodiment, before acquiring key points of a human body in a target image, human body detection needs to be performed on the target image to acquire a human body frame indicating a human body position in the target image, and since an image input by a user can be any image, there are inevitable backgrounds of some non-human body images, such as a table chair, a large-tree automobile building, and the like, and these useless backgrounds are removed through some mature algorithms.

Meanwhile, semantic segmentation, joint point detection, bone detection and edge detection are carried out, and good foundation can be laid for generating a 3D human body model later by collecting the 1D point information and the 2D surface information. A first stage neural network is used to generate a map of the joints of the human body, alternatively, a target detection algorithm may rapidly generate a network for a target area based on a convolutional neural network. The first neural network needs to carry out massive data training, some photos collected from the network are labeled by manpower, then the photos are input into the neural network for training, the neural network through deep learning can basically achieve the purpose that the joint point graph with the same accuracy and effect as those of the artificially labeled joint points can be immediately obtained after the photos are input, and meanwhile, the efficiency is tens of times or even hundreds of times that of the artificially labeled joint points. Human joint points generally exist as key points of the human body.

In the invention, the position of the joint point of the human body in the picture is obtained, only the first step is completed, 1D point information is obtained, and 2D surface information is generated according to the 1D point information, and the work can be completed through a neural network model and a mature algorithm in the prior art. The invention redesigns the working process and intervention time of the neural network model, reasonably designs various conditions and parameters, makes the parameter generation work more efficient, reduces the degree of manual participation, is very suitable for the internet application scene, for example, in the virtual reloading program, the user can obtain the reloading result in a basically instant manner without waiting, and plays a vital role in improving the attraction of the program to the user.

After the relevant 1D point information and 2D surface information are obtained, the parameters or results, namely the key point map, the semantic segmentation map, the body skeleton points and the key point information of the target human body can be taken as input items to be substituted into a second neural network subjected to deep learning to carry out regression of the human body posture and body type parameters. Through the regression calculation of the second neural network, a plurality of groups of three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters can be immediately output. Preferably, the loss function of the neural network is designed based on a three-dimensional standard human body model (base human body model), a predicted three-dimensional human body model, a standard two-dimensional human body image in which the positions of the original joint points are labeled, and a standard two-dimensional human body image including the positions of the predicted joint points.

In this section, we also pre-design and model some basic human platforms. The main working contents are as follows: and combining the mathematical model to construct a three-dimensional standard human body model, namely a basic mannequin. By studying the continuous animation of parameter change, the SMPL human body model of Mapu clearly shows that the continuous change of each control human body form parameter can cause local and even integral linkage change of the human body model, and in order to reflect the movement of human muscle tissues, the linear change of each parameter of the SMPL human body model can cause large-area grid change. Figuratively speaking, for example, when adjusting the parameter of β 1, the model may directly understand the parameter change of β 1 as the whole change of the body, and you may only want to adjust the proportion of the waist, but the model may force the fat and thin of the legs, chest and even hands to adjust together. Although the working mode can greatly simplify the working process and improve the efficiency, the project pursuing the modeling effect is really very inconvenient. Because the SMPL human body model is a model which is trained by Western body pictures and measurement data and accords with the body type of a Western person, the body change rule basically accords with the common change curve of the Western person, and when the SMPL human body model is applied to modeling of a human body model of an Asian person, a plurality of problems can occur, such as the proportion of arms and legs, the proportion of waist and body, the proportion of neck, the length of legs, the length of arms and the like. Through our research, the aspects have large difference, and if the SMPL human body model is used in a hard way, the final generation effect can not meet our requirements.

Therefore, the effect is improved by adopting a human body model self-made mode. The core of the method is that a human body blend body type base is built to realize accurate independent control of a human body. The three-dimensional standard human body model (basic human body platform) is composed of parameters of 20 physique bases and 170 skeleton parameters. The plurality of bases form the whole human body model, and each shape base is independently controlled and changed by parameters without mutual influence. The independent control means that each base is independently controlled, such as waist, legs, hands, head and the like, each skeleton can be independently adjusted in length and is independent from each other, and physical linkage is not generated, so that fine adjustment of the human body model can be better realized. The existing model embodies a corresponding relation on the mathematical principle, can better represent local body changes of the human body model, and the changes are based on the mathematical principle, so that the parameters are not influenced, and the arms and the legs are kept in a completely independent state.

It should be emphasized that, our human body model is not obtained by training, and there is a corresponding relationship based on the mathematical principle between the parameters, that is, there is no mutual relationship between the parameters of our group, and they are independent, so our model is more explanatory in the transformation process, and can better represent the shape change of a certain part of the body. Generally speaking, the three-dimensional human body model which is closer to the body type of Asian people than the SMPL model of Mapu and has better independent operation and controllability is generated by establishing the self-contained standard human body model and using the parameters corresponding to 20 physique bases and 170 skeletons. The model can realize the control and length adjustment of the thigh and the shank respectively by controlling the input parameters, so as to accurately determine the proportion of the leg. In fact, only such a human body model can be perfectly matched with the whole set of training method, because the training method requires that each part of the human body model can be adjusted independently as much as possible, so that the advantages of the training method can be fully exerted, and the human body in the human body model and the human body in the photo can be attached to each part visible to the naked eye.

The first part also comprises parameters of the human body model and the human body model for fitting to generate a target human body model. As shown in fig. 2, the method comprises the following substeps of corresponding the obtained three-dimensional human body posture and body type parameters to a plurality of basic and skeleton parameters of a three-dimensional standard human body model; inputting the obtained groups of base and skeleton parameters into a standard three-dimensional standard human body model for fitting; the three-dimensional human body model has a mathematical weight relation between the skeleton points and the model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture. In this section, the two parameters generated in the previous section are used to substitute the pre-designed human body model for the construction of the 3D human body model. The two types of parameters are similar to the names of the human body SMPL model parameters of Mapu, but the actual contained contents are different greatly. Because the basis of the two models is different, namely, the self-made three-dimensional standard human body model (basic mannequin) is adopted in the method, the SMPL model of Mapu adopts the standard human body model generated by big data training, the two models have different generation and calculation modes, and although the two models are finally embodied as the generated 3D human body model, the connotation is larger. After this step, a preliminary 3D phantom is obtained, including the bone positions and the mesh (mesh) of the phantom with long short messages.

In the second part, we complete the correction of the manikin.

The human body model output from the neural network model designed by us is substantially the same as the posture of the human body in the original picture, and is generally called a target human body model. The skeleton joint diagram has certain consistency due to the fact that the skeleton joint diagram is continuously trained through manual labeling. The rarely occurring photographic pose is the extreme case of the manikin with the arm down, with the arm in a raised position. In general, the difference between the actual body pose and body type in the target phantom and the photo we judge is not very obvious. However, no one can further process these imperfect human body models before, and the general method is to continuously adjust parameters and weight settings of the neural network by other methods, and continuously output the human body model after parameter correction, and if the posture is closer, the parameter adjustment is proved to be successful. However, such an adjustment method is often used in terms of efficiency, and is generally effective and efficient.

The training thought of the invention is to intervene and adjust the imperfect human body model generated by the neural network manually to make the model reach the state of being highly consistent with the human body posture of the photo, and take the corrected model as the human body model which should be generated by the photo and calculate the model by replacing the neural network model to obtain the parameters which the neural network should have so as to output the correct answer. Compared with various methods for acquiring real human body model data in the prior art, the method is simple and effective, the whole correction and training process can be completed only by one picture, and the time for acquiring data and the equipment cost are greatly simplified. It is also emphasized that the best correction is obtained by using a self-made standard manikin which can control each part independently. If the mappler SMPL model is used, it is not possible to achieve the best results of independently controlling the model to fit the picture.

Referring to fig. 3, in the present invention, if a human body model with a "perfect" posture is desired, the generated model is first imported into the existing mainstream three-dimensional modeling software, such as Maya; secondly, importing the original two-dimensional photo into modeling software, and displaying the imported photo and the human body model in the same task frame; thirdly, the bones and the joint points of the human body model are adjusted in a single task box in Maya, the joint points and the bones bound by the human body model are manipulated by an art designer, and the parts which are not consistent with the original picture are referred to and are close to the postures and body types in the picture. For example, in the picture, the arm is raised to 30 degrees at the front left, the generated model is 25 degrees at the front left, and the adjustment can be completed only by manually moving 5 degrees. In a complicated case, the direction of the arm, the arm's weight, the length of the arm, the direction of the hand rotation, etc. are not consistent, and in this case, the adjustment needs to be performed by the art designer. The adjustment mode can be a common mode in Maya, such as an ik mode of dragging the whole joint, and the XYZ coordinate, the Euler angle and the like are directly adjusted. In fact, the step is manually participated in, the accuracy of the model and the final working efficiency are improved relatively high, and the model and the final working efficiency are also 'jointed' through software. Although manual and software automatic adjustment, the final goal is that the human body model completely "covers" the human body part in the original picture, we still prefer to adopt the manual adjustment method. However, it should be emphasized that this does not have much influence on the implementation of the training method claimed in the present invention, because no matter what method is adopted, the generated human body model is designed to be as close as possible to the posture and body type of the human body in the original picture, and the step of modifying the model is provided, which can greatly improve the accuracy of training, and can enable us to adjust various parameters more accurately, thereby generating the target human body model closer to the real human body condition of the photo.

After adjustment is finished, a checking and repairing link is added, three views of the human body model are rendered in Maya, then the three views are compared with the picture, and if the three views are consistent in height, the three views can be used as a result to train the neural network. If there are any inconsistent places, the entering software carries out fine adjustment on the inconsistent places again. Because the model and the photo are aligned by using the sight of human eyes in the Maya virtual 3D space, although the alignment is accurate, the alignment of the human eyes is difficult to avoid a little error, after a two-dimensional picture of the model is generated, the inconsistency of the photo and the model in the two-dimensional space can be observed carefully from a two-dimensional static angle, and the inconsistency which is not easy to be found in the 3D space, such as the shape of the elbow of an arm, the backward bent limb, the forward or backward bent body and the like, can be found more easily, so that further assistance is provided for correcting a more perfect model.

In the fourth part, the re-modified human body model is re-input as a result into the neural network to train the human body model, so that the neural network can know what the most accurate matching result of the input picture and the output model is. There are many training methods for neural networks, but the basic flow and thought are almost the same, that is, let the neural network know what is wrong. One common deep learning neural network training method generally includes: (1) preprocessing data; (2) the data is input into a neural network (each neuron inputs a weighted accumulation of values and then inputs an activation function as an output value of the neuron) to be transmitted in the forward direction, and a score or a result is obtained; (3) inputting the 'score' or the 'result' into an error function (regularization punishment, over-fitting prevention), comparing the 'score' or the 'result' with an expected value to obtain an error, and judging the identification degree (the smaller the loss value is, the better the identification degree) by the error, wherein a plurality of the errors are sums; (4) determining gradient vectors by back propagation (back derivation, error function and each activation function in the neural network requires, with the final goal of minimizing the error); (5) finally, each weight is adjusted through a gradient vector, and the error tends to be 0 or the convergence trend is adjusted towards the score or the result; (6) repeating the above process until the average value of the set times or the loss error does not drop (lowest point); (7) and finishing the training.

It can be seen that most of neural network training processes are based on a large amount of data originally, and when training is started, a neural network model has a poor effect on many scenes, so that a large amount of various badcases need to be labeled, and then the labeled badcases are added into a training set, so that the neural network knows what the actual values of the badcases should be, and the scene can be accurately predicted after the network learns that similar images are touched later. However, this training is not very efficient in practice, so the training process is actually an iterative process, and if it is possible to train the model with results that are almost consistent with the standard answers, the later the badcase appears less, and the convergence rate is increased very fast, and the better the model performance.

The training method flow for generating the three-dimensional human model neural network according to the embodiment of the invention is described in conjunction with fig. 1 to 3. Fig. 4 is a diagram illustrating a hardware configuration 300 of an apparatus for processing a human body image according to an embodiment of the present invention.

The invention also discloses a computer readable storage medium having a computer program stored therein, which when executed by a processor implements the training method and steps described above.

The electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus; a memory for storing a computer program; a processor for implementing the training method and steps described above when executing the program stored in the memory.

As shown in fig. 4, the apparatus 300 for implementing the neural network training method in this embodiment includes: the device comprises a processor 301, a memory 302, a communication interface 303 and a bus 310, wherein the processor 301, the memory 302 and the communication interface 303 are connected through the bus 310 and complete mutual communication.

In particular, the processor 301 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured as one or more integrated circuits implementing an embodiment of the present invention.

Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include an HDD, a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the human image processing apparatus 300, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory. In a particular embodiment, the memory 302 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The communication interface 303 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiment of the present invention.

The bus 310 includes hardware, software, or both to couple the components of the apparatus 300 for processing human body images to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

That is, the apparatus 300 for processing a human body image shown in fig. 4 may be implemented to include: a processor 301, a memory 302, a communication interface 303, and a bus 310. The processor 301, memory 302 and communication interface 303 are coupled by a bus 310 and communicate with each other. The memory 302 is used to store program code; the processor 301 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 302 for executing the neural network training method in any embodiment of the present invention, thereby implementing the neural network training method described in conjunction with fig. 1 to 3.

The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium is stored with computer program instructions; the computer program instructions, when executed by a processor, implement a neural network training method provided by embodiments of the present invention.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A mannequin neural network training method, the method comprising:

1) acquiring a two-dimensional image of a target human body;

2. The method of claim 1, wherein the training method further comprises: combining the mathematical model to construct a three-dimensional standard human body model; acquiring three-dimensional human body parameters of a target human body model through a neural network, wherein the three-dimensional human body parameters comprise three-dimensional human body action posture parameters and three-dimensional human body shape parameters; the obtained three-dimensional human body posture and body type parameters correspond to a plurality of basic and skeleton parameters of the three-dimensional standard human body model; and inputting the obtained groups of base and skeleton parameters into a standard three-dimensional standard human body model for fitting.

3. The method according to claim 2, wherein the three-dimensional standard human body model is composed of parameters of a plurality of shape bases and parameters of a plurality of bones, the plurality of shape bases form the whole human body model, and each shape base is independently controlled and changed by the parameters of the bases without influencing each other; the three-dimensional standard human body model has a mathematical weight relation between skeleton points and a model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture.

4. The method of claim 1, wherein the step of obtaining parameters of the target human body model further comprises, 1) obtaining a two-dimensional image of the target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

5. The method of claim 4, further comprising a process of training the first neural network before inputting the two-dimensional human image into the first neural network model, the training sample comprising a standard two-dimensional human image labeling original joint point positions, the original joint point positions being labeled by an artificial on the two-dimensional human image with high accuracy.

6. The method of claim 1, wherein the consistency adjustment is performed manually.

7. The method of claim 6, wherein the consistency of the pose between the imported target human model and the two-dimensional image of the target human body is achieved by adjusting the skeleton of the human model, and the consistency of the body shape is achieved by adjusting the body shape base of the human model.

8. The method of claim 1, wherein after obtaining the modified new target human body model, rendering three views of the human body model in software, comparing the three views with the picture, and performing the supplementary modification adjustment.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-8.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus; a memory for storing a computer program; a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.