CN114693570A

CN114693570A - Human body model image fusion processing method, device and storage medium

Info

Publication number: CN114693570A
Application number: CN202011609643.8A
Authority: CN
Inventors: 吴圣杰; 杨超杰; 周润楠; 张胜凯; 郑天祥; 焦年红
Original assignee: Beijing Momo Information Technology Co Ltd
Current assignee: Beijing Momo Information Technology Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-01

Abstract

The invention discloses a fusion processing method of a human body model image, which comprises the following steps: manufacturing a basic mannequin; obtaining a three-dimensional clothing model; attaching the three-dimensional garment model to a basic mannequin; acquiring secondary information of the human body model according to the target human body two-dimensional image; obtaining the posture and body type parameters of the target human body model through regression prediction; obtaining a target posture and a target body type; driving the human body model to move from the initial posture to the target posture; obtaining a target human body model after the reloading is finished; fusing the picture head portrait with the target human body model image; restoring the skin color of the naked part of the human body; and obtaining the 2D picture after changing the clothes. The invention utilizes the interpolation frame to generate the animation sequence, so that the effect of the whole fitting and driving process is better, the image effect after the model is generated is optimized, and the slicing effect is improved.

Description

Human body model image fusion processing method, device and storage medium

Technical Field

The invention belongs to the field of virtual reloading, and particularly relates to an image fusion processing method after matching of a human body model and a clothing model is completed, in particular to a method, equipment and a storage medium for fusing a spliced part and naked skin complexion.

Background

With the development of internet technology, online shopping is more and more popular. Compared with shopping in a physical store, online shopping has the advantages of multiple commodity types, convenience in shopping and the like. However, there are some problems that are not easy to solve when purchasing commodities on the internet, and most importantly, the commodities to be purchased cannot be checked on the spot. The problem of clothing is most prominent in all commercial varieties. Compared with the method that the clothes effect can be changed and checked in real time in the shopping of a physical store, the online clothes shopping can not provide an effect picture aiming at a consumer, only can provide a picture of model fitting, and even has no fitting picture, so that the consumer can not intuitively obtain the matching degree of the clothes and the body figure of the consumer in real time. Resulting in a large amount of returns.

In response to this problem, operators have attempted to solve this problem by providing simulated fitting effects for consumers using virtual fitting techniques. Of course, there are other situations in reality where virtual fitting and dressing techniques can be used, such as in network games. Therefore, this technology has been developed more rapidly.

The virtual fitting refers to a technical application that a user can check the dressing change effect on a terminal screen in real time without actually changing clothes with the wearing effect. The existing dressing change technology mainly comprises a plane fitting technology and a three-dimensional virtual fitting technology. The former basically collects pictures of users, collects pictures of clothes, and then cuts and splices the pictures to form images after dressing, but the images have poor reality due to a simple and rough image processing mode, the actual body type of the users is not considered at all, and the requirements of the users cannot be met only by carrying and hardbanding clothes on the pictures of the users. The latter usually collects the three-dimensional information of the person through a three-dimensional collecting device and combines the characteristics of the clothes, or manually inputs the body data information provided by the user, virtually generates a human body three-dimensional model according to a certain rule, and then combines the human body three-dimensional model with the clothes map. Overall, such three-dimensional virtual fitting requires a large amount of data acquisition or three-dimensional data calculation, and has high hardware cost and is not easy to popularize among ordinary users.

With the development of cloud computing technology, artificial intelligence technology and intelligent terminal processing capacity, a technology for generating a three-dimensional human body model from a two-dimensional human body virtual image and realizing virtual fitting after changing the three-dimensional clothes model is generated. Such techniques mainly comprise several major steps: (1) processing the personal body information provided by the user to obtain a target human body model; (2) processing the clothing information to obtain a clothing model; (3) matching the human body model and the clothes model together; (4) the generated image of the person wearing the garment is fused with the original image.

However, due to the limitation of the changing principle, the generated body model after changing does not have the head and face characteristics of the changing person, so after the changing model is generated, the two-dimensional picture of the changing model is superposed on the original two-dimensional picture, and the head portrait of the changing person and the body wearing the virtual clothes can be displayed on the synthesized picture. Therefore, once defects appear in the superposition, the distortion and the unreasonable defects can be visually displayed on the picture, and particularly, the problems that the human body model is not aligned with the head of the original picture of the fitting person, the skin texture is lost or the skin color is inconsistent, the body part is lost, the background is lost and the like are easily caused, so that the effect of the finally generated clothes changing picture is influenced.

In the general field of computer vision, there are many initial steps of human body modeling, which generally include three major categories, namely omni-directional scanning of a real human body using a 3D scanning device, three-dimensional reconstruction methods based on multi-view depth-of-field photography, and three-dimensional reconstruction implemented by combining a given single image or multiple images with a neural network model and a standard human body model.

In the prior art, methods for constructing a human body model through two-dimensional images generally have several types: (1) reconstructing a human body model represented by a voxel through a convolutional neural network based on a regression method; (2) according to the method, firstly, simple human skeleton key points are roughly marked on an image, and then, initial matching and fitting of a human model are carried out according to the rough key points to obtain the approximate shape of a human body. (3) And predicting key points on the image based on the CNN model, fitting by adopting an SMPL model to obtain an initial human body model, and finally combining the initial model and the bounding box obtained by regression to obtain the three-dimensional human body reconstruction.

In the prior art, a human body modeling method based on body measurement data is disclosed, which includes: acquiring body measurement data; performing linear regression on a pre-established human body model through a pre-trained prediction model according to the body measurement data, and fitting to obtain a prediction human body model, wherein the pre-established human body model comprises a plurality of groups of pre-defined marking feature points and corresponding standard shape bases, and the body measurement data comprises measurement data corresponding to each group of marking feature points; and obtaining a target human body model according to the prediction human body model, wherein the target human body model comprises measurement data, a target shape base and a target shape coefficient. However, the method has high requirements on the body measurement data, and although the calculation amount is saved, the user experience is very poor, and the procedure is very complicated.

It should be noted that the SMPL model is a parameterized human body model, and is a human body modeling method proposed by mapp, germany, which can perform arbitrary human body modeling and animation driving. The biggest difference between the method and the traditional LBS is that the method for imaging the body surface morphology of the human body posture can simulate the protrusion and the depression of human muscles in the limb movement process. Therefore, the surface distortion of the human body in the motion process can be avoided, and the shapes of the muscle stretching and contraction motion of the human body can be accurately depicted. In the method, beta and theta are input parameters, wherein beta represents 10 parameters of human body with high and low fat and thin body, head-to-body ratio and the like, and theta represents 75 parameters of human body overall motion pose and 24 joint relative angles. However, in this model generation method, the core is to obtain the relationship between the body type and the shape bases by accumulating a large amount of training data, but because of the strong correlation between the training data and the shape bases, independent control cannot be achieved between the shape bases, and decoupling operation is not easy to perform. In the driving process of the model, the characteristics of the model still seriously influence the final driving effect of the model, the model moves integrally when moving, the difference between the model and each part embodied by the human body model adopted by the invention can be independently controlled, and the model driving effect has a great lifting space.

The second prior art discloses a method for processing an image of a virtual fitting model, which includes: determining a set of pixels of which the colors belong to a skin color range in a face region; taking the color average value of all pixels in the pixel set as the human body skin color average value of a preselected region in the reference image, and calculating the ratio obtained by dividing the human body skin color average value by the skin color average value of the virtual fitting model image; multiplying each pixel value of the body area in the virtual fitting model image by the ratio, and taking the multiplication result as each pixel value of the body area in the virtual fitting model image; before the step of calculating the ratio obtained by dividing the average value of the human body complexion by the average value of the complexion of the virtual fitting model image, the method also comprises the step of calculating the average value of the complexion of the virtual fitting model image. The step of determining the average value of the human skin color of the preselected region in the reference image further comprises receiving information for determining the preselected region in the reference image. The method can make the complex calculation relatively simple, but only considers the average value of the complexion of the face area, and replaces the complexion values of all other parts with the value, thus easily causing the unnatural new complexion.

The third prior art discloses a method for synthesizing a virtual object, which comprises the following steps: acquiring a target user image and a virtual object image containing clothing characteristics; extracting user head features from the target user image; respectively carrying out skin color processing on the head feature of the user and the virtual object image according to the skin color feature of the reference object image to obtain the head feature of the user and the virtual object image which are matched with the skin color feature; acquiring neck characteristics of the reference object image, and fusing the neck characteristics to a virtual object image after skin color processing; and integrating the head features of the user after the skin color processing into a virtual object image fused with the neck features, and synthesizing a virtual object comprising the clothing features, the neck features and the head features of the user after the skin color processing. Respectively carrying out skin color processing on the head feature of the user and the virtual object image according to skin color features of a reference object image to obtain the head feature of the user and the virtual object image which are matched with the skin color features; acquiring pixel values of all pixel points for representing skin colors from the user head features and the virtual object image, and acquiring a pixel matrix according to the acquired pixel values; processing the skin color characteristics of the reference object image to obtain a skin color mapping matrix; calculating the skin color mapping matrix and a pixel matrix corresponding to the user head feature to obtain the user head feature matched with the skin color feature; and calculating the skin color mapping matrix and a pixel matrix corresponding to the virtual object image to obtain a virtual object image matched with the skin color characteristics. The method is actually an operation of changing the head, and the head of the model is virtually replaced, so that only the skin color characteristics of the head are extracted, and the conditions of other parts of the limb cannot be well reflected.

Therefore, in the field of virtual dressing change, in the case that the human body model is from an initial position to a target posture and is worn with clothes, the generated composite two-dimensional image still has many visual defects, and most methods focus on increasing the processing speed so as to meet the requirements of animation or games at present. However, in some cases, the requirement for the simulation quality of the final clothes is more important, and how to provide a fusion processing method of the human body model and the clothes model image, which has a calculation amount not exceeding the bearing capacity of the terminal equipment and has excellent effect, is a problem to be solved urgently.

Disclosure of Invention

Based on the above problems, the present invention provides a human body model image fusion processing method, apparatus, and storage medium that overcome the above problems. By means of the restoration and fusion of the joint part of the superposed pictures and the extraction of the skin color of each part of the original two-dimensional image, the fusion of the original picture and the two-dimensional picture output after the dressing of the human body model can be well completed, and the finally synthesized dressing change picture can be closer to the state of the original picture in form and reality degree.

The invention provides a fusion processing method of a human body model image, which comprises the following steps: manufacturing a three-dimensional basic mannequin in an initial posture, wherein the initial posture parameters are determined by the initialization parameters of a basic mannequin model; obtaining a three-dimensional clothing model; fitting the three-dimensional garment model to a three-dimensional basic mannequin model of the initial posture; acquiring secondary information of the human body model according to the target human body two-dimensional image by using the neural network model; obtaining posture and body type parameters of a target human body model through neural network model regression prediction according to the secondary information, wherein the three-dimensional human body posture and body type parameters correspond to bones of a three-dimensional basic mannequin and a plurality of basic parameters; inputting the obtained groups of base and skeleton parameters into a basic mannequin model for fitting to obtain a target posture and a target body type; driving the skeleton of the human body model to move from the initial posture to the target posture; obtaining a three-dimensional target human body model which has the same posture as the two-dimensional image of the target human body and finishes the changing of the three-dimensional garment model; fusing the picture head portrait with the target human body model image, and processing the part of the limb which is not completely aligned and aligned; restoring the skin color of the naked part of the human body; and obtaining a 2D picture of the target human head portrait, the limbs, the target image background and the changed clothes which are formed by the clothes worn on the target human body model.

Preferably, the fusion process further comprises: (1) performing portrait matting processing on a two-dimensional image of a target human body, but keeping a head image; (2) the three-dimensional human body model which is changed is processed and then output into a two-dimensional image which has the same size with the two-dimensional human body image, does not have any background and removes the head image; (3) superposing the two-dimensional image output by the human body model on the target two-dimensional human body image, and checking whether the head image of the target human body is consistent with the neck of the two-dimensional image output by the human body model; (4) and if the difference is not consistent, splicing the joint parts by adopting a pixel fusion mode.

Preferably, the human body model outputs a part of the two-dimensional image connected with the head of the original two-dimensional image, and fusion adjustment is carried out according to the detected human face complexion; the human body model outputs other exposed parts of the limbs of the two-dimensional image, and fusion adjustment is carried out according to the skin color of the exposed skin of the adjacent part in the original two-dimensional image.

Preferably, the overlay image after matting contains a pixel missing part, and the pixel missing part is filled according to the background pixel color in a certain area around the missing part.

Preferably, the driving process further includes: driving the skeleton of the human body model to move to a target posture from an initial posture in a frame interpolation mode; the movement of the skeleton drives the following movement of the human body model mesh; driving the clothing model to move to a target posture along with the human body model frame by frame; the cloth calculation process of the clothing model and the movement process of the human body model grid are synchronously carried out, and the physical simulation calculation of the cloth is carried out after all the skeletons finish the movement of each frame.

Preferably, after obtaining the initial state of the bone information and the target posture state parameters, the bone is driven to move from the initial posture to the target posture, and the time series of the bone information from the initial posture to the target posture is formed by a frame interpolation mode of linear interpolation or nearest neighbor interpolation.

Preferably, in the process of generating the animation sequence, the movement of the human model mesh is also performed in a frame interpolation manner, after each frame drives the skeleton to move, the vertex of the human model in the current state, namely the plane information, is obtained through the calculation of the weight parameters of the standard human model, and the state of the current human model mesh is updated, recorded and stored.

Preferably, the step of obtaining parameters of the target human body model further comprises, 1) obtaining a two-dimensional image of the target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

Furthermore, the present invention provides a computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the method steps of any of the preceding claims.

An electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for implementing any of the above method steps when executing a program stored in the memory.

The invention has the beneficial effects that:

1. the final tablet forming effect is good. Our dressing change procedure is actually a process of going back from two-dimensional picture to two-dimensional picture. That is, the user inputs an original two-dimensional image of the user, then selects the clothes to be replaced, and can directly obtain a picture after the clothes are replaced without any additional operation, wherein the head image, the hand image, the exposed skin color and the picture background of the user are consistent with those of the original two-dimensional image, and the difference is only the previous clothes of the user, so that the selected clothes are changed. Therefore, in order to achieve the effect of fully restoring and highly natural pictures, after the human body model is changed, the overlapping and fusion processing process of the changed pictures and the original pictures is added, firstly, the parts of the limbs which are not aligned and aligned are adjusted, secondly, the skin colors of the head and the neck are unified, and finally, the skin colors of the foot parts are fused and unified. The synthesized picture is more real and natural, and obvious BUG can not appear.

2. The model fitting effect is good, and the die penetration is less. In order to drive a model to completely fit the posture of a target human body, an overall one-step in-place moving mode is generally adopted in the process of changing from an initial posture (position) to a target posture, the mode has small calculated amount, large changes of boundary conditions and position conditions of the model, much reduced reality degree and easy generation of mode penetration. We note that in the moving process of the human body model, the grid motion rules of each part of the upper limb are different, some grids are obviously changed, some grids are not obviously changed, and accordingly, some grids are in violent motion, and some grids basically do not move. If the mode that the target position is reached from the initial position in one step is adopted, some parts can generate obvious distortion or deformation, while some parts have enough reduction degree, the overall visual effect is poor, and people can easily see various unreasonable details. Aiming at the characteristics, in order to fit the motion posture of a target human body vividly and better match cloth simulation calculation, in the process of changing a skeleton from an initial posture to a target posture, an optimized frame insertion method is adopted for completion, compared with a traditional pin insertion method, skeleton information of the target posture is obtained by model regression prediction, and meanwhile, an animation sequence moving from the initial posture to the target posture is generated; and forming a bone information time sequence from the initial posture to the target posture through frame interpolation modes such as linear interpolation, nearest neighbor interpolation and the like. The accumulation of a plurality of frame animation sequences is creatively utilized, the adoption of a one-step in-place driving mode is avoided, and the cloth material calculation is carried out frame by frame while the human body model is driven in a small amplitude in each frame. The overall resolving effect of the clothing model is far better than that of a one-step method, and although the speed is influenced, the simulation effect is more real and the reduction degree is better.

3. The human body model is accurate and controllable. In the process of frame-by-frame driving, the self-made basic mannequin can control more details, so that the reality degree and the reduction degree of the details are better than those of a traditional human body model. The currently popular human body reconstruction method based on single image is mainly divided into reconstructing parameterized human body models, such as SMPL models, deep learning and training are mainly carried out through a large number of human body model examples, the relationship between body types and shape bases is an overall incidence relationship, decoupling difficulty is high, a body part to be controlled cannot be controlled at will, the generated model cannot achieve high consistency with the real human body posture and body type, and in addition, if the method is further applied to the subsequent dressing process, the representation capability of human body surface geometric details is limited, and the detailed texture of human body surface clothes cannot be well reconstructed. However, the human body model is not obtained through training, and the parameters have corresponding relations based on the mathematical principle, that is, the parameters of each group have no mutual relation, and are independent, so that the model has more explanatory property in the transformation process, and can better represent the shape and position change of a certain part of the body or a specific part. That is to say, in the process of frame-by-frame movement of human skeleton and grid, the basic mannequin made by us can better restore the limb movement situation and the clothes following movement state in the real world.

4. A hierarchical deep neural network is used with high frequency creativity. In the prior art, a neural network model is also used, but the functions and functions of the neural network model are greatly different due to different input conditions, input parameters and training modes. In the aspect of acquiring the secondary information of the human body model and the body data of the model, different neural networks are respectively used for different purposes, and the neural network models with different input conditions and training modes are utilized, so that the accurate contour separation of the human body under the complex background, the semantic segmentation of the human body, the determination of key points and joint points are realized, the influence of loose clothes and hairstyle is eliminated, and the true body type and shape of the human body are approached to the maximum extent. The advantages of the deep learning network are fully utilized, and the posture and the body type of the human body can be restored in various complex scenes with high precision. And parameters output by the later-stage neural network comprise two categories of a posture pos and a body shape, so that the action and the body shape can be respectively controlled, and the posture and the body shape of the human body model can be accurately copied by combining the reference model.

The invention forms an animation series from an initial posture to a target posture, completes the whole process of driving the human epidermis grid to move by the skeleton in a frame-by-frame driving mode, and optimizes a two-dimensional image generated by the model by adopting an image processing method after the model is manufactured, so that the whole reloading effect reaches a higher level.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic diagram illustrating a fusion process of a complete human body model and a garment model according to an embodiment;

FIG. 2 is a schematic diagram of a process for obtaining a target mannequin and a garment model according to one embodiment;

FIG. 3 is a schematic process flow diagram of a model parameter acquisition module of an embodiment;

FIG. 4 is a schematic diagram of the system of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The method described in the embodiments of the present invention will be described in detail below with reference to the accompanying fig. 1 to 3.

As shown in fig. 1-3, the present invention provides a method for fusion processing of human body model images, the method comprising: manufacturing a three-dimensional basic mannequin in an initial posture, wherein the initial posture parameters are determined by the initialization parameters of a basic mannequin model; obtaining a three-dimensional clothing model; fitting the three-dimensional garment model to a three-dimensional basic mannequin model of the initial posture; acquiring secondary information of the human body model according to the target human body two-dimensional image by using the neural network model; obtaining posture and body type parameters of a target human body model through neural network model regression prediction according to the secondary information, wherein the three-dimensional human body posture and body type parameters correspond to bones of a three-dimensional basic mannequin and a plurality of basic parameters; inputting the obtained groups of base and skeleton parameters into a basic mannequin model for fitting to obtain a target posture and a target body type; driving the skeleton of the human body model to move from the initial posture to the target posture; obtaining a three-dimensional target human body model which has the same posture as the two-dimensional image of the target human body and finishes the changing of the three-dimensional garment model; fusing the picture head portrait with the target human body model, and processing the part of the limb which is not completely aligned and aligned; restoring the skin color of the naked part of the human body; and obtaining a 2D picture of the target human head portrait, the limbs, the target image background and the changed clothes which are formed by the clothes worn on the target human body model. Of course, the strict sequence order between these steps is not necessarily observed, because some steps are independent preparation steps, and the placing sequence does not have a decisive influence on the final product.

It can be seen that the flow of the present invention involves roughly five partial steps. Firstly, generating a three-dimensional basic mannequin (standard human body model); secondly, attaching the three-dimensional garment model to a basic mannequin; thirdly, obtaining parameters of the target posture human body model; fitting the body type and the posture of the standard human body model to be consistent with the target human body model, and moving the clothes model to the position of the target human body model; fifthly, the two-dimensional picture output by the human body model and the original target human body two-dimensional picture are fused.

The first part is to pre-design model some of the underlying mannequins. The main working contents are as follows: and combining the mathematical model to construct a three-dimensional basic mannequin, namely a basic mannequin or a basic mannequin. The SMPL human body model of Mapu can avoid surface distortion of a human body in the motion process, and can accurately depict the shapes of muscle stretching and contraction motions of the human body. In the method, beta and theta are input parameters, wherein beta represents 10 parameters of human body with high and low fat and thin body, head-to-body ratio and the like, and theta represents 75 parameters of human body overall motion pose and 24 joint relative angles. The beta parameter is shape Blend position parameter, and the shape change of the human body can be controlled by 10 incremental templates. Because the SMPL human body model is a model which is trained by Western body pictures and measurement data and accords with the body type of a Western person, the body change rule basically accords with the common change curve of the Western person, and when the SMPL human body model is applied to modeling of a human body model of an Asian person, a plurality of problems can occur, such as the proportion of arms and legs, the proportion of waist and body, the proportion of neck, the length of legs, the length of arms and the like.

Therefore, the technical feasibility is improved by adopting a human body model self-making mode. The core of the method is that a human body blend body type base is built to realize accurate independent control of a human body. The three-dimensional basic mannequin has a mathematical weight relation between skeleton points and a model grid, and the determination of the skeleton points can be associated with a human body model for determining the target human body posture; the three-dimensional basic mannequin is defined by a plurality of body base parameters and a plurality of skeleton parameters, the body bases form the whole human model grid, and each body base is independently controlled and changed by the base parameters without mutual influence. Preferably, the three-dimensional basic mannequin (basic mannequin) is composed of 20 parameters of the physique base and 170 skeleton parameters. So-called accurate control, on the one hand has increased the parameter of control, does not continue to use ten beta control parameters of mapplet, and like this, the parameter that can adjust except general fat thin, has still added the length of arm, the length of shank, the fat thin of waist, buttock and chest etc. has improved the parameter more than one time in the aspect of the bone parameter, has richened the scope that can adjust the parameter greatly, provides good basis for the design basis people platform that becomes more meticulous. The independent control means that each base is independently controlled, such as waist, legs, hands, head and the like, each skeleton can be independently adjusted in length and is independent from each other, and physical linkage is not generated, so that fine adjustment of the human body model can be better realized. The existing model embodies a corresponding relation on the mathematical principle, and is obviously different from a big data training model of the SMPL human body model, so that the parameter transformation of the model is more interpretable, local body changes of the human body model can be better represented, in addition, the changes are based on the mathematical principle, no influence exists among all parameters, and the arms and the legs are kept in a completely independent state.

The second part is mainly to generate a three-dimensional garment model. In the prior art of generating three-dimensional garment models, there are several different approaches. At present, a traditional clothes three-dimensional model building method is based on a two-dimensional clothes cutting piece design and sewing method. This method requires some garment expertise to design the pattern. The other novel three-dimensional modeling method is based on hand drawing, and a simple clothing model can be generated through line information drawn by a user hand. And the other method is to comprehensively use an image processing technology and a graph simulation technology on the basis of obtaining the clothing picture information to finally generate a virtual three-dimensional clothing model. The method comprises the steps of obtaining the outline and the size of the garment in a picture through outline detection and classification, finding out edges and key points of the edges from the outline through a machine learning method, generating sewing information through the corresponding relation of the key points, and finally performing physical sewing simulation on the garment in a three-dimensional space to obtain the real effect of the garment worn on a human body. In addition, methods such as a mapping method, a mathematical model simulation method and the like are provided, the method is not particularly limited to the part, however, the three-dimensional garment model needs to be matched with the standard human body model, the total requirement is that the garment model matched with the standard human body model is matched with the human body model under the target posture in a cloth physical simulation mode based on the garment model matched with the standard human body model, and the naturalness and the reasonability of the garment are ensured.

And the third part is to process the acquired human body image to obtain the parameter information required by generating the human body model. Previously, the selection of these skeletal key points is usually performed manually, but this method is inefficient and not suitable for the requirement of fast pace in the internet era, so that today when the neural network is in the way, it is a trend to use the deep-learning neural network to replace the manual selection of the key points. However, how to efficiently utilize the neural network is a problem that needs further research. In general, the idea of secondary neural network plus data refinement is adopted to construct the parameter acquisition system. As shown in fig. 2-3, we use a deep-learned neural network to generate these parameters, which mainly includes the following sub-steps: 1) acquiring a two-dimensional image of a target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

Before the two-dimensional human body image is input into the first neural network model, the method further comprises a process of training the neural network, the training sample comprises a standard two-dimensional human body image marking the position of an original joint point, and the position of the original joint point is marked on the two-dimensional human body image with high accuracy by manual work. Here, a target image is first acquired, and human body detection is performed on the target image using a target detection algorithm. Human detection is not the detection of a real human body by using a measuring instrument, but in the invention, the actual detection means that for any given image, usually a two-dimensional picture containing enough information, such as a human face, the four limbs and the body requirements of a human are all included in the picture. Then, a certain strategy is adopted to search the given image so as to determine whether the given image contains the human body, and if the given image contains the human body, parameters such as the position and the size of the human body are given. In this embodiment, before acquiring key points of a human body in a target image, human body detection needs to be performed on the target image to acquire a human body frame indicating a human body position in the target image, and since an image input by a user can be any image, there are inevitable backgrounds of some non-human body images, such as a table chair, a large-tree automobile building, and the like, and these useless backgrounds are removed through some mature algorithms.

Meanwhile, semantic segmentation, joint point detection, bone detection and edge detection are carried out, and good foundation can be laid for generating a 3D human body model later by collecting the 1D point information and the 2D surface information. A first stage neural network is used to generate a map of the joints of the human body, alternatively, a target detection algorithm may rapidly generate a network for a target area based on a convolutional neural network. The first neural network needs to carry out massive data training, some photos collected from the network are labeled by manpower, then the photos are input into the neural network for training, the neural network through deep learning can basically achieve the purpose that the joint point graph with the same accuracy and effect as those of the artificially labeled joint points can be immediately obtained after the photos are input, and meanwhile, the efficiency is tens of times or even hundreds of times that of the artificially labeled joint points.

After the relevant 1D point information and 2D surface information are obtained, the parameters or results, namely a relevant node map, a semantic segmentation map, body skeleton points and/or key point information of a target human body can be used as input items to be substituted into a second neural network subjected to deep learning to carry out regression of human body posture and body type parameters. Through the regression calculation of the second neural network, a plurality of groups of three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters can be immediately output. Preferably, the loss function of the neural network is designed based on a three-dimensional base model (base model), a predicted three-dimensional body model, a standard two-dimensional body image labeling the positions of original joint points, and a standard two-dimensional body image including the positions of predicted joint points.

And the fourth step is to fit the parameters of the human body model with the human body model and match the clothing model with the human body model.

Preferably, the driving process further includes: driving the skeleton of the human body model to move to a target posture from the initial posture by adopting a frame interpolation mode; the movement of the skeleton drives the following movement of the human body model mesh; driving the clothing model to move to a target posture along with the human body model frame by frame; the cloth calculation process of the clothing model and the movement process of the human body model grid are synchronously carried out, and the physical simulation calculation of the cloth is carried out after all the skeletons finish the movement of each frame.

In order to fit the motion posture of a target human body vividly and match cloth simulation calculation better, in the process of changing a skeleton from an initial posture to a target posture, a frame insertion method is designed to realize high reality and reducibility of a garment model from a model simulation mechanism. Because the human body model is actually a process of repeated calculation and repeated verification in the driving process, when parameters are substituted into the model for calculation, if the calculated amount is small, the result simulated by the model is closer to the real result, and conversely, the larger the calculation span is, the degree of distortion of the model is rapidly increased. In our invention, the process of moving from the initial position to the target position is decomposed into several small-amplitude actions, and an animation sequence moving from the initial pose to the target pose is generated by frame insertion in a time sequence. The frame interpolation mode can select linear interpolation, nearest neighbor interpolation and the like to form a bone information time sequence from an initial posture to a target posture, and on the basis, a series of process animations in sequence of time are formed. The whole action is decomposed into a plurality of frames of animation sequences, the moving amplitude of the model of each frame is very small, and finally, the large-range action is formed in an accumulated mode, so that the driving mode of one-step in place is avoided. Through frame interpolation between the initial pos and the target pos, cloth calculation, collision body calculation and verification are performed frame by frame while the human body model is driven in a small amplitude in each frame. Therefore, when the target position is reached, the overall resolving effect of the clothing model is far more due to the one-step in-place method, and the final visual effect of the clothing model, which is more real and has better reduction degree, is obtained.

Preferably, after obtaining the initial state of the bone information and the target posture state parameters, the bone is driven to move from the initial posture to the target posture, and the time series of the bone information from the initial posture to the target posture is formed by a frame interpolation mode of linear interpolation or nearest neighbor interpolation. The mode can ensure that the movement of all the skeletal joint points meets the requirement of smaller motion amplitude, so that the change of the motion can better meet the actual condition, and the subsequent human body grid follow-up and the small-amplitude change of the clothes model are facilitated.

The fifth part is to perform fusion processing on the two-dimensional picture output by the human body model and the original target human body two-dimensional picture, and is also a core part of the invention.

The whole process is actually a process from the two-dimensional picture to the two-dimensional picture finally. That is, the user inputs an original two-dimensional image of the user, and then selects the garment to be changed, so that a composite photo of the user after changing the garment can be directly obtained, wherein the head image, the hand image, the exposed skin color and the picture background of the user are consistent with those of the original two-dimensional image, and the difference is that the garment is changed into the selected changed garment.

Preferably, the fusion process further comprises: (1) carrying out portrait matting processing on a two-dimensional image of a target human body, but keeping a head image; (2) the three-dimensional human body model which is changed is processed and then output into a two-dimensional image which has the same size with the two-dimensional human body image, does not have any background and removes the head image; (3) superposing the two-dimensional image output by the human body model on the target two-dimensional human body image, and checking whether the head image of the target human body is consistent with the neck of the two-dimensional image output by the human body model; (4) and if the difference is not consistent, splicing the joint parts by adopting a pixel fusion mode.

Here, there are simply the following steps to achieve image stitching: (1) extracting characteristic points of each image; (2) matching the characteristic points; (3) carrying out image registration; (4) copying an image to a specific location of another image; (5) the overlap boundary is specially processed. The image splicing is realized by image registration and image fusion methods, so that the problem of micro overlapping of two images in the same picture can be solved well, and finally a multi-image high-resolution image is spliced. The image registration mainly solves the problem that two images under respective coordinates are converted into one image under the same coordinate, and the image fusion mainly solves the gray value problem of pixels of a spliced image.

Due to the defects of hardware design, the captured image cannot meet the requirements of image quality due to a plurality of different noises, image preprocessing work such as denoising and correcting needs to be carried out on the original image, the precision of the preprocessing stage of the image has great influence on the quality of the finally spliced image, and the preprocessing aims to enhance the detail information of the image, inhibit the noises and improve the image quality. Common pretreatment methods include the following:

(1) and carrying out image smoothing and edge sharpening processing. Because the images have different shooting visual angles, different folding transformations and random noise, the image sequences with the overlapped areas are not completely the same in the details of the overlapped parts, and therefore, only the outline or other main edges can be selected to be used as the vertical edges for feature matching. (2) A phase correlation algorithm. If there is a translation of the image, the translation can be converted to the frequency domain and the phase difference calculated. The pulse on the translational motion coordinate is the inverse fourier transform of this phase difference, and after the displacement positions of the two images are obtained, the point of the two images can be obtained by searching the position of the maximum value. (3) Gray scale image projection algorithm. If the translation in the vertical direction is negligible and the translation in the horizontal direction is small, the gray-scale image projection algorithm can be used for roughly positioning two adjacent images so as to reduce errors and narrow the search range when precise registration is carried out. First, a color image is converted into a gray scale and then into a gray scale image of a binary image, the gray scale values of all pixels are calculated and then projected to the vertical direction, in anticipation of accumulation, the projection of a position image that can be approximately matched by comparing adjacent curves. (4) And screening of video sequence subsets. When image splicing based on video is performed, video sequence images need to be screened first. Since a substantial amount of overlapping information is available for the images of the video sequence, they are displaced from each other by a small amount. Therefore, in order to reduce the registration error and the discontinuity of the spliced images and reduce the calculation amount, only a subset of the images can be selected, and the whole video sequence image is not used. (5) And (4) an algorithm based on template matching. The process based on template matching is to use a block in an overlapping region in one image as a template, and search for a corresponding block having the same or similar value as the template in the other image, so as to determine the overlapping range of the two images. Generally, if the area of the template is larger, the accuracy of the algorithm is higher, but the computational complexity is high.

The key of image stitching is to accurately find the position of the overlapping part in two adjacent images, and then determine the transformation relation of the two images, namely image registration. The image registration method based on the characteristics has the advantages of being insensitive to brightness and noise, and capable of processing the condition that large misalignment exists between images, and the method is generally adopted at present.

The image registration scheme includes two phases: a pre-registration phase and a registration phase. Firstly, in the pre-registration stage, perspective transformation is performed on a reference image to generate a training set, the SIFT method is used for extracting characteristic coefficients from the training set, and then the characteristic coefficients are input into the SLFN for training. Second, the output of the trained SLFN are those perspective transformation parameters. Since the SLFN has been trained, in the registration stage we simply use the same feature extraction method to extract feature coefficients from the registered images, and then input these feature coefficients into the trained SLFN to obtain the estimated perspective parameters. That is, the reference picture can be regarded as a "base", and a set of training sets is obtained by performing perspective transformation on the reference picture, and then the perspective transformation parameters of the training sets relative to the reference picture are known. For example, one image in the training set is x-axis translated by 3 pixels, y-axis translated by 2 pixels, rotated by 20 degrees, with a horizontal distortion of 0.001 and a vertical distortion of-0.003.

The training set is composed of 200 images, each image obtained by transforming the reference image a by the perception parameters in the predefined range is moved leftwards to be most similar to the image b, and thus the perspective transformation parameters can be obtained.

The perspective transformation, which is one of the most common and complex transformations between two-dimensional images, takes into account all possible motion patterns between the images. It may describe translational motion, zoom, rotation, horizontal and vertical distortion, etc. Since the reference image and the registered image are both in their respective pixel coordinate systems, we need to convert them to the same pixel coordinate system.

Where (x1, y1) is transformed by perspective to (x2, y2), where (x1, y1) is the reference image coordinates and (x2, y2) is the coordinates of the transformed image in the registration image. H is a perspective transformation matrix containing 8 non-zero parameters, which contains essentially all possible motion patterns between images, such as translation, rotation, scaling, etc.

In general, feature-based registration methods estimate a transformation matrix between images by extracting distinct blocks, lines, and points in the images as features. The general steps of image registration under this method are: (a) extracting the characteristics of the image to be registered; (b) matching image features; (c) estimating and obtaining a transformation matrix between the images through the matched features; (d) the images are aligned using a transformation matrix.

The feature detection is the basis of image registration, and different feature detection methods should be selected according to the scene characteristics of the images to be spliced. The commonly used edge detection methods include Roberts operator, Sobel operator, Prewitt operator, Canny operator, etc., and the most commonly used corner detection method is the Harris corner detection algorithm.

Image fusion is a technique for integrating useful information in two registered images into one image and displaying the integrated information in a visual way. Due to the influences of factors such as different resolutions and visual angles and illumination, the registered images are sometimes even spliced among multispectral images, fuzzy, ghost or noise points are generated in the overlapped part of image splicing, obvious splicing seams can be formed at the boundary, and the spliced images need to be fused in order to improve the visual effect and objective quality of the spliced images.

The current fusion algorithm can be roughly divided into: the fusion algorithm based on image gray, the fusion algorithm based on color space change and the fusion algorithm based on a transform domain. (1) And (4) fusion algorithm based on image gray level. The weighted average method is the simplest image fusion algorithm, and corresponding pixel points of two images are multiplied by a weighting coefficient and then added to obtain a fused image; (2) the image fusion algorithm based on the region of interest can be regarded as a self-adaptive weighted average method, the region of interest of one image is segmented, the weighting coefficient of the region of interest is set to be 1, the weighting coefficient of the corresponding region of the other image is set to be 0, and the region of interest of one image is embedded into the other image; (3) the contrast modulation method is to extract the contrast of one image by using the image detail information contained in the other image, and modulate the gray distribution of the other image to realize image fusion. The methods can better realize the fusion and alignment of the images.

The pixel-level image fusion is the most basic fusion method, the image obtained after the pixel-level image fusion has more detail information, such as the extraction of edges and textures, is beneficial to further analysis, processing and understanding of the image, can expose a potential target, and is beneficial to judging and identifying the operation of a potential target pixel point.

Preferably, the human body model outputs a part where the two-dimensional image is connected with the head of the original two-dimensional image, and fusion adjustment is carried out according to the detected skin color of the human face; the human body model outputs other exposed parts of the limbs of the two-dimensional image, and fusion adjustment is carried out according to the skin color of the exposed skin of the adjacent part in the original two-dimensional image. This part is mainly to synthesize the same skin color as the original picture, because the hands and some bare limbs may remain bare in the synthesized picture, especially the hands and neck. However, due to the different clothes styles, it is likely that the newly generated mannequin will have a new addition of exposed skin after being replaced with a new clothes model. Therefore, the skin color of the newly generated human body model must be consistent with the naked skin color in the original picture, otherwise, obvious color difference is caused, and the truth of the whole picture is seriously reduced. Preferably, the overlay image after matting contains a pixel missing part, and the pixel missing part is filled according to the background pixel color in a certain area around the missing part. This is because the background originally covered by the clothes is not filled with pixels after the clothes are replaced, and therefore, the background needs to be filled with surrounding colors for the natural background.

The fusion method according to the embodiment of the present invention described in connection with fig. 1 to 3 may be implemented by an apparatus for processing fitting and fusion of a human body model. FIG. 4 is a schematic diagram illustrating a hardware architecture 300 of an apparatus for processing mannequin fitting and fusion in accordance with an embodiment of the invention.

The invention also discloses a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the method and steps described above.

The electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus; a memory for storing a computer program; a processor for implementing the method and steps as described above when executing programs stored in the memory.

As shown in fig. 4, the apparatus 300 in the present embodiment includes: the device comprises a processor 301, a memory 302, a communication interface 303 and a bus 310, wherein the processor 301, the memory 302 and the communication interface 303 are connected through the bus 310 and complete mutual communication.

In particular, the processor 301 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured as one or more integrated circuits implementing an embodiment of the present invention.

Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include an HDD, a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the processing device 300, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory. In a particular embodiment, the memory 302 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The communication interface 303 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiment of the present invention.

The bus 310 comprises hardware, software, or both to couple the components of the processing device 300 to one another. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-x) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

That is, the apparatus 300 shown in fig. 4 may be implemented to include: a processor 301, a memory 302, a communication interface 303, and a bus 310. The processor 301, memory 302 and communication interface 303 are coupled by a bus 310 and communicate with each other. The memory 302 is used to store program code; the processor 301 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 302 for executing the fusion method in any embodiment of the present invention, thereby implementing the method and apparatus described in conjunction with fig. 1 to 3.

The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium is stored with computer program instructions; the computer program instructions, when executed by a processor, implement the fusion method provided by embodiments of the present invention.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A method for fusion processing of human body model images, the method comprising:

1) manufacturing a three-dimensional basic mannequin in an initial posture, wherein the initial posture parameters are determined by the initialization parameters of a basic mannequin model;

2) obtaining a three-dimensional clothing model;

3) fitting the three-dimensional garment model to a three-dimensional basic mannequin model of the initial posture;

4) acquiring secondary information of the human body model according to the target human body two-dimensional image by using the neural network model;

5) obtaining posture and body type parameters of a target human body model through neural network model regression prediction according to the secondary information, wherein the three-dimensional human body posture and body type parameters correspond to bones of a three-dimensional basic mannequin and a plurality of basic parameters;

6) inputting the obtained multiple groups of base and skeleton parameters into a basic mannequin model for fitting to obtain a target posture and a target body type;

7) driving the skeleton of the human body model to move from the initial posture to the target posture;

8) obtaining a three-dimensional target human body model which has the same posture as the two-dimensional image of the target human body and finishes the changing of the three-dimensional garment model;

9) fusing the picture head portrait with the target human body model image, and processing the part of the limb which is not completely aligned and aligned;

10) restoring the skin color of the naked part of the human body;

11) and obtaining a 2D picture of the target human head portrait, the limbs, the target image background and the changed clothes which are formed by the clothes worn on the target human body model.

2. The method of claim 1, wherein the fusion process further comprises: (1) performing portrait matting processing on a two-dimensional image of a target human body, but keeping a head image; (2) the three-dimensional human body model which is changed is processed and then output into a two-dimensional image which has the same size with the two-dimensional human body image, does not have any background and removes the head image; (3) superposing the two-dimensional image output by the human body model on the target two-dimensional human body image, and checking whether the head image of the target human body is consistent with the neck of the two-dimensional image output by the human body model; (4) and if the difference is not consistent, splicing the joint parts by adopting a pixel fusion mode.

3. The method according to claim 2, characterized in that the human body model outputs the part of the two-dimensional image connected with the head of the original two-dimensional image, and fusion adjustment is carried out according to the detected skin color of the human face; the human body model outputs other exposed parts of the limbs of the two-dimensional image, and fusion adjustment is carried out according to the skin color of the exposed skin of the adjacent part in the original two-dimensional image.

4. A method as claimed in claim 3, wherein the matting results in the superimposed image including missing pixels, and the filling is based on the background pixel color in a region around the missing pixels.

5. The method of claim 1, wherein the driving process further comprises: driving the skeleton of the human body model to move to a target posture from the initial posture by adopting a frame interpolation mode; the movement of the skeleton drives the following movement of the human body model mesh; driving the clothing model to move to a target posture along with the human body model frame by frame; the cloth calculation process of the clothing model and the movement process of the human body model grid are synchronously carried out, and the physical simulation calculation of the cloth is carried out after all the skeletons finish the movement of each frame.

6. The method according to claim 5, characterized in that after obtaining the initial state of the bone information and the target pose state parameters, the bone is driven to move from the initial pose to the target pose, and the time series of the bone information from the initial pose to the target pose is formed by frame interpolation of linear interpolation or nearest neighbor interpolation.

7. The method of claim 6, wherein during the generation of the animation sequence, the movement of the mesh of the human body model is performed by frame interpolation, after each frame drives the movement of the skeleton, the vertex-plane information of the human body model in the current state is obtained by calculating the weight parameters of the standard human body model, and the state of the mesh of the current human body model is updated, recorded and saved.

8. The method of claim 1, wherein the step of obtaining parameters of the target human body model further comprises, 1) obtaining a two-dimensional image of the target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into the first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and obtaining output three-dimensional human body parameters including three-dimensional human body action posture parameters and three-dimensional human body shape parameters.

9. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-8.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus; a memory for storing a computer program; a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.