CN114202630A

CN114202630A - Illumination matching virtual fitting method, device and storage medium

Info

Publication number: CN114202630A
Application number: CN202010876706.XA
Authority: CN
Inventors: 周润楠; 杨超杰; 张涛; 郑天祥; 张胜凯; 周一凡; 周博生
Original assignee: Beijing Momo Information Technology Co ltd
Current assignee: Beijing Momo Information Technology Co ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2022-03-18

Abstract

The invention discloses a virtual fitting method for illumination matching, which comprises the following steps: acquiring a two-dimensional image of a target human body; obtaining a semantic segmentation graph of a target human body image; acquiring a two-dimensional image of the garment; matching the illumination intensity and the illumination angle of the prefabricated garment and the target human body image; and selecting the corresponding clothes texture mapping to manufacture a three-dimensional model of the clothes. The three-dimensional clothing model keeps the degree of reality and the degree of reduction of the three-dimensional clothing model through a series of methods, the three-dimensional clothing model is obtained by processing two-dimensional clothing pictures in advance, a user does not need to participate in the work behind the scenes, the system can automatically match the clothing models with corresponding illumination intensity and illumination angle, and the degree of reality and the degree of reduction are kept. The method is very well suitable for the characteristics and the trend of simplicity and quickness in the Internet era, and the fact that a user uploads a photo is all the virtual clothes changing work required to be completed by the user.

Description

Illumination matching virtual fitting method, device and storage medium

Technical Field

The invention belongs to the field of virtual clothes changing and fitting of users, and particularly relates to human body modeling, clothes modeling and fitting of a clothes model and a human body model used in virtual clothes changing, in particular to a virtual fitting method, equipment and a storage medium for calculating and extracting relevant illumination information from a target human body photo and matching illumination matching adapted to the illumination information.

Background

With the development of internet technology, online shopping is more and more popular. Compared with shopping in a physical store, online shopping has the advantages of multiple commodity types, convenience in shopping and the like. However, there are some problems that are not easy to solve when purchasing commodities on the internet, and most importantly, the commodities to be purchased cannot be checked on the spot. The problem of clothing is most prominent in all commercial varieties. Compared with the method that the clothes effect can be changed and checked in real time in the shopping of a physical store, the online clothes shopping can not provide an effect picture aiming at a consumer, only can provide a picture of model fitting, and even has no fitting picture, so that the consumer can not intuitively obtain the matching degree of the clothes and the body figure of the consumer in real time. Resulting in a large amount of returns.

In response to this problem, operators have attempted to solve this problem by providing simulated fitting effects for consumers using virtual fitting techniques. Of course, there are other situations in reality where virtual fitting and changing techniques can be used, such as in network games. Therefore, this technology has been developed more rapidly.

The virtual fitting refers to a technical application that a user can check the dressing change effect on a terminal screen in real time without actually changing clothes with the wearing effect. The existing dressing change technology mainly comprises a plane fitting technology and a three-dimensional virtual fitting technology. The former basically collects pictures of users, collects pictures of clothes, then stretches or compresses clothes into a state with the same size as human bodies, and then cuts and splices the clothes to form images after dressing, but the images have poor reality degree due to a simple and rough image processing mode, completely do not consider the actual body types of the users, only carry clothes on hard cover on the pictures of the users, and cannot meet the requirements of the users. The latter usually collects the three-dimensional information of the person through a three-dimensional collecting device and combines the characteristics of the clothes to synthesize, or manually inputs the body data information provided by the user, generates a virtual human body three-dimensional model mesh according to a certain rule, and then combines the mesh with the clothes chartlet. Overall, such three-dimensional virtual fitting requires a large amount of data acquisition or three-dimensional data calculation, and has high hardware cost and is not easy to popularize among ordinary users.

With the development of cloud computing technology, artificial intelligence technology and intelligent terminal processing capacity, a two-dimensional virtual fitting technology is generated. Such techniques essentially comprise three steps: (1) processing the personal body information provided by the user to obtain a target human body model; (2) processing the clothing information to obtain a clothing model; (3) the human body model and the clothing model are fused together to generate a simulated figure of the clothing worn by a person.

For the point (1), due to the accumulation of many uncertain factors such as the process design, the model parameter selection, the neural network training method, and the like, the quality of the finally generated clothes changing picture is not as good as that of the traditional three-dimensional virtual fitting technology, wherein the establishment of the human body model is a basic step, and the subsequent dressing process also needs to be based on the previously generated human body model, so once the human body model is generated inaccurately, the problems of overlarge body type difference between the human body model and the fitting person, skin texture loss, body part loss, and the like are easily caused, and the effect of the finally generated clothes changing picture is influenced.

In the general field of computer vision, there are many initial starting points for human body modeling, which generally include three major categories, namely omni-directional scanning of a real human body by using a 3D scanning device, a three-dimensional reconstruction method based on multi-view depth-of-field photography, and a method of combining a given image with a human body model to achieve three-dimensional reconstruction. The 3D scanning equipment is used for carrying out omnibearing scanning on a real human body to obtain the most accurate information, but the equipment is expensive usually and needs high cooperation of a human body model, and the whole processing process has high requirements on the processing equipment, so the equipment is generally applied to some professional fields; secondly, the multi-view three-dimensional reconstruction method needs to provide images with multiple overlapped views of a reconstructed human body and establish a space conversion relation among the images, multiple groups of cameras are used for shooting multiple images, a 3D model is spliced, the operation is relatively simplified, the calculation complexity is still high, and in most cases, only people participating in the scene can obtain multi-angle images. A model obtained by splicing the pictures taken by the depth camera in the multi-angle shooting method does not have body scale data and cannot provide a basis for 3D perception. Thirdly, only one image needs to be provided by the method of combining a single image with the human body model, the weight and the threshold value which can be used for describing curves of the neck, the chest, the waist, the hip and other parts of the human body are obtained by the neural network based three-dimensional human body characteristic curve intelligent generation method based on the neural network, and then the predicted human body model can be obtained by directly generating the human body three-dimensional curve which is matched with the real human body shape according to the size parameter information of the girth, the width, the thickness and the like of the human body section. However, the method still needs to consume a large amount of calculation due to a small amount of input information, so that the final model effect is not satisfactory.

For point (2), there are several different methods in the prior art of generating three-dimensional garment models. At present, a traditional clothes three-dimensional model building method is based on a two-dimensional clothes cutting piece design and sewing method. This method requires a certain amount of garment expertise to design the pattern, which is not a quality possessed by all users of virtual fitting, and also requires manual specification of the stitching relationship between the patterns, which takes a lot of time to set. Besides, another novel three-dimensional modeling method is based on hand drawing, and a simple clothing model can be generated through line information drawn by a user hand. However, this method requires professional personnel to perform hand drawing, has poor reproducibility and repeatability, requires a lot of time for users to perform detailed drawing of clothes, and is difficult to be popularized in electronic commerce on a large scale. Both of these approaches are more prone to innovative design of new garments than three-dimensional modeling of existing garments for sale. And the other method is to comprehensively use an image processing technology and a graph simulation technology on the basis of obtaining the clothing picture information to finally generate a virtual three-dimensional clothing model. The method comprises the steps of obtaining the outline and the size of the garment in a picture through outline detection and classification, finding out edges and key points of the edges from the outline through a machine learning method, generating sewing information through the corresponding relation of the key points, and finally performing physical sewing simulation on the garment in a three-dimensional space to obtain the real effect of the garment worn on a human body.

In conclusion, based on the internet technology and the network environment characteristics of the internet technology, the mode of directly outputting the final reloaded image or photo from a single human body image is undoubtedly the most preferable, the convenience is the best, and the user does not need to go to the site and only needs one photo to complete the whole virtual clothes changing process. The problem then comes to be that it will become the mainstream as long as it can be guaranteed that the resulting photo effect obtained by it is substantially equivalent to the real 3D simulated dressing change. Among them, how (1) a human body model closest to the real state of the human body is obtained through a photograph, and (2) how a three-dimensional garment model is put on the human body model closest to the real state becomes two most important irreparable problems in the virtual garment changing method.

For the first point. In the prior art, methods for constructing a human body model generally have several types: (1) the method is based on regression, a human body model represented by voxels is reconstructed through a convolutional neural network, the algorithm firstly estimates the position of a main joint point of a human body according to an input picture, then in a given voxel grid with a specified size according to the position of a key point, and the shape of the reconstructed human body is described by the whole shape of the internally occupied voxels according to whether each unit voxel in the voxel grid is occupied or not; (2) the method comprises the steps of roughly marking simple human skeleton key points on an image, and then carrying out initial matching and fitting on a human model according to the rough key points to obtain the approximate shape of the human body. (3) Representing the human skeleton by 23 skeleton nodes, then representing the posture of the whole human body by the rotation of each skeleton node, simultaneously representing the shape of the human body by 6890 vertex positions, giving the positions of the skeleton nodes in the fitting process, and simultaneously fitting the parameters of the shape and the posture so as to reconstruct the three-dimensional human body; or the CNN model is used for predicting key points on the image, and then the SMPL model is used for fitting to obtain an initial human body model. And then, the shape parameters obtained by fitting are used for back and forth normalizing the bounding boxes of the individual body joints, one bounding box corresponds to each joint, and the bounding boxes are represented by axial length and radius. And finally, combining the initial model and the bounding box obtained by regression to obtain the three-dimensional human body reconstruction. The method has the problems of low modeling speed, insufficient modeling precision and strong dependence on the created body and posture database on the reconstruction effect.

For the second point. There are such virtual fitting solutions in the prior art, including: acquiring a dressed reference human body model and an unworn target human body model; embedding skeletons of the same hierarchical structure into the reference human body model and the target human body model respectively; skin binding the reference human body model and the skeleton of the target human body model; calculating the rotation amount of bones in the target human model skeleton, and recursively adjusting all bones in the target human model skeleton to keep the postures of the target human model skeleton and the reference human model skeleton consistent; performing skin deformation of the target human body model by using an LBS skin algorithm according to the rotation amount of bones in the skeleton of the target human body model; and on the basis of skin deformation of the target human body model, transferring the clothing model from the reference human body model to the target human body model. According to the method, after the postures of the target human body model and the reference human body model are adjusted to be consistent, the difficulty of transferring the clothing model from the reference human body to the target human body can be reduced, the problem of low-efficiency non-rigid registration is converted into the problem of high-efficiency rigid registration, and therefore the clothing model is transferred from the reference human body model to the target human body model. The technical problem of automatic fitting of clothes under different human bodies and different postures under the condition of keeping the size of the clothes unchanged before and after fitting is solved. However, the method for covering the skin is superior in fitting speed because the method focuses too much on the fixed distance between the clothes and the skin, but has great disadvantages in clothes matching degree and reality degree, and is only suitable for occasions needing to quickly and simply process clothes to move along with the skin grids. In addition, the above method only focuses on how to maximally restore and reproduce the clothes model, but does not focus on what the original appearance of the clothes model should be.

In the prior art, a method for reconstructing geometric details of a single-view human body garment by illumination separation is disclosed, which comprises the following steps: acquiring human motion data through single RGB to obtain a single RGB image, extracting character gestures from the single RGB image, and solving the shape, the gesture and the relative spatial position of characters in each frame; generating a two-dimensional clothes grid model through a preset clothes template, and sewing and putting on different parts of clothes on the person in the initial posture through a particle simulation method; the human posture is transitedly converted to the posture of the 1 st frame in the video, the three-dimensional clothes are subjected to combined physical simulation, and the clothes physical simulation frame by frame is carried out on all the subsequent frames based on the human body posture; solving the clothes parameters by a human body segmentation method to ensure that the simulated clothes shape is consistent with the segmentation graph in the image; for each frame in the video, extracting an intrinsic illumination image and an intrinsic albedo image of the image by using an image illumination separation method; the method comprises the steps of obtaining a vertex-by-vertex normal direction of a clothes mesh model through an initial clothes shape which is physically simulated, and obtaining illumination information through the assumption of spherical harmonic illumination; solving the point-by-point deformation of the clothes on the premise of presetting the spherical harmonic illumination coefficient by the assumption of the spherical harmonic illumination to obtain the geometric details of the clothes; and projecting the per-frame solved clothing per-vertex deformation to a per-vertex local coordinate system, and performing time domain smoothing on the projection coefficient of each frame to obtain a final dynamic clothing detail reconstruction result.

Although the influence of illumination conditions on a clothes model is considered in the prior art, only one RGB camera is needed to collect a human body, scene illumination is obtained by utilizing intrinsic decomposition information of an image, so that illumination and surface detail information of clothes can be jointly solved, people and clothes in a single RGB video can be simultaneously modeled and simulated by obtaining, and further, the details of the clothes in an input image can be well reconstructed through a frame of clothes modeling and surface detail solving. However, due to the fact that high reduction of each frame of image is paid much attention, the used calculation method and the calculation principle are complex, and the used method needs to continuously complete related operations in the whole process, so that time control on reconstruction of the garment model is poor, the simulation calculation time is long, and the method is not suitable for virtual reloading application in internet scenes.

Therefore, in order to match with the development trend of the internet industry, in the subdivision field of virtual fitting, the minimum input information, the minimum calculation amount and the best effect are three basic targets which are always pursued. An optimal balance point needs to be found among the three, and the clothes matching method which can achieve simple input, has the calculated amount not exceeding the bearing capacity of the terminal equipment and has the effect close to that of professional equipment is provided.

Disclosure of Invention

Based on the above problems, the present invention provides a three-dimensional garment model matching method, apparatus and storage medium that overcome the above problems.

The invention provides a virtual fitting method for illumination matching, which comprises the following steps:

1) acquiring a two-dimensional image of a target human body;

2) obtaining a semantic segmentation graph of a target human body image;

3) acquiring a two-dimensional image of the garment;

4) matching the illumination intensity and the illumination angle of the prefabricated garment and the target human body image;

5) and selecting the corresponding clothes texture mapping to manufacture a three-dimensional model of the clothes.

Further, the method also comprises a matching process of the clothing model and the human body model, wherein a three-dimensional standard human body model is constructed by combining the mathematical model, and the three-dimensional standard model is in an initial position posture; attaching the three-dimensional garment model to a three-dimensional standard human body model of the initial position, namely a standard basic mannequin; calculating to obtain three-dimensional target human body model parameters through a neural network model according to the target human body two-dimensional image; inputting a plurality of groups of POSE base and body type SHAPE three-dimensional human body parameters into a three-dimensional standard human body model for fitting; and obtaining the target human body model which has the same posture and body type as the target human body and is worn by the changed clothes.

Further, a contour map of the target human body two-dimensional image is obtained, the two-dimensional human body contour image is substituted into the neural network subjected to deep learning for regression, and a semantic segmentation map of the target human body image is obtained.

Further, the matching process further includes obtaining a value of an RGB color space of the target human body two-dimensional image, converting the RGB value of the target image into an XYZ value of an XYZ color space, converting the XYZ value of the XYZ color space of the target image into a value of a Lab color space, combining the Lab image with the semantic segmentation map to obtain an L channel value corresponding to the target human body part, and obtaining an illumination intensity level corresponding to the target human body part through the L channel value. And acquiring texture maps of the clothing model under different illumination intensities according to different illumination intensity grades, and selecting the texture map corresponding to the illumination intensity according to the illumination intensity of the target human body to render the clothing model.

The method further comprises the steps of obtaining RGB values of the target human body image, converting the RGB values into gray values, combining the gray images with the semantic segmentation map to obtain a gray image corresponding to the target human body part, calculating the brightness difference value delta of each pixel in the horizontal and vertical directions in the image interesting region, and further obtaining the integral image brightness difference statistical characteristic gamma in the gray image. And calculating cosine similarity values of the current picture and pictures in the feature library by comparing a pre-established image difference statistical feature picture library, and selecting one picture with the highest feature similarity to estimate as the illumination angle of the current picture. And acquiring texture maps of the clothing model under different illumination angles according to different illumination angles, and selecting the texture map corresponding to the illumination angle according to the illumination angle of the target human body to render the clothing model.

Furthermore, a computer-readable storage medium is provided, wherein a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method and steps as described above.

An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus; a memory for storing a computer program; a processor for implementing the methods and steps described above when executing programs stored in the memory.

The invention has the beneficial effects that:

1. the lighting condition of the virtual clothes is highly consistent with the target human body image. When a three-dimensional garment model is manufactured at the beginning, a target human body photo which is actually shot is considered, and a plurality of illumination conditions including different illumination intensity and illumination angle are possibly generated and reflected on the photo and are obviously different. For example, the light intensity in the photo is poor, the face and skin of the person are darker, and the garment model is generated under good light conditions, so that the photo after the garment is changed is likely to appear, the face and the bare limbs of the person are darker, the garment is very bright, and the change effect is very unreal directly. Therefore, a method for prefabricating a plurality of models of the clothes under various illumination conditions is adopted, firstly, the work of different clothes models which need to consume a large amount of time is completed in advance, and after a user clicks to start changing the clothes, the user does not feel the delay of processing time; secondly, the natural degree of matching between the garment model and the human body photo is extremely high, the garment model and the human body photo are dark together when dark, the garment model and the human body photo are bright together when bright, the transition between the garment and the skin is very natural, particularly when only half-body clothes or a certain single article are virtually changed, the advantages of the user can be reflected, and the whole changed photo has no sense of incongruity. For example, a set of cheongsam has more than ten kinds of illumination, and different illumination intensity levels and illumination angle combinations are combined, that is, one set of cheongsam has more than ten sets of cheongsam in the clothing model library, and input pictures can be perfectly matched with the cheongsam under different illumination conditions.

2. The virtual clothes have high reality degree. Here, the reality actually includes two aspects, namely, the reality that the clothes naturally follow the state change of the human body is high; firstly, the reduction degree of the texture of the cloth of the clothes is high. Firstly, a three-dimensional clothing model is worn on a three-dimensional standard human body model, body type parameters of the target human body model are input to obtain the target human body model, the three-dimensional clothing model is made to change along with the change from the standard human body model to the target human body model in a clothing self-adaption mode, and the change of the clothing state is very real. In addition, we also adopt cloth simulation to simulate the cloth effect which is close to reality (certainly, the cloth effect can be different from the real physical effect), and we mainly highlight the high reduction degree of the cloth texture simulation, including the accuracy of the simulation of cloth printing. The cloth simulation adopts the combination of the cloth simulation and the covering method, the covering mode is adopted for the part which basically does not deform after the clothes move, and the cloth simulation mode is adopted for the part which deforms in the moving process of the clothes to ensure the fidelity and the calculation speed. In the cloth simulation process, when the human body model reaches the target posture, the cloth of the garment is subjected to a plurality of frames of gravity calculation, so that the fidelity of the cloth of the garment in the target posture is ensured.

3. The user operation is simple. The invention provides a method for obtaining accurate human body three-dimensional model parameters by analyzing a human body whole-body photo through a deep neural network, and the human body can be rapidly modeled only by one common photo; meanwhile, the three-dimensional clothing model is obtained by processing the two-dimensional clothing picture in advance, so that the user does not need to participate in the backdrop work, only needs to select the clothing style of the virtual fitting, and the system can automatically match the corresponding clothing model. The method is very well suitable for the characteristics and the trend of the Internet era, and is simple and quick. The user does not need any preparation, and uploading a photo is all the work the user needs to complete. If the invention is applied to scenes such as entertainment small programs or network shopping, the experience and the viscosity of the user can be greatly enhanced. The 3D model obtained without a depth of field camera or a plurality of groups of cameras corresponds to the real shape of the human body, and provides a wide application scene for various industries, such as clothes, health and the like.

4. High frequencies use deep neural networks. The invention fully utilizes the advantages of the deep learning network and can restore the posture and the body type of the human body in various complex scenes with high precision. Different neural networks are respectively used for different purposes, and the neural network models with different input conditions and training modes are utilized, so that accurate contour separation of the human body under a complex background, semantic segmentation of the human body and determination of key points and joint points are realized, the influence of loose clothes and hairstyle is eliminated, and the real body type and shape of the human body are approached to the maximum extent.

5. The human body model is accurate and controllable. The most commonly used parameterized model is the mapau SMPL model, which contains two sets of 72 parameters for describing body posture and body size. However, the SMPL model is mainly deeply learned and trained by a large number of human body model examples, the relationship between the body type and the shape base is an overall association relationship, the decoupling difficulty is high, and the body part to be controlled cannot be controlled at will. However, the human body model is not obtained through training, and the parameters have corresponding relations based on the mathematical principle, that is, the parameters of each group are independent without mutual involvement, so that the model is more explanatory in the transformation process and can better represent the shape change of a certain part of the body.

The invention keeps the three-dimensional clothing model reality degree and the three-dimensional clothing model reduction degree through a series of methods, fully considers the influence of the illumination intensity and the illumination angle on the final piece quality in the matching of the clothing model and the human body model, sets a plurality of groups of clothing models with different illumination conditions in advance, uses the clothing model which is most matched with the input picture to carry out virtual fitting, and not only keeps the processing speed fast, but also keeps the reality of clothing simulation.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is an overall process flow diagram of one embodiment;

FIG. 2 is a flow diagram of the processing of the illumination intensity parameter acquisition module of one embodiment;

FIG. 3 is a flowchart of the processing of the illumination angle parameter obtaining module of one embodiment;

FIG. 4 is a schematic diagram of the system of the present invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The following describes a method for processing a human body image according to an embodiment of the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention discloses a virtual fitting method for illumination matching, where the method includes:

1) acquiring a two-dimensional image of a target human body;

2) obtaining a semantic segmentation graph of a target human body image;

3) acquiring a two-dimensional image of the garment;

The method for matching the clothing model with the mannequin roughly comprises the steps of several parts. Firstly, generating a three-dimensional clothing model; secondly, generating a standard human body model, and sleeving the three-dimensional clothing model on the standard human body model; thirdly, obtaining parameters of the target posture human body model; and fourthly, changing the body type and the posture of the standard human body model to be consistent with those of the target human body model, and simulating the real change of the three-dimensional clothes model along with the change of the human body model.

The first part is mainly to generate a three-dimensional garment model. In the prior art of generating three-dimensional garment models, there are several different approaches. At present, a traditional clothes three-dimensional model building method is based on a two-dimensional clothes cutting piece design and sewing method. This method requires a certain garment expertise to design the template. The other novel three-dimensional modeling method is based on hand drawing, and a simple clothing model can be generated through line information drawn by a user hand. And the other method is to comprehensively use an image processing technology and a graph simulation technology on the basis of obtaining the clothing picture information to finally generate a virtual three-dimensional clothing model. The method comprises the steps of obtaining the outline and the size of the garment in a picture through outline detection and classification, finding out edges and key points of the edges from the outline through a machine learning method, generating sewing information through the corresponding relation of the key points, and finally performing physical sewing simulation on the garment in a three-dimensional space to obtain the real effect of the garment worn on a human body. Further, there are methods such as a mapping method and a mathematical model simulation method, and the present invention is not particularly limited to these methods.

However, the three-dimensional garment model needs to be matched with a standard human body model, the general requirement is that the garment model matched with the standard human body model is matched with the human body model in the target posture in a cloth physical simulation mode based on the garment model matched with the standard human body model, and the naturalness and the reasonability of the garment are ensured. Some basic requirements are usually met, including but not limited to the following: a. completely attaching the template to the initial position of a standard mannequin without penetrating the template; b, outputting the uniform four-sided surface; c. the model UV needs to be unfolded, tiled, compacted and aligned, and the chartlet needs to be manually aligned with the UV through a photoshop tool; d. performing over-vertex merging; e. the output model should be uniformly reduced, and the reference standard total surface number does not exceed 15w surfaces/set; f. the material is required to be adjusted in mainstream garment design software, 10 frames of animation are calculated to observe the cloth effect, the expectation is reached, and the material parameters are stored; g. the rendering material is required to be adjusted in mainstream design software, and one rendering is previewed, so that the reasonable lambert attribute of the material is ensured.

The illumination matching virtual fitting method comprises the following steps: 1) acquiring a two-dimensional image of a target human body; 2) obtaining a semantic segmentation graph of a target human body image; 3) acquiring a two-dimensional image of the garment; 4) matching the illumination intensity and the illumination angle of the prefabricated garment and the target human body image; 5) And selecting the corresponding clothes texture mapping to manufacture a three-dimensional model of the clothes.

After the target human body two-dimensional image is obtained, a human body contour map can be output according to an algorithm or substituted into a neural network model, the two-dimensional human body contour map is substituted into a neural network subjected to deep learning for regression, and a semantic segmentation map of the target human body image is obtained.

Wherein, step 3) is especially critical, still include the following steps: acquiring the RGB color space value of the target human body two-dimensional image, converting the RGB value of the target image into the XYZ value of the XYZ color space, and converting the XYZ value of the XYZ color space of the target image into the Lab color space value. And combining the Lab image with the semantic segmentation graph to obtain an L channel value corresponding to the target human body part, and obtaining the illumination intensity grade corresponding to the target human body part through the L channel value.

The brightness refers to the brightness of the picture, the contrast refers to the difference between the brightness of the picture, and the saturation refers to the fullness of the color of the picture. Picture files are typically in RGB format, primarily for display. RGB is an abbreviation for three colors, where R refers to Red (Red), G to Green (Green), and B to Blue (Blue). Modern times color theory holds that all colors are a combination of three colors, red, green and blue. In a computer, each color is recorded by one Byte (Byte), and three bytes are used in an RGB picture file to record three colors of red, green and blue, respectively, so that the better picture file is 24-bit. Some picture files also support transparency, which can also be recorded in one byte, so that the picture files supporting transparency are 32-bit. In recording a color with one byte, the byte can be viewed as a number, and one byte has 8 bits (bits), each bit representing a binary number with a value of 0 or 1, so that an 8-bit binary number converted to a decimal number can represent a range from 0 to 255. The color values may represent shading of the color by a number from 0 to 255. The color is darkest when the value is 0 and brightest when the value is 255. When the three color values of red, green and blue are all 0, the picture is black, and when the three color values of red, green and blue are all white, the picture is white, so that the 16, 777 and 216 colors including black, white and gray can be combined by changing the three color values of red, green and blue.

Lab is a less common color space than the RGB color space. It was established on the basis of the international standard for color measurement established by the international commission on illumination (CIE) in 1931. In 1976, the modified CIELab was formally named CIELab. It is a device-independent color system, and is also a color system based on physiological characteristics. This means that it describes the human visual perception digitally. The L component in the Lab color space is used for representing the brightness of the pixel, the value range is [0, 100], and the L component represents pure black to pure white; a represents the range from red to green, and the value range is [127, -128 ]; b represents the range from yellow to blue, and the value range is [127, -128 ].

The difference between the two methods is mainly as follows: RGB consists of a red channel (R), a green channel (G), and a blue channel (B), and the brightest red + brightest green + brightest blue is white; darkest red + darkest green + darkest blue ═ black; and between the lightest and darkest, red of the same shade + green of the same shade + blue of the same shade is gray. In any of the channels of RGB, white and black represent the shade of this color. Therefore, where there is white or off-white, neither of the R, G, B channels can be black, as R, G, B three channels are necessary to make up the colors.

Unlike LABs, the lightness channel (L) in LABs is exclusively responsible for the darkness of the entire image, simply a black and white version of the entire image. The a-channel and the b-channel are only responsible for how much of the color. The a channel represents a range from magenta (white in the channel) to dark green (black in the channel); b denotes a range from burnt yellow (white in the channel) to blue (black in the channel) that curls upward; a. 50% neutral gray in the b channel indicates no color, so closer to gray indicates less color and no brightness in the colors of the a and b channels. This explains why the contours of red garments in the a, b channels are usually very clear, since red is composed of magenta + burnt yellow.

A picture is made up of vertically and horizontally interleaved dots, one dot called a pixel. The number of transverse points and longitudinal points forms the resolution of the picture, the product of the transverse points and the longitudinal points is the number of pixels, and the number of the pixels can be used for measuring the resolution of the picture. When a picture is cut, the number of pixels of the picture is reduced, and the resolution is also reduced. Each color of each pixel on the picture can have a variation from 0 to 255, and the higher the value, the higher the brightness of the color, so when the brightness of a picture is changed, the value of each color of each pixel on the picture is actually changed at the same time, and the brightness of the picture is improved, that is, the value of each color of each pixel on the picture is inversely adjusted, so that the brightness of the picture is reduced. For each color value of each pixel on the picture, with 127 as a boundary, values less than 127 are dark, and values greater than 127 are light. If all color values of each pixel on the picture are decreased, which are less than 127, and all color values of each pixel on the picture are increased, which are greater than 127, we see an adjustment of the contrast of the picture, i.e. the darker the dark part of the picture, and the brighter the bright part of the picture.

The illumination intensity estimation mode is mainly estimated through the Lab channel value, and the work cannot be finished by directly using the RGB image. The L component in Lab color space is used to represent the brightness of the pixel, and has a value in the range of [0, 100], which represents from pure black to pure white. The original picture of RGB cannot be directly converted into Lab channel, and the RGB color space needs to be converted into XYZ color space by means of the XYZ color space, and then the XYZ color space is converted into Lab color space. The specific process is roughly as follows:

(1) RGB to XYZ conversion

Assuming that r, g and b are three channels of pixels, the value ranges are [0, 255], the conversion formula is as follows:

M＝0.4124，0.3576，0.1805

0.2126，0.7152，0.0722

0.0193，0.1192，0.9505

equivalent to the following equation:

X＝var_R*0.4124+var_G*0.3576+var_B*0.1805

Y＝var_R*0.2126+var_G*0.7152+var_B*0.0722

Z＝var_R*0.0193+var_G*0.1192+var_B*0.9505

the gamma function is used for carrying out nonlinear tone editing on the image, and the aim is to improve the contrast of the image. This function is not unique.

RGB is the Gamma corrected color component: r ═ G (R), G ═ G (G), B ═ G (B). Where rgb is the original color component.

g is the Gamma correction function: when x < 0.018, g (x) ═ 4.5318 x; when x is 0.018, g (x) is 1.099 d 0.45-0.099

RGB and RGB are both [0, 1 ]. After the calculation is completed, the value range of XYZ is changed, which is respectively: [0, 0.9506),[0,1),[0,1.0890).

(2) XYZ to LAB

L^★＝116f(Y/Y_n)-16

a^★＝500[f(X/X_n)-f(Y/Y_n)]

b^★＝200[f(Y/Y_n)-f(Z/Z_n)] (5)

In the above two equations, L, a, b are the values of the three channels of the final LAB color space.

Where f is a correction function like the Gamma function: when x > 0.008856, f (x) x ^ (1/3); when x is 0.008856, f (7.787 x) + (16/116).

x, Y, Z are values calculated after linear normalization after RGB has been converted into XYZ, xn, Yn, Zn are typically 95.047, 100.0, 108.883 by default. After the calculation is complete, L has a value in the range of [0, 100), and a and b are approximately [ -169, +169) and [ -160, + 160).

Through the conversion calculation, an L channel value corresponding to the 2D picture of the human body part can be obtained, and through comparing the L channel values of the interested region, a plurality of different illumination intensity levels can be quantized, for example, the L value of the interested region is averaged, and the illumination level quantization is performed according to the L' value. Since the human visual aesthetics generally prefer to have properly brightly lit photographs, a lighter value of 80% quantile is generally selected when selecting the value of L'.

When the clothes model is manufactured, a plurality of groups of texture maps of clothes are collected according to different illumination intensity conditions, after the illumination intensity grades are obtained in the previous step, the texture maps of the clothes model under different illumination intensities can be collected according to different illumination intensity grades, and the texture maps corresponding to the illumination intensity are selected according to the illumination intensity of a target human body to render the clothes model.

Step 3) also comprises the following parallel steps: and converting the gray-scale image through the RGB image, and estimating the illumination angle in a gradient calculation mode.

Theoretically, the calculation algorithms of the illumination directions are many, but basically, no scheme can solve the illumination problems in various scenes in the real world very accurately, so that the scheme simplifies the illumination source of the real environment under proper conditions, and in many scenes, the illumination source is left or right, the influence of high light, horizontal light or bottom light is large, the difference of the high light of certain degrees is not obvious, and experiments prove that the general illumination direction can be basically distinguished from the calculated illumination angle by the scheme.

And acquiring the RGB value of the target human body image, and converting the RGB value into a gray value. The RGB image is converted into a gray-scale image, and the additional influence brought by the color difference of the original image is eliminated. Since the illumination direction of the human body region is concerned, only the interested human body region (properly extended outwards) is extracted in combination with the semantic segmentation information of the 2D picture and used for calculating the illumination direction.

And combining the gray image with the semantic segmentation map to obtain a gray image corresponding to the target human body part, selecting a proper filter operator, calculating the brightness difference value delta of each pixel in the horizontal and vertical directions in the region of interest of the image, and further obtaining the integral image brightness difference statistical characteristic gamma in the gray image. And calculating cosine similarity values of the current picture and pictures in the feature library by comparing a pre-established image difference statistical feature picture library, and selecting one picture with the highest feature similarity to estimate as the illumination angle of the current picture. And acquiring texture maps of the clothing model under different illumination angles according to different illumination angles, and selecting the texture map corresponding to the illumination angle according to the illumination angle of the target human body to render the clothing model.

The two parameters of the illumination intensity and the illumination angle are integrated uniformly, so that the three-dimensional clothing model which is very close to the illumination condition of the target human body photo can be obtained, and a very good foundation is laid for the overall reality degree of clothing simulation.

The second part is that some basic mannequins are designed and modeled in advance according to the human body modeling method, and the three-dimensional clothing model is sleeved on the standard human body model so as to achieve the effect of adapting to the subsequent work flow of people. The main working contents are as follows: and combining the mathematical model to construct a three-dimensional standard human body model, namely a basic mannequin. The SMPL human body model of Mapu can avoid surface distortion of a human body in the motion process, and can accurately depict the shapes of muscle stretching and contraction motions of the human body. In the method, beta and theta are input parameters, wherein beta represents 10 parameters of human body with high and low fat and thin body, head-to-body ratio and the like, and theta represents 75 parameters of human body overall motion pose and 24 joint relative angles. The beta parameter is ShapeBlendPose parameter, and can control the change of human body shape through 10 incremental templates, and specifically, the change of human body shape controlled by each parameter can be depicted through a dynamic graph. By studying the continuous animation of parameter change, we can clearly see that the continuous change of each control human body form parameter can cause local and even whole linkage change of the human body model, and in order to reflect the movement of human muscle tissues, the linear change of each parameter of the SMPL human body model can cause large-area grid change. Figuratively speaking, for example, when adjusting the parameter of β 1, the model may directly understand the parameter change of β 1 as the whole change of the body, and you may only want to adjust the proportion of the waist, but the model may force the fat and thin of the legs, chest and even hands to adjust together. Although the working mode can greatly simplify the working process and improve the efficiency, the project pursuing the modeling effect is really very inconvenient. Because the SMPL human body model is a model which is trained by Western body pictures and measurement data and accords with the body type of a Western person, the body change rule basically accords with the common change curve of the Western person, and when the SMPL human body model is applied to modeling of a human body model of an Asian person, a plurality of problems can occur, such as the proportion of arms and legs, the proportion of waist and body, the proportion of neck, the length of legs, the length of arms and the like. Through our research, the aspects have large difference, and if the SMPL human body model is used in a hard way, the final generation effect can not meet our requirements.

Therefore, the effect is improved by adopting a human body model self-made mode. The core of the method is that a human body blenshape base is built to realize accurate independent control of a human body. Preferably, the three-dimensional standard human body model (basic human body platform) is composed of parameters of 20 physique bases and 170 skeleton parameters. The plurality of bases form the whole human body model, and each shape base is independently controlled and changed by parameters without mutual influence. So-called accurate control, on the one hand has increased the parameter of control, does not continue to use ten beta control parameters of mapplet, and like this, the parameter that can adjust is except general fat thin, has still added the length of arm, the length of shank, the fat thin of waist, buttock and chest etc. has improved the parameter more than one time in the aspect of the bone parameter, has richened the scope that can adjust the parameter greatly, provides good basis for the design standard manikin that becomes more meticulous. The independent control means that each base is independently controlled, such as waist, legs, hands, head and the like, each skeleton can be independently adjusted in length and is independent from each other, and physical linkage is not generated, so that fine adjustment of the human body model can be better realized. The model is no longer foolproof and cannot be adjusted to the form satisfied by the designer. The existing model embodies a corresponding relation on the mathematical principle, and is actually equivalent to that the model is designed from two parts of artificial aesthetics and data statistical analysis, so that the model is generated according to the design rule of the model and is considered to be a correct model according with the body type of the Asian person, and the model is obviously different from a big data training model of the SMPL (human body model), so that the parameter transformation of the model is more interpretable, the local body change of the body model can be better represented, in addition, the change is based on the mathematical principle, the influence of all parameters is avoided, and the complete independence state is kept between arms and legs. In fact, such many different parameters are designed, so that the defect of training a human body model by big data can be avoided, the human body model is accurately controlled in more dimensions, the parameters are not limited to some indexes such as height and the like, and the modeling effect is greatly improved. Only on the premise of self-building a form base, the setting of such many independent control parameters has practical significance, and the two are not available for meeting the requirements of designers on the standard.

The three-dimensional clothing model is worn on a standard human body model, which is a conventional technology in the field, and the three-dimensional clothing model is not limited too much and can achieve the required effect.

And the third part is to process the acquired human body image to obtain the parameter information required by generating the human body model. Previously, the selection of these skeletal key points is usually performed manually, but this method is inefficient and not suitable for the requirement of fast pace in the internet era, so that today when the neural network is in the way, it is a trend to use the deep-learning neural network to replace the manual selection of the key points. However, how to efficiently utilize the neural network is a problem that needs further research. In general, the idea of secondary neural network plus data refinement is adopted to construct the parameter acquisition system. As shown in fig. 2, we use a deep-learning neural network to generate these parameters, which mainly includes the following sub-steps: 1) acquiring a two-dimensional image of a target human body; 2) processing to obtain a two-dimensional human body outline image of a target human body; 3) substituting the two-dimensional human body contour image into a first neural network subjected to deep learning to carry out regression of the joint points; 4) obtaining a joint point diagram of a target human body; obtaining semantic segmentation maps of all parts of a human body; body key points; a body bone point; 5) substituting the generated joint point graph, semantic segmentation graph, body skeleton point and key point information of the target human body into a second neural network subjected to deep learning to carry out regression on human body posture and body type parameters; 6) and acquiring output three-dimensional human body parameters including a three-dimensional human body motion POSE parameter and a three-dimensional human body SHAPE SHAPE parameter.

The two-dimensional image of the target human body may be a two-dimensional image including a human body image in any posture and in any dressing. The acquisition of the two-dimensional human body contour image utilizes a target detection algorithm, which is a target area fast generation network based on a convolutional neural network.

Before the two-dimensional human body image is input into the first neural network model, the method further comprises a process of training the neural network, the training sample comprises a standard two-dimensional human body image marking the position of an original joint point, and the position of the original joint point is marked on the two-dimensional human body image with high accuracy by manual work. Here, a target image is first acquired, and human body detection is performed on the target image using a target detection algorithm. Human detection is not the detection of a real human body by using a measuring instrument, but in the invention, the actual detection means that for any given image, usually a two-dimensional picture containing enough information, such as a human face, the four limbs and the body requirements of a human are all included in the picture. Then, a certain strategy is adopted to search the given image so as to determine whether the given image contains the human body, and if the given image contains the human body, parameters such as the position and the size of the human body are given. In this embodiment, before acquiring key points of a human body in a target image, human body detection needs to be performed on the target image to acquire a human body frame indicating a human body position in the target image, and since an image input by a user can be any image, there are inevitable backgrounds of some non-human body images, such as a table chair, a large-tree automobile building, and the like, and these useless backgrounds are removed through some mature algorithms.

Meanwhile, semantic segmentation, joint point detection, bone detection and edge detection are carried out, and good foundation can be laid for generating a 3D human body model later by collecting the 1D point information and the 2D surface information. A first stage neural network is used to generate a map of the joints of the human body, alternatively, a target detection algorithm may rapidly generate a network for a target area based on a convolutional neural network. The first neural network needs to carry out massive data training, some photos collected from the network are labeled by manpower, then the photos are input into the neural network for training, the neural network through deep learning can basically achieve the purpose that the joint point graph with the same accuracy and effect as those of the artificially labeled joint points can be immediately obtained after the photos are input, and meanwhile, the efficiency is tens of times or even hundreds of times that of the artificially labeled joint points.

In the invention, the position of the joint point of the human body in the picture is obtained, only the first step is completed, 1D point information is obtained, and 2D surface information is generated according to the 1D point information, and the work can be completed through a neural network model and a mature algorithm in the prior art. The invention redesigns the working process and intervention time of the neural network model, reasonably designs various conditions and parameters, makes the parameter generation work more efficient, reduces the degree of manual participation, is very suitable for the internet application scene, for example, in the virtual reloading program, the user can obtain the reloading result in a basically instant manner without waiting, and plays a vital role in improving the attraction of the program to the user.

After the relevant 1D point information and 2D surface information are obtained, the parameters or results, namely the key point map, the semantic segmentation map, the body skeleton points and the key point information of the target human body can be taken as input items to be substituted into a second neural network subjected to deep learning to carry out regression of the human body posture and body type parameters. Through the regression calculation of the second neural network, a plurality of groups of three-dimensional human body parameters including three-dimensional human body motion POSE parameters and three-dimensional human body type SHAPE parameters can be immediately output. Preferably, the loss function of the neural network is designed based on a three-dimensional standard human body model (base human body model), a predicted three-dimensional human body model, a standard two-dimensional human body image in which the positions of the original joint points are labeled, and a standard two-dimensional human body image including the positions of the predicted joint points.

The fourth part, namely the most critical part, is to fit the parameters of the human body model with the human body model, and simultaneously, ensure that the state of the clothes after moving is as real as possible. As shown in fig. 3, the moving process includes the following sub-steps, corresponding the obtained three-dimensional body post and SHAPE parameters to a plurality of base and skeleton parameters of the three-dimensional standard body model; inputting the obtained groups of base and skeleton parameters into a standard three-dimensional human parameter model for fitting; the three-dimensional human body model has a mathematical weight relation between the skeleton points and the model grid, and the determination of the skeleton points can be associated with the human body model for determining the target human body posture. In this section, the two parameters generated in the previous section are used to substitute the pre-designed human body model for the construction of the 3D human body model. These two types of parameters are similar to the names of the human SMPL model parameters in mapu, but actually contain a large difference in substance. Because the basis of the two models is different, namely, the self-made three-dimensional standard human body models (basic mannequins) are adopted, each basis is designed according to the body type and the figure proportion of Asians, the three-dimensional standard human body models comprise a plurality of parts which are not related to the SMPL model, the SMPL model of Mapu adopts the standard human body model generated by big data training, the two models have different generation and calculation modes, and the three-dimensional model is finally embodied as the generated 3D human body model, but has larger connotation difference. After this step, a preliminary 3D phantom is obtained, including the bone positions and the mesh (mesh) of the phantom with long short messages.

In this part, after the three-dimensional clothing model is sleeved on the standard human body model, the body type and the posture of the standard human body model are required to be changed to be consistent with those of a target human body, and the real change of the three-dimensional clothing model along with the change of the human body model is simulated. We use several methods to ensure that the above objectives are successfully achieved.

Firstly, equipment self-adaptation means that a target posture human body model is given based on a garment model which is already adapted to a standard human body model, namely the human body model has the same posture and the same grid structure as the standard human body model, the number of grid vertexes, grid units and topological connection relations between the vertexes are the same, and only human body models with different body types such as tall, thin and fat body types are adapted to the human body model with the target body type in a garment self-adaptation matching mode, the three-dimensional garment model changes along with the change of the fat body and thin body of the human body model, and the naturalness and the reasonability of the garment are ensured.

In the self-adaptive process of the equipment, a field is generated for the mesh surface patches of the standard human body model, a fixed field corresponding relation is established between each surface patch of the three-dimensional garment and the corresponding position of the standard human body model, and when the body type of the standard human body model changes towards the body type of the target human body model, the three-dimensional garment model can also realize corresponding uniform change.

In this section, the mannequin also completes the change from the initial position to the target position. Since we input only one photo, the pose of the target body on the photo is usually different from the basic human platform, and then, in order to fit the pose of the target body, the change from the initial position to the target position is completed. In order to simulate more realistically, when several sets of base and skeleton parameters are fitted in a standard three-dimensional human parametric model in the fitting step, the method further comprises the following steps,

1) obtaining the position coordinates of the initial position and the target position; the initial position parameter is obtained by the initial parameter of the standard mannequin model, and the skeleton information of the target position is obtained by the regression prediction of the neural network model. 2) Generating an animation sequence moving from the initial position to the target position; 3) in the process of generating the animation sequence, processing in a grid mesh frame inserting mode; 4) the frame interpolation speed is set to be slow in the positions of the front and back distance initial points and the target point and fast in the middle movement process; 5) and (5) keeping static for several frames when the final target position is driven, and obtaining the whole animation sequence. 6) The bone movement from the initial position to the target position is driven. Compared with the method of inserting frames at a constant speed, the method is closer to the real physical world motion law, and the simulated effect is better.

The method of generating a three-dimensional body model including generating a garment model according to the embodiments of the present invention described in connection with fig. 1 to 3 may be implemented by a device for processing body images. Fig. 4 is a diagram illustrating a hardware configuration 300 of an apparatus for processing a human body image according to an embodiment of the present invention.

The invention also discloses a computer readable storage medium, which stores a computer program, and the computer program is executed by a processor to realize the clothing model matching method and steps.

The electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus; a memory for storing a computer program; and the processor is used for realizing the clothing model matching method and the steps when executing the program stored in the memory.

As shown in fig. 4, the apparatus 300 for implementing virtual fitting in this embodiment includes: the device comprises a processor 301, a memory 302, a communication interface 303 and a bus 310, wherein the processor 301, the memory 302 and the communication interface 303 are connected through the bus 310 and complete mutual communication.

In particular, the processor 301 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured as one or more integrated circuits implementing an embodiment of the present invention.

Memory 302 may include mass storage for data or instructions. By way of example, and not limitation, memory 302 may include an HDD, a floppy disk drive, flash memory, an optical disk, a magneto-optical disk, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 302 may include removable or non-removable (or fixed) media, where appropriate. The memory 302 may be internal or external to the human image processing apparatus 300, where appropriate. In a particular embodiment, the memory 302 is a non-volatile solid-state memory. In a particular embodiment, the memory 302 includes Read Only Memory (ROM). Where appropriate, the ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory or a combination of two or more of these.

The communication interface 303 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiment of the present invention.

The bus 310 includes hardware, software, or both to couple the components of the apparatus 300 for processing human body images to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 310 may include one or more buses, where appropriate. Although specific buses have been described and shown in the embodiments of the invention, any suitable buses or interconnects are contemplated by the invention.

That is, the apparatus 300 for processing a human body image shown in fig. 4 may be implemented to include: a processor 301, a memory 302, a communication interface 303, and a bus 310. The processor 301, memory 302 and communication interface 303 are coupled by a bus 310 and communicate with each other. The memory 302 is used to store program code; the processor 301 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 302, so as to perform the virtual fitting method in any embodiment of the present invention, thereby implementing the method and apparatus for virtual fitting described in conjunction with fig. 1 to 3.

The embodiment of the invention also provides a computer storage medium, wherein the computer storage medium is stored with computer program instructions; the computer program instructions, when executed by a processor, implement the method for processing human body images provided by the embodiments of the present invention.

It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.

The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims

1. A method of lighting-matched virtual fitting, the method comprising:

1) acquiring a two-dimensional image of a target human body;

2) obtaining a semantic segmentation graph of a target human body image;

3) acquiring a two-dimensional image of the garment;

2. The method of claim 1, further comprising a matching process of the garment model and the human body model, and constructing a three-dimensional standard human body model in combination with the mathematical model, wherein the three-dimensional standard model is in an initial posture; attaching the three-dimensional garment model to a three-dimensional standard human body model in an initial posture, namely a standard basic mannequin; calculating to obtain three-dimensional target human body model parameters through a neural network model according to the target human body two-dimensional image; inputting the obtained three-dimensional human body parameters of a plurality of groups of postures and body types into a three-dimensional standard human body model for fitting; and obtaining the target human body model which has the same posture and body type as the target human body and is worn by the changed clothes.

3. The method according to claim 1, characterized in that a contour map of a target human body two-dimensional image is obtained, and the two-dimensional human body contour image is substituted into a neural network model subjected to deep learning for regression to obtain a semantic segmentation map of the target human body image.

4. The method as claimed in claim 1, wherein the RGB color space values of the two-dimensional image of the target human body are obtained, the RGB values of the target image are converted into XYZ values of an XYZ color space, the XYZ values of the XYZ color space of the target image are converted into Lab color space values, the Lab image is combined with the semantic segmentation map to obtain L channel values corresponding to the target human body part, and the illumination intensity level corresponding to the target human body part is obtained through the L channel values.

5. The method as claimed in claim 4, wherein the texture maps of the garment model under different illumination intensities are collected according to different illumination intensity levels, and the texture map corresponding to the illumination intensity is selected according to the illumination intensity of the target human body for rendering the garment model.

6. The method as claimed in claim 1, wherein the RGB values of the target human body image are obtained, the RGB values are converted into gray scale values, the gray scale image is combined with the semantic segmentation map to obtain a gray scale map corresponding to the target human body part, the luminance difference value δ in the horizontal and vertical directions of each pixel in the region of interest of the image is calculated, and then the statistical characteristic γ of the luminance difference of the whole image in the gray scale image is obtained.

7. The method of claim 6, wherein the cosine similarity between the current picture and the picture in the feature library is calculated by comparing a pre-established image difference statistical feature picture library, and the picture with the highest feature similarity is selected and estimated as the illumination angle of the current picture.

8. The method as claimed in claim 7, wherein the texture maps of the garment model under different illumination angles are collected according to different illumination angles, and the texture map corresponding to the illumination angle is selected according to the illumination angle of the target human body for rendering the garment model.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method and steps of any one of claims 1 to 8.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus; a memory for storing a computer program; a processor for implementing the method and steps of any of claims 1-8 when executing a program stored in a memory.