CN110599593B

CN110599593B - Data synthesis method, device, equipment and storage medium

Info

Publication number: CN110599593B
Application number: CN201910867434.4A
Authority: CN
Inventors: 唐宇晨; 邱迪
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2021-03-23
Anticipated expiration: 2039-09-12
Also published as: CN110599593A

Abstract

The application discloses a data synthesis method, a data synthesis device, data synthesis equipment and a storage medium, and belongs to the technical field of data processing. The method comprises the following steps: acquiring a two-dimensional image of a target object, and performing three-dimensional mapping on the two-dimensional image of the target object to obtain a three-dimensional model of the target object; acquiring a three-dimensional model of the additional data; and matching the three-dimensional model of the target object and the three-dimensional model of the additional data through the texture coordinates of the target pixel, and obtaining an image of the target object synthesizing the additional data according to the matching result. The three-dimensional model of the target object is obtained by three-dimensionally mapping the two-dimensional image of the target object, and the three-dimensional model of the target object and the three-dimensional model of the additional data are matched in a texture coordinate mode to obtain the image of the target object for synthesizing the additional data, so that three-dimensional data synthesis can be realized through the 2D camera, and the cost is low; in addition, multiple modeling is not needed, so that the calculation time is saved, and the data synthesis efficiency is higher.

Description

Data synthesis method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a data synthesis method, a data synthesis device, data synthesis equipment and a storage medium.

Background

Along with the popularization of electronic commerce, the online shopping clothes are more and more popular among people due to the advantages of convenience, rapidness, time saving and labor saving. In order to enable a user to visually see the effect of trying on clothes, a data synthesis technology is widely applied to intelligent fitting applications.

In the related art, in order to realize the effect of intelligent fitting, before data synthesis, a 3D camera is required to respectively take a depth image and a color image of a standard body for dressing, a standard body for not dressing and a fitter, and respectively perform modeling according to three-dimensional point cloud data of each depth image to obtain a three-dimensional model or a color three-dimensional model; and then, carrying out segmentation and functional transformation on the obtained three-dimensional model, thereby obtaining the intelligent fitting model through data synthesis. The shooting of the 3D camera needs to be performed in the same illumination environment, and if the illumination environments are not the same, the illumination environments need to be adjusted by means of light supplement or shrinkage of the light transmission aperture of the 3D camera to obtain the same illumination environment.

In the course of implementing the present application, the inventors found that the related art has at least the following problems:

because data synthesis in the related art needs to use a 3D camera for shooting, the 3D camera has higher cost and is not popularized yet, and the use cost is higher; in addition, because modeling is needed for many times, the calculation time is long, the fitting model is slow to obtain, the data synthesis efficiency is not high, and the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a data synthesis method, a data synthesis device, data synthesis equipment and a storage medium, which can be used for solving the problems in the related art.

According to an aspect of embodiments of the present application, there is provided a data synthesis method, including:

acquiring a two-dimensional image of a target object, and performing three-dimensional mapping on the two-dimensional image of the target object to obtain a three-dimensional model of the target object;

acquiring a three-dimensional model of the additional data;

and matching the three-dimensional model of the target object and the three-dimensional model of the additional data through texture coordinates of target pixels, and obtaining an image of the target object synthesizing the additional data according to a matching result.

Optionally, the obtaining of the three-dimensional model of the additional data includes:

acquiring a two-dimensional image of a standard object with additional data and a two-dimensional image of a standard object without additional data;

respectively carrying out three-dimensional mapping on the two-dimensional image of the standard object of the additional data and the two-dimensional image of the standard object of the non-additional data to obtain a three-dimensional model of the standard object of the additional data and a three-dimensional model of the standard object of the non-additional data;

and acquiring a three-dimensional model of the additional data based on the three-dimensional model of the standard object of the additional data and the three-dimensional model of the standard object of the non-additional data.

Optionally, before matching the three-dimensional model of the target object and the three-dimensional model of the additional data by the texture coordinates of the target pixel, the method further includes:

for any one of the three-dimensional model of the target object and the three-dimensional model of the additional data, segmenting the any three-dimensional model to obtain a plurality of parts;

texture coordinates of the respective pixels included in each portion are determined.

Optionally, the matching the three-dimensional model of the target object and the three-dimensional model of the additional data by texture coordinates of target pixels, and obtaining an image of the target object synthesizing the additional data according to a matching result includes:

matching texture coordinates of pixels of each portion in the target object and the additional data;

and for the first target part successfully matched, synthesizing the data of the first target part of the additional data into the corresponding first target part in the target object.

Optionally, after matching texture coordinates of pixels of the target object and the additional data, the method further includes:

if a second target part which fails to be matched exists, acquiring supplementary data based on texture coordinates of pixels of the second target part in the target object;

synthesizing the supplemental data into a corresponding second target portion of the target object.

Optionally, the obtaining supplementary data based on texture coordinates of pixels of a second target portion in the target object includes:

searching for a target texture coordinate with the highest matching degree with the texture coordinate of the pixel of the second target part in the target object in the additional data;

and taking the data corresponding to the target texture coordinates as the supplementary data.

and determining the average value of the texture coordinates of the second target part in the target object in the reference range, and taking the data corresponding to the average value of the texture coordinates in the additional data as supplementary data.

Optionally, after acquiring the two-dimensional image of the target object, the method further includes:

predicting the posture of the target object based on the two-dimensional image of the target object to obtain two-dimensional images of a plurality of postures of the target object;

performing three-dimensional mapping on the two-dimensional images of the multiple postures of the target object to obtain a three-dimensional model of each posture of the target object;

and matching the three-dimensional model of each gesture of the target object with the three-dimensional model of the additional data through texture coordinates of target pixels, and obtaining images of the target object of the synthesized additional data under different gestures according to matching results.

Optionally, the predicting the pose of the target object based on the two-dimensional image of the target object to obtain two-dimensional images of multiple poses of the target object includes:

decomposing the two-dimensional image of the target object into a plurality of surfaces, and parameterizing each surface by adopting a local two-dimensional coordinate system to obtain the position of a target node on each surface area;

and expressing the position of the target node in a thermodynamic diagram, and then estimating a plurality of postures of the target object based on the reference speed processing intensive coordinates to obtain a two-dimensional image of the plurality of postures of the target object.

There is also provided a data synthesis apparatus, the apparatus comprising:

the first acquisition module is used for acquiring a two-dimensional image of a target object;

the mapping module is used for carrying out three-dimensional mapping on the two-dimensional image of the target object to obtain a three-dimensional model of the target object;

the second acquisition module is used for acquiring a three-dimensional model of the additional data;

and the synthesis module is used for matching the three-dimensional model of the target object and the three-dimensional model of the additional data through texture coordinates of target pixels and obtaining an image of the target object for synthesizing the additional data according to a matching result.

Optionally, the second obtaining module is configured to obtain a two-dimensional image of the standard object with additional data and a two-dimensional image of the standard object without additional data; respectively carrying out three-dimensional mapping on the two-dimensional image of the standard object of the additional data and the two-dimensional image of the standard object of the non-additional data to obtain a three-dimensional model of the standard object of the additional data and a three-dimensional model of the standard object of the non-additional data; and acquiring a three-dimensional model of the additional data based on the three-dimensional model of the standard object of the additional data and the three-dimensional model of the standard object of the non-additional data.

Optionally, the synthesizing module is further configured to segment any one of the three-dimensional model of the target object and the three-dimensional model of the additional data to obtain a plurality of parts; texture coordinates of the respective pixels included in each portion are determined.

Optionally, the synthesizing module is configured to match texture coordinates of pixels of the target object and the additional data of each part; and for the first target part successfully matched, synthesizing the data of the first target part of the additional data into the corresponding first target part in the target object.

Optionally, the synthesizing module is further configured to, if there is a second target portion that fails to be matched, obtain supplementary data based on texture coordinates of pixels of the second target portion in the target object; synthesizing the supplemental data into a corresponding second target portion of the target object.

Optionally, the synthesizing module is configured to search, in the additional data, a target texture coordinate with a highest matching degree with a texture coordinate of a pixel of a second target portion in the target object; and taking the data corresponding to the target texture coordinates as the supplementary data.

Optionally, the synthesizing module is configured to determine a texture coordinate average value of a reference range in which a second target portion in the target object is located, and use data corresponding to the texture coordinate average value in the additional data as supplementary data.

Optionally, the first obtaining module is further configured to predict a posture of the target object based on the two-dimensional image of the target object, so as to obtain two-dimensional images of multiple postures of the target object;

the mapping module is further configured to perform three-dimensional mapping on the two-dimensional images of the multiple postures of the target object to obtain a three-dimensional model of each posture of the target object;

and the synthesis module is also used for matching the three-dimensional model of each gesture of the target object with the three-dimensional model of the additional data through texture coordinates of target pixels, and obtaining images of the target object of the synthesized additional data under different gestures according to matching results.

Optionally, the first obtaining module is configured to decompose the two-dimensional image of the target object into a plurality of surfaces, and parameterize each surface by using a local two-dimensional coordinate system to obtain a position of a target node on each surface region; and expressing the position of the target node in a thermodynamic diagram, and then estimating a plurality of postures of the target object based on the reference speed processing intensive coordinates to obtain a two-dimensional image of the plurality of postures of the target object.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

the three-dimensional model of the target object is obtained by three-dimensionally mapping the two-dimensional image of the target object, and the three-dimensional model of the target object and the three-dimensional model of the additional data are matched in a texture coordinate mode to obtain the image of the target object for synthesizing the additional data, so that three-dimensional data synthesis can be realized through the 2D camera, and the cost is low; in addition, multiple modeling is not needed, so that the calculation time is saved, and the data synthesis efficiency is higher.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flow chart of a data synthesis method provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a model segmentation provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a data synthesis process provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a process for predicting an attitude according to an embodiment of the present application;

FIG. 6 is a schematic thermodynamic diagram provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a data synthesis process provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a data synthesis apparatus provided in an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a data synthesis device provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

In view of the above, an embodiment of the present application provides a data synthesis method, please refer to fig. 1, which shows a schematic diagram of an implementation environment of the method provided in the embodiment of the present application. The implementation environment may include: a terminal 11 and a server 12.

The terminal 11 is installed with a data synthesis client, for example, an intelligent fitting application client, a shopping application client with fitting scenes, and the like. The method provided by the embodiment of the present application can be applied to any client, if there is a need for data synthesis. Alternatively, the terminal 11 has a camera thereon, by which a two-dimensional image of the target object can be taken.

Alternatively, the terminal 11 shown in fig. 1 may be an electronic device such as a mobile phone, a tablet computer, a personal computer, or the like. The server 12 may be a server of an application installed on the terminal 11, and the server 12 may be one server, a server cluster composed of a plurality of servers, or a cloud computing service center. The terminal 11 establishes a communication connection with the server 12 through a wired or wireless network.

Based on the implementation environment shown in fig. 1, the embodiment of the present application provides a data synthesis method, which can be applied to an intelligent fitting scene. For example, in the case where the target object is a fitter, the additional data is clothes, and the standard object is a fitting model, the data synthesis may be to synthesize an image of the fitter wearing the clothes with an image of the clothes, that is, an image of the target object to which the additional data is synthesized. As shown in fig. 2, the terminal 11 in which the method is applied to the implementation environment shown in fig. 1 is taken as an example. As shown in fig. 2, the method provided by the embodiment of the present application may include the following steps:

step 201, a two-dimensional image of a target object is acquired.

Different from the mode of acquiring the three-dimensional image of the target object by adopting the 3D camera in the related art, the method and the device can acquire the two-dimensional image of the target object by adopting the 2D camera, thereby reducing the cost. For example, an application program of intelligent fitting is installed on the terminal, and after the application program of the intelligent fitting is opened by the user operation terminal, an image acquisition interface can be accessed to trigger the camera to acquire a two-dimensional image of the user.

Of course, besides the mode of acquiring the image in the scene needing data synthesis, the embodiment of the application also supports the acquisition of the acquired two-dimensional image. For example, an application program for intelligent fitting is installed on the terminal, and after the user operates the terminal to open the application program for intelligent fitting, the user can enter an album interface of the terminal to select a two-dimensional image which is shot before from an album.

The embodiment of the present application is not limited to the manner of acquiring the two-dimensional image of the target object, and in addition to the above two manners, downloading the two-dimensional image of the target object from the network may be supported. For example, a two-dimensional image of the target object is acquired from the cloud, or a two-dimensional image of the target object transmitted by another terminal is received.

Step 202, performing three-dimensional mapping on the two-dimensional image of the target object to obtain a three-dimensional model of the target object.

The embodiment of the application does not limit the three-dimensional mapping mode, for example, a general human body model is derived by adopting three-dimensional human body modeling software, and the data information of the model is classified and extracted according to modeling requirements. And converting the extracted data information into a suitable data format for storage. And selecting the characteristic points, and carrying out integral transformation on the universal human body model to ensure that the universal human body model is matched with the human body contour in the two-dimensional image of the target object, if the universal human body model is consistent with the human body contour in the two-dimensional image of the target object, or the similarity reaches a similarity threshold value. The similarity threshold may be set based on experience, and may also be adjusted according to different scenarios, which is not limited in the embodiment of the present application.

The body model is then transformed locally on the basis of a global transformation, for example, a three-dimensional generic body model using a radial basis function interpolation technique. And performing secondary interpolation smoothing processing on the transformed model. And synthesizing the full-view human texture image, and determining an overlapping area of the image by using a column characteristic-based matching method. And then, carrying out gray level adjustment by adopting a histogram matching method, and finishing seamless splicing of the images by using a weighted smoothing algorithm. And then, carrying out secondary image fusion by using a pyramid method, so as to avoid the phenomenon of large texture distortion when obtaining a full-view texture image, and obtain a smoother and more natural human texture image. Besides the steps, the texture mapping of the human body model can be carried out to obtain a three-dimensional model with reality.

The three-dimensional model of the target object is obtained by three-dimensionally mapping the two-dimensional image of the target object, so that data synthesis can be subsequently performed based on the three-dimensional model, and a three-dimensional effect is realized. For example, for an intelligent fitting scene, a two-dimensional image of a fitting person can be subjected to three-dimensional mapping to obtain a three-dimensional model of the fitting person, so that the three-dimensional model of the fitting person and the three-dimensional model of clothes are subjected to data synthesis through subsequent steps to achieve a three-dimensional fitting effect.

Step 203, a three-dimensional model of the additional data is obtained.

Optionally, obtaining a three-dimensional model of the additional data comprises: acquiring a two-dimensional image of a standard object with additional data and a two-dimensional image of a standard object without additional data; respectively carrying out three-dimensional mapping on the two-dimensional image of the standard object with the additional data and the two-dimensional image of the standard object without the additional data to obtain a three-dimensional model of the standard object with the additional data and a three-dimensional model of the standard object without the additional data; and acquiring the three-dimensional model of the additional data based on the three-dimensional model of the standard object of the additional data and the three-dimensional model of the standard object without the additional data.

The standard object to which data is added may be regarded as a model on which clothes are put, the standard object to which data is not added may be regarded as a model on which clothes are not put, and the two-dimensional image of the standard object to which data is added and the two-dimensional image of the standard object to which data is not added may be acquired as the above-described manner of acquiring the two-dimensional image of the target object. For example, a two-dimensional image of a standard object to which data is added and a two-dimensional image of a standard object to which data is not added are acquired by a 2D camera. Alternatively, the two-dimensional image of the standard object to which data is added and the two-dimensional image of the standard object to which data is not added are downloaded from the server. For example, the server side of the smart fitting application stores two-dimensional images of standard objects with different additional data and two-dimensional images of standard objects without additional data, and the terminal can provide fitting options based on the current smart fitting scene, wherein the different fitting options correspond to the two-dimensional images of the standard objects with different additional data and the two-dimensional images of the standard objects without additional data. And after receiving a selection instruction of a certain fitting option, pulling the two-dimensional image of the standard object with the additional data and the two-dimensional image of the standard object without the additional data corresponding to the fitting option from the server.

After the two-dimensional image of the standard object to which data is added and the two-dimensional image of the standard object to which data is not added are acquired, the two-dimensional image of the standard object to which data is added and the two-dimensional image of the standard object to which data is not added can be three-dimensionally mapped respectively in the above-described manner of three-dimensionally mapping the two-dimensional image of the target object, so that a three-dimensional model of the standard object to which data is added and a three-dimensional model of the standard object to which data is not added can be obtained.

Optionally, when the three-dimensional model of the additional data is obtained based on the three-dimensional model of the standard object of the additional data and the three-dimensional model of the standard object without the additional data, the three-dimensional model of the standard object of the additional data and the three-dimensional model of the standard object without the additional data may be directly subtracted from each other to obtain the three-dimensional model of the additional data.

Taking the example that the additional data is clothes, the three-dimensional model of the standard object of the additional data is the three-dimensional model of the model wearing clothes, and the three-dimensional model of the standard object without the additional data is the three-dimensional model of the model not wearing clothes in the intelligent fitting scene, the three-dimensional model of the model wearing clothes is subtracted from the three-dimensional model of the model not wearing clothes, and the rest is the three-dimensional model of the clothes, and the three-dimensional model of the clothes is used for synthesizing the clothes and the image of the target object.

Compared with the mode of three-dimensional modeling of clothes in a data synthesis scene, the three-dimensional model of the additional data is obtained by the mode of model subtraction, so that the efficiency of obtaining the three-dimensional model of the additional data can be improved, and the fitting request can be responded quickly.

And 204, matching the three-dimensional model of the target object and the three-dimensional model of the additional data through texture coordinates of the target pixel, and obtaining an image of the target object synthesizing the additional data according to a matching result.

For a three-dimensional model, there are two coordinate systems, one being the location (X, Y, Z) coordinates of the vertices and the other being the UV coordinates, i.e., texture coordinates. The U and V are coordinates of the picture in the horizontal direction and the vertical direction of the display respectively, and the value is generally 0-1, namely the width of the U-th pixel/picture in the horizontal direction and the height of the V-th pixel/picture in the vertical direction. Texture mapping can map a picture (or texture) onto one or more faces of a 3D model. The texture may be any picture and the use of texture mapping may increase the realism of the 3D object. Each fragment (pixel) has a corresponding texture coordinate. Since the size of the surface of the three-dimensional object is changed, the texture coordinates need to be updated continuously, and the difficulty is high. Thus, a texture coordinate space, also called UV space, is set. The texture coordinate range of each dimension is [0,1], and the texture coordinate multiplied by the height or width of the texture can be used to obtain the corresponding texture unit position of the vertex on the texture. For vertices, the texture coordinates are invariant with respect to position.

Based on the principle, the three-dimensional model of the target object and the three-dimensional model of the additional data are matched through the texture coordinates of the target pixel, so that the target object and the additional data are corresponding. Optionally, before matching the three-dimensional model of the target object and the three-dimensional model of the additional data with the texture coordinates of the target pixel, the texture coordinates of each pixel are determined. Ways to determine texture coordinates include, but are not limited to: for any three-dimensional model of the target object and the three-dimensional model of the additional data, segmenting any three-dimensional model to obtain a plurality of parts; texture coordinates of the respective pixels included in each portion are determined.

As shown in fig. 3, the three-dimensional model of the left target object is segmented to obtain N parts, as shown in the segmentation diagram on the right side of fig. 3. The target object in the two-dimensional image of the target object is aligned into the UV coordinate systems of the N sections, and then interpolated in each UV coordinate system. For example, the body part of the target object is divided to correspond to the labeled equidistant points, the position of the pixel in the corresponding part is determined for each pixel, and two-dimensional correction is performed, so that the texture coordinates of each pixel included in each part after the three-dimensional model of the target object is segmented are obtained. The three-dimensional model of the additional data is also segmented to obtain N parts. Then, the texture coordinates of the respective pixels included in each of the portions after the three-dimensional model segmentation of the additional data are determined in the manner described above for determining the texture coordinates of the respective pixels included in each of the portions.

Alternatively, after texture coordinates of each pixel of the three-dimensional model of the target object and the three-dimensional model of the additional data are obtained, texture coordinate matching of the pixels may be performed, thereby corresponding the three-dimensional model of the target object and the three-dimensional model of the additional data. When matching the texture coordinates of the pixels, the matching may be performed by using all the pixels, or may be performed by using the texture coordinates of some of the pixels. The pixel to be matched is the target pixel, that is, the target pixel may be all pixels in the three-dimensional model of the target object and the three-dimensional model of the additional data, that is, the three-dimensional model of the target object and the three-dimensional model of the additional data are matched by the texture coordinates of each pixel. Or the reference number of pixels in all the pixels in the three-dimensional model of the target object and the three-dimensional model of the additional data may be used, that is, the three-dimensional model of the target object and the three-dimensional model of the additional data are matched by the texture coordinates of the reference number of pixels. The reference number may be set based on experience or application scenarios, which is not limited in the embodiments of the present application. Compared with the mode of matching by adopting all pixels, the mode of matching by adopting the reference number of pixels has the advantages of higher speed and higher data synthesis efficiency.

Optionally, matching the three-dimensional model of the target object and the three-dimensional model of the additional data by texture coordinates of the target pixel, and obtaining an image of the target object synthesizing the additional data according to a matching result, including: matching texture coordinates of pixels of each part in the target object and the additional data; and for the first target part successfully matched, synthesizing the data of the first target part of the additional data into the corresponding first target part in the target object.

It should be understood that the first target portion that is successfully matched may be considered as the target object and the first target portion in the additional data that meet the matching requirement, such as the texture coordinates are consistent, or the number of texture coordinates that are consistent reaches a number threshold. For example, still taking the additional data as the clothes as an example, if the texture coordinates of the elbow joint of the target object are consistent with the texture coordinates corresponding to the elbow joint part in the clothes, and the number of the texture coordinates is equal to the number threshold, it is considered that the elbow joint of the target object is successfully matched with the elbow joint part in the clothes. The data of the elbow joint part in the clothes are synthesized to the elbow joint of the target object, so that the effect that part of clothes are attached to the outside of the elbow joint of the target object is achieved.

Through the above process, if the texture coordinates of the pixels of the target object and the additional data of each part are successfully matched, the additional data are integrally synthesized into the target object, and the effect that the target object wears clothes is achieved.

Of course, since the fitting person has different statures and is not always the same as the model, the fitting person texture coordinates are not always completely matched with the clothes texture coordinates, that is, in the data synthesis process, there may be a case where the texture coordinates of some pixels in the target object and the additional data are not successfully matched. In this regard, as an optional mode, after matching texture coordinates of pixels of each portion in the target object and the additional data, the method provided in the embodiment of the present application further includes: if a second target part which fails to be matched exists, acquiring supplementary data based on texture coordinates of pixels of the second target part in the target object; the supplemental data is composited into a corresponding second target portion of the target object.

Based on the texture coordinates of the pixels of the second target portion in the target object, supplemental data is obtained, including but not limited to the following two ways:

the first way to obtain supplemental data: searching for a target texture coordinate with the highest matching degree with the texture coordinate of the pixel of the second target part in the target object in the additional data; and taking the data corresponding to the target texture coordinates as supplementary data.

The process of completing data synthesis in this way may refer to fig. 4, and after the three-dimensional model of the target object and the three-dimensional model of the additional data are obtained, since each pixel of the three-dimensional model of the target object and the three-dimensional model of the additional data may correspond to a texture coordinate, if the second target portion of the target object is not matched to the corresponding additional data, the target texture coordinate having the highest degree of matching with the texture coordinate of the pixel of the second target portion in the target object may be searched in the additional data. For example, if the elbow joint of the target object is not matched with the corresponding clothing, the target texture coordinate with the highest matching degree with the texture coordinate of the elbow joint is searched in the clothing, for example, the texture coordinate corresponding to the arm part in the clothing is the searched target texture coordinate. Data corresponding to the arm part of the clothing is synthesized as supplementary data to the elbow joint of the target object, so that the clothing synthesized on the target object is completely filled, and the situation that some part of the target object does not cover the clothing is avoided.

Second way of acquiring supplemental data: and determining the average value of the texture coordinates of the second target part in the target object in the reference range, and taking the data corresponding to the average value of the texture coordinates in the additional data as the supplementary data.

In the second way, in the case that there is an unmatched second target portion, the matched data is searched for in the appended data by the mean value of the texture coordinates in the reference range of the second target portion in the target object. For example, still taking the example that the elbow joint of the target object is not matched with the corresponding clothing, the second way does not search the clothing for the target texture coordinate with the highest matching degree with the texture coordinate of the elbow joint, but determines the average value of the texture coordinates in the reference range of the elbow joint of the target object. And then searching the clothes for data corresponding to the average value of the texture coordinates. For example, if the texture coordinates corresponding to the arm portion of the clothing match the average value of the texture coordinates (match or have an error within a certain range), the data corresponding to the arm portion of the clothing is synthesized as the supplementary data to the elbow joint of the target object, so that the clothing synthesized on the target object is completely filled, and the situation that some part of the target object does not cover the clothing is avoided.

According to the method provided by the embodiment of the application, the three-dimensional model of the target object is obtained by three-dimensionally mapping the two-dimensional image of the target object, and the three-dimensional model of the target object and the three-dimensional model of the additional data are matched in a texture coordinate mode to obtain the image of the target object for synthesizing the additional data, so that three-dimensional data synthesis can be realized through the 2D camera, and the cost is low; in addition, multiple modeling is not needed, so that the calculation time is saved, and the data synthesis efficiency is higher.

Optionally, after acquiring the two-dimensional image of the target object, the method provided in the embodiment of the present application further includes: predicting the posture of the target object based on the two-dimensional image of the target object to obtain two-dimensional images of a plurality of postures of the target object; performing three-dimensional mapping on the two-dimensional images of the multiple postures of the target object to obtain a three-dimensional model of each posture of the target object; and matching the three-dimensional model of each gesture of the target object and the three-dimensional model of the additional data through the texture coordinates of the target pixel, and obtaining the image of the target object synthesizing the additional data under different gestures according to the matching result.

Optionally, predicting the pose of the target object based on the two-dimensional image of the target object, and obtaining two-dimensional images of multiple poses of the target object, including: decomposing a two-dimensional image of a target object into a plurality of surfaces, and parameterizing each surface by adopting a local two-dimensional coordinate system to obtain the position of a target node on each surface area; and expressing the position of the target node in a thermodynamic diagram, processing intensive coordinates based on the reference speed, estimating a plurality of postures of the target object, and obtaining a two-dimensional image of the plurality of postures of the target object.

As shown in fig. 5, the human body structure is decomposed into a plurality of independent surfaces based on the FPN (Feature Pyramid Networks) Feature extraction network and the DensePose-RCNN of the ResNet (Residual neural Networks) 50, and then the ROI (Region of Interest) Align is adopted to map the direct ROI from the original drawing to the Feature map directly by using bilinear interpolation without rounding, so that the error is much smaller, and the accuracy of corresponding to the original drawing after pooling is higher. Each surface is then parameterized by a local two-dimensional coordinate system to identify the location of each node on the region. The positions of the nodes of the target are represented in a thermodynamic diagram (as shown in fig. 6), and the intensive coordinates are processed at the speed of multiple frames per second, so that the accurate positioning and posture estimation of the dynamic character can be realized. The 2D image coordinates are thus mapped onto the 3D body surface by the convolutional neural network model.

After the two-dimensional images of the target object in the multiple postures are obtained through prediction, aiming at the two-dimensional image of each posture, the data synthesis process is adopted to synthesize the data of the target object in each posture with the additional data, and the image of the target object with the additional data synthesized in different postures is obtained. Aiming at the intelligent fitting scene, the fitting effect can be achieved under each posture of the target object.

As shown in fig. 7, based on the two-dimensional image of the target object (as shown in the left picture in fig. 7), the texture of the input picture is mapped into the target picture by UV coordinates on a common surface based on a Warping module. The core of this module is the STN (Spatial Transformer Network). DensePose divides the 3D model of the human body into 24 parts, STN aligns the character of the source picture into UV coordinate systems of 24 parts according to the output of DensePose, and then interpolates in each UV coordinate system. And then converted from the UV coordinate system to an output picture with another STN module. However, because the human body information included in the source graph generally cannot cover the whole human body, and the overlapping part with the human body appearance in the target graph is generally less, a repairing network (repairing network) can be added in the forwarding module. The inpainting module mainly infers the appearance of the rest part of the human body from the surface nodes of the human body filled by the STN module. Since the system cannot obtain complete human body surface information, an inpainting method different from other depth repairing methods can be used.

As shown in fig. 7, the person in the input picture on the left is aligned to 24 person body surface coordinates by STN, and then serves as an input to an interpolating auto encoder. The input automation needs to predict the appearance of the same person at different viewing angles according to the input. And then collecting pictures of the same person under multiple angles as target output of a repair module to serve as a supervision signal to train the model. The multi-view monitoring method can approximately obtain all appearance information of the human body. The above method is used to predict the pose, and two-dimensional images (as shown in the right image in fig. 7) with different poses are obtained.

An embodiment of the present application provides a data synthesis apparatus, and referring to fig. 8, the apparatus includes:

a first acquiring module 81 for acquiring a two-dimensional image of a target object;

the mapping module 82 is configured to perform three-dimensional mapping on the two-dimensional image of the target object to obtain a three-dimensional model of the target object;

a second obtaining module 83, configured to obtain a three-dimensional model of the additional data;

and a synthesizing module 84, configured to match the three-dimensional model of the target object and the three-dimensional model of the additional data with the texture coordinates of the target pixel, and obtain an image of the target object with the synthesized additional data according to a matching result.

Optionally, the second acquiring module 83 is configured to acquire a two-dimensional image of the standard object with additional data and a two-dimensional image of the standard object without additional data; respectively carrying out three-dimensional mapping on the two-dimensional image of the standard object with the additional data and the two-dimensional image of the standard object without the additional data to obtain a three-dimensional model of the standard object with the additional data and a three-dimensional model of the standard object without the additional data; and acquiring the three-dimensional model of the additional data based on the three-dimensional model of the standard object of the additional data and the three-dimensional model of the standard object without the additional data.

Optionally, the synthesizing module 84 is further configured to segment any three-dimensional model of the target object and any three-dimensional model of the additional data to obtain a plurality of parts; texture coordinates of the respective pixels included in each portion are determined.

Optionally, a synthesis module 84, configured to match texture coordinates of pixels of the target object and the additional data; and for the first target part successfully matched, synthesizing the data of the first target part of the additional data into the corresponding first target part in the target object.

Optionally, the synthesizing module 84 is further configured to, if there is a second target portion with a matching failure, obtain the supplementary data based on texture coordinates of pixels of the second target portion in the target object; the supplemental data is composited into a corresponding second target portion of the target object.

Optionally, the synthesizing module 84 is configured to find, in the additional data, a target texture coordinate with a highest matching degree with the texture coordinate of the pixel of the second target portion in the target object; and taking the data corresponding to the target texture coordinates as supplementary data.

Optionally, the synthesizing module 84 is configured to determine an average value of the texture coordinates of the second target portion in the target object within the reference range, and use data corresponding to the average value of the texture coordinates in the additional data as the supplementary data.

Optionally, the first obtaining module 81 is further configured to predict a pose of the target object based on the two-dimensional image of the target object, and obtain two-dimensional images of multiple poses of the target object;

the mapping module 82 is further configured to perform three-dimensional mapping on the two-dimensional images of the multiple postures of the target object to obtain a three-dimensional model of each posture of the target object;

the synthesis module 84 is further configured to match the three-dimensional model of each pose of the target object with the three-dimensional model of the additional data through the texture coordinates of the target pixel, and obtain an image of the target object with the synthesized additional data in different poses according to the matching result.

Optionally, the first obtaining module 81 is configured to decompose the two-dimensional image of the target object into a plurality of surfaces, and parameterize each surface by using a local two-dimensional coordinate system to obtain a position of a target node on each surface region; and expressing the position of the target node in a thermodynamic diagram, processing intensive coordinates based on the reference speed, estimating a plurality of postures of the target object, and obtaining a two-dimensional image of the plurality of postures of the target object.

According to the device provided by the embodiment of the application, the three-dimensional model of the target object is obtained by performing three-dimensional mapping on the two-dimensional image of the target object, and the three-dimensional model of the target object and the three-dimensional model of the additional data are matched in a texture coordinate mode to obtain the image of the target object for synthesizing the additional data, so that three-dimensional data synthesis can be realized through the 2D camera, and the cost is low; in addition, multiple modeling is not needed, so that the calculation time is saved, and the data synthesis efficiency is higher.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Fig. 9 is a schematic structural diagram of a data synthesis device according to an embodiment of the present invention. The device may be a terminal, and may be, for example: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. A terminal may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

Generally, a terminal includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 901 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 901 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 901 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement the data synthesis methods provided by the method embodiments herein.

In some embodiments, the terminal may further include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902 and the peripheral interface 903 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, and is provided with a front panel of the terminal; in other embodiments, the number of the display panels 905 may be at least two, and the two display panels are respectively disposed on different surfaces of the terminal or are in a folding design; in still other embodiments, the display 905 may be a flexible display, disposed on a curved surface or on a folded surface of the terminal. Even more, the display screen 905 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones can be arranged at different parts of the terminal respectively. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate the current geographic Location of the terminal to implement navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

The power supply 909 is used to supply power to each component in the terminal. The power source 909 may be alternating current, direct current, disposable or rechargeable. When power source 909 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal also includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 may detect the magnitude of acceleration on three coordinate axes of a coordinate system established with the terminal. For example, the acceleration sensor 911 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 901 can control the display screen 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 911. The acceleration sensor 911 may also be used for acquisition of motion data of a game or a user.

The gyroscope sensor 912 can detect the body direction and the rotation angle of the terminal, and the gyroscope sensor 912 and the acceleration sensor 911 cooperate to acquire the 3D motion of the user on the terminal. The processor 901 can implement the following functions according to the data collected by the gyro sensor 912: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 913 may be disposed on a side frame of the terminal and/or under the display 905. When the pressure sensor 913 is disposed on the side frame of the terminal, the user's holding signal to the terminal may be detected, and the processor 901 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is disposed at a lower layer of the display screen 905, the processor 901 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 905. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 914 is used for collecting a fingerprint of the user, and the processor 901 identifies the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 914 may be provided on the front, back, or side of the terminal. When a physical key or vendor Logo is provided on the terminal, the fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect ambient light intensity. In one embodiment, the processor 901 may control the display brightness of the display screen 905 based on the ambient light intensity collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display brightness of the display screen 905 is increased; when the ambient light intensity is low, the display brightness of the display screen 905 is reduced. In another embodiment, the processor 901 can also dynamically adjust the shooting parameters of the camera assembly 906 according to the ambient light intensity collected by the optical sensor 915.

A proximity sensor 916, also known as a distance sensor, is typically provided on the front panel of the terminal. The proximity sensor 916 is used to collect the distance between the user and the front face of the terminal. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front face of the terminal gradually decreases, the processor 901 controls the touch display 905 to switch from the bright screen state to the dark screen state; when the proximity sensor 916 detects that the distance between the user and the front face of the terminal gradually becomes larger, the display 905 is controlled by the processor 901 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 9 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an example embodiment, a computer device is also provided, see fig. 10, comprising a processor 1001 and a memory 1002, said memory 1002 having at least one instruction stored therein. The at least one instruction is configured to be executed by the one or more processors 1001 to implement any of the data synthesis methods described above.

In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one instruction which, when executed by a processor of a computer device, implements any of the above-described data synthesis methods.

Alternatively, the computer-readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of data synthesis, the method comprising:

acquiring a three-dimensional model of the additional data;

matching the three-dimensional model of the target object and the three-dimensional model of the additional data through texture coordinates of target pixels, wherein the target pixels are pixels with reference quantity in all pixels in the three-dimensional model of the target object and the three-dimensional model of the additional data;

for the first target part successfully matched, synthesizing the data of the first target part of the additional data into the corresponding first target part in the target object;

and for the second target part which fails to be matched, acquiring supplementary data based on texture coordinates of pixels of the second target part in the target object, and synthesizing the supplementary data into the corresponding second target part in the target object, wherein the second target part which fails to be matched is a part which is not matched with the corresponding supplementary data in the three-dimensional model of the target object.

2. The method of claim 1, wherein said obtaining a three-dimensional model of additional data comprises:

3. The method of claim 1, wherein prior to matching the three-dimensional model of the target object and the three-dimensional model of the additional data with texture coordinates of target pixels, further comprising:

4. The method of claim 1, wherein obtaining supplemental data based on texture coordinates of pixels of a second target portion in the target object comprises:

5. The method of claim 1, wherein obtaining supplemental data based on texture coordinates of pixels of a second target portion in the target object comprises:

6. The method according to any one of claims 1-5, wherein after acquiring the two-dimensional image of the target object, further comprising:

7. The method of claim 6, wherein predicting the pose of the target object based on the two-dimensional image of the target object, resulting in two-dimensional images of multiple poses of the target object, comprises:

8. A data synthesis apparatus, characterized in that the apparatus comprises:

a synthesis module, configured to match the three-dimensional model of the target object and the three-dimensional model of the additional data with texture coordinates of target pixels, where the target pixels are reference number pixels in all pixels in the three-dimensional model of the target object and the three-dimensional model of the additional data; for the first target part successfully matched, synthesizing the data of the first target part of the additional data into the corresponding first target part in the target object; and for a second target part with matching failure, acquiring supplementary data based on texture coordinates of pixels of the second target part in the target object, and synthesizing the supplementary data into a corresponding second target part in the target object, wherein the second target part with matching failure is a part which is not matched with corresponding supplementary data in the three-dimensional model of the target object.

9. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction which, when executed by the processor, implements a data synthesis method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium having stored therein at least one instruction which, when executed, implements a data synthesis method as claimed in any one of claims 1 to 7.