WO2022205755A1 - 纹理生成方法、装置、设备及存储介质 - Google Patents

纹理生成方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022205755A1
WO2022205755A1 PCT/CN2021/114973 CN2021114973W WO2022205755A1 WO 2022205755 A1 WO2022205755 A1 WO 2022205755A1 CN 2021114973 W CN2021114973 W CN 2021114973W WO 2022205755 A1 WO2022205755 A1 WO 2022205755A1
Authority
WO
WIPO (PCT)
Prior art keywords
texture
view
sample
network
generation network
Prior art date
Application number
PCT/CN2021/114973
Other languages
English (en)
French (fr)
Inventor
邓又铭
宋勃宇
刘文韬
钱晨
Original Assignee
深圳市慧鲤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市慧鲤科技有限公司 filed Critical 深圳市慧鲤科技有限公司
Publication of WO2022205755A1 publication Critical patent/WO2022205755A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the present disclosure relates to image processing technologies, and in particular, to a texture generation method, apparatus, device, and storage medium.
  • the human body texture information needs to be supplemented on the basis of the human body mesh model, so that the reconstructed 3D human body model has a more realistic visual effect.
  • continuous scanning of the human body is required.
  • the method of obtaining human body texture by continuously scanning the human body is more complicated to operate.
  • embodiments of the present disclosure provide at least a texture generation method, apparatus, device, and storage medium.
  • a texture generation method comprising:
  • the target object is a human body
  • the first view texture of the target object is a front view texture of the human body
  • the second view texture is a back view texture of the human body
  • the method further includes: mapping the first view texture and the second view texture to the The initial three-dimensional model of the target object is obtained, and the target three-dimensional model filled with the texture structure is obtained; the initial three-dimensional model is a three-dimensional mesh model representing the geometric shape of the target object.
  • the inputting the texture of the first view into a pre-trained texture generation network to obtain the second view texture of the target object predicted and output by the texture generation network includes: Perform down-sampling processing on the texture to obtain a first texture feature map; perform up-sampling processing based on the first texture feature map to output the second view texture.
  • the training process of the texture generation network includes: based on the first view texture sample of the sample object, outputting the second view texture sample of the sample object through the texture generation network to be trained;
  • the texture samples and the second view texture samples generate network supervision information;
  • the network supervision information includes at least one of the following: texture supervision information and confrontation supervision information obtained by generating an adversarial network; based on the network supervision information, adjusting Network parameters of the texture generation network.
  • the generating network supervision information according to the first view texture samples and the second view texture samples includes: Perform feature extraction on the texture sample to obtain a first texture feature; perform feature extraction on the second view texture sample to obtain a second texture feature; according to the first texture feature, the second texture feature and the texture feature of the second view label , the first feature loss used to represent the difference between texture features is obtained as the texture supervision information.
  • the generating network supervision information according to the first view texture sample and the second view texture sample includes: The view texture samples and the second view texture samples are input to the first discriminator; according to the output value of the first discriminator and the first discriminant label, the first discriminant loss is obtained as the adversarial supervision information.
  • the network supervision information further includes: regression supervision information; the method further includes: obtaining a first regression loss as the regression supervision based on the second view texture samples and the corresponding second view labels information.
  • the outputting the second view texture sample of the sample object through the texture generation network to be trained based on the first view texture sample of the sample object includes: receiving an initial image, where the initial image includes the sample object; process the initial image to obtain the first view texture sample and at least one of the following two object masks: a first view object mask and a second view object mask; At least one of a view object mask and a second view object mask, and the first view texture samples are used as the input of the texture generation network to be trained, and the second view output by the texture generation network is obtained. Texture samples.
  • the method before outputting the second view texture sample of the sample object through the texture generation network to be trained based on the first view texture sample of the sample object, the method further includes: based on the first view texture sample of the sample object Three-view texture sample, output the fourth-view texture sample of the sample object through the auxiliary texture generation network to be trained; wherein, the resolution of the first-view texture sample is higher than that of the third-view texture sample; according to the the fourth view texture sample, adjust the network parameters of the auxiliary texture generation network; after the auxiliary texture generation network is trained, use at least part of the network parameters of the auxiliary texture generation network as at least part of the texture generation network parameter.
  • the auxiliary texture generation network includes: a first encoding end and a first decoding end; the texture generation network includes: a second encoding end and a second decoding end; An encoding end adds at least one convolution layer, and the second decoding end adds at least one deconvolution layer than the first decoding end.
  • a texture generating apparatus comprising:
  • a texture acquisition module used to acquire the first view texture of the target object
  • a prediction processing module configured to input the texture of the first view into a pre-trained texture generation network to obtain a second view texture of the target object predicted and output by the texture generation network, the texture of the first view and the texture of the second View textures correspond to different view acquisition angles.
  • the target object is a human body
  • the first view texture of the target object is a front view texture of the human body
  • the second view texture is a back view texture of the human body
  • the apparatus further includes: a network training module for training the texture generation network;
  • the network training module includes: a texture output sub-module, configured to generate a texture sample based on the first view of the sample object, through the The trained texture generation network outputs the second view texture sample of the sample object;
  • the supervision generation sub-module is used to generate network supervision information according to the first view texture sample and the second view texture sample;
  • the network supervision The information includes at least one of the following: texture supervision information and adversarial supervision information obtained by generating an adversarial network; and a parameter adjustment sub-module for adjusting network parameters of the texture generation network based on the network supervision information.
  • the method when the supervision generation sub-module is used to generate the texture supervision information, the method includes: performing feature extraction on the texture samples of the first view to obtain first texture features; Perform feature extraction on the sample to obtain a second texture feature; according to the first texture feature, the second texture feature and the texture feature of the second view label, obtain a first feature loss used to represent the difference between the texture features as the texture Supervision information.
  • the supervision generation sub-module when used to generate the adversarial supervision information, includes: inputting the first view texture sample and the second view texture sample into a first discriminator; The output value of the discriminator and the first discriminant label are used to obtain the first discriminant loss as the adversarial supervision information.
  • the texture output sub-module is specifically configured to: receive an initial image, where the initial image includes the sample object; process the initial image to obtain the first view texture sample, and the following At least one of two object masks: a first view object mask and a second view object mask; mask at least one of the first view object mask and the second view object mask, and all
  • the first view texture sample is used as the input of the texture generation network, and the second view texture sample output by the texture generation network is obtained.
  • an electronic device which includes: a memory and a processor, where the memory is configured to store computer-readable instructions, and the processor is configured to invoke the computer instructions to implement the method of any embodiment of the present disclosure .
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method of any embodiment of the present disclosure.
  • a computer program product including a computer program that, when executed by a processor, implements the method of any of the embodiments of the present disclosure.
  • a texture generation network obtained by pre-training is used to predict and output the second view texture of the target object, so that only the first view texture of the target object can be obtained With the second view texture, the view acquisition requirements of the target object are reduced, the acquisition operation is simpler, and the cost of texture generation is reduced; and the texture generation network is used to predict the texture of the second view, because the neural network has been pre-trained , which also makes the generated texture more accurate and realistic.
  • FIG. 1 shows a schematic flowchart of a texture generation method provided by at least one embodiment of the present disclosure
  • FIG. 2 shows a schematic schematic diagram of a texture generation method provided by at least one embodiment of the present disclosure
  • FIG. 3 shows a training flow chart of a texture generation network provided by at least one embodiment of the present disclosure
  • Fig. 4 shows the training principle diagram of a texture generation network provided by at least one embodiment of the present disclosure
  • FIG. 5 shows a training flowchart corresponding to FIG. 4 provided by at least one embodiment of the present disclosure
  • Fig. 6 shows the training principle diagram of a texture generation network provided by at least one embodiment of the present disclosure
  • Fig. 7 shows the training principle diagram of a texture generation network provided by at least one embodiment of the present disclosure
  • FIG. 8 shows a training flow chart of a texture generation network provided by at least one embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of the training of another texture generation network provided by at least one embodiment of the present disclosure.
  • FIG. 10 shows a structural diagram of a texture generation apparatus provided by at least one embodiment of the present disclosure
  • FIG. 11 shows a structural diagram of another texture generating apparatus provided by at least one embodiment of the present disclosure.
  • 3D human body reconstruction has important applications in many fields, including but not limited to the following application scenarios:
  • the realism of some virtual reality application scenarios can be enhanced through 3D human reconstruction.
  • 3D human reconstruction For example, virtual fitting, virtual cloud meeting, virtual classroom, etc.
  • the 3D human body model obtained by 3D human body reconstruction can be imported into the game data to complete the generation of the personalized character.
  • 3D human body reconstruction needs to simplify the user's operating costs as much as possible.
  • 3D human body reconstruction can be performed based on a single RGB image. Since only one image can be used for 3D human body reconstruction, users can It does not need to spend a lot of time and energy to cooperate with image acquisition, and the user experience is better.
  • 3D human body reconstruction based on RGB images since the single image only includes part of the texture of the human body, it is necessary to predict the textures of other parts of the human body, so that a complete texture map can be performed on the 3D human body model.
  • an embodiment of the present disclosure provides a texture generation method, which aims to predict the texture of other parts according to the texture of a part of the human body, and achieve a better texture prediction effect.
  • the method uses a texture generation network for texture prediction. It can be understood that this method can be applied not only to the texture generation of human body, but also to the texture generation of other objects.
  • the human body is used as an example.
  • Figure 1 illustrates a flow chart of a texture generation method. As shown in Figure 1, the method may include:
  • step 100 a first view texture of the target object is acquired.
  • the target object may be a three-dimensional object.
  • the target object may be a human body
  • the first view texture of the target object may be a frontal image collected from the front of the human body, and the frontal image may be referred to as the frontal view texture of the human body.
  • step 102 input the texture of the first view into the texture generation network obtained by pre-training, and obtain the texture of the second view of the target object predicted and output by the texture generation network, the texture of the first view corresponds to the texture of the second view at different viewing angles.
  • the second view texture output by the texture generation network may be the back view texture of the human body.
  • the first view texture 21 of the human body is input to the texture generation network 22 , and the second view texture 23 of the human body output by the texture generation network 22 can be obtained.
  • the first view texture 21 and the second view texture 23 can correspond to different view acquisition angles
  • the first view texture 21 is a frontal image collected from the front of the human body
  • the second view texture 23 is equivalent to being collected from the back of the human body. Back image.
  • the texture generation network 22 may be a deep residual convolutional neural network, which may include an encoding end and a decoding end.
  • the encoding end may include multiple convolutional layers, and the input first view texture may be down-sampled through the multiple convolutional layers to extract the first texture feature map;
  • the decoding end may include multiple deconvolutional layers, through The multiple deconvolution layers perform up-sampling processing on the first texture feature map, and output the second view texture.
  • the texture generation network obtained by pre-training is used to predict the second view texture of the output target object, so that the second view texture can be obtained only by acquiring the first view texture of the target object.
  • View acquisition requirements for target objects are reduced.
  • the target object is a human body
  • the back image of the human body can be obtained through the texture generation network. Therefore, compared with the traditional method in the field of 3D human body reconstruction, which requires the collection of multiple human body views of the user to obtain a complete model texture, the operation of the method in this embodiment is simpler, and the cost of texture generation can be reduced;
  • To predict the second view texture since the neural network has been pre-trained, the generated texture is more accurate and realistic.
  • the initial three-dimensional model is a three-dimensional mesh model representing the geometric shape of the target object, such as a mesh representing the geometric structure of the human body.
  • first view texture eg, human body front view texture
  • second view texture eg, human body back view texture
  • the interpolation technology can be used to fill some gaps in the model, so as to complete the texture of the model.
  • a 3D human model filled with textured structures is obtained.
  • the textured 3D human model can make the reconstructed human model more realistic.
  • FIG. 3 illustrates a schematic diagram of a training process of a texture generation network in an embodiment.
  • the training process may include the following processes:
  • step 300 based on the first view texture sample of the sample object, the texture generation network to be trained outputs the second view texture sample of the sample object.
  • the first view texture sample of the sample object may be a frontal image of a human body used in the training process
  • the human body is the sample object
  • the frontal image of the human body is the first view texture sample.
  • the second view texture sample of the sample object output by the texture generation network may be a backside image of a human body.
  • the texture generation network used in this step is a texture generation network that has not been trained yet. During the training of the texture generation network, multiple iterations can be performed until the end condition of the network training is reached, such as reaching a preset number of iterations, or predicting the output. The error between the result and the label is small enough, etc.
  • the training process in this embodiment may be one of the iterative processes.
  • step 302 network supervision information is generated according to the first view texture sample and the second view texture sample; the network supervision information includes at least one of the following: texture supervision information, and a against supervised information.
  • network supervision information for supervised network training may be generated based on the first view texture samples and the second view texture samples obtained in step 300 .
  • the network supervision information may be texture supervision information, adversarial supervision information, or both.
  • the generated network supervision information may also include other information than the texture supervision information and the confrontation supervision information according to other information than the first view texture sample and the second view texture sample. Type of supervisory information.
  • the texture supervision information may be information used to constrain the consistency of texture features between the texture samples of the first view and the texture samples of the second view.
  • the texture features between the first-view texture samples and the second-view texture samples are as consistent as possible, so that the second-view texture samples and the first-view texture samples have a higher degree of fit, and the output second-view texture samples are more realistic .
  • the adversarial supervision information may be information used to constrain the authenticity of the texture samples of the second view.
  • the generated texture samples of the second view can be made more natural and more natural. reality.
  • generative adversarial learning can be performed based on the first-view texture samples and the second-view texture samples, and the adversarial supervision information can be obtained through a generative adversarial network.
  • the first-view texture samples and the second-view texture samples can be input The discriminator, and the discriminant loss is obtained as the adversarial supervision information according to the output value of the discriminator and the discriminant label.
  • step 304 based on the network supervision information, network parameters of the texture generation network are adjusted.
  • the texture generation network can be tuned through a backpropagation algorithm. If the network supervision information includes texture supervision information and adversarial supervision information, the network parameters can be adjusted by combining these two types of supervision information.
  • the network parameters are adjusted by combining at least one of the texture supervision information and the confrontation supervision information, so that the second view texture sample and the first view texture sample have a higher degree of fit, and the generated second view texture sample has a higher degree of fit. View texture samples more realistic and natural.
  • FIG. 4 illustrates a schematic diagram of the training principle of the texture generation network in an embodiment.
  • the input of the texture generation network 42 may be the first view texture sample 41 of the sample object (eg, human body), and the output is The predicted second view texture sample 43 of the sample object.
  • the first-view texture sample 41 and the second-view texture sample 43 correspond to different view acquisition angles, for example, the first-view texture sample 41 is a frontal image of a human body, and the second-view texture sample 43 is predicted by the texture generation network 42 human body back texture.
  • the texture generation network 42 may be, for example, a deep residual convolutional neural network, which may include an encoding end and a decoding end, the encoding end may include multiple convolutional layers, and the decoding end may include multiple deconvolutional layers.
  • Fig. 5 is the training flow of this texture generation network corresponding to Fig. 4, can include:
  • step 500 based on the first view texture sample of the sample object, output the predicted second view texture sample of the sample object through the texture generation network to be trained.
  • the texture sample 41 of the first view of the human body is input to the texture generation network 42 to be trained, and the texture sample 43 of the second view of the human body is output after being processed by the network.
  • step 502 a first regression loss is obtained based on the second view texture sample and the second view label; and the first view texture sample and the second view texture sample are used as the input of the first discriminator, according to The output value of the first discriminator and the first discriminant label obtain a first discriminant loss.
  • the second view label can be the real back image of the human body
  • the second view texture sample generated in step 500 is the back image of the human body generated by the texture generation network 42.
  • the two-view label can be calculated to obtain a Loss, which is called the first regression loss.
  • the first regression loss can be L1Loss. This first regression loss may be referred to as regression supervision information.
  • the regression supervision information can be combined with at least one of the above-mentioned texture supervision information and confrontation supervision information to adjust parameters.
  • parameters can be adjusted comprehensively through texture supervision information and regression supervision information, or by combining adversarial supervision information and regression supervision information, or by combining texture supervision information, confrontation supervision information, and regression supervision information. Adjust parameters.
  • other types of supervision information may also be combined in the specific implementation.
  • the supervision of network training by combining the regression supervision information and the confrontation supervision information is taken as an example.
  • a first discriminant loss can also be calculated.
  • the first view texture sample 41 and the second view texture sample 43 can be used as the input of the first discriminator, and the first discriminator can compare the two, and the output value of the first discriminator can be a True or false value, the output value is compared with the first discriminant label (ie, the true true and false value) to obtain the first discriminant loss.
  • This first discriminative loss can be referred to as adversarial supervision information.
  • step 504 the network parameters of the texture generation network are adjusted according to the first regression loss and the first discrimination loss.
  • the network parameters of the texture generation network can be adjusted by combining the two types of Loss obtained above, and after several iterations, the texture generation network that has been trained is finally obtained.
  • the second view texture samples generated by the texture generation network can be made as close to the real second view texture as possible, and based on the first view texture Generative adversarial learning between the samples and the second-view texture samples can make the texture styles of the second-view texture samples and the first-view texture samples as consistent as possible, so that the second-view texture samples match the input first-view texture samples.
  • the supervised training of the above two aspects makes the texture samples of the second view obtained by the texture generation network more realistic.
  • Figure 6 illustrates the training schematic of another example texture generation network. On the basis of Figure 4, another supervision can be performed during the training process.
  • feature extraction may be performed on the first view texture sample 41 to obtain a first texture feature 61
  • feature extraction may be performed on the second view texture sample 43 to obtain a second texture feature 62
  • the first feature loss 63 is obtained.
  • the first feature loss 63 is used to supervise the consistency between the first texture feature, the second texture feature, and the consistency between the second texture feature and the texture feature extracted from the second view label.
  • the above-mentioned first feature loss can be called texture supervision information.
  • the network parameters of the texture generation network may be adjusted according to the above-mentioned first regression loss, first discrimination loss and first feature loss.
  • the first feature loss is added to supervise the training of the texture generation network, that is, the network training supervision is carried out by combining the regression supervision information, the confrontation supervision information and the texture supervision information, which can make
  • the texture features of the second view texture samples obtained by the texture generation network are closer to the texture features of the input first view texture samples, so that the generated second view texture samples are more realistic and natural.
  • the texture sample of the first view shown in FIG. 4 is an image in an ideal situation.
  • the image collected on the sample object for example, a human body
  • the image needs to be segmented to obtain the first view texture samples and masks.
  • an initial image 71 is acquired.
  • the initial image 71 may be a collected frontal image of the human body.
  • the initial image 71 may include a background and a sample object, ie, a human body. Combined with the process shown in Figure 8, the following processes are included:
  • step 800 the initial image is processed to obtain a first view texture sample, a first view object mask and a second view object mask.
  • the initial image 71 may be segmented to obtain a first view texture sample 72 , a first view object mask 73 and a second view object mask 74 .
  • the second-view object mask 74 specifies the human body region of the second-view texture sample, which helps to generate more accurate second-view texture samples.
  • the second view object mask 74 may be obtained by horizontally flipping the first view object mask 73 .
  • the above two object masks can also be predicted by the network.
  • the interference of the background in the original image 71 is removed by segmentation.
  • step 802 after superimposing the first view texture sample, the first view object mask and the second view object mask in the channel dimension, input the texture generation network to obtain the second view texture sample output by the texture generation network.
  • the texture generation network 75 is input to obtain the second output of the texture generation network 75 .
  • View Texture Sample 76 After the first view texture sample 72 , the first view object mask 73 and the second view object mask 74 are superimposed in the channel dimension, the texture generation network 75 is input to obtain the second output of the texture generation network 75 .
  • step 804 network parameters of the texture generation network are adjusted according to the second view texture samples.
  • the network parameters of the texture generation network may be adjusted based on the first regression loss, the first discrimination loss and the first feature loss.
  • FIG. 7 takes the generation of two object masks as an example, and the actual implementation is not limited to this.
  • the first view object mask and the second view object mask can also be generated in the two object masks. at least one of. And at least one of the two object masks and the texture sample of the first view are superimposed and used as the network input of the texture generation network.
  • the interference of the background is eliminated through segmentation processing; and the generated second view texture can be made more accurate through the constraints of the segmentation mask.
  • the texture generation network can be trained by the training method shown in FIG. 9 .
  • the training system may include an auxiliary texture generation network 91 and a texture generation network 92 .
  • the auxiliary texture generation network 91 can be trained by using the first initial image 93 of the human body (the human body can be referred to as a sample object), and the texture generation network 92 can be trained by using the second initial image 94 of the human body.
  • the aforementioned auxiliary texture generation network 91 may include a first encoding end and a first decoding end, and the texture generating network 92 may include a second encoding end and a second decoding end.
  • the second encoding end adds at least one convolution layer to the first encoding end
  • the second decoding end adds at least one deconvolution layer to the first decoding end.
  • the aforementioned auxiliary texture generation network 91 may be a deep residual convolutional neural network.
  • the auxiliary texture generation network 91 may also be referred to as a low-resolution network, and the texture generation network 92 may be referred to as a high-resolution network.
  • the auxiliary texture generation network 91 can be trained first, and then the texture generation network 92 can be trained in combination with the auxiliary texture generation network 91 that has been trained. For example, based on the texture sample of the third view of the human body (the resolution of the texture sample of the third view is lower than that of the texture sample of the first view), the texture sample of the fourth view of the human body can be output through the auxiliary texture generation network 91, and combined with The fourth view texture sample adjusts the network parameters of the auxiliary texture generation network 91 .
  • the texture generation network 92 can continue to be trained through the texture samples of the first view of the human body.
  • the texture generation network 92 is trained by combining the network supervision information generated by the first view texture samples and the second view texture samples. This method of training the auxiliary texture generation network first and then training the texture generation network can make the network training process more stable and the texture generation network easier to converge.
  • the first initial image 93 is input into the human body segmentation network.
  • the first initial image 93 itself has a background, and the human body needs to be segmented first to remove the interference of the background.
  • the human body segmentation network can be a pre-trained lightweight neural network, and through the processing of the human body segmentation network, the third view texture sample, the third view object mask, and the fourth view object mask can be obtained.
  • the above-mentioned third-view texture samples, third-view object masks and fourth-view object masks can be referred to as shown in FIG. 7 , which is to segment an image containing a human body with a background into a human body image without a background, and Front and back mask area.
  • the third view texture sample, the third view object mask and the fourth view object mask are superimposed and input to the auxiliary texture generation network 91 to obtain the fourth view texture sample predicted and output by the auxiliary texture generation network 91 .
  • the fourth view texture sample (for example, it may be the predicted back image of the human body) and the corresponding The fourth view label (which can be a real back image of the human body) obtains the second regression loss, and the third view texture sample and the fourth view texture sample can also be used as the input of the second discriminator.
  • the second feature loss is obtained from the second discriminant label
  • the second feature loss can also be obtained according to the texture features of the third texture feature sample, the fourth texture feature sample and the fourth view label, wherein the third texture feature is a
  • the fourth view texture sample is obtained by extracting the feature of the view texture sample
  • the fourth texture feature is obtained by extracting the feature from the fourth view texture sample
  • the texture feature of the fourth view label is obtained by extracting the feature from the fourth view label.
  • the network parameters of the auxiliary texture generation network are adjusted according to the second regression loss, the second discriminant loss and the second feature loss.
  • the above three losses are the second regression loss, the second discrimination loss and the second feature loss as examples, and some of the losses can be combined to adjust the network parameters of the auxiliary texture generation network.
  • the human body segmentation network may also obtain at least one of the third view object mask and the fourth view object mask.
  • the auxiliary texture generation network 91 After the auxiliary texture generation network 91 is trained, at least part of the network parameters of the auxiliary texture generation network 91 may be used as at least part of the network parameters of the texture generation network 92, and then the texture generation network 92 can be trained.
  • the second encoding end included in the texture generation network 92 adds a plurality of convolutional layers on the basis of the first encoding end of the auxiliary texture generation network 91 , and the second decoding end included in the second decoding end is in the first decoding end of the auxiliary texture generation network 91 . Multiple deconvolution layers are added to the end.
  • the network parameters of the auxiliary texture generation network 91 after training can be used as the initialization parameters of the texture generation network 92, and then continue to train the texture generation network 92.
  • the texture generation network 92 can be trained with the second initial image 94 .
  • the first initial image 93 may be obtained by reducing the resolution of the second initial image 94.
  • both the first initial image 93 and the second initial image 94 are the same image containing a human body, and the difference is only in the second initial image
  • the resolution of 94 is somewhat higher than that of the first initial image 93 .
  • the resolution of the texture samples of the first view is also higher than the resolution of the texture samples of the third view, and the fourth view label may be obtained by reducing the resolution of the second view label.
  • the three losses shown in Figure 6 can be obtained based on the first view texture samples, and the network parameters of the texture generation network can be adjusted accordingly. After many iterations, the trained texture generation network can be obtained.
  • the network training process can be more stable, and the texture generation network is easier to converge.
  • one of the networks can be selected to be used in the application stage of the network.
  • a texture generation network can be used for texture generation
  • an auxiliary texture generation network can be used for texture generation.
  • the texture generation method further includes: generating a three-dimensional model of the target object by using the first view texture and the second view texture.
  • generating the 3D model of the target object only one 2D image (that is, a texture view) is needed, and texture views from other perspectives can be obtained through the texture generation network, so the user does not need to spend a lot of time and energy on image acquisition, saving resource.
  • FIG. 10 shows a schematic structural diagram of a texture generation apparatus provided by at least one embodiment of the present disclosure.
  • the apparatus may include: a texture acquisition module 1001 and a prediction processing module 1002 .
  • the texture acquisition module 1001 is used for acquiring the first view texture of the target object.
  • Prediction processing module 1002 configured to input the first view texture into a texture generation network obtained by pre-training, and obtain the second view texture of the target object predicted and output by the texture generation network, the first view texture and the first view texture; Two-view textures correspond to different view acquisition angles.
  • the target object is a human body
  • the first view texture of the target object is a front view texture of the human body
  • the second view texture of the target object is a back view texture of the human body.
  • the apparatus further includes: a network training module 1003 for training the texture generation network.
  • the network training module 1003 may include:
  • the texture output sub-module 1101 is configured to output the second view texture sample of the sample object through the texture generation network to be trained based on the first view texture sample of the sample object.
  • the supervision generation sub-module 1102 is configured to generate network supervision information according to the first view texture sample and the second view texture sample; the network supervision information includes at least one of the following: texture supervision information, and a method of generating confrontation Adversarial supervision information obtained by the network.
  • a parameter adjustment sub-module 1103, configured to adjust network parameters of the texture generation network based on the network supervision information.
  • the supervision generation sub-module 1102 when used to generate the texture supervision information, includes: performing feature extraction on the first view texture samples to obtain first texture features; Perform feature extraction to obtain a second texture feature; according to the first texture feature, the second texture feature and the texture feature of the second view label, obtain a first feature loss used to represent the difference between the texture features as the texture supervision information.
  • the supervision generation sub-module 1102 when used to generate the adversarial supervision information, includes: inputting a first view texture sample and a second view texture sample into a first discriminator; The output value and the first discriminant label are obtained, and the first discriminant loss is obtained as the adversarial supervision information.
  • the texture output sub-module 1101 is specifically configured to: receive an initial image, where the initial image includes the sample object; process the initial image to obtain the first view texture sample, and the following two at least one of the object masks: a first view object mask and a second view object mask; at least one of the first view object mask and the second view object mask, and the first view
  • the texture sample is used as the input of the texture generation network, and the second view texture sample output by the texture generation network is obtained.
  • the foregoing apparatus may be configured to execute any corresponding method described above, which is not repeated here for brevity.
  • An embodiment of the present disclosure also provides an electronic device, the device includes a memory and a processor, where the memory is used to store computer-readable instructions, and the processor is used to invoke the computer instructions to implement any embodiment of this specification Methods.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the method of any embodiment of the present specification.
  • one or more embodiments of the present disclosure may be provided as a method, system or computer program product.
  • the computer program product includes a computer program that, when executed by a processor, implements the method of any embodiment of the present disclosure. Accordingly, one or more embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present disclosure may employ a computer program implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein form of the product.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this disclosure can be implemented in digital electronic circuitry, in tangible embodied computer software or firmware, in computer hardware including the structures disclosed in this disclosure and their structural equivalents, or in a combination of one or more.
  • Embodiments of the subject matter described in this disclosure may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this disclosure can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks or memory devices. removable disks
  • magneto-optical disks e.g., CD-ROM and DVD-ROM disks.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供一种纹理生成方法、装置、设备及存储介质,其中,该方法可以包括:获取目标对象的第一视图纹理;将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和第二视图纹理对应于不同的视图采集角度。本公开实施例使得只需要获取目标对象的第一视图纹理就可以获得该第二视图纹理,由此对目标对象的视图采集要求降低,采集操作更加简单,降低纹理生成的成本;并且,生成的纹理较为准确和真实。

Description

纹理生成方法、装置、设备及存储介质
相关申请的交叉引用
本专利申请要求于2021年3月31日提交的、申请号为202110352202.2、发明名称为“纹理生成方法、装置、设备及存储介质”的中国专利申请的优先权,该申请以引用的方式并入文本中。
技术领域
本公开涉及图像处理技术,具体涉及一种纹理生成方法、装置、设备及存储介质。
背景技术
在三维人体重建技术中,如果想要重建三维人体模型,需要在人体网格模型的基础上补充人体纹理信息,以使得重建得到的三维人体模型具有更为逼真的视觉效果。而要获得带纹理的三维人体模型,需要对人体进行连续的扫描。但是,通过连续扫描人体获得人体纹理的方式,操作起来比较复杂。
发明内容
有鉴于此,本公开实施例至少提供一种纹理生成方法、装置、设备及存储介质。
第一方面,提供一种纹理生成方法,所述方法包括:
获取目标对象的第一视图纹理;
将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和第二视图纹理对应于不同的视图采集角度。
在一个例子中,所述目标对象是人体,所述目标对象的第一视图纹理是所述人体的正向视图纹理,所述第二视图纹理是所述人体的背向视图纹理。
在一个例子中,所述得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理之后,所述方法还包括:将所述第一视图纹理和第二视图纹理,映射至所述目标对象的初始三维模型,得到填充有纹理结构的目标三维模型;所述初始三维模型是表示所述目标对象的几何形状的三维网格模型。
在一个例子中,所述将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,包括:对所述第一视图纹 理进行下采样处理,提取得到第一纹理特征图;基于所述第一纹理特征图进行上采样处理,输出所述第二视图纹理。
在一个例子中,所述纹理生成网络的训练过程包括:基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本;根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息;所述网络监督信息包括如下至少一种:纹理监督信息、以及通过生成对抗网络得到的对抗监督信息;基于所述网络监督信息,调整所述纹理生成网络的网络参数。
在一个例子中,在所述网络监督信息包括:纹理监督信息时,所述根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息,包括:对所述第一视图纹理样本进行特征提取,得到第一纹理特征;对所述第二视图纹理样本进行特征提取,得到第二纹理特征;根据所述第一纹理特征、第二纹理特征以及第二视图标签的纹理特征,得到用于表示纹理特征之间差异的第一特征损失作为所述纹理监督信息。
在一个例子中,在所述网络监督信息包括所述对抗监督信息时,所述根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息,包括:将所述第一视图纹理样本和第二视图纹理样本输入第一判别器;根据所述第一判别器的输出值和第一判别标签,得到第一判别损失作为所述对抗监督信息。
在一个例子中,所述网络监督信息还包括:回归监督信息;所述方法还包括:基于所述第二视图纹理样本、以及对应的第二视图标签,得到第一回归损失作为所述回归监督信息。
在一个例子中,所述基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本,包括:接收初始图像,所述初始图像中包括所述样本对象;对所述初始图像进行处理,得到所述第一视图纹理样本、以及如下两个对象掩码中的至少一个:第一视图对象掩码和第二视图对象掩码;将所述第一视图对象掩码和第二视图对象掩码中的至少一个、以及所述第一视图纹理样本作为所述待训练的纹理生成网络的输入,得到所述纹理生成网络输出的所述第二视图纹理样本。
在一个例子中,所述基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本之前,所述方法还包括:基于所述样本对象的第三视图纹理样本,通过待训练的辅助纹理生成网络输出所述样本对象的第四视图纹理样本;其中,所述第一视图纹理样本比所述第三视图纹理样本的分辨率高;根据所述第 四视图纹理样本,调整所述辅助纹理生成网络的网络参数;在所述辅助纹理生成网络训练完成之后,将所述辅助纹理生成网络的至少部分网络参数作为所述纹理生成网络的至少部分网络参数。
在一个例子中,所述辅助纹理生成网络包括:第一编码端和第一解码端;所述纹理生成网络包括:第二编码端和第二解码端;其中,所述第二编码端比第一编码端增加至少一个卷积层,所述第二解码端比第一解码端增加至少一个反卷积层。
第二方面,提供一种纹理生成装置,所述装置包括:
纹理获取模块,用于获取目标对象的第一视图纹理;
预测处理模块,用于将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和第二视图纹理对应于不同的视图采集角度。
在一个例子中,所述目标对象是人体,所述目标对象的第一视图纹理是所述人体的正向视图纹理,所述第二视图纹理是所述人体的背向视图纹理。
在一个例子中,所述装置还包括:用于训练所述纹理生成网络的网络训练模块;所述网络训练模块包括:纹理输出子模块,用于基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本;监督生成子模块,用于根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息;所述网络监督信息包括如下至少一种:纹理监督信息、以及通过生成对抗网络得到的对抗监督信息;参数调整子模块,用于基于所述网络监督信息,调整所述纹理生成网络的网络参数。
在一个例子中,所述监督生成子模块,在用于生成所述纹理监督信息时,包括:对所述第一视图纹理样本进行特征提取,得到第一纹理特征;对所述第二视图纹理样本进行特征提取,得到第二纹理特征;根据所述第一纹理特征、第二纹理特征以及第二视图标签的纹理特征,得到用于表示纹理特征之间差异的第一特征损失作为所述纹理监督信息。
在一个例子中,所述监督生成子模块,在用于生成所述对抗监督信息时,包括:将所述第一视图纹理样本和第二视图纹理样本输入第一判别器;根据所述第一判别器的输出值和第一判别标签,得到第一判别损失作为所述对抗监督信息。
在一个例子中,所述纹理输出子模块,具体用于:接收初始图像,所述初始图像中包括所述样本对象;对所述初始图像进行处理,得到所述第一视图纹理样本、以及如下 两个对象掩码中的至少一个:第一视图对象掩码和第二视图对象掩码;将所述第一视图对象掩码和第二视图对象掩码中的至少一个对象掩码、以及所述第一视图纹理样本作为所述纹理生成网络的输入,得到所述纹理生成网络输出的所述第二视图纹理样本。
第三方面,提供一种电子设备,该设备包括:存储器、处理器,所述存储器用于存储计算机可读指令,所述处理器用于调用所述计算机指令,实现本公开任一实施例的方法。
第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例的方法。
第五方面,提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现本公开任一实施例的方法。
本公开实施例提供的纹理生成方法、装置、设备及存储介质,通过预先训练得到的纹理生成网络来预测输出目标对象的第二视图纹理,使得只需要获取目标对象的第一视图纹理就可以获得该第二视图纹理,由此对目标对象的视图采集要求降低,采集操作更加简单,降低纹理生成的成本;并且,通过纹理生成网络的方式来预测第二视图纹理,由于神经网络经过了预先训练,也使得生成的纹理较为准确和真实。
附图说明
为了更清楚地说明本公开一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1示出了本公开至少一个实施例提供的一种纹理生成方法的流程示意图;
图2示出了本公开至少一个实施例提供的一种纹理生成方法的原理示意图;
图3示出了本公开至少一个实施例提供的一种纹理生成网络的训练流程图;
图4示出了本公开至少一个实施例提供的一种纹理生成网络的训练原理图;
图5示出了本公开至少一个实施例提供的对应图4的训练流程图;
图6示出了本公开至少一个实施例提供的一种纹理生成网络的训练原理图;
图7示出了本公开至少一个实施例提供的一种纹理生成网络的训练原理图;
图8示出了本公开至少一个实施例提供的纹理生成网络的训练流程图;
图9示出了本公开至少一个实施例提供的另一种纹理生成网络的训练原理图;
图10示出了本公开至少一个实施例提供的一种纹理生成装置的结构图;
图11示出了本公开至少一个实施例提供的另一种纹理生成装置的结构图。
具体实施方式
为了使本技术领域的人员更好地理解本公开一个或多个实施例中的技术方案,下面将结合本公开一个或多个实施例中的附图,对本公开一个或多个实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开一个或多个实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。
三维人体重建在很多领域有着重要应用,包括但不限于如下的应用场景:
例如,可以通过三维人体重建,增强一些虚拟现实应用场景的真实感。比如,虚拟试衣、虚拟云会议、虚拟课堂等。
又例如,可以将通过三维人体重建得到的三维人体模型,导入到游戏数据里,完成个性化人物角色的生成。
再例如,目前制作科幻电影需要用到绿幕、动作捕捉等多种科技技术,硬件设备昂贵、整体流程耗时繁杂。通过三维人体重建得到虚拟的三维人体模型,可以简化流程,节省资源。
不论何种应用场景,三维人体重建都存在着尽可能简化用户的操作成本的需求,比如,可以基于单张RGB图像进行三维人体重建,由于只需要一张图像就可以进行三维人体重建,使得用户不需要花大量时间和精力配合图像采集,用户体验较好。而在基于RGB图像进行三维人体重建时,由于该单张图像只包括了人体的部分纹理,那就需要去预测人体其他部分的纹理,以能够对三维人体模型进行完整的纹理贴图。
基于此,本公开实施例提供了一种纹理生成方法,该方法旨在根据人体的一部分的纹理去预测其他部分的纹理,并且实现较好的纹理预测效果。其中,该方法使用纹理生成网络来进行纹理预测。可以理解的是,该方法不仅可以适用于人体的纹理生成,也可以用于其他对象的纹理生成,如下方法描述中以人体为例。
图1示例了一种纹理生成方法的流程图,如图1所示,该方法可以包括:
在步骤100中,获取目标对象的第一视图纹理。
在本公开实施例中,目标对象可以是三维对象。例如,所述目标对象可以是人体,所述目标对象的第一视图纹理可以是从人体正面采集的正面图像,该正面图像可以称为人体的正向视图纹理。
在步骤102中,将第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和第二视图纹理对应于不同的视图采集角度。
例如,纹理生成网络输出的第二视图纹理可以是人体背向视图纹理。请结合图2的示意,人体的第一视图纹理21输入纹理生成网络22,可以得到该纹理生成网络22输出的人体的第二视图纹理23。可以看到,第一视图纹理21和第二视图纹理23可以对应于不同的视图采集角度,第一视图纹理21是从人体正面采集的正面图像,第二视图纹理23相当于从人体背面采集的背面图像。
其中,本实施例不限制纹理生成网络22的网络结构。例如,该纹理生成网络22可以是深度残差卷积神经网络,其中可以包括编码端和解码端。编码端可以包括多个卷积层,可以通过该多个卷积层对输入的第一视图纹理进行下采样处理,提取得到第一纹理特征图;解码端可以包括多个反卷积层,通过该多个反卷积层对第一纹理特征图进行上采样处理,输出第二视图纹理。
本实施例的纹理生成方法,通过预先训练得到的纹理生成网络来预测输出目标对象的第二视图纹理,使得只需要获取目标对象的第一视图纹理就可以获得该第二视图纹理,由此对目标对象的视图采集要求降低。比如,当目标对象是人体时,只要采集人体的正面图像,就可以通过该纹理生成网络得到人体的背面图像。从而,相对于三维人体重建领域的传统方法要求采集用户的多个人体视图以得到完整的模型纹理,本实施例的方式操作更加简单,可以降低纹理生成的成本;并且,通过纹理生成网络的方式来预测第二视图纹理,由于神经网络经过了预先训练,生成的纹理较为准确和真实。
此外,在获得了目标对象的第一视图纹理和第二视图纹理之后,可以将这两种视图纹理映射至目标对象的初始三维模型。其中,该初始三维模型是一种表示目标对象的几何形状的三维网格模型,比如可以是表示人体几何结构的网格Mesh。可以基于上述的第一视图纹理(如,人体正向视图纹理)和第二视图纹理(如,人体背向视图纹理),对该初始三维模型进行纹理贴图,将上述两视图纹理映射至初始三维模型的网格Mesh 上。并且,对于人体的一些不可见的部位,比如上述的第一视图纹理和第二视图纹理都不可见的人体部位,可以使用插值技术在模型的一些缝隙进行纹理填充,从而将模型的纹理补全,最终得到填充有纹理结构的三维人体模型。这种带纹理的三维人体模型能够使得重建得到的人体模型更加逼真。
如下描述上述纹理生成网络的训练过程:
图3示例了在一实施例中的纹理生成网络的训练流程示意图,如图3所示,该训练过程可以包括如下处理:
在步骤300中,基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本。
本实施例,可以将网络训练过程中使用的相关信息都称为训练样本。例如,其中的样本对象的第一视图纹理样本,可以是训练过程中使用的一张人体正面图像,人体即所述样本对象,该人体的正面图像即第一视图纹理样本。通过纹理生成网络输出的样本对象的第二视图纹理样本可以是人体的背面图像。
本步骤中使用的纹理生成网络是尚未训练完成的纹理生成网络,在该纹理生成网络的训练中可以进行多次迭代,直至达到网络训练结束条件,比如达到了预设的迭代次数,或者预测输出结果与标签间的误差足够小,等。本实施例中的训练流程可以是其中一次迭代过程。
在步骤302中,根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息;所述网络监督信息包括如下至少一种:纹理监督信息、以及通过生成式对抗网络得到的对抗监督信息。
本步骤中,可以基于步骤300中得到的第一视图纹理样本和第二视图纹理样本,生成用于监督网络训练的网络监督信息。例如,该网络监督信息可以是纹理监督信息,或者对抗监督信息,也可以是这两种监督信息都有。此外,本实施例在生成网络监督信息时,可以依据第一视图纹理样本和第二视图纹理样本之外的其他信息,生成的网络监督信息也可以包括纹理监督信息和对抗监督信息之外的其他类型的监督信息。
其中,纹理监督信息可以是用于约束第一视图纹理样本与第二视图纹理样本之间的纹理特征一致性的信息,通过在该纹理监督信息的监督下去调整纹理生成网络的网络参数,能够使得第一视图纹理样本与第二视图纹理样本之间的纹理特征尽可能的一致,使得第二视图纹理样本与第一视图纹理样本间的契合度更高,输出的第二视图纹理样本更 为逼真。
其中,对抗监督信息可以是用于约束第二视图纹理样本真实性的信息,通过在该对抗监督信息的监督下调整纹理生成网络的网络参数,能够使得生成的第二视图纹理样本更为自然和真实。在具体实现中,可以基于第一视图纹理样本和第二视图纹理样本进行生成对抗学习,通过生成式对抗网络来得到该对抗监督信息,例如可以将第一视图纹理样本和第二视图纹理样本输入判别器,并根据该判别器的输出值和判别标签得到判别损失作为对抗监督信息。
在步骤304中,基于所述网络监督信息,调整纹理生成网络的网络参数。
例如,可以通过反向传播算法对纹理生成网络进行调参。若网络监督信息包括纹理监督信息和对抗监督信息,可以综合这两种监督信息调整网络参数。
本实施例的纹理生成网络的训练过程,通过结合纹理监督信息和对抗监督信息的至少一种调整网络参数,使得第二视图纹理样本与第一视图纹理样本的契合度更高,生成的第二视图纹理样本更真实自然。
图4示例了在一实施例中的纹理生成网络的训练原理示意图,如图4所示,纹理生成网络42的输入可以是样本对象(如,人体)的第一视图纹理样本41,输出的是预测得到的样本对象的第二视图纹理样本43。
其中,第一视图纹理样本41和第二视图纹理样本43对应于不同的视图采集角度,比如,第一视图纹理样本41是人体的正面图像,第二视图纹理样本43是纹理生成网络42预测得到的人体背面纹理。
其中,纹理生成网络42例如可以是深度残差卷积神经网络,其中可以包括编码端和解码端,编码端可以包括多个卷积层,解码端可以包括多个反卷积层。
图5是对应图4的该纹理生成网络的训练流程,可以包括:
在步骤500中,基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出预测得到的所述样本对象的第二视图纹理样本。
请结合参见图4,人体的第一视图纹理样本41输入待训练的纹理生成网络42,经过该网络的处理后输出生成的人体的第二视图纹理样本43。
在步骤502中,基于所述第二视图纹理样本、以及第二视图标签,得到第一回归损失;并将所述第一视图纹理样本和第二视图纹理样本作为第一判别器的输入,根据所述 第一判别器的输出值和第一判别标签得到第一判别损失。
本步骤中,第二视图标签可以是人体的真实的背面图像,而步骤500中生成的第二视图纹理样本是由纹理生成网络42生成的人体背面图像,根据所述第二视图纹理样本和第二视图标签可以计算得到一个Loss,称为第一回归损失,例如该第一回归损失可以是L1Loss。该第一回归损失可以称为回归监督信息。
需要说明的是,该回归监督信息可以与上述的纹理监督信息、对抗监督信息中的至少一种结合来调参。例如,可以通过纹理监督信息和回归监督信息综合调参,也可以是结合对抗监督信息和回归监督信息来调参,或者还可以是结合纹理监督信息、对抗监督信息和回归监督信息这三种信息调参。可以理解的是,具体实施中也可以结合其他类型的监督信息。在本实施例中,是以回归监督信息和对抗监督信息结合进行网络训练的监督为例。
此外,除了计算上述的第一回归损失外,还可以计算第一判别损失。如图4所示,可以将第一视图纹理样本41和第二视图纹理样本43作为第一判别器的输入,由第一判别器进行两者的比较,第一判别器的输出值可以是一个真假值,该输出值与第一判别标签(即真实的真假值)比较得到第一判别损失。该第一判别损失可以称为对抗监督信息。
在步骤504中,根据所述第一回归损失和第一判别损失,调整纹理生成网络的网络参数。本步骤可以综合上述得到的两种Loss调整纹理生成网络的网络参数,经过多次迭代,最终得到训练完成的纹理生成网络。
本公开实施例的纹理生成网络的训练方法中,通过第二视图标签的监督,能够使得纹理生成网络生成的第二视图纹理样本尽可能的接近真实的第二视图纹理,并且基于第一视图纹理样本和第二视图纹理样本进行生成对抗学习,能够使得第二视图纹理样本与第一视图纹理样本的纹理风格尽量保持一致性,从而第二视图纹理样本与输入的第一视图纹理样本的契合度更高,上述两个方面的监督训练使得纹理生成网络得到的第二视图纹理样本具有更为真实的效果。
图6示例了另一个例子的纹理生成网络的训练原理图,在图4的基础上,在训练过程中还可以进行另一项监督。
如图6所示,可以对第一视图纹理样本41进行特征提取,得到第一纹理特征61,并对第二视图纹理样本43进行特征提取,得到第二纹理特征62。再通过比较所述第一纹理特征61和第二纹理特征62之间的差异,以及第二纹理特征62与第二视图标签的 纹理特征之间的差异,得到第一特征损失63。该第一特征损失63用于监督第一纹理特征、第二纹理特征之间的一致性,以及第二纹理特征与从第二视图标签提取到的纹理特征之间的一致性。上述的第一特征损失可以称为纹理监督信息。
在调整纹理生成网络的参数时,可以是综合根据上述的第一回归损失、第一判别损失和第一特征损失,来调整该纹理生成网络的网络参数。
通过在第一回归损失和第一判别损失的基础上,增加第一特征损失对纹理生成网络的训练进行监督,即综合了回归监督信息、对抗监督信息和纹理监督信息进行网络训练监督,能够使得纹理生成网络得到的第二视图纹理样本与输入的第一视图纹理样本的纹理特征更为接近,从而使得生成的第二视图纹理样本更加真实自然。
在又一个实施例中,图4所示的第一视图纹理样本是一种比较理想情况下的图像,实际实施中,对样本对象(例如,人体)采集的图像大多数时候是具有背景的,那就需要对图像进行分割处理,得到第一视图纹理样本和掩码。
请参见图7所示,获取一个初始图像71,该初始图像71可以是采集的人体全身正面图像,其中,该初始图像71中可以包括背景,还包括样本对象即人体。结合图8所示的流程,包括如下处理:
在步骤800中,对初始图像进行处理,得到第一视图纹理样本、第一视图对象掩码和第二视图对象掩码。
例如,可以对初始图像71进行分割,得到第一视图纹理样本72、第一视图对象掩码73和第二视图对象掩码74。其中,该第二视图对象掩码74给定了第二视图纹理样本的人体区域,有助于帮助第二视图纹理样本生成的更准确。在一个例子中,当假定初始图像71的拍摄是平行投影时,第二视图对象掩码74可以是由第一视图对象掩码73进行水平翻转得到。在其他例子中,上述的两个对象掩码也可以是由网络预测得到。并且,通过分割去除了初始图像71中的背景的干扰。
在步骤802中,将第一视图纹理样本、第一视图对象掩码和第二视图对象掩码在通道维度进行叠加后,输入纹理生成网络,得到纹理生成网络输出的第二视图纹理样本。例如,结合图7,第一视图纹理样本72、第一视图对象掩码73和第二视图对象掩码74在通道维度进行叠加后,输入纹理生成网络75,得到纹理生成网络75输出的第二视图纹理样本76。
在步骤804中,根据第二视图纹理样本,调整纹理生成网络的网络参数。
本步骤中,例如,可以基于第一回归损失、第一判别损失和第一特征损失,调整纹理生成网络的网络参数。详细可以参见前述实施例,不再详述。
进一步的,图7是以生成了两个对象掩码为例,实际实施中不局限于此,比如,也可以生成第一视图对象掩码和第二视图对象掩码这两个对象掩码中的至少一个。并且将该两个对象掩码中的至少一个对象掩码、以及第一视图纹理样本叠加后作为纹理生成网络的网络输入。
本实施例的纹理生成网络的训练方法,通过分割处理,排除了背景的干扰;并且通过分割掩码的约束,能够使得生成的第二视图纹理更加准确。
在又一个例子中,可以通过图9所示的训练方式来训练纹理生成网络。具体的,如图9所示,这个训练系统中可以包括辅助纹理生成网络91和纹理生成网络92。其中,可以使用人体(该人体可以称为样本对象)的第一初始图像93来训练辅助纹理生成网络91,通过人体的第二初始图像94来训练纹理生成网络92。
其中,上述的辅助纹理生成网络91可以包括第一编码端和第一解码端,纹理生成网络92可以包括第二编码端和第二解码端。其中,第二编码端比第一编码端增加至少一个卷积层,第二解码端比第一解码端增加至少一个反卷积层。在一个示例中,上述的辅助纹理生成网络91可以是深度残差卷积神经网络。也可以将辅助纹理生成网络91称为低分辨率网络,将纹理生成网络92称为高分辨率网络。
具体训练中,可以先进行辅助纹理生成网络91的训练,再结合训练完成的辅助纹理生成网络91训练纹理生成网络92。比如,可以先基于人体的第三视图纹理样本(该第三视图纹理样本的分辨率比第一视图纹理样本的分辨率低),通过辅助纹理生成网络91输出人体的第四视图纹理样本,结合该第四视图纹理样本调整辅助纹理生成网络91的网络参数。在辅助纹理生成网络91训练完成后,可以将该辅助纹理生成网络91的至少部分网络参数作为纹理生成网络92的网络参数,继续通过人体的第一视图纹理样本训练纹理生成网络92,比如,可以结合第一视图纹理样本以及第二视图纹理样本生成的网络监督信息训练纹理生成网络92。这种先训练辅助纹理生成网络再训练纹理生成网络的方式,网络训练过程可以更稳定,纹理生成网络更容易收敛。
结合图9的示例,例如,将第一初始图像93输入人体分割网络,第一初始图像93本身带有背景,需要先分割出其中的人体以去除背景的干扰。该人体分割网络可以是一个预先训练好的轻量级的神经网络,经过人体分割网络的处理,可以得到第三视图纹理 样本、第三视图对象掩码、第四视图对象掩码。其中,上述的第三视图纹理样本、第三视图对象掩码和第四视图对象掩码可以参见图7所示,是将一个带有背景的包含人体的图像分割出没有背景的人体图像、以及正面和反面的掩码区域。然后将第三视图纹理样本、第三视图对象掩码和第四视图对象掩码叠加后输入辅助纹理生成网络91,得到该辅助纹理生成网络91预测输出的第四视图纹理样本。
然后,基于该第四视图纹理样本可以得到三种损失,可以结合参见图6所示,与图6类似的,可以根据第四视图纹理样本(例如,可以是预测的人体背面图像)和对应的第四视图标签(可以是真实的人体背面图像)得到第二回归损失,还可以将第三视图纹理样本和第四视图纹理样本作为第二判别器的输入,根据第二判别器的输出值和第二判别标签得到第二判别损失,也可以根据第三纹理特征样本、第四纹理特征样本以及第四视图标签的纹理特征得到第二特征损失,其中,所述第三纹理特征是对第三视图纹理样本提取特征得到,所述第四纹理特征是对第四视图纹理样本提取特征得到,第四视图标签的纹理特征是对第四视图标签提取特征得到。最后根据第二回归损失、第二判别损失和第二特征损失调整辅助纹理生成网络的网络参数。
当然,上述是以第二回归损失、第二判别损失和第二特征损失这三种损失为例,可以结合其中的部分损失来调整辅助纹理生成网络的网络参数。人体分割网络也可以是得到第三视图对象掩码和第四视图对象掩码中的至少一种。
在训练完成辅助纹理生成网络91之后,可以将辅助纹理生成网络91的至少部分网络参数作为纹理生成网络92的至少部分网络参数,再继续训练纹理生成网络92。例如,纹理生成网络92包括的第二编码端在辅助纹理生成网络91的第一编码端的基础上增加了多个卷积层,其包括的第二解码端在辅助纹理生成网络91的第一解码端的基础上增加了多个反卷积层。那么就可以将训练完成的辅助纹理生成网络91的网络参数作为纹理生成网络92的初始化参数,再继续去训练纹理生成网络92。
请继续参见图9所示,纹理生成网络92可以通过第二初始图像94来训练。其中,第一初始图像93可以是由第二初始图像94降低分辨率得到,比如,第一初始图像93和第二初始图像94都是一个相同的包含人体的图像,区别只在于第二初始图像94的分辨率比第一初始图像93要高一些。相应的,第一视图纹理样本的分辨率也高于第三视图纹理样本的分辨率,第四视图标签可以是由第二视图标签降低分辨率得到。
同理,基于第一视图纹理样本可以得到图6所示的三种损失,并据此调整纹理生成网络的网络参数。经过多次迭代,可以得到训练好的纹理生成网络。
上述的通过辅助纹理生成网络和纹理生成网络结合的训练方式,相比于传统的单独网络训练的方式来说,网络训练过程可以更稳定,纹理生成网络更容易收敛。
上述训练得到的辅助纹理生成网络和纹理生成网络,在网络的应用阶段,可以选择使用其中的一个网络。例如,可以使用纹理生成网络进行纹理生成,也可以使用辅助纹理生成网络进行纹理生成。
在本公开实施例中,所述纹理生成方法还包括:利用所述第一视图纹理和所述第二视图纹理生成所述目标对象的三维模型。在生成目标对象的三维模型的过程中,只需要一张二维图像(即一张纹理视图),其他视角的纹理视图可通过纹理生成网络得到,因此用户无需花大量时间和精力进行图像采集,节约了资源。
图10示出了本公开至少一个实施例提供的纹理生成装置的结构示意图,如图10所示,该装置可以包括:纹理获取模块1001和预测处理模块1002。
纹理获取模块1001,用于获取目标对象的第一视图纹理。
预测处理模块1002,用于将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和第二视图纹理对应于不同的视图采集角度。
在一个例子中,所述目标对象是人体,所述目标对象的第一视图纹理是所述人体的正向视图纹理,所述目标对象的第二视图纹理是所述人体的背向视图纹理。
在一个例子中,如图11所示,该装置还包括:用于训练所述纹理生成网络的网络训练模块1003。其中,该网络训练模块1003可以包括:
纹理输出子模块1101,用于基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本。
监督生成子模块1102,用于根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息;所述网络监督信息包括如下至少一种:纹理监督信息、以及通过生成式对抗网络得到的对抗监督信息。
参数调整子模块1103,用于基于所述网络监督信息,调整所述纹理生成网络的网络参数。
在一个例子中,监督生成子模块1102,在用于生成所述纹理监督信息时,包括:对所述第一视图纹理样本进行特征提取,得到第一纹理特征;对所述第二视图纹理样本 进行特征提取,得到第二纹理特征;根据所述第一纹理特征、第二纹理特征以及第二视图标签的纹理特征,得到用于表示纹理特征之间差异的第一特征损失作为所述纹理监督信息。
在一个例子中,监督生成子模块1102,在用于生成所述对抗监督信息时,包括:将第一视图纹理样本和第二视图纹理样本输入第一判别器;根据所述第一判别器的输出值和第一判别标签,得到第一判别损失作为所述对抗监督信息。
在一个例子中,纹理输出子模块1101,具体用于:接收初始图像,所述初始图像中包括所述样本对象;对所述初始图像进行处理,得到所述第一视图纹理样本、以及如下两个对象掩码中的至少一个:第一视图对象掩码和第二视图对象掩码;将所述第一视图对象掩码和第二视图对象掩码中的至少一个、以及所述第一视图纹理样本作为所述纹理生成网络的输入,得到所述纹理生成网络输出的所述第二视图纹理样本。
在一些实施例中,上述装置可以用于执行上文所述的对应任意方法,为了简洁,这里不再赘述。
本公开实施例还提供了一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储计算机可读指令,所述处理器用于调用所述计算机指令,实现本说明书任一实施例的方法。
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本说明书任一实施例的方法。
本领域技术人员应明白,本公开一个或多个实施例可提供为方法、系统或计算机程序产品。该计算机程序产品包括计算机程序,该程序被处理器执行时实现本公开任一实施例的方法。因此,本公开一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本公开一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
其中,本公开实施例所述的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”包括三种方案:A、B、以及“A和B”。
本公开中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之 处参见方法实施例的部分说明即可。
上述对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本公开中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本公开中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本公开中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本公开中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和 闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本公开包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本公开内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上所述仅为本公开的一些实施例而已,并不用以限制本公开一个或多个实施例,凡在本公开一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开一个或多个实施例保护的范围之内。

Claims (20)

  1. 一种纹理生成方法,包括:
    获取目标对象的第一视图纹理;
    将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和所述第二视图纹理对应于不同的视图采集角度。
  2. 根据权利要求1所述的方法,其特征在于,
    所述目标对象是人体,所述目标对象的所述第一视图纹理是所述人体的正向视图纹理,所述第二视图纹理是所述人体的背向视图纹理。
  3. 根据权利要求1或2所述的方法,其特征在于,所述得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理之后,所述方法还包括:
    将所述第一视图纹理和所述第二视图纹理,映射至所述目标对象的初始三维模型,得到填充有纹理结构的目标三维模型;所述初始三维模型是表示所述目标对象的几何形状的三维网格模型。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,包括:
    对所述第一视图纹理进行下采样处理,提取得到第一纹理特征图;
    基于所述第一纹理特征图进行上采样处理,输出所述第二视图纹理。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述纹理生成网络的训练过程包括:
    基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本;
    根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息;所述网络监督信息包括如下至少一种:纹理监督信息、以及通过生成对抗网络得到的对抗监督信息;
    基于所述网络监督信息,调整所述纹理生成网络的网络参数。
  6. 根据权利要求5所述的方法,其特征在于,在所述网络监督信息包括所述纹理监督信息时,所述根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息,包括:
    对所述第一视图纹理样本进行特征提取,得到第一纹理特征;
    对所述第二视图纹理样本进行特征提取,得到第二纹理特征;
    根据所述第一纹理特征、所述第二纹理特征以及第二视图标签的纹理特征,得到用于表示纹理特征之间差异的第一特征损失作为所述纹理监督信息。
  7. 根据权利要求5所述的方法,其特征在于,在所述网络监督信息包括所述对抗监督信息时,所述根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息,包括:
    将所述第一视图纹理样本和所述第二视图纹理样本输入第一判别器;
    根据所述第一判别器的输出值和第一判别标签,得到第一判别损失作为所述对抗监督信息。
  8. 根据权利要求5至7任一所述的方法,其特征在于,所述网络监督信息还包括:回归监督信息;所述方法还包括:
    基于所述第二视图纹理样本、以及对应的第二视图标签,得到第一回归损失作为所述回归监督信息。
  9. 根据权利要求5所述的方法,其特征在于,所述基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本,包括:
    接收初始图像,所述初始图像中包括所述样本对象;
    对所述初始图像进行处理,得到所述第一视图纹理样本、以及如下两个对象掩码中的至少一个:第一视图对象掩码和第二视图对象掩码;
    将所述第一视图对象掩码和所述第二视图对象掩码中的至少一个、以及所述第一视图纹理样本作为所述待训练的纹理生成网络的输入,得到所述纹理生成网络输出的所述第二视图纹理样本。
  10. 根据权利要求5所述的方法,其特征在于,所述基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本之前,所述方法还包括:
    基于所述样本对象的第三视图纹理样本,通过待训练的辅助纹理生成网络输出所述样本对象的第四视图纹理样本;其中,所述第一视图纹理样本比所述第三视图纹理样本的分辨率高;
    根据所述第四视图纹理样本,调整所述辅助纹理生成网络的网络参数;
    在所述辅助纹理生成网络训练完成之后,将所述辅助纹理生成网络的至少部分网络参数作为所述纹理生成网络的至少部分网络参数。
  11. 根据权利要求10所述的方法,其特征在于,
    所述辅助纹理生成网络包括:第一编码端和第一解码端;
    所述纹理生成网络包括:第二编码端和第二解码端;其中,所述第二编码端比第一编码端增加至少一个卷积层,所述第二解码端比第一解码端增加至少一个反卷积层。
  12. 一种纹理生成装置,包括:
    纹理获取模块,用于获取目标对象的第一视图纹理;
    预测处理模块,用于将所述第一视图纹理输入预先训练得到的纹理生成网络,得到所述纹理生成网络预测输出的所述目标对象的第二视图纹理,所述第一视图纹理和所述第二视图纹理对应于不同的视图采集角度。
  13. 根据权利要求12所述的装置,其特征在于,
    所述目标对象是人体,所述目标对象的所述第一视图纹理是所述人体的正向视图纹理,所述第二视图纹理是所述人体的背向视图纹理。
  14. 根据权利要求12或13所述的装置,其特征在于,所述装置还包括:用于训练所述纹理生成网络的网络训练模块;所述网络训练模块包括:
    纹理输出子模块,用于基于样本对象的第一视图纹理样本,通过待训练的纹理生成网络输出所述样本对象的第二视图纹理样本;
    监督生成子模块,用于根据所述第一视图纹理样本和所述第二视图纹理样本,生成网络监督信息;所述网络监督信息包括如下至少一种:纹理监督信息、以及通过生成对抗网络得到的对抗监督信息;
    参数调整子模块,用于基于所述网络监督信息,调整所述纹理生成网络的网络参数。
  15. 根据权利要求14所述的装置,其特征在于,
    所述监督生成子模块,在用于生成所述纹理监督信息时,包括:对所述第一视图纹理样本进行特征提取,得到第一纹理特征;对所述第二视图纹理样本进行特征提取,得到第二纹理特征;根据所述第一纹理特征、所述第二纹理特征以及第二视图标签的纹理特征,得到用于表示纹理特征之间差异的第一特征损失作为所述纹理监督信息。
  16. 根据权利要求14所述的装置,其特征在于,
    所述监督生成子模块,在用于生成所述对抗监督信息时,包括:将所述第一视图纹理样本和所述第二视图纹理样本输入第一判别器;根据所述第一判别器的输出值和第一判别标签,得到第一判别损失作为所述对抗监督信息。
  17. 根据权利要求14所述的装置,其特征在于,
    所述纹理输出子模块,具体用于:接收初始图像,所述初始图像中包括所述样本对象;对所述初始图像进行处理,得到所述第一视图纹理样本、以及如下两个对象掩码中 的至少一个:第一视图对象掩码和第二视图对象掩码;将所述第一视图对象掩码和所述第二视图对象掩码中的至少一个、以及所述第一视图纹理样本作为所述纹理生成网络的输入,得到所述纹理生成网络输出的所述第二视图纹理样本。
  18. 一种电子设备,包括:存储器、处理器,所述存储器用于存储计算机可读指令,所述处理器用于调用所述计算机指令,实现权利要求1至11任一所述的方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至11任一所述的方法。
  20. 一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至11任一所述的方法。
PCT/CN2021/114973 2021-03-31 2021-08-27 纹理生成方法、装置、设备及存储介质 WO2022205755A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110352202.2A CN112950739A (zh) 2021-03-31 2021-03-31 纹理生成方法、装置、设备及存储介质
CN202110352202.2 2021-03-31

Publications (1)

Publication Number Publication Date
WO2022205755A1 true WO2022205755A1 (zh) 2022-10-06

Family

ID=76231795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114973 WO2022205755A1 (zh) 2021-03-31 2021-08-27 纹理生成方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN112950739A (zh)
WO (1) WO2022205755A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937409A (zh) * 2022-10-19 2023-04-07 中国人民解放军军事科学院国防科技创新研究院 反视觉智能的对抗攻击纹理生成方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950739A (zh) * 2021-03-31 2021-06-11 深圳市慧鲤科技有限公司 纹理生成方法、装置、设备及存储介质
CN113012282B (zh) * 2021-03-31 2023-05-19 深圳市慧鲤科技有限公司 三维人体重建方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255831A (zh) * 2018-09-21 2019-01-22 南京大学 基于多任务学习的单视图人脸三维重建及纹理生成的方法
CN110223370A (zh) * 2019-05-29 2019-09-10 南京大学 一种从单视点图片生成完整人体纹理贴图的方法
CN110378838A (zh) * 2019-06-25 2019-10-25 深圳前海达闼云端智能科技有限公司 变视角图像生成方法,装置,存储介质及电子设备
CN111445410A (zh) * 2020-03-26 2020-07-24 腾讯科技(深圳)有限公司 基于纹理图像的纹理增强方法、装置、设备和存储介质
US10818043B1 (en) * 2019-04-24 2020-10-27 Adobe Inc. Texture interpolation using neural networks
CN112950739A (zh) * 2021-03-31 2021-06-11 深圳市慧鲤科技有限公司 纹理生成方法、装置、设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112132197A (zh) * 2020-09-15 2020-12-25 腾讯科技(深圳)有限公司 模型训练、图像处理方法、装置、计算机设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109255831A (zh) * 2018-09-21 2019-01-22 南京大学 基于多任务学习的单视图人脸三维重建及纹理生成的方法
US10818043B1 (en) * 2019-04-24 2020-10-27 Adobe Inc. Texture interpolation using neural networks
CN110223370A (zh) * 2019-05-29 2019-09-10 南京大学 一种从单视点图片生成完整人体纹理贴图的方法
CN110378838A (zh) * 2019-06-25 2019-10-25 深圳前海达闼云端智能科技有限公司 变视角图像生成方法,装置,存储介质及电子设备
CN111445410A (zh) * 2020-03-26 2020-07-24 腾讯科技(深圳)有限公司 基于纹理图像的纹理增强方法、装置、设备和存储介质
CN112950739A (zh) * 2021-03-31 2021-06-11 深圳市慧鲤科技有限公司 纹理生成方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937409A (zh) * 2022-10-19 2023-04-07 中国人民解放军军事科学院国防科技创新研究院 反视觉智能的对抗攻击纹理生成方法

Also Published As

Publication number Publication date
CN112950739A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022205755A1 (zh) 纹理生成方法、装置、设备及存储介质
JP7373554B2 (ja) クロスドメイン画像変換
WO2022267641A1 (zh) 一种基于循环生成对抗网络的图像去雾方法及系统
WO2022205760A1 (zh) 三维人体重建方法、装置、设备及存储介质
US10311630B2 (en) Methods and systems for rendering frames of a virtual scene from different vantage points based on a virtual entity description frame of the virtual scene
CN113906478A (zh) 新颖的姿态合成
WO2020220516A1 (zh) 图像生成网络的训练及图像处理方法、装置、电子设备、介质
US11006141B2 (en) Methods and systems for using atlas frames to process data representative of a scene
CN114339409B (zh) 视频处理方法、装置、计算机设备及存储介质
KR102141319B1 (ko) 다시점 360도 영상의 초해상화 방법 및 영상처리장치
CN109754464B (zh) 用于生成信息的方法和装置
CN115690382B (zh) 深度学习模型的训练方法、生成全景图的方法和装置
US11403807B2 (en) Learning hybrid (surface-based and volume-based) shape representation
Sun et al. Masked lip-sync prediction by audio-visual contextual exploitation in transformers
CN117252984A (zh) 三维模型生成方法、装置、设备、存储介质及程序产品
Khan et al. Sparse to dense depth completion using a generative adversarial network with intelligent sampling strategies
WO2022182441A1 (en) Latency-resilient cloud rendering
CN112562045B (zh) 生成模型和生成3d动画的方法、装置、设备和存储介质
EP4176409A1 (en) Full skeletal 3d pose recovery from monocular camera
CN115984949B (zh) 一种带有注意力机制的低质量人脸图像识别方法及设备
CN109816791B (zh) 用于生成信息的方法和装置
Shidanshidi et al. A quantitative approach for comparison and evaluation of light field rendering techniques
CN115272608A (zh) 一种人手重建方法及设备
CN116958451B (zh) 模型处理、图像生成方法、装置、计算机设备和存储介质
KR102442980B1 (ko) Erp 기반 다시점 360도 영상의 초해상화 방법 및 영상처리장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934398

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM1205A DATED 22.01.2024)