CN109978759B - Image processing method and device and training method and device of image generation network - Google Patents

Image processing method and device and training method and device of image generation network Download PDF

Info

Publication number
CN109978759B
CN109978759B CN201910238417.4A CN201910238417A CN109978759B CN 109978759 B CN109978759 B CN 109978759B CN 201910238417 A CN201910238417 A CN 201910238417A CN 109978759 B CN109978759 B CN 109978759B
Authority
CN
China
Prior art keywords
image
data
network
texture
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910238417.4A
Other languages
Chinese (zh)
Other versions
CN109978759A (en
Inventor
暴天鹏
沈宇军
吴立威
吕健勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201910238417.4A priority Critical patent/CN109978759B/en
Publication of CN109978759A publication Critical patent/CN109978759A/en
Application granted granted Critical
Publication of CN109978759B publication Critical patent/CN109978759B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Generation (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The disclosure relates to an image processing method and device and an image generation network training method and device, wherein the image processing method comprises the following steps: acquiring a first random vector and a second random vector; and inputting the first random vector into a 3D shape generation network for image shape generation processing to obtain shape data of the 3D image, and inputting the second random vector into a texture generation network for image texture generation processing to obtain texture data of the 3D image. According to an embodiment of the present disclosure, generation of 3D image data can be achieved by a neural network.

Description

Image processing method and device and training method and device of image generation network
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to an image processing method and apparatus and an image generation network training method and apparatus.
Background
In the related art, it has become possible for image generation techniques in image processing to generate high-quality 2D images, whereas for generation of 3D images, on the one hand, 3D image data, especially high-quality 3D image data, is difficult to acquire; on the other hand, training data and generation data of the neural network are generally of the same type, and in the case that it is difficult to acquire a sufficient number of 3D images to train the neural network, how to generate the 3D images through the neural network is an urgent problem to be solved.
Disclosure of Invention
The present disclosure proposes a technical solution of image processing.
According to an aspect of the present disclosure, there is provided an image processing method including: acquiring a first random vector and a second random vector; and inputting the first random vector into a 3D shape generation network for image shape generation processing to obtain shape data of the 3D image, and inputting the second random vector into a texture generation network for image texture generation processing to obtain texture data of the 3D image.
In some possible implementations, the method further includes: and generating the 3D image according to the shape data of the 3D image and the texture data of the 3D image.
In some possible implementation manners, the inputting the first random vector to a 3D shape generation network for processing to obtain shape data of a 3D image includes: the 3D shape generation network obtains shape data of the 3D image by using prior data and the first random vector, wherein the prior data is obtained by counting pixel point positions in a plurality of existing 3D images.
In some possible implementations, the 3D shape generation network and the texture generation network are obtained by performing joint training using 2D images as training samples.
In some possible implementations, the method further includes: performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image; determining a predicted image type of the 2D image through an image discrimination network, wherein the predicted image type is a generated image or a real image; and adjusting network parameters of at least one of the 3D appearance generation network, the texture generation network and the image discrimination network based on the predicted image category of the 2D image.
In some possible implementation manners, the performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image includes: and inputting the appearance data of the 3D image, the texture data of the 3D image and the random attitude vector into a projection rendering network for random projection processing to obtain a 2D image.
In some possible implementations, the randomly projecting the 3D image to obtain a 2D image includes: carrying out random projection processing on the 3D image by utilizing a plurality of random attitude vectors to obtain a plurality of 2D images; the determining the predicted image category of the 2D image through the image discrimination network comprises the following steps: determining, by the image discrimination network, a predicted image category for each of the plurality of 2D images.
In some possible implementations, the method further includes: inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image category of the 2D sample image; and adjusting the network parameters of the image discrimination network based on the predicted image type of the 2D sample image and the labeled image type of the 2D sample image.
In some possible implementations, the method further includes: inputting texture data of the 3D image into a texture discrimination network for processing to obtain a prediction data type of the texture data of the 3D image, wherein the prediction data type is real texture data or generated texture data; inputting real sample texture data into the texture judging network for processing to obtain the predicted data category of the sample texture data; adjusting a network parameter of at least one of the texture discrimination network and the texture generation network based on a predicted data class of texture data of the 3D image and a predicted data class of the sample texture data.
According to another aspect of the present disclosure, there is provided a training method of an image generation network, including: performing image generation processing through the image generation network to obtain image data of a 3D image; inputting the image data of the 3D image into a projection rendering network for projection processing to obtain a 2D image; inputting the 2D image into an image discrimination network for processing to obtain a predicted image type of the 2D image, wherein the predicted image type is a real image or a generated image; and adjusting network parameters of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image.
In some possible implementations, the image generation network includes a 3D outline generation network and a texture generation network, the image data of the 3D image includes outline data of the 3D image and texture data of the 3D image; the image generation processing through the image generation network to obtain image data of a 3D image includes: inputting the first random vector into the 3D shape generation network to perform image shape generation processing to obtain shape data of the 3D image; and inputting a second random vector into the texture generation network to perform image texture generation processing to obtain texture data of the 3D image.
In some possible implementations, the inputting the image data of the 3D image to a projection rendering network for projection processing to obtain a 2D image includes: and inputting the image data of the 3D image and the plurality of random attitude vectors into a projection rendering network for random projection processing to obtain a plurality of 2D images.
In some possible implementations, the method further includes: inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image category of the 2D sample image;
the adjusting the network parameter of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image includes: adjusting a network parameter of at least one of the image discrimination network and the 3D image generation network based on a predicted image category of the 2D sample image and a predicted image category of the 2D image.
In some possible implementations, the method further includes: inputting texture data in the image data of the 3D image into a texture discrimination network for processing to obtain a prediction data type of the texture data of the 3D image, wherein the prediction data type is real texture data or generated texture data; inputting real sample texture data into the texture judging network for processing to obtain the predicted data category of the sample texture data; adjusting a network parameter of at least one of the texture discrimination network and a texture generation network of the image generation network based on a predicted data class of texture data of the 3D image and a predicted data class of the sample texture data.
According to another aspect of the present disclosure, there is provided an image processing apparatus including: the vector acquisition module is used for acquiring a first random vector and a second random vector; and the appearance and texture generation module is used for inputting the first random vector into a 3D appearance generation network to perform image appearance generation processing to obtain appearance data of a 3D image, and inputting the second random vector into a texture generation network to perform image texture generation processing to obtain texture data of the 3D image.
In some possible implementations, the apparatus further includes: and the 3D image generation module is used for generating the 3D image according to the shape data of the 3D image and the texture data of the 3D image.
In some possible implementations, the shape and texture generating module includes: and the appearance data acquisition submodule is used for obtaining the appearance data of the 3D image by the 3D appearance generation network by utilizing the prior data and the first random vector, wherein the prior data is obtained by counting the positions of pixel points in a plurality of existing 3D images.
In some possible implementations, the 3D shape generation network and the texture generation network are obtained by performing joint training using 2D images as training samples.
In some possible implementations, the apparatus further includes: the first projection module is used for carrying out projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image; the first class prediction module is used for determining a predicted image class of the 2D image through an image discrimination network, wherein the predicted image class is a generated image or a real image; a first parameter adjusting module, configured to adjust a network parameter of at least one of the 3D outline generating network, the texture generating network, and the image discriminating network based on a predicted image category of the 2D image.
In some possible implementations, the first projection module includes: and the first random projection submodule is used for inputting the appearance data of the 3D image, the texture data of the 3D image and the random attitude vector into a projection rendering network for random projection processing to obtain a 2D image.
In some possible implementations, the stochastic projection sub-module is to: carrying out random projection processing on the 3D image by using a plurality of random attitude vectors to obtain a plurality of 2D images; the first class prediction module is to: determining, by the image discrimination network, a predicted image category for each of the plurality of 2D images.
In some possible implementations, the apparatus further includes: the second type prediction module is used for inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image type of the 2D sample image; and the second parameter adjusting module is used for adjusting the network parameters of the image judgment network based on the predicted image type of the 2D sample image and the labeled image type of the 2D sample image.
In some possible implementations, the apparatus further includes: a third type prediction module, configured to input texture data of the 3D image to a texture discrimination network for processing, so as to obtain a prediction data type of the texture data of the 3D image, where the prediction data type is real texture data or generated texture data; the fourth type prediction module is used for inputting real sample texture data into the texture discrimination network for processing to obtain the prediction data type of the sample texture data; a third parameter adjusting module, configured to adjust a network parameter of at least one of the texture discrimination network and the texture generation network based on a predicted data type of texture data of the 3D image and a predicted data type of the sample texture data.
According to another aspect of the present disclosure, there is provided a training apparatus of an image generation network, including: the image data acquisition module is used for carrying out image generation processing through the image generation network to obtain image data of the 3D image; the second projection module is used for inputting the image data of the 3D image into a projection rendering network for projection processing to obtain a 2D image; the fifth type prediction module is used for inputting the 2D image into an image discrimination network for processing to obtain a predicted image type of the 2D image, wherein the predicted image type is a real image or a generated image; and the fourth parameter adjusting module is used for adjusting the network parameters of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image.
In some possible implementations, the image generation network includes a 3D outline generation network and a texture generation network, the image data of the 3D image includes outline data of the 3D image and texture data of the 3D image; the image data obtaining module includes: the appearance generation submodule is used for inputting a first random vector into the 3D appearance generation network to carry out image appearance generation processing so as to obtain appearance data of a 3D image; and the texture generation submodule is used for inputting a second random vector into the texture generation network to perform image texture generation processing so as to obtain texture data of the 3D image.
In some possible implementations, the second projection module includes: and the second random projection submodule is used for inputting the image data of the 3D image and the plurality of random attitude vectors into a projection rendering network for random projection processing to obtain a plurality of 2D images.
In some possible implementations, the apparatus further includes: the sixth type prediction module is used for inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image type of the 2D sample image; the fourth parameter adjusting module includes: and the adjusting sub-module is used for adjusting the network parameters of at least one of the image discrimination network and the 3D image generation network based on the predicted image category of the 2D sample image and the predicted image category of the 2D image.
In some possible implementations, the apparatus further includes: a seventh type prediction module, configured to input texture data in the image data of the 3D image to a texture discrimination network for processing, so as to obtain a prediction data type of the texture data of the 3D image, where the prediction data type is real texture data or generated texture data; the eighth type prediction module is used for inputting real sample texture data into the texture judging network for processing to obtain the prediction data type of the sample texture data; a fifth parameter adjusting module, configured to adjust a network parameter of at least one of the texture discrimination network and a texture generation network in the image generation network based on a prediction data type of texture data of the 3D image and a prediction data type of the sample texture data.
According to another aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: the above method is performed.
According to another aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
According to the image processing method disclosed by the embodiment of the disclosure, the random vector can be subjected to image appearance generation through the 3D appearance generation network to obtain the appearance data of the 3D image, and the random vector is subjected to image texture generation through the texture generation network to obtain the texture data of the 3D image, so that the generation of the appearance and the texture data of the 3D image is realized through the neural network.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flow chart of an image processing method according to an embodiment of the present disclosure.
Fig. 2 shows a schematic diagram of an image processing network in an image processing method according to an embodiment of the present disclosure.
FIG. 3 shows a flow diagram of a method of training an image generation network according to an embodiment of the present disclosure.
Fig. 4 illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure.
FIG. 5 shows a block diagram of a training apparatus of an image generation network according to an embodiment of the present disclosure.
Fig. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.
Fig. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Image generation is one of the important research topics in the field of computer vision. The generation of images not only has important research value due to the diversity and complexity of images, but also helps to improve image processing technologies such as image detection, image recognition, and living body detection. With the development of deep learning and generation of countermeasure Networks (GAN), it has become possible to generate high-quality 2D images. However, for the generation of high quality 3D images, the difficulty is reflected in two aspects: on one hand, GAN as a deep learning network consumes a large amount of training data, and generally requires that the training data and the finally generated data are of the same type, that is: if 3D images need to be generated, a large number of 3D images are also required as training data. However, 3D images, especially high quality 3D images, are difficult to acquire, and thus, training the GAN for generating 3D images with a sufficient number of 3D images is poorly operable.
Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure, which is applied to an image processing apparatus, for example, the image processing apparatus may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the image processing method may be implemented by a processor invoking computer readable instructions stored in a memory. As shown in fig. 1, the method includes:
s11, acquiring a first random vector and a second random vector;
and S12, inputting the first random vector into a 3D shape generation network to perform image shape generation processing to obtain shape data of the 3D image, and inputting the second random vector into a texture generation network to perform image texture generation processing to obtain texture data of the 3D image.
According to the image processing method disclosed by the embodiment of the disclosure, the random vector can be subjected to image outline generation through the 3D outline generation network to obtain the outline data of the 3D image, and the random vector is subjected to image texture generation through the texture generation network to obtain the texture data of the 3D image, so that the generation of the outline and the texture data of the 3D image is realized through the neural network.
In the embodiment of the present disclosure, the 3D image generation network includes a 3D shape generation network and a texture generation network, and 3D shape generation and texture generation are performed by the 3D shape generation network and the texture generation network, respectively, based on the random vector, to obtain shape data and texture data of the 3D image, and accordingly, the 3D image is represented by the 3D shape data and the texture data.
In some possible implementations, the 3D image may be, for example, a three-dimensional face image. First random vector z S May be a vector in random space, e.g. a 64-dimensional vector, each element of which is from the interval [ -1,1]And (4) uniformly sampling to obtain the intermediate. Network (generator) G can be generated by 3D shape S A first random vector z S The shape data S is mapped to a 3D image. Wherein the shape data S may be a 3N-dimensional vector represented by (x) 1 ,y 1 ,z 1 ,…,x i ,y i ,z i ,…,x N ,y N ,z N ) Wherein (x) i ,y i ,z i ) The three-dimensional coordinates of the ith point representing the shape data S, and the value of N may be 53215, for example. Wherein the 3D shape generation network G S Which may be, for example, a deep neural network, the present disclosure does not limit the particular network type of the 3D profile generating network, the dimensions of the first random vector, and the dimensions of the profile data.
In some possible implementations, inputting the first random vector to a 3D shape generation network in step S12 for processing, to obtain shape data of a 3D image, includes:
the 3D shape generation network obtains shape data of the 3D image by using prior data and the first random vector, wherein the prior data is obtained by counting pixel point positions in a plurality of existing 3D images.
For example, the positions of the pixels in the existing 3D images may be counted to obtain the prior data. The prior data may be obtained by using a parametric model representing positions of pixel points in an existing 3D image, for example, using a three-dimensional deformation model (3D portable model,3 dmm). Where the 3d dm defines a three-dimensional mesh v consisting of a series of triangles, each triangle consisting of three points in three-dimensional space. With the three-dimensional mesh v, three-dimensional points in 3DMM are no longer unordered, but have a certain connectivity relationship.
In some possible implementations, using 3d dm, the average shape μ of N points in three-dimensional space is determined S And the standard deviation σ of the three-dimensional shape along each dimension S And the average shape mu S Sum standard deviation σ S As a priori data, the mean shape μ S and the standard deviation σ included in the a priori data are utilized S And a first random vector for generating profile data of the 3D image through a 3D profile generation network. As shown in equation (1):
S=μ SS ⊙G S (z S ) (1)
in the formula (1), S represents profile data, G S Indicating a 3D shape generation function, which indicates a point multiplication (i.e., element-by-element multiplication).
By the method, the appearance data which are in line with the actual three-dimensional face appearance can be generated, and the trueness degree of the generated 3D appearance data is improved.
Optionally, the a priori data may also be determined in other ways, which is not limited in this disclosure.
In some possible implementations, the second random vector is input to the texture generation network G T And outputting texture data of the 3D image. Wherein the second random vector z T Is a vector in random space, e.g. a 64-dimensional vector, each element of which is from the interval [ -1,1]And (4) uniformly sampling to obtain the intermediate. Generating a network (generator) G by texture T A first random vector z T Mapped as texture data T of the 3D image. The texture data T is two-dimensional face image data, for example, two-dimensional face image data of 224 × 224 resolution. Wherein the texture generating network G T Which may be, for example, a deep neural network, the disclosure does not limit the specific network type of the texture generation network, the dimensions of the second random vector, and the size of the texture data.
In some possible implementations, the texture data T may be a UV Map (UV Map). The UV map can map the 3D object surface to the 2D space according to a preset relation, compared with other texture images, the UV map retains more information of the side faces, and therefore all sides of the human face in the 3D image represented by the UV map are more real.
In some possible implementations, the image processing method according to an embodiment of the present disclosure further includes: and generating the 3D image according to the shape data of the 3D image and the texture data of the 3D image. That is, a 3D image can be constructed from the shape data and texture data of the 3D image, thereby generating a high-quality three-dimensional face image.
In some embodiments, the generated shape data and texture data of the 3D image are used to train the 3D shape generation network and the texture generation network.
In some possible implementations, the 3D shape generation network and the texture generation network are obtained by performing joint training using 2D images as training samples.
For example, 3D images, especially high-quality 3D images, are difficult to obtain, and the training effect is poor when small-scale 3D images are directly used as training data.
In some possible implementations, the 3D image generation network may be trained by GAN, and accordingly, the image processing method according to the embodiment of the present disclosure further includes:
performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image;
determining a predicted image type of the 2D image through an image discrimination network, wherein the predicted image type is a generated image or a real image;
and adjusting network parameters of at least one of the image discrimination network, the texture generation network and the 3D appearance generation network based on the predicted image category of the 2D image.
Optionally, a first network loss of the GAN may be determined based on a predicted image class of the 2D image, and a network parameter of at least one of the image discrimination network, the texture generation network, and the 3D shape generation network may be inversely adjusted according to the first network loss.
For example, during training, the network (discriminator) D can be discriminated by the image P To distinguish the generated 2D face image from the real 2D face image, thus to resist the training of the 3D shape generation network G S And said texture generation network G T . In this case, the network G is generated from the 3D profile S And said texture generation network G T And projecting the 3D image to a two-dimensional space to obtain a 2D image by using the generated shape data S and texture data T. Then, the 2D image is input to an image discrimination network for processing, and a predicted image type of the 2D image is output. Wherein the predicted image category comprises a generated image or a real image. In this way, the performance of the 3D image generation network can be improved.
In some possible implementation manners, the step of performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image includes:
and inputting the appearance data of the 3D image, the texture data of the 3D image and the random attitude vector into a projection rendering network for random projection processing to obtain a 2D image.
For example, a random attitude vector z P Which may be randomly generated vectors, to characterize the pose of the face. The posture of the human face can be characterized by three euler angles of a pitch angle, a roll angle and a yaw angle, and can also be described in other modes.
In some possible implementations, the pose vector z P The vector generated by the preset angle (euler angle) can be used for representing the human face posture in the preset angle direction. The present disclosure is directed to generating an attitude vector z P Are not limited in particular.
In some possible implementations, the projection rendering network may utilize geometric relationships to fit the shape data S and texture data T to a given random pose vector z P Projecting, and rendering a 2D image in a corresponding directionImage I P . Then 2D image I P Input image discrimination network, discrimination of 2D image I P Whether an image is generated or a real image. The projection rendering network performs a specific geometric operation, and there is no network parameter to be learned, and those skilled in the art can set the structure and parameters of the projection rendering network according to actual conditions, which is not limited by the present disclosure.
In this way, a 2D image projected in any direction can be generated, various possible projection results are provided, and the diversity of data is guaranteed.
In some possible implementations, the randomly projecting the 3D image to obtain the 2D image includes: carrying out random projection processing on the 3D image by using a plurality of random attitude vectors to obtain a plurality of 2D images;
the step of determining the predicted image category of the 2D image through an image discrimination network comprises: determining, by the image discrimination network, a predicted image category for each of the plurality of 2D images.
For example, a plurality of random pose vectors z may be generated P The projection rendering network respectively renders the external data S and the texture data T according to a plurality of random attitude vectors z P Performing projection to obtain a plurality of 2D images; the plurality of 2D images are input to an image discrimination network and processed, and predicted image types of the plurality of 2D images are output for each of the plurality of 2D images.
In some possible implementations, multiple attitude vectors z may also be generated at preset angular intervals P Thereby obtaining 2D images projected in various directions, the present disclosure generates a plurality of attitude vectors z P The specific manner of the above is not limited.
In this way, two-dimensional projection images at different angles can be generated as many as possible, and the categories of the projection images can be determined from different viewing angles, so that the comprehensiveness of the determination can be improved.
In some possible implementations, the image processing method according to an embodiment of the present disclosure further includes:
inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image category of the 2D sample image;
and adjusting the network parameters of the image discrimination network based on the predicted image type of the 2D sample image and the labeled image type of the 2D sample image.
For example, the real 2D sample image may be used as a training sample, and the real 2D sample image is input to the image discrimination network for processing, so as to obtain a predicted image category (real image) of the 2D sample image. And determining a second network loss of the GAN according to the predicted image type of the 2D sample image and the labeled image type of the 2D sample image, and adjusting at least one network parameter of the GAN, such as the network parameter of the image discrimination network, based on the second network loss.
In some possible implementations, the image discrimination network D P The generated 2D image can be distinguished from the real 2D image based on image quality. The truer the image is, the more the image discrimination network D P The closer the probability of output is to 1; otherwise, the image discriminating network D P The closer the probability of output is to 0. At the beginning of the confrontational training, the 3D image generated by the 3D image generation network is of poor quality. During the countermeasure training process, the 3D image generation network (texture generation network and 3D shape generation network) tries to confuse the image discrimination network D, and the image discrimination network D P An attempt is made to distinguish the generated image from the real image, both competing against each other. With the continuous training of the network, the 3D image generation network is able to generate higher quality 3D image data.
In some possible implementation manners, after a plurality of network parameter adjustments, under the condition that a training end condition is met, a trained image discrimination network, a texture generation network and a 3D shape generation network can be obtained, and the trained texture generation network and the trained 3D shape generation network can be used for generating a 3D image. The present disclosure does not limit the specific manner of adjusting the network parameters and the set training end conditions.
In this way, a texture generation network and a 3D shape generation network with better performance can be obtained.
In some possible implementations, the image processing method according to an embodiment of the present disclosure further includes:
inputting texture data of the 3D image into a texture discrimination network for processing to obtain a prediction data type of the texture data of the 3D image, wherein the prediction data type is real texture data or generated texture data;
inputting real sample texture data into the texture judging network for processing to obtain the predicted data category of the sample texture data;
adjusting a network parameter of at least one of the texture discrimination network and the texture generation network based on a prediction data class of texture data of the 3D image and a prediction data class of the sample texture data.
For example, during the training process, a texture discrimination network (discriminator) D can also be introduced T To distinguish the generated face texture from the real face texture, thus countering the training texture generation network G T . In this case, the texture may be generated into a network G T The generated texture data T is input into a texture discrimination network for processing, and the predicted data type of the texture data is output. Wherein the prediction data category comprises real texture data or generated texture data.
In some possible implementations, the real sample texture data may be employed as a training set. For example, with the data set of the face UV map of the related art, real sample texture data can also be acquired from a real 3D image by means of face reconstruction or the like. The present disclosure is not so limited.
In some possible implementations, the sample texture data is input into a texture discrimination network for processing, and a prediction data type (real texture) of the sample texture data can be output. Determining network loss of a texture generation network and a texture discrimination network according to the prediction data type of texture data of the 3D image and the prediction data type of sample texture data; and then the network parameters of at least one of the texture generation network and the texture discrimination network can be reversely adjusted according to the network loss.
In some possible implementations, the texture discrimination network can distinguish between generating the face texture and the real face texture. The more realistic the texture is, the closer the probability of the texture discrimination network output is to 1, and conversely, the closer the probability of the texture discrimination network output is to 0. When the countermeasure training is started, the texture data of the generated 3D image is not realistic enough, and the texture discrimination network can easily distinguish the generated texture data from the real sample texture data.
During the countermeasure training process, the texture generation network attempts to confuse the generated texture data with the true sample texture data, and the texture discrimination network D T An attempt is made to distinguish between the generated texture data and the true sample texture data, both of which compete against each other. With the continuous training of the network, the texture generation network can generate texture data with higher quality, and finally the texture discrimination network D is enabled T It is not possible to distinguish between the generated texture data and the real sample texture data.
In some possible implementation manners, after multiple network parameter adjustments, the trained texture discrimination network and the trained texture generation network can be obtained under the condition that the training end condition is met. The present disclosure does not limit the specific manner of adjusting the network parameters and the set training end conditions.
By the method, the texture distinguishing network and the texture generating network can be trained in a confrontation mode, and the texture generating network with better performance is obtained.
By adopting the embodiment of the disclosure, a reasonable 3D image representation form is designed, and the 3D image is projected and rendered into 2D planes in different directions to obtain a 2D image. And generating a high-quality 3D image by using the 2D image and the GAN network structure, namely, using the 2D image as training data, and performing end-to-end training by using the GAN network structure to generate the high-quality 3D image.
Fig. 2 shows a schematic diagram of a neural network in an image processing method according to an embodiment of the present disclosure. As shown in FIG. 2, during training, the neural network includes a 3D shape generation network G S Texture generation network G T Projection rendering network R and image discrimination network D P And a texture discriminating network D T
During processing, the first one may beRandom vector z S Input 3D shape generating network G S The second random vector z is generated by the middle processing, the shape data S of the 3D image is output T Input texture generating network G T The method comprises the steps of (1) middle processing, and outputting texture data T of a 3D image; shape data S and texture data T and random attitude vector z P Inputting into a projection rendering network R for processing, and outputting a 2D image I obtained by projection P (ii) a 2D image I P And a true 2D sample image x separately input image discrimination network D P Output 2D image I P And a predicted image category for the 2D sample image x; combining texture data T with sample texture data x of a 2D sample image x T Separately input texture discrimination network D T Output texture data T and sample texture data x T The predicted data category of (1).
In some possible implementations, the network penalty of the 3D shape generation network, the texture generation network, the image discrimination network, and the texture discrimination network may be represented by the following equations (2) - (5):
Figure BDA0002008922750000101
Figure BDA0002008922750000102
Figure BDA0002008922750000103
Figure BDA0002008922750000104
in the formulas (2) to (5),
Figure BDA0002008922750000111
representing a network loss of the 3D profile generation network;
Figure BDA0002008922750000112
represents the net loss of the texture generating network;
Figure BDA0002008922750000113
representing network loss of the image discrimination network;
Figure BDA0002008922750000114
representing the network loss of the texture discrimination network;
Figure BDA0002008922750000115
and
Figure BDA0002008922750000116
respectively representing different energy functions;
Figure BDA0002008922750000117
and
Figure BDA0002008922750000118
the adaptive parameters when the network parameters are updated for the t time are shown; lambda denotes
Figure BDA0002008922750000119
The weight of (c). The value of λ can be set by those skilled in the art according to practical situations, and the disclosure is not limited thereto.
In some possible implementation manners, in the process of training the 3D shape generation network, the texture generation network, the image discrimination network, and the texture discrimination network against, the network parameters of each network may be respectively adjusted along a reverse propagation path (in the direction of a dotted arrow in fig. 2), so that the value of the loss function of each network is sequentially minimized. Under the condition of meeting the training condition, each trained network can be obtained.
FIG. 3 shows a flow diagram of a method of training an image generation network according to an embodiment of the present disclosure. As shown in fig. 3, according to an embodiment of the present disclosure, there is also provided a training method of an image generation network, including:
step S31, carrying out image generation processing through the image generation network to obtain image data of a 3D image;
step S32, inputting the image data of the 3D image into a projection rendering network for projection processing to obtain a 2D image;
step S33, inputting the 2D image into an image discrimination network for processing to obtain a predicted image type of the 2D image, wherein the predicted image type is a real image or a generated image;
and step S34, adjusting network parameters of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image.
According to the embodiment of the disclosure, image data of a 3D image can be generated through an image generation network, the 3D image data is projected and rendered to obtain a 2D image, the predicted image type of the 2D image is judged through an image discrimination network, and a training image generation network and the image discrimination network are resisted according to the predicted image type, so that the image generation network which can generate a real image and has a good effect is obtained.
In some possible implementations, the image generation network includes a 3D outline generation network and a texture generation network, the image data of the 3D image includes outline data of the 3D image and texture data of the 3D image,
in step S31, performing image generation processing through the image generation network to obtain image data of a 3D image includes:
inputting the first random vector into the 3D shape generation network to perform image shape generation processing to obtain shape data of the 3D image;
and inputting a second random vector into the texture generation network to perform image texture generation processing, so as to obtain texture data of the 3D image.
In some possible implementations, in step S32, inputting the image data of the 3D image into a projection rendering network for projection processing, so as to obtain a 2D image, including:
and inputting the image data of the 3D image and the plurality of random attitude vectors into a projection rendering network for random projection processing to obtain a plurality of 2D images.
In some possible implementations, the method further includes:
inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image category of the 2D sample image;
the adjusting the network parameter of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image includes:
adjusting a network parameter of at least one of the image discrimination network and the 3D image generation network based on a predicted image category of the 2D sample image and a predicted image category of the 2D image.
In some possible implementations, the method further includes:
inputting texture data in the image data of the 3D image into a texture discrimination network for processing to obtain a prediction data type of the texture data of the 3D image, wherein the prediction data type is real texture data or generated texture data;
inputting real sample texture data into the texture judging network for processing to obtain the predicted data category of the sample texture data;
adjusting a network parameter of at least one of the texture discrimination network and a texture generation network of the image generation network based on a prediction data class of texture data of the 3D image and a prediction data class of the sample texture data.
Various details of the image generation network training process have been described above, and are not repeated here.
According to the image processing method and the training method of the image generation network disclosed by the embodiment of the disclosure, the 3D image with a good effect can be generated, and the face appearance and the face texture of the generated 3D image are vivid enough under different visual angles. According to embodiments of the present disclosure, the type of training data, which is a high quality 2D image that is easy to acquire, and the type of generation data, which is a high quality 3D image that is more difficult to acquire, may be different.
According to the embodiment of the disclosure, the face appearance, the face problem and the face posture can be respectively controlled, one factor of the face appearance, the face problem and the face posture can be controlled to change, and the other two factors are kept unchanged. For example, a face-side paired dataset may be generated that controls face pose changes while keeping the other two factors constant.
According to the embodiment of the disclosure, the projection rendering network is integrated into the deep learning network, the projection rendering network supports back propagation, end-to-end training of the network is not affected, and meanwhile, each step of operation of the projection rendering network has a definite physical meaning and has better interpretability than the traditional neural network structure.
According to the embodiment of the disclosure, a controlled 2D face data set can be generated, the performance of technologies such as face detection, face recognition and living body detection is improved, the collection number of 3D images can be reduced, and the cost is reduced. The disclosed embodiments may also be used in entertainment products, all including 3D image generation capability data acquisition systems and entertainment products.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
The above-mentioned method embodiments can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space and will not be repeated in this disclosure.
In addition, the present disclosure also provides an image processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the image processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.
Fig. 4 illustrates a block diagram of an image processing apparatus according to an embodiment of the present disclosure, and as illustrated in fig. 4, the image processing apparatus of an embodiment of the present disclosure includes:
a vector obtaining module 41, configured to obtain a first random vector and a second random vector;
and the shape and texture generating module 42 is configured to input the first random vector to a 3D shape generating network for image shape generation processing to obtain shape data of a 3D image, and input the second random vector to a texture generating network for image texture generation processing to obtain texture data of the 3D image.
In some possible implementations, the apparatus further includes: and the 3D image generation module is used for generating the 3D image according to the shape data of the 3D image and the texture data of the 3D image.
In some possible implementations, the shape and texture generating module includes: and the appearance data acquisition submodule is used for obtaining the appearance data of the 3D image by the 3D appearance generation network by utilizing the prior data and the first random vector, wherein the prior data is obtained by counting the positions of pixel points in a plurality of existing 3D images.
In some possible implementations, the 3D shape generation network and the texture generation network are obtained by performing joint training using 2D images as training samples.
In some possible implementations, the apparatus further includes: the first projection module is used for performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image; the first class prediction module is used for determining a predicted image class of the 2D image through an image discrimination network, wherein the predicted image class is a generated image or a real image; a first parameter adjusting module, configured to adjust a network parameter of at least one of the 3D outline generating network, the texture generating network, and the image discriminating network based on a predicted image category of the 2D image.
In some possible implementations, the first projection module includes: and the first random projection submodule is used for inputting the appearance data of the 3D image, the texture data of the 3D image and the random attitude vector into a projection rendering network for random projection processing to obtain a 2D image.
In some possible implementations, the stochastic projection sub-module is to: carrying out random projection processing on the 3D image by using a plurality of random attitude vectors to obtain a plurality of 2D images; the first class prediction module is to: determining, by the image discrimination network, a predicted image category for each of the plurality of 2D images.
In some possible implementations, the apparatus further includes: the second type prediction module is used for inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image type of the 2D sample image; and the second parameter adjusting module is used for adjusting the network parameters of the image judgment network based on the predicted image type of the 2D sample image and the labeled image type of the 2D sample image.
In some possible implementations, the apparatus further includes: a third type prediction module, configured to input texture data of the 3D image to a texture discrimination network for processing, so as to obtain a prediction data type of the texture data of the 3D image, where the prediction data type is real texture data or generated texture data; the fourth type prediction module is used for inputting real sample texture data into the texture discrimination network for processing to obtain the prediction data type of the sample texture data; a third parameter adjusting module, configured to adjust a network parameter of at least one of the texture decision network and the texture generation network based on a predicted data type of texture data of the 3D image and a predicted data type of the sample texture data.
FIG. 5 shows a block diagram of a training apparatus of an image generation network according to an embodiment of the present disclosure. As shown in fig. 5, the training apparatus for an image generation network according to an embodiment of the present disclosure includes:
an image data obtaining module 51, configured to perform image generation processing through the image generation network to obtain image data of a 3D image;
the second projection module 52 is configured to input image data of the 3D image to a projection rendering network for projection processing, so as to obtain a 2D image;
a fifth type prediction module 53, configured to input the 2D image into an image discrimination network for processing, so as to obtain a predicted image type of the 2D image, where the predicted image type is a real image or a generated image;
a fourth parameter adjusting module 54, configured to adjust a network parameter of at least one of the image generation network and the image discrimination network based on a predicted image category of the 2D image.
In some possible implementations, the image generation network includes a 3D appearance generation network and a texture generation network, the image data of the 3D image includes appearance data of the 3D image and texture data of the 3D image; the image data obtaining module includes: the appearance generation submodule is used for inputting a first random vector into the 3D appearance generation network to carry out image appearance generation processing so as to obtain appearance data of a 3D image; and the texture generation submodule is used for inputting a second random vector into the texture generation network to perform image texture generation processing so as to obtain texture data of the 3D image.
In some possible implementations, the second projection module includes: and the second random projection submodule is used for inputting the image data of the 3D image and the plurality of random attitude vectors into a projection rendering network for random projection processing to obtain a plurality of 2D images.
In some possible implementations, the apparatus further includes: the sixth type prediction module is used for inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image type of the 2D sample image; the fourth parameter adjusting module includes: an adjusting sub-module, configured to adjust a network parameter of at least one of the image discrimination network and the 3D image generation network based on a predicted image category of the 2D sample image and a predicted image category of the 2D image.
In some possible implementations, the apparatus further includes: a seventh type prediction module, configured to input texture data in the image data of the 3D image to a texture discrimination network for processing, so as to obtain a prediction data type of the texture data of the 3D image, where the prediction data type is real texture data or generated texture data; the eighth type prediction module is used for inputting real sample texture data into the texture judging network for processing to obtain the prediction data type of the sample texture data; a fifth parameter adjusting module, configured to adjust a network parameter of at least one of the texture discrimination network and a texture generation network of the image generation network based on a predicted data type of texture data of the 3D image and a predicted data type of the sample texture data.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.
An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons include, but are not limited to: a home button, a volume button, a start button, and a lock button.
Sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.
Fig. 7 is a block diagram illustrating an electronic device 900 in accordance with an example embodiment. For example, the electronic device 900 may be provided as a server. Referring to fig. 7, electronic device 900 includes a processing component 922, which further includes one or more processors, and memory resources, represented by memory 932, for storing instructions, such as applications, that are executable by processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 922 is configured to execute instructions to perform the methods described above.
The electronic device 900 may also include a power component 926 configured to perform power management for the electronic device 900, a wired or wireless network interface 950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 958. The electronic device 900 may operate based on an operating system stored in the memory 932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 932, is also provided that includes computer program instructions executable by the processing component 922 of the electronic device 900 to perform the above-described method.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (26)

1. An image processing method, characterized in that the method comprises:
acquiring a first random vector and a second random vector;
inputting the first random vector into a 3D appearance generating network to perform image appearance generating processing to obtain appearance data of a 3D image, and inputting the second random vector into a texture generating network to perform image texture generating processing to obtain texture data of the 3D image;
generating the 3D image according to the shape data of the 3D image and the texture data of the 3D image;
the inputting the first random vector into a 3D shape generation network for processing to obtain shape data of a 3D image includes:
the 3D shape generation network obtains shape data of the 3D image by using prior data and the first random vector, wherein the prior data is obtained by counting pixel point positions in a plurality of existing 3D images.
2. The method of claim 1, wherein the 3D shape generation network and the texture generation network are obtained by joint training using 2D images as training samples.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image;
determining a predicted image type of the 2D image through an image discrimination network, wherein the predicted image type is a generated image or a real image;
and adjusting network parameters of at least one of the 3D appearance generation network, the texture generation network and the image discrimination network based on the predicted image category of the 2D image.
4. The method according to claim 3, wherein the performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image comprises:
and inputting the appearance data of the 3D image, the texture data of the 3D image and the random attitude vector into a projection rendering network for random projection processing to obtain a 2D image.
5. The method of claim 4, wherein randomly projecting the 3D image to obtain a 2D image comprises: carrying out random projection processing on the 3D image by utilizing a plurality of random attitude vectors to obtain a plurality of 2D images;
the determining the predicted image category of the 2D image through the image discrimination network comprises: determining, by the image discrimination network, a predicted image category for each of the plurality of 2D images.
6. The method of claim 3, further comprising:
inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image category of the 2D sample image;
and adjusting the network parameters of the image discrimination network based on the predicted image type of the 2D sample image and the labeled image type of the 2D sample image.
7. The method of claim 3, further comprising:
inputting texture data of the 3D image into a texture discrimination network for processing to obtain a prediction data type of the texture data of the 3D image, wherein the prediction data type is real texture data or generated texture data;
inputting real sample texture data into the texture judging network for processing to obtain the predicted data category of the sample texture data;
adjusting a network parameter of at least one of the texture discrimination network and the texture generation network based on a prediction data class of texture data of the 3D image and a prediction data class of the sample texture data.
8. A method for training an image generation network, comprising:
performing image generation processing through the image generation network to obtain image data of a 3D image;
inputting the image data of the 3D image into a projection rendering network for projection processing to obtain a 2D image;
inputting the 2D image into an image discrimination network for processing to obtain a predicted image type of the 2D image, wherein the predicted image type is a real image or a generated image;
and adjusting network parameters of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image.
9. The method of claim 8, wherein the image generation network comprises a 3D appearance generation network and a texture generation network, and wherein the image data of the 3D image comprises appearance data of the 3D image and texture data of the 3D image;
the image generation processing through the image generation network to obtain image data of a 3D image includes:
inputting a first random vector into the 3D appearance generating network to perform image appearance generating processing to obtain appearance data of a 3D image;
and inputting a second random vector into the texture generation network to perform image texture generation processing to obtain texture data of the 3D image.
10. The method according to claim 8 or 9, wherein the inputting the image data of the 3D image into a projection rendering network for projection processing to obtain a 2D image comprises:
and inputting the image data of the 3D image and the plurality of random attitude vectors into a projection rendering network for random projection processing to obtain a plurality of 2D images.
11. The method of claim 8, further comprising:
inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image category of the 2D sample image;
the adjusting the network parameter of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image includes:
adjusting a network parameter of at least one of the image discrimination network and the 3D image generation network based on a predicted image category of the 2D sample image and a predicted image category of the 2D image.
12. The method of claim 8, further comprising:
inputting texture data in the image data of the 3D image into a texture discrimination network for processing to obtain a prediction data type of the texture data of the 3D image, wherein the prediction data type is real texture data or generated texture data;
inputting real sample texture data into the texture judging network for processing to obtain the predicted data category of the sample texture data;
adjusting a network parameter of at least one of the texture discrimination network and a texture generation network of the image generation network based on a prediction data class of texture data of the 3D image and a prediction data class of the sample texture data.
13. An image processing apparatus, characterized in that the apparatus comprises:
the vector acquisition module is used for acquiring a first random vector and a second random vector;
the appearance and texture generation module is used for inputting the first random vector into a 3D appearance generation network to perform image appearance generation processing to obtain appearance data of a 3D image, and inputting the second random vector into the texture generation network to perform image texture generation processing to obtain texture data of the 3D image;
the 3D image generation module is used for generating the 3D image according to the shape data of the 3D image and the texture data of the 3D image;
the shape and texture generation module comprises:
and the shape data acquisition submodule is used for acquiring shape data of the 3D image by the 3D shape generation network by utilizing prior data and the first random vector, wherein the prior data is obtained by counting the positions of pixel points in a plurality of existing 3D images.
14. The apparatus of claim 13, wherein the 3D shape generation network and the texture generation network are obtained by joint training using 2D images as training samples.
15. The apparatus of claim 13 or 14, further comprising:
the first projection module is used for performing projection processing on the 3D image according to the shape data of the 3D image and the texture data of the 3D image to obtain a 2D image;
the first type prediction module is used for determining the type of a prediction image of the 2D image through an image discrimination network, wherein the type of the prediction image is a generated image or a real image;
a first parameter adjusting module, configured to adjust a network parameter of at least one of the 3D outline generating network, the texture generating network, and the image discriminating network based on a predicted image category of the 2D image.
16. The apparatus of claim 15, wherein the first projection module comprises:
and the first random projection submodule is used for inputting the appearance data of the 3D image, the texture data of the 3D image and the random attitude vector into a projection rendering network for random projection processing to obtain a 2D image.
17. The apparatus of claim 16, wherein the stochastic projection sub-module is configured to: carrying out random projection processing on the 3D image by using a plurality of random attitude vectors to obtain a plurality of 2D images;
the first class prediction module is to: determining, by the image discrimination network, a predicted image category for each of the plurality of 2D images.
18. The apparatus of claim 15, further comprising:
the second type prediction module is used for inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image type of the 2D sample image;
and the second parameter adjusting module is used for adjusting the network parameters of the image judgment network based on the predicted image type of the 2D sample image and the labeled image type of the 2D sample image.
19. The apparatus of claim 15, further comprising:
a third type prediction module, configured to input texture data of the 3D image to a texture discrimination network for processing, so as to obtain a prediction data type of the texture data of the 3D image, where the prediction data type is real texture data or generated texture data;
the fourth type prediction module is used for inputting real sample texture data into the texture discrimination network for processing to obtain the prediction data type of the sample texture data;
a third parameter adjusting module, configured to adjust a network parameter of at least one of the texture decision network and the texture generation network based on a predicted data type of texture data of the 3D image and a predicted data type of the sample texture data.
20. An apparatus for training an image generation network, comprising:
the image data acquisition module is used for carrying out image generation processing through the image generation network to obtain image data of the 3D image;
the second projection module is used for inputting the image data of the 3D image into a projection rendering network for projection processing to obtain a 2D image;
a fifth type prediction module, configured to input the 2D image into an image discrimination network for processing, so as to obtain a predicted image type of the 2D image, where the predicted image type is a real image or a generated image;
and the fourth parameter adjusting module is used for adjusting the network parameters of at least one of the image generation network and the image discrimination network based on the predicted image category of the 2D image.
21. The apparatus of claim 20, wherein the image generation network comprises a 3D appearance generation network and a texture generation network, and wherein the image data of the 3D image comprises appearance data of the 3D image and texture data of the 3D image;
the image data obtaining module includes:
the appearance generation submodule is used for inputting a first random vector into the 3D appearance generation network to carry out image appearance generation processing so as to obtain appearance data of a 3D image;
and the texture generation submodule is used for inputting a second random vector into the texture generation network to perform image texture generation processing so as to obtain texture data of the 3D image.
22. The apparatus of claim 20 or 21, wherein the second projection module comprises:
and the second random projection submodule is used for inputting the image data of the 3D image and the plurality of random attitude vectors into a projection rendering network for random projection processing to obtain a plurality of 2D images.
23. The apparatus of claim 20, further comprising:
the sixth type prediction module is used for inputting a real 2D sample image into the image discrimination network for processing to obtain a predicted image type of the 2D sample image;
the fourth parameter adjustment module comprises:
and the adjusting sub-module is used for adjusting the network parameters of at least one of the image discrimination network and the 3D image generation network based on the predicted image category of the 2D sample image and the predicted image category of the 2D image.
24. The apparatus of claim 20, further comprising:
a seventh class prediction module, configured to input texture data in the image data of the 3D image to a texture discrimination network for processing, to obtain a prediction data class of the texture data of the 3D image, where the prediction data class is real texture data or generated texture data;
the eighth type prediction module is used for inputting real sample texture data into the texture judging network for processing to obtain the prediction data type of the sample texture data;
a fifth parameter adjusting module, configured to adjust a network parameter of at least one of the texture discrimination network and a texture generation network of the image generation network based on a predicted data type of texture data of the 3D image and a predicted data type of the sample texture data.
25. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: performing the method of any one of claims 1 to 12.
26. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 12.
CN201910238417.4A 2019-03-27 2019-03-27 Image processing method and device and training method and device of image generation network Active CN109978759B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910238417.4A CN109978759B (en) 2019-03-27 2019-03-27 Image processing method and device and training method and device of image generation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910238417.4A CN109978759B (en) 2019-03-27 2019-03-27 Image processing method and device and training method and device of image generation network

Publications (2)

Publication Number Publication Date
CN109978759A CN109978759A (en) 2019-07-05
CN109978759B true CN109978759B (en) 2023-01-31

Family

ID=67080965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910238417.4A Active CN109978759B (en) 2019-03-27 2019-03-27 Image processing method and device and training method and device of image generation network

Country Status (1)

Country Link
CN (1) CN109978759B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524216B (en) * 2020-04-10 2023-06-27 北京百度网讯科技有限公司 Method and device for generating three-dimensional face data
CN112819947A (en) * 2021-02-03 2021-05-18 Oppo广东移动通信有限公司 Three-dimensional face reconstruction method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784678A (en) * 2017-11-08 2018-03-09 北京奇虎科技有限公司 Generation method, device and the terminal of cartoon human face image
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN109409335A (en) * 2018-11-30 2019-03-01 腾讯科技(深圳)有限公司 Image processing method, device, computer-readable medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6975750B2 (en) * 2000-12-01 2005-12-13 Microsoft Corp. System and method for face recognition using synthesized training images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784678A (en) * 2017-11-08 2018-03-09 北京奇虎科技有限公司 Generation method, device and the terminal of cartoon human face image
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN109409335A (en) * 2018-11-30 2019-03-01 腾讯科技(深圳)有限公司 Image processing method, device, computer-readable medium and electronic equipment

Also Published As

Publication number Publication date
CN109978759A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN110674719B (en) Target object matching method and device, electronic equipment and storage medium
CN110647834B (en) Human face and human hand correlation detection method and device, electronic equipment and storage medium
CN110688951B (en) Image processing method and device, electronic equipment and storage medium
CN109816764B (en) Image generation method and device, electronic equipment and storage medium
CN109697734B (en) Pose estimation method and device, electronic equipment and storage medium
CN109977847B (en) Image generation method and device, electronic equipment and storage medium
CN110634167B (en) Neural network training method and device and image generation method and device
CN109584362B (en) Three-dimensional model construction method and device, electronic equipment and storage medium
CN109145970B (en) Image-based question and answer processing method and device, electronic equipment and storage medium
CN110458218B (en) Image classification method and device and classification network training method and device
CN111243011A (en) Key point detection method and device, electronic equipment and storage medium
CN109165738B (en) Neural network model optimization method and device, electronic device and storage medium
CN111553864A (en) Image restoration method and device, electronic equipment and storage medium
CN110706339B (en) Three-dimensional face reconstruction method and device, electronic equipment and storage medium
CN111241887A (en) Target object key point identification method and device, electronic equipment and storage medium
CN109255784B (en) Image processing method and device, electronic equipment and storage medium
CN112184787A (en) Image registration method and device, electronic equipment and storage medium
CN111091610A (en) Image processing method and device, electronic equipment and storage medium
CN109978759B (en) Image processing method and device and training method and device of image generation network
CN109903252B (en) Image processing method and device, electronic equipment and storage medium
CN109447258B (en) Neural network model optimization method and device, electronic device and storage medium
CN111339880A (en) Target detection method and device, electronic equipment and storage medium
US9665925B2 (en) Method and terminal device for retargeting images
CN111582381B (en) Method and device for determining performance parameters, electronic equipment and storage medium
CN110826463B (en) Face recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant