WO2020207177A1 - 图像增广与神经网络训练方法、装置、设备及存储介质 - Google Patents

图像增广与神经网络训练方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020207177A1
WO2020207177A1 PCT/CN2020/078650 CN2020078650W WO2020207177A1 WO 2020207177 A1 WO2020207177 A1 WO 2020207177A1 CN 2020078650 W CN2020078650 W CN 2020078650W WO 2020207177 A1 WO2020207177 A1 WO 2020207177A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
dimensional
dimensional image
target object
training
Prior art date
Application number
PCT/CN2020/078650
Other languages
English (en)
French (fr)
Inventor
刘颖璐
申豪
石海林
梅涛
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2020207177A1 publication Critical patent/WO2020207177A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the field of computer vision technology, and in particular to an image augmentation and neural network training method, device, equipment and storage medium.
  • Deep learning face and recognition model training requires a large amount of accurately labeled data, but the amount of manually labeled data is very limited.
  • the existing face key point database there are relatively few accurate face key point data for large poses.
  • the main purpose of the embodiments of the present application is to provide an image augmentation method and device, neural network training method and device, computer equipment, and storage medium, which can automatically and quickly obtain key point information carrying designated target objects Augment the image.
  • an image augmentation method including: acquiring a three-dimensional image carrying a set key point annotation of a target object, wherein the three-dimensional image is obtained by reconstructing a two-dimensional image of the target object; Obtain the defective two-dimensional image corresponding to the projection after the three-dimensional image is rotated by a set angle, the defective two-dimensional image includes the conversion coordinates of the set key point of the target object corresponding to the set angle; based on training The latter neural network performs feature extraction on the defective two-dimensional image, and repairs the defective two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture to obtain a repaired image corresponding to the defective two-dimensional image ; Obtain an augmented image set of the target object based on the repaired image.
  • a neural network training method including: obtaining an augmented image set of a target object by using the image augmentation method provided in any embodiment of the present application; and according to the two-dimensional image of the target object A training sample set is formed with the augmented image set; the training sample set is input to a neural network model for training until the neural network model converges, and the trained neural network model is obtained.
  • an image augmentation device includes: an acquisition module configured to acquire a three-dimensional image carrying a set key point annotation of a target object, wherein the three-dimensional image is defined by the A two-dimensional image of the target object is reconstructed and obtained; a projection module configured to obtain a defective two-dimensional image corresponding to the projection after the three-dimensional image is rotated by a set angle, and the defective two-dimensional image includes the set key points of the target object The conversion coordinates corresponding to the set angle; the first processing module is configured to perform feature extraction on the defective two-dimensional image based on the trained neural network, and pair the converted coordinates and posture based on the corresponding relationship between the set key points The defective two-dimensional image is repaired to obtain a repaired image corresponding to the defective two-dimensional image; the second processing module is configured to obtain an augmented image set of the target object based on the repaired image.
  • a neural network training device in a fourth aspect of the embodiments of the present application, includes: a sample generation module configured to obtain an augmented image set of a target object by using the image augmentation method provided in any embodiment of the present application , Forming a training sample set according to the two-dimensional image of the target object and the augmented image set; the training module is configured to input the training sample set into a neural network model for training until the neural network model converges to obtain training After the neural network model.
  • a computer device including: a processor and a memory configured to store a computer program that can run on the processor;
  • the processor is configured to implement the image augmentation method provided by any embodiment of the present application or the neural network training method provided by any embodiment of the present application when the processor is configured to run the computer program.
  • the sixth aspect of the embodiments of the present application provides a computer storage medium in which a computer program is stored.
  • the computer program is executed by a processor, the image augmentation provided by any of the embodiments of the present application is implemented.
  • Method, or implement the neural network training method provided in any embodiment of the present application is implemented.
  • the three-dimensional image carrying the set key point annotation of the target object is obtained, wherein the three-dimensional image is obtained by reconstructing the two-dimensional image of the target object; the three-dimensional image is obtained after rotating a set angle and projecting Corresponding to the defective two-dimensional image, the defective two-dimensional image includes the converted coordinates of the set key point of the target object corresponding to the set angle, so that a piece of original two-dimensional image of the target object Image, a large number of defective two-dimensional images can be acquired; feature extraction of the defective two-dimensional image based on the trained neural network, and repair of the defective two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture, A repaired image corresponding to the defective two-dimensional image is obtained, so that an augmented image set containing key point information of the target object can be accurately and efficiently obtained.
  • the augmented image set of the target object is obtained based on the repaired image, which effectively solves the problems of less training data and inaccurate manual labeling in the neural network, can expand the target object data from different angles, and improve the posture Lower the accuracy of manual annotation, thereby further improving the effect of neural network model training.
  • Figure 1 is a schematic diagram of the currently known process for obtaining 106 key point coordinates
  • Figure 2 is a schematic diagram of the currently known process of generating 106 key point coordinates based on a three-dimensional model
  • FIG. 3 is a schematic flowchart of an image augmentation method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of 106 key points on a human face provided by an embodiment of the application.
  • FIG. 5 is an example sample diagram of a three-dimensional standard model provided by an embodiment of the application.
  • FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of an image augmentation device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of a computer device provided by an embodiment of this application.
  • FIG. 10 is a schematic flowchart of an image augmentation method provided by another embodiment of this application.
  • the target object refers to the object contained in the image to be recognized when it is used for neural network training.
  • This article refers to the human face.
  • Defective two-dimensional image refers to the two-dimensional image obtained according to projection after the three-dimensional image is rotated. Since it is obtained by rotating and projecting the three-dimensional image, it may cause texture holes in the image, which is called a defective two-dimensional image.
  • the three-dimensional standard model refers to a three-dimensional model built from multiple pixels at multiple specific positions of the target object. Taking the target object as a human face as an example, it refers to a three-dimensional model composed of all pixels representing the specific location of the human face For example, if a three-dimensional standard model has 30,000 pixels, the 30,000 pixels are arranged in an orderly manner and each pixel can represent a specific position on the face, such as eyes, mouth, or nose, etc. .
  • Loss function also called cost function (cost function) is the objective function of neural network optimization
  • Neural Networks is a complex network system formed by a large number of simple processing units (called neurons) widely connected to each other. It reflects many basic features of the human brain and is a Highly complex nonlinear dynamic learning system.
  • Figure 1 is a currently known method for generating key points of a human face.
  • a three-dimensional face model By establishing a three-dimensional face model, rotating the three-dimensional face model to a certain angle and projecting the 68-point key point sampling to obtain different angles
  • the two-dimensional image containing 68 key point coordinates is used again to supplement the key points with the interpolation method to obtain 106 key points.
  • This scheme will cause a certain degree in the process of obtaining the two-dimensional image from the three-dimensional face model.
  • the loss of accuracy of the key points on the above, and secondly, the use of interpolation method to supplement the key points to get 106 points will cause a secondary loss of accuracy.
  • FIG. 3 is another known method for generating key points of a human face. Model, rotate the three-dimensional face model to a certain angle and perform the projection after 106 key point sampling to obtain two-dimensional images containing 106 key point coordinates under different angles.
  • FIG. 3 An embodiment of the present application provides an image augmentation method, which includes the following steps:
  • Step 101 Acquire a three-dimensional image that carries a set key point annotation of a target object, wherein the three-dimensional image is obtained by reconstructing a two-dimensional image of the target object;
  • the three-dimensional image is obtained by reconstructing the two-dimensional image of the target object, which means that for the input two-dimensional image containing the target object, by adjusting the combination parameters of the three-dimensional standard model, specifically, it may include a shape expression model and a texture model to obtain The three-dimensional image of the target object with the highest similarity of the input two-dimensional image; here, the target object refers to a human face, the three-dimensional image can be a face model, and the corresponding target object is other objects, the corresponding three-dimensional image can also be other Object model.
  • the three-dimensional image that carries the set key point annotation of the target object refers to the corresponding vertices of the key points of the face in the three-dimensional image.
  • the key points can be 106 key points of the face.
  • 106 of the face The key points on the face are used to indicate a specific position on the face, mainly including eyebrows, eyes, mouth or nose, and facial contours. Compared with the 68 key points, the upper and lower edges of the eyebrows are more completely outlined , Contour information and information on the nose, so it can more completely describe the contours of the face and its facial features.
  • Step 102 Obtain a defective two-dimensional image corresponding to the projection after the three-dimensional image is rotated by a set angle, the defective two-dimensional image includes the converted coordinates of the set key point of the target object corresponding to the set angle ;
  • the defected two-dimensional image corresponding to the projection after obtaining the three-dimensional image rotated by a set angle refers to rotating in three-dimensional space based on the obtained three-dimensional image, for example, placing the three-dimensional image on the origin of the three-dimensional coordinates, from the front of the three-dimensional image Projection is performed to rotate the three-dimensional image on three-dimensional coordinates, and rotate with the coordinate axes X, Y, and Z on the three-dimensional coordinates as the center of the circle, so as to obtain the corresponding defect two-dimensional image after projection at different set angles.
  • the conversion coordinates corresponding to the set angle of the set key point of the target object refers to the establishment of the corresponding relationship between the corresponding two-dimensional defect image and the corresponding three-dimensional image at different angles.
  • a certain The coordinates of the key point A at the initial position are A(x1, y1, z1), and the three-dimensional coordinates of the key point A are obtained by rotating an angle ⁇ as A'(x2, y2, z2), so as to determine the transformation corresponding to the ⁇ angle coordinate.
  • Step 103 Perform feature extraction on the defective two-dimensional image based on the trained neural network, and repair the defective two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture to obtain the two-dimensional defect image.
  • the feature extraction of the defective two-dimensional image based on the trained neural network refers to the feature extraction of the defective two-dimensional image and the acquisition of the set key points corresponding to the angle of the posture corresponding to each defective two-dimensional image Convert coordinates.
  • the defective two-dimensional image refers to a two-dimensional image that is projected to cause a texture hole after the rotation of the three-dimensional image;
  • the repaired image corresponding to the defective two-dimensional image refers to a key point at which the texture hole is repaired based on the trained neural network The image obtained by the coordinates.
  • Step 104 Obtain an augmented image set of the target object based on the repaired image.
  • Obtaining the augmented image set of the target object based on the repaired image refers to an image set composed of repaired images corresponding to the defected two-dimensional image and obtained by repairing each of the defected two-dimensional images, namely Augment the image collection.
  • the three-dimensional image carrying the set key point annotation of the target object is obtained, wherein the three-dimensional image is obtained by reconstructing the two-dimensional image of the target object; the three-dimensional image is obtained after rotating a set angle and projecting Corresponding to the defective two-dimensional image, the defective two-dimensional image includes the converted coordinates of the set key point of the target object corresponding to the set angle, so that a piece of original two-dimensional image of the target object Image, a large number of defective two-dimensional images can be acquired; feature extraction of the defective two-dimensional image based on the trained neural network, and repair of the defective two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture, A repaired image corresponding to the defective two-dimensional image is obtained, so that an augmented image set containing key point information of the target object can be accurately and efficiently obtained.
  • the augmented image set of the target object is obtained based on the repaired image, which effectively solves the problems of less training data and inaccurate manual labeling in the neural network, can expand the target object data from different angles, and improve the posture Lower the accuracy of manual annotation, thereby further improving the effect of neural network model training.
  • the acquiring a three-dimensional image carrying a set key point annotation of the target object includes:
  • Obtain a two-dimensional image of the target object determine a three-dimensional image corresponding to the two-dimensional image and set three-dimensional coordinates of key points based on the two-dimensional image and the key point mapping relationship included in the three-dimensional standard model.
  • a two-dimensional image refers to an original picture that contains a target object taken or drawn for reconstruction and acquisition of a three-dimensional image.
  • the target object here generally refers to a human face, but may also be other objects.
  • the three-dimensional standard model refers to a model composed of multiple pixels that represent specific positions on the face. For example, see Figure 5. If a three-dimensional standard model has 30,000 pixels, the 30,000 pixels are arranged in an orderly manner and Each pixel point can represent a specific position on the face, such as eyes, mouth or nose; among them, the key points can be 106 key points on the face, and the three-dimensional standard model can be a three-dimensional variable face model (3DMM) Construct a three-dimensional face model.
  • 3DMM three-dimensional variable face model
  • the three-dimensional variable face model (3DMM) is based on the three-dimensional face database. It is constrained by the statistics of the face shape and the face texture, and at the same time, it can generate high-precision three-dimensional Face model.
  • S i and e i are the shape of a front component and expression analysis (Principal Component Analysis, PCA) part, ⁇ i and ⁇ i denote coefficients corresponding to the shape and expression; Similarly texture model.
  • PCA Principal Component Analysis
  • the three-dimensional standard model is preset with coordinates corresponding to key points and corresponding indexes. Based on the key point mapping relationship included in the two-dimensional image and the three-dimensional standard model, it means to determine the correspondence of each key point in the three-dimensional image based on the key points included in the three-dimensional standard model and the corresponding pixel points in the two-dimensional image in the three-dimensional standard model Further, it is determined that the input two-dimensional image is based on the corresponding three-dimensional coordinates of the key points contained in the three-dimensional image corresponding to the three-dimensional standard model.
  • the 3D coordinates of the key points are set based on the 2D image and the 3D standard model to obtain the 3D image set corresponding to the 2D image. This ensures that the 2D image obtained by rotating projection based on the 3D image contains all The converted coordinates of the set key point of the target object corresponding to the set angle.
  • the acquiring the three-dimensional image rotated by a set angle and then projecting the corresponding defective two-dimensional image includes:
  • Determining the posture corresponding to the target object after the three-dimensional image is rotated by the set angle based on the converted coordinates of the set key point of the target object corresponding to the set angle refers to determining based on the angle of the three-dimensional image rotation The corresponding coordinate offset value.
  • the initial position of a key point A on the three-dimensional image is A(x1, y1, z1)
  • the three-dimensional coordinates of the key point A obtained by rotating an angle ⁇ are A'( x2, y2, z2), that is, it is determined that the corresponding posture under the corresponding converted coordinates is the first posture.
  • the projection matrix refers to the matrix that transforms the coordinates under three-dimensional coordinates into two-dimensional coordinates.
  • the corresponding defective two-dimensional image and the corresponding conversion coordinate of the set key point of the target object corresponding to the set angle may be determined by determining the projection matrix corresponding to the first posture.
  • the method before acquiring the three-dimensional image carrying the set key point annotation of the target object, the method further includes:
  • the original two-dimensional image of the target object is acquired as the image to be augmented, and the image to be augmented is scaled, for example, the image to be augmented is scaled to a fixed size (for example, 128*128).
  • the neural network is a generative confrontation network, and the generative confrontation network includes a generation network and a confrontation network; the neural network is based on the training of the defective two-dimensional image for feature extraction, based on the settings Repairing the defective two-dimensional image by the corresponding relationship between the transformed coordinates of the key points and the posture to obtain a repaired image corresponding to the defective two-dimensional image includes:
  • Generative Adversarial Networks is a deep learning model.
  • the model generates output through the mutual game learning of (at least) two modules in the framework: Generative Model and Discriminative Model.
  • the generation network correspondence refers to the generation model in the generation confrontation network.
  • the confrontation network correspondence is Refers to the confrontation model in the generative confrontation network.
  • the generated two-dimensional repair image and the two-dimensional image are input into the confrontation network, and the determination result of the generated two-dimensional repair image and the two-dimensional image is based on the generated two-dimensional repair
  • the image and the two-dimensional image are input to the confrontation network to determine whether the key points corresponding to the area to be filled in the two-dimensional repaired image and the key points of the two-dimensional image are in the set range, that is, basically the same. If so, determine the generated two-dimensional
  • the repaired image is a repaired image corresponding to the defective two-dimensional image.
  • the generated two-dimensional repair image corresponding to the defective two-dimensional image is generated based on the generation network, and the judgment is made based on the confrontation network, so as to obtain the repaired image corresponding to the defect two-dimensional image.
  • the method before the acquiring a three-dimensional image marked with key points of the target object's settings, the method includes:
  • the two-dimensional image may be a two-dimensional face image, and based on the two-dimensional image, a corresponding three-dimensional training image carrying the set key point label of the target object is obtained.
  • Obtaining a two-dimensional training image set based on the three-dimensional training image rotated by different set angles and then respectively projected corresponding multiple two-dimensional training images refers to rotating and projecting the three-dimensional training image according to different set angles.
  • the three-dimensional training image is placed on the origin of the three-dimensional coordinate, and the three-dimensional training image is projected from the front of the three-dimensional training image to rotate the three-dimensional training image on the three-dimensional coordinate.
  • the three-dimensional training image reconstructed from the two-dimensional image is projected to obtain a two-dimensional image training set of multiple two-dimensional training images after rotating at different set angles based on the three-dimensional training image, thus greatly reducing
  • a two-dimensional image training set can be automatically obtained through a two-dimensional image.
  • the method before acquiring the three-dimensional image with the set key point annotations of the target object, the method further includes:
  • the generative confrontation network is independently iterated alternately until the set loss function satisfies the convergence condition, and the trained generative confrontation network is obtained.
  • the loss function is also called the cost function, which is the objective function of neural network optimization.
  • the process of neural network training or optimization is the process of minimizing the loss function.
  • the smaller the loss function value, the corresponding prediction result The closer the value to the real result is, in this application, the loss function can include the counter loss function and the reconstruction loss function.
  • the generated confrontation network is iterated individually and alternately until the set loss function meets the convergence conditions to obtain the trained The generating confrontation network.
  • performing separate and alternate iterations on the generative confrontation network until the set loss function satisfies the convergence condition refers to updating the generation network parameters through the two-dimensional training image obtained by resampling, reacquiring the generated training and repairing image, and then combining the two The three-dimensional training image and the generated training repair image are input to the confrontation network until the set loss function meets the convergence condition, and the trained generation network and the trained confrontation network are obtained.
  • the neural network backward propagation algorithm is used to iteratively update the values of the parameters of the generation network and the confrontation network, first update the parameters of the confrontation network, and then update the parameters of the generation network through the training color patches obtained by re-sampling, until the setting The loss function of satisfies the convergence condition, and the trained generative confrontation network is obtained.
  • the trained generation network and the trained confrontation network are obtained through alternate iterative training, and the trained generation confrontation network is used to repair texture holes caused by rotation, reducing human errors and high labor costs.
  • the step of performing separate alternate iterations on the generative confrontation network based on the discrimination result until the set loss function satisfies the convergence condition further includes:
  • the loss function corresponding to the generated confrontation network is obtained.
  • the loss function includes two parts, the confrontation loss function and the reconstruction loss function, see formulas (3) and (4) respectively;
  • L adv is the confrontation loss function
  • L rec is the reconstruction loss function
  • G is the generation network
  • D is the discriminant network
  • x is the input sample, that is, the image with texture holes after rotation
  • G(x) is based on the input image x
  • the generated image is the image after texture repair.
  • the purpose of combating loss is to make the generated image more real and natural.
  • the weight coefficient w is introduced in the reconstruction loss function. The purpose is to make the repaired image, except for the filling area, to be as consistent as possible with the original image, thereby ensuring The correctness of the key point coordinates.
  • a neural network training method including:
  • Step 201 Obtain an augmented image set of the target object; form a training sample set according to the two-dimensional image of the target object and the augmented image set; wherein, acquiring the augmented image set of the target object can be implemented arbitrarily in this application Example of the augmented image set of the target object obtained by the image augmentation method provided.
  • Step 202 Input the training sample set into a neural network model for training until the neural network model converges, and the trained neural network model is obtained.
  • the corresponding augmented image set is used as the training sample of the neural network to ensure that more effective training samples can be obtained more quickly to realize the training of the neural network and improve the trained neural network
  • the augmented image since the augmented image includes the accurate positioning of the set key points, the augmented image set can be trained on the neural network to obtain the trained neural network model, which can be used for expressions Recognition, animation synthesis, live broadcast, beauty, special effects camera and other application scenarios.
  • an image augmentation device is also provided, and the device includes:
  • the obtaining module 31 is configured to obtain a three-dimensional image carrying a set key point annotation of a target object, wherein the three-dimensional image is obtained by reconstructing a two-dimensional image of the target object;
  • the projection module 32 is configured to obtain a defective two-dimensional image corresponding to the projection after the three-dimensional image is rotated by a set angle, the defective two-dimensional image including the set key points of the target object corresponding to the set angle Conversion coordinates;
  • the first processing module 33 is configured to perform feature extraction on the defective two-dimensional image based on the trained neural network, and repair the defective two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture to obtain A repaired image corresponding to the defective two-dimensional image;
  • the second processing module 34 is configured to obtain an augmented image set of the target object based on the repaired image.
  • a three-dimensional image carrying the set key points of the target object is obtained, wherein the three-dimensional image is obtained by reconstructing the two-dimensional image of the target object; after obtaining the three-dimensional image and rotating a set angle
  • the defective two-dimensional image corresponding to the projection the defective two-dimensional image includes the conversion coordinates of the set key point of the target object corresponding to the set angle; in this way, a large number of images can be obtained automatically and quickly from a two-dimensional image Defective two-dimensional image; feature extraction of the defect two-dimensional image based on the trained neural network, and repair the defect two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture to obtain the A repair image corresponding to the two-dimensional image; an augmented image set of the target object is obtained based on the repair image.
  • the augmented image set of the target object obtained based on the repaired image effectively solves the problems of less training data and inaccurate manual annotation in the neural network, can expand the face data of different angles, and improve the large posture
  • the accuracy of manual labeling further improves the effect of neural network model training.
  • the acquisition module 31 is further configured to acquire a two-dimensional image of the target object, and determine a three-dimensional image corresponding to the two-dimensional image based on the two-dimensional image and the key point mapping relationship included in the three-dimensional standard model And set the three-dimensional coordinates of key points.
  • the projection module 32 is further configured to determine that the target object corresponds to the target object after the three-dimensional image is rotated by the set angle based on the converted coordinates of the set key point of the target object corresponding to the set angle Determine the corresponding projection matrix based on the posture; determine the defective two-dimensional image corresponding to the posture based on the projection matrix.
  • the acquisition module 31 is further configured to acquire the original two-dimensional image of the target object as the image to be augmented, and process the image to be augmented to obtain the processed two-dimensional image of the target object ;
  • the processing includes scaling processing and/or normalization processing.
  • the first processing module 33 is further configured to input the defective two-dimensional image into the trained generative confrontation network, and obtain the generated two-dimensional confrontation network based on the correspondence between the converted coordinates and the posture of the set key points through the generative network.
  • Two-dimensional repaired image input the generated two-dimensional repaired image and the two-dimensional image into the confrontation network, determine the generated two-dimensional repaired image and the two-dimensional image discrimination result, and determine based on the discrimination result A repaired image corresponding to the defective two-dimensional image.
  • the acquisition module 31 is further configured to obtain a three-dimensional training image based on reconstruction of a two-dimensional image containing the target object, the three-dimensional training image carrying a set key point tag label of the target object;
  • the corresponding two-dimensional training image sets of multiple two-dimensional training images are respectively projected after different setting angles.
  • the first processing module 33 is further configured to input the two-dimensional training image into the initial generative confrontation network, and obtain the corresponding generated image based on the corresponding relationship between the converted coordinates and the posture of the set key points through the generative network.
  • a two-dimensional training image input the two-dimensional image and the generated two-dimensional training image into the confrontation network, and determine the discrimination result of the two-dimensional image and the generated two-dimensional training image; based on the As a result of the discrimination, separate alternate iterations are performed on the generative confrontation network until the set loss function meets the convergence condition, and the trained generative confrontation network is obtained.
  • the first processing module 33 is further configured to obtain the loss function corresponding to the generated confrontation network according to the combination of the confrontation loss function and the reconstruction loss function.
  • a neural network training device is also provided, and the device includes:
  • the sample generation module 41 is configured to obtain an augmented image set of a target object by using the image augmentation method provided in any embodiment of the present application, and form training based on the two-dimensional image of the target object and the augmented image set Sample set
  • the training module 42 is configured to input the training sample set into a neural network model for training, until the neural network model converges, and the trained neural network model is obtained.
  • a computer device including: at least one processor 210 and a memory 211 configured to store a computer program that can run on the processor 210; wherein, FIG. 9
  • the processor 210 illustrated in is not configured to refer to the number of processors as one, but only to refer to the positional relationship of the processor relative to other devices. In practical applications, the number of processors can be one or more.
  • the memory 211 illustrated in FIG. 9 has the same meaning, that is, it is only configured to refer to the positional relationship of the memory relative to other devices. In practical applications, the number of memories can be one or more.
  • processor 210 when the processor 210 is configured to run the computer program, the following steps are executed:
  • Obtaining a three-dimensional image carrying the set key point annotations of the target object wherein the three-dimensional image is obtained by reconstruction of the two-dimensional image of the target object; obtaining the defected two-dimensional image corresponding to the projection after the three-dimensional image is rotated by a set angle,
  • the defective two-dimensional image includes the conversion coordinates of the set key point of the target object corresponding to the set angle; the feature extraction is performed on the defect two-dimensional image based on the trained neural network, based on the setting
  • the converted coordinate of the key point and the posture correspondence relationship are used to repair the defective two-dimensional image to obtain a repaired image corresponding to the defective two-dimensional image; and obtain an augmented image set of the target object based on the repaired image.
  • processor 210 is further configured to execute the following steps when running the computer program:
  • Obtain a two-dimensional image of the target object determine a three-dimensional image corresponding to the two-dimensional image and set three-dimensional coordinates of key points based on the two-dimensional image and the key point mapping relationship included in the three-dimensional standard model.
  • processor 210 is further configured to execute the following steps when running the computer program:
  • processor 210 is further configured to execute the following steps when running the computer program:
  • processor 210 is further configured to execute the following steps when running the computer program:
  • the defective two-dimensional image is input into the trained generative confrontation network, and the generated two-dimensional repaired image is obtained by the generated network based on the corresponding relationship between the converted coordinates of the set key points and the posture; the generated two-dimensional repaired image And the two-dimensional image input to the confrontation network, determine the generated two-dimensional repaired image and the discrimination result of the two-dimensional image, and determine the repaired image corresponding to the defective two-dimensional image based on the discrimination result.
  • processor 210 is further configured to execute the following steps when running the computer program:
  • processor 210 is further configured to execute the following steps when running the computer program:
  • the two-dimensional training image is input into the initial generative confrontation network, and the corresponding generated training repair image is obtained through the generating network based on the corresponding relationship between the converted coordinates and the posture of the set key points;
  • the generated training repair image is input to the confrontation network, and the discrimination result of the two-dimensional training image and the generated training repair image is determined; based on the discrimination result, the generated confrontation network is iterated individually and alternately until the set The loss function satisfies the convergence condition, and the trained generation confrontation network is obtained.
  • processor 210 is further configured to execute the following steps when running the computer program:
  • the loss function corresponding to the generated confrontation network is obtained.
  • processor 210 is further configured to execute the following steps when running the computer program:
  • the training sample set is input to the neural network model for training until the neural network model converges, and the trained neural network model is obtained.
  • the computer device may further include: at least one network interface 212.
  • the various components in the sending end are coupled together through the bus system 213.
  • the bus system 213 is configured to implement connection and communication between these components.
  • the bus system 213 also includes a power bus, a control bus, and a status signal bus.
  • various buses are marked as the bus system 213 in FIG. 5.
  • the memory 211 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory can be a read only memory (ROM, Read Only Memory), a programmable read only memory (PROM, Programmable Read-Only Memory), an erasable programmable read only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access memory (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
  • the volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • SSRAM synchronous static random access memory
  • DRAM dynamic random access Memory
  • SDRAM Synchronous Dynamic Random Access Memory
  • DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • ESDRAM enhanced -Type synchronous dynamic random access memory
  • SLDRAM SyncLink Dynamic Random Access Memory
  • direct memory bus random access memory DRRAM, Direct Rambus Random Access Memory
  • DRRAM Direct Rambus Random Access Memory
  • the memory 211 described in the embodiment of the present application is intended to include, but is not limited to, these and any other suitable types of memory.
  • the memory 211 in the embodiment of the present application is used to store various types of data to support the operation of the sender.
  • Examples of such data include: any computer programs used to operate on the sending end, such as operating systems and applications.
  • the operating system contains various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks.
  • the application program can include various application programs for realizing various application services.
  • the program that implements the method of the embodiment of the present application may be included in the application program.
  • This embodiment also provides a computer storage medium, for example, including a memory 211 storing a computer program.
  • the computer program can be executed by the processor 210 in the sending end to complete the steps of the foregoing method.
  • the computer storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, etc.; it can also be a variety of devices including one or any combination of the above memories, such as smart phones , Tablets, laptops, etc.
  • processor 210 when the processor 210 is used to run the computer program, the following steps are executed:
  • Obtaining a three-dimensional image carrying the set key point annotations of the target object wherein the three-dimensional image is obtained by reconstruction of the two-dimensional image of the target object; obtaining the defected two-dimensional image corresponding to the projection after the three-dimensional image is rotated by a set angle,
  • the defective two-dimensional image includes the conversion coordinates of the set key point of the target object corresponding to the set angle; the feature extraction is performed on the defect two-dimensional image based on the trained neural network, based on the setting
  • the converted coordinate of the key point and the posture correspondence relationship are used to repair the defective two-dimensional image to obtain a repaired image corresponding to the defective two-dimensional image; and obtain an augmented image set of the target object based on the repaired image.
  • Obtain a two-dimensional image of the target object determine a three-dimensional image corresponding to the two-dimensional image and set three-dimensional coordinates of key points based on the two-dimensional image and the key point mapping relationship included in the three-dimensional standard model.
  • the defective two-dimensional image is input into the trained generative confrontation network, and the generated two-dimensional repaired image is obtained by the generated network based on the corresponding relationship between the converted coordinates of the set key points and the posture; the generated two-dimensional repaired image And the two-dimensional image input to the confrontation network, determine the generated two-dimensional repaired image and the discrimination result of the two-dimensional image, and determine the repaired image corresponding to the defective two-dimensional image based on the discrimination result.
  • the two-dimensional training image is input into the initial generative confrontation network, and the corresponding generated two-dimensional training image is obtained through the generating network based on the corresponding relationship between the converted coordinates and the posture of the set key points;
  • the generated training repair image is input to the confrontation network, and the discrimination result of the two-dimensional training image and the generated training repair image is determined; based on the discrimination result, the generated confrontation network is iterated individually and alternately until it is set The loss function of satisfies the convergence condition, and the trained generation confrontation network is obtained.
  • the loss function corresponding to the generated confrontation network is obtained.
  • the training sample set is input to the neural network model for training until the neural network model converges, and the trained neural network model is obtained.
  • the image augmentation method includes the following steps:
  • acquiring a two-dimensional image is acquiring a two-dimensional image containing a face image
  • generating a three-dimensional image refers to determining a three-dimensional image corresponding to the two-dimensional image and setting three-dimensional coordinates of key points based on the two-dimensional image and the key point mapping relationship included in the three-dimensional standard model.
  • key point sampling is based on the three-dimensional image acquisition and setting of the three-dimensional coordinates of the key point.
  • step S14 and step S16 are executed respectively;
  • projecting a three-dimensional image refers to obtaining a corresponding two-dimensional image based on the three-dimensional image
  • acquiring the 106-point key point coordinates of the two-dimensional image refers to determining the 106-point key point coordinates of the corresponding two-dimensional image based on the three-dimensional coordinates of the set key points and the corresponding projection matrix; the two-dimensional image here is the 106-point key point corresponding to the original image Point coordinates
  • rotating the three-dimensional image refers to rotating the three-dimensional image according to a set angle to obtain corresponding different postures
  • projecting a three-dimensional image rotated by a set angle refers to projecting a corresponding two-dimensional missing image based on different postures of the three-dimensional image rotated at a set angle; for example, placing the three-dimensional image on the origin of the three-dimensional coordinates, Project from the front of the three-dimensional image, rotate the three-dimensional image on the three-dimensional coordinate, and rotate the coordinate axis X, Y, and Z on the three-dimensional coordinate as the center of the circle, so as to obtain the corresponding defective two-dimensional image after projection at different set angles.
  • step S18 and step S20 are respectively performed after projecting the three-dimensional image rotated by a set angle
  • repairing the image based on the trained generative confrontation network refers to the feature extraction of the defective two-dimensional image based on the trained generative confrontation network, and the correction of the defective two-dimensional image based on the corresponding relationship between the converted coordinates and the posture of the set key points. Repairing the image to obtain a repaired image corresponding to the defective two-dimensional image;
  • acquiring an augmented image set after rotation repair refers to acquiring an augmented image set of the target object based on the repaired image means repairing each of the defective two-dimensional images to obtain a large number of corresponding defective two-dimensional images
  • the image set composed of repaired images is the augmented image set.
  • the 106-point key point coordinates of the rotated augmented image means that the corresponding 106-point key point coordinates are respectively obtained based on the defective two-dimensional image obtained in step S17.
  • the above-mentioned embodiments of the present application at least solve the following problems: on the one hand, it can reduce the large dependence on the annotation data in the key point positioning of the face and the data preparation time before training, and collect massive training data for model training; On the other hand, it effectively solves the problem of insufficient training data in large poses and inaccurate manual labeling. It can expand face data from different angles, and improve the accuracy of manual labeling in large poses, thereby further improving the effect of model training .
  • the defected two-dimensional image includes the conversion coordinates of the set key point of the target object corresponding to the set angle; again, based on the training of the neural network Perform feature extraction on the defective two-dimensional image, repair the defective two-dimensional image based on the corresponding relationship between the converted coordinates of the set key points and the posture, to obtain a repaired image corresponding to the defective two-dimensional image; finally, based on the Repair the image to obtain an augmented image set of the target object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了图像增广与神经网络训练方法、装置、设备及存储介质,包括:获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;基于所述修复图像得到所述目标对象的增广图像集。

Description

图像增广与神经网络训练方法、装置、设备及存储介质
相关申请的交叉引用
本申请基于申请号为201910282291.0、申请日为2019年04月09日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本申请。
技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种图像增广与神经网络训练方法、装置、设备及存储介质。
背景技术
深度学习的人脸与识别模型的训练需要大量精确标注的数据,但是人工标注的数据量非常有限,另外,对于大姿态的人脸图像来说进行人工标注的难度很高,由于自遮挡和大姿态的存在,在人工标注的过程中,对于不可见位置的标注,往往需要猜测关键点的位置,带有一定的主观性,例如确定一个人的左嘴角位置,不同的人标注的结果可能会有些许偏差,标注的准确性难以把握。现有的人脸关键点数据库中,精确的大姿态人脸关键点数据也比较少。
发明内容
有鉴于此,本申请实施例的主要目的在于提供一种图像增广的方法及装置、神经网络训练方法及装置、计算机设备及存储介质,能够自动快速获取携带有指定目标对象的关键点信息的增广图像。
为达到上述目的,本申请实施例的技术方案是这样实现的:
本申请实施例第一方面,提供了一种图像增广方法,包括:获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;基于所述修复图像得到所述目标对象的增广图像集。
本申请实施例的第二方面,提供一种神经网络训练方法,包括:采用本申请任意实施例所提供的图像增广方法获得目标对象的增广图像集;根据所述目标对象的二维图像和所述增广图像集形成训练样本集;将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
本申请实施例的第三方面,提供了一种图像增广装置,所述装置包括:获取模块,配置为获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;投影模块,配置为获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;第一处理模块,配置为基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;第二处理模块,配置为基于所述修复图像得到所述目标对象的增广图像集。
本申请实施例的第四方面,提供了一种神经网络训练装置,所述装置包括:样本生成模块,配置为采用本申请任意实施例所提供的图像增广方法获得目标对象的增广图像集,根据所述目标对象的二维图像和所述增广 图像集形成训练样本集;训练模块,配置为将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
本申请实施例的第五方面,提供了一种计算机设备,包括:处理器和配置为存储能够在处理器上运行的计算机程序的存储器;
其中,所述处理器配置为运行所述计算机程序时,实现本申请任一实施例所提供的图像增广方法、或实现本申请任一实施例所提供的神经网络训练方法。
本申请实施例的第六方面,提供了一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现本申请任一实施例所提供的图像增广方法、或实现本申请任一实施例所提供的神经网络训练方法。
本申请上述实施例中,通过获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,如此,通过所述目标对象的一张原始二维图像,可以获取大量缺损二维图像;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像,从而可以准确且高效地得到包含有所述目标对象的关键点信息的增广图像集。如此,基于所述修复图像得到所述目标对象的增广图像集,有效的解决了神经网络中训练数据少、人工标注不准确的问题,可以扩充不同角度的目标对象数据,并且提升了大姿态下人工标注的准确率,从而进一步提升神经网络模型训练的效果。
附图说明
图1为目前已知的获得106点关键点坐标的流程示意图;
图2为目前已知的基于三维模型生成106点关键点坐标的流程示意图;
图3为本申请一实施例提供的图像增广方法的流程示意图;
图4为本申请一实施例提供的人脸106点关键点的示意图;
图5为本申请一实施例提供的三维标准模型的实例样图;
图6为本申请一实施例提供的神经网络训练方法的流程示意图
图7为本申请一实施例提供的图像增广装置的结构示意图;
图8为本申请一实施例提供的神经网络训练装置的结构示意图;
图9为本申请一实施例提供的计算机设备的结构示意图;
图10为本申请另一实施例提供的图像增广方法的流程示意图。
具体实施方式
以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。
对本申请进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)目标对象,是指用于神经网络训练时待识别图像中包含的对象,本文中是指人脸。
2)缺损二维图像,指三维图像旋转后根据投影得到的二维图像,由于是根据三维图像通过旋转后投影得到,可能造成图像中纹理空洞的情况, 将其称之为缺损二维图像。
3)三维标准模型,指由根据目标对象的多个特定位置的多个像素点建立的立体模型,以目标对象为人脸为例,是指通过表示所述人脸的特定位置的所有像素点构成的模型,例如,若一个三维标准模型具有3万个像素点,则该3万个像素点有序排列且每个像素点能够表示人脸上的某一特定位置,例如眼睛、嘴巴或鼻子等。
4)二维训练图像,用于图像训练的样本图像。
5)损失函数(loss function),也叫代价函数(cost function),是神经网络优化的目标函数;
6)神经网络(Neural Networks,NN),是由大量的、简单的处理单元(称为神经元)广泛地互相连接而形成的复杂网络系统,它反映了人脑功能的许多基本特征,是一个高度复杂的非线性动力学习系统。
请参阅图1,为目前已知的一种人脸关键点的生成方法,通过建立三维人脸模型,将三维人脸模型旋转一定角度以及进行68点关键点采样后的投影,得到不同角度下的包含68点关键点坐标的二维图像,然后再次利用插值的方法对关键点进行补充,得到106点的关键点,该方案在由三维人脸模型投影得到二维图像过程中会造成一定程度上的关键点精度的损失,其次利用插值方法进行关键点补充得到106点的关键会造成精度上的二次损失。
因此,基于上述方案,目前还提出了另外一种减少了一次维度转换造成的关键点精度损失的方案,参见图2,为另一已知的人脸关键点的生成方法,通过建立三维人脸模型,将三维人脸模型旋转一定角度以及进行106点关键点采样后的投影,得到不同角度下的包含106点关键点坐标的二维图像,然而该方法生成的图像中会存在缺损,同样会造成一定程度上的关键点精度的损失。基于上述已知的方案存在的问题,请参阅图3,本申请一 实施例提供了一种图像增广方法,该方法包括如下步骤:
步骤101:获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;
三维图像由所述目标对象的二维图像重建获得,是指对于输入的包含目标对象的二维图像,通过调节三维标准模型的组合参数,具体地,可以包括形状表情模型和纹理模型,得到与输入的二维图像的相似度最高的所述目标对象的三维图像;这里,目标对象是指人脸,三维图像可以为人脸模型,对应目标对象为其它物体而言,三维图像相应也可以是其它物体模型。
携带有目标对象的设置关键点标注的三维图像,是指在三维图像中包括人脸关键点的对应顶点,这里,关键点可以是106点人脸关键点,请参阅图4,人脸的106点人脸关键点分别用于表示人脸上的某一特定位置,主要包括眉毛、眼睛、嘴巴或鼻子以及脸部轮廓等,相较于68点关键点,更完整的勾勒出眉毛的上下边缘、轮廓信息以及鼻翼处信息,因此能够更加完整的描述出人脸及其五官的轮廓。
步骤102:获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;
获取所述三维图像旋转设定角度后投影所对应的缺损二维图像是指基于获取的三维图像在三维空间进行旋转,例如,将所述三维图像放置于三维坐标原点上,从三维图像的正面进行投影,让三维图像在三维坐标上旋转,分别以三维坐标上坐标轴X、Y、Z为圆心旋转,从而获得不同设定角度后投影对应的缺损二维图像。
所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标是指建立不同角度下对应的二维缺损图像和对应的三维图像的对应关系,这 里,例如三维图像上某个关键点A在初始位置时的坐标为A(x1,y1,z1),通过旋转一角度α后获取关键点A的三维坐标为A'(x2,y2,z2),从而确定α角度对应的转换坐标。
步骤103:基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;
基于训练后的神经网络对所述缺损二维图像进行特征提取,是指对所述缺损二维图像特征提取,获取每一所述缺损二维图像对应的姿态的角度所对应的设置关键点的转换坐标。
所述缺损二维图像是指三维图像旋转后投影造成纹理空洞的二维图像;与所述缺损二维图像对应的修复图像,是指基于所述训练后的神经网络修复纹理空洞处的关键点坐标而获得的图像。
步骤104:基于所述修复图像得到所述目标对象的增广图像集。
基于所述修复图像得到所述目标对象的增广图像集,是指分别根据每一所述缺损二维图像修复得到的、与所述缺损二维图像对应的修复图像所组成的图像集,即增广图像集。
本申请上述实施例中,通过获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,如此,通过所述目标对象的一张原始二维图像,可以获取大量缺损二维图像;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像,从而可以准确且高效地得到包含有所述目标对象的关键点信息的增广图像集。如此,基于所述修复 图像得到所述目标对象的增广图像集,有效的解决了神经网络中训练数据少、人工标注不准确的问题,可以扩充不同角度的目标对象数据,并且提升了大姿态下人工标注的准确率,从而进一步提升神经网络模型训练的效果。
在一实施方式中,所述获取携带有目标对象的设置关键点标注的三维图像,包括:
获取所述目标对象的二维图像,基于所述二维图像以及三维标准模型包含的关键点映射关系,确定与所述二维图像对应的三维图像及设置关键点的三维坐标。
二维图像是指用于重建获取三维图像所拍摄或者绘制的包含目标对象的原始图片,目标对象这里一般是指人脸,也可以是其他物体。
三维标准模型指由多个表示人脸上的特定位置的像素点构成的模型,例如,参见图5,若一个三维标准模型具有3万个像素点,则该3万个像素点有序排列且每个像素点能够表示人脸上的某一特定位置,例如眼睛、嘴巴或鼻子等;其中,关键点可以是106点人脸关键点,所述三维标准模型可以是使用三维可变人脸模型(3DMM)构建三维人脸模型。
三维可变人脸模型(3DMM)建立在三维人脸数据库的基础上,以人脸形状和人脸纹理统计为约束,同时考虑到了人脸的姿态和光照因素的影响,可以生成高精度的三维人脸模型。3DMM模型数据库人脸数据对象的线性组合,在上面3D人脸表示基础上,假设我们建立3D变形的人脸模型由m个人脸模型组成,其中每一个人脸模型都包含相应的S i和T i两种向量,这样在表示新的3D人脸模型时,参见公式(1)、(2)
Figure PCTCN2020078650-appb-000001
Figure PCTCN2020078650-appb-000002
其中
Figure PCTCN2020078650-appb-000003
表示平均脸部形状模型,S i和e i分别表示形状和表情的主成分分析(Principal Component Analysis,PCA)部分,α i和β i分别表示形状和表情的对应系数;纹理模型同理。这样,一张新的人脸模型就可以由已有的脸部模型线性组合。也就是说,可以通过改变系数,在已有人脸标准模型的基础上生成三维图像。
这里,三维标准模型预先设置有关键点对应的坐标以及对应的索引。基于所述二维图像以及三维标准模型包含的关键点映射关系,是指基于三维标准模型包含的关键点与二维图像中对应的像素点确定三维图像中每一关键点的对应在三维标准模型中的索引值,进一步地,确定输入的二维图像基于三维标准模型对应的三维图像中包含的关键点的对应的三维坐标。
在本申请上述实施方式中,基于二维图像及三维标准模型获取二维图像对应的三维图像集设置关键点的三维坐标,如此,保证了基于三维图像进行旋转投影得到的二维图像包含了所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标。
在一实施方式中,所述获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,包括:
基于所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,确定所述三维图像旋转设定角度后所述目标对象对应的姿态,基于所述姿态确定对应的投影矩阵;
基于所述投影矩阵确定与所述姿态对应的缺损二维图像。
基于所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,确定所述三维图像旋转设定角度后所述目标对象对应的姿态,是指基于三维图像旋转的角度确定对应的坐标偏移值,例如,三维图像上某个关键点A在初始位置时的坐标为A(x1,y1,z1),通过旋转一角度α后获取 关键点A的三维坐标为A'(x2,y2,z2),即确定该对应的转换坐标下对应的姿态为第一姿态。
投影矩阵是指将三维坐标下的坐标转化到二维坐标对应的矩阵。这里,可以通过确定第一姿态对应的投影矩阵确定对应的缺损二维图像以及对应的所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标。
在上述实施方式中,基于确定的所述三维图像旋转设定角度后所述目标对象对应的姿态得到不同的投影矩阵,进而得到与投影矩阵对应的缺损二维图像,如此实现了一张二维图像的增广处理,得到缺损二维图像集。
在一实施方式中,所述获取携带有目标对象的设置关键点标注的三维图像之前,还包括:
获取所述目标对象的原始二维图像作为待增广图像,对所述待增广图像进行处理,得到处理后的所述目标对象的二维图像;其中所述处理包括缩放处理和/或归一化处理。
这里,将获取到目标对象的原始二维图像的作为待增广图像,对该待增广图像进行缩放处理,例如,将该待增广图像缩放至固定大小(例如128*128)。对缩放至固定大小的待增广图像进行归一化处理,例如减均值或除方差,得到处理后的所述目标对象的二维图像。如此,减轻外部环境对图像造成的影响,比如光照、噪声、旋转等。
在一实施方式中,所述神经网络为生成对抗网络,所述生成对抗网络包括生成网络和对抗网络;所述基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像,包括:
将所述缺损二维图像输入训练后的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得生成后的二维修复图像;
将所述生成后的二维修复图像和所述二维图像输入对抗网络,确定所 述生成后的二维修复图像和所述二维图像的判别结果,基于所述判别结果确定与所述缺损二维图像对应的修复图像。
生成对抗网络(GAN,Generative Adversarial Networks)是一种深度学习模型。模型通过框架中(至少)两个模块:生成模型(Generative Model)和判别模型(Discriminative Model)的互相博弈学习产生输出,这里,生成网络对应是指生成对抗网络中的生成模型,对抗网络对应是指生成对抗网络中的对抗模型。
将所述缺损二维图像输入训练后的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得生成后的二维修复图像是指通过生成网络基于所述设置关键点的转换坐标与姿态对应关系生成纹理修复后的图像,即生成后的二维修复图像。
将所述生成后的二维修复图像和所述二维图像输入对抗网络,确定所述生成后的二维修复图像和所述二维图像的判别结果是指基于所述生成后的二维修复图像和所述二维图像输入对抗网络,判断二维修复图像待填补区域对应的关键点是否和二维图像的关键点是否在设置的范围,即基本一致,若是,则确定生成后的二维修复图像为与所述缺损二维图像对应的修复图像。
在上述实施方式中,基于生成网络生成与所述缺损二维图像对应的生成后的二维修复图像,基于对抗网络进行判断,从而获取与所述缺损二维图像对应的修复图像,如此,实现了对三维图像旋转投影造成的纹理空洞的修复。
在一实施方式中,所述获取携带有目标对象的设置的关键点标注的三维图像之前,包括:
基于包含目标对象的二维图像重建获得三维训练图像,所述三维训练图像携带有目标对象的设置关键点标注标签;
获取基于所述三维训练图像旋转不同设定角度后分别投影所对应的多个二维训练图像的二维训练图像集。
这里,二维图像可以是二维人脸图像,基于二维图像获取对应的携带有目标对象的设置关键点标注标签的三维训练图像。
获取基于所述三维训练图像旋转不同设定角度后分别投影所对应的多个二维训练图像的二维训练图像集是指通过对所述三维训练图像按照不同设定个角度进行旋转并投影从而得到多个二维训练图像组成的二维训练图像集,例如,将所述三维训练图像放置于三维坐标原点上,从三维训练图像的正面进行投影,让三维训练图像在三维坐标上旋转,分别以三维坐标上坐标轴X、Y、Z为圆心旋转,从而获得不同设定角度后投影对应的二维训练图像组成的二维训练图像集。
在上述实施方式中,通过二维图像重建的三维训练图像,在基于所述三维训练图像旋转不同设定角度后分别投影得到多个二维训练图像的二维图像训练集,如此,大大的减少了人脸关键点定位任务对标注数据的大量依赖以及训练前的数据准备时间,通过一张二维图像即可自动获取二维图像训练集。
在一实施方式中,所述获取携带有目标对象的设置的关键点标注的三维图像之前,还包括:
将所述二维训练图像输入初始的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得对应的生成后的训练修复图像;
将所述二维训练图像和所述生成后的训练修复图像输入所述对抗网络,确定所述二维图像和所述生成后的训练修复图像的判别结果;
基于所述判别结果对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件,获得训练后的所述生成对抗网络。
这里,损失函数(loss function)也叫代价函数(cost function),是神经网络优化的目标函数,神经网络训练或者优化的过程就是最小化损失函数的过程,损失函数值越小,对应预测的结果和真实结果的值就越接近,在本申请中,损失函数可以包括对抗损失函数和重构损失函数。
确定所述二维训练图像和所述生成后的训练修复图像的判别结果,若不满足设置条件,则对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件,获得训练后的所述生成对抗网络。
这里,对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件是指通过再采样得到的二维训练图像更新生成网络参数,重新获得生成后的训练修复图像,再将所述二维训练图像和所述生成后的训练修复图像输入对抗网络,直至所述设置的损失函数满足收敛条件,获得所述训练后的生成网络和所述训练后的对抗网络。具体地,利用神经网络后向传播算法,迭代更新生成网络和对抗网络各参数的取值,先更新对抗网络的参数,然后通过再采样得到的训练色块更新生成网络的参数,直至所述设置的损失函数满足收敛条件,获得训练后的生成对抗网络。如此,通过交替迭代训练获得训练后的生成网络和训练后的对抗网络,利用训练后的生成对抗网络来修补由于旋转而造成的纹理空洞,减少了人为误差和高人力成本。
在一实施方式中,所述基于所述判别结果对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件之前,还包括:
根据对抗损失函数和重构损失函数的组合,得到所述生成对抗网络对应的损失函数。
这里,损失函数包括两个部分,对抗损失函数和重构损失函数,分别参见公式(3)、(4);
L adv=E x[logD(x)]+E x[log(1-(D(G(x))))]    (3)
L rec=E x[w⊙(x-G(x)) 1]    (4)
其中,L adv为对抗损失函数,L rec为重构损失函数,G为生成网络,D为判别网络,x为输入样本,即旋转后存在纹理空洞的图像,G(x)为基于输入图像x的生成图像,即纹理修补后的图像。对抗损失的目的是使得生成的图像更加的真实自然,重构损失函数中引入了权重系数w,其目的是使得修补后的图像除了填补区域外其他的区域尽可能和原图保持一致,从而保证关键点坐标的正确性。
在另一实施方式中,如图6所示,还提供了一种神经网络训练方法,包括:
步骤201:获得目标对象的增广图像集;根据所述目标对象的二维图像和所述增广图像集形成训练样本集;其中,获取目标对象的增广图像集可以是采用本申请任意实施例所提供的图像增广方法获得的目标对象的增广图像集。
步骤202:将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
这里,基于目标对象的二维图像形成对应的增广图像集作为神经网络的训练样本,以确保能够更快速的获取更多有效的训练样本以实现对神经网络的训练,提升训练后的神经网络的分类准确性,这里,由于增广图像包括了对设置关键点的准确定位,可以将该增广图像集对神经网络进行训练得到训练后的所述神经网络模型,该神经网络模型可用于表情识别、动画合成、直播、美颜、特效相机等应用场景中。
在另一实施方式中,如图7所示,还提供了一种图像增广装置,所述 装置包括:
获取模块31,配置为获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;
投影模块32,配置为获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;
第一处理模块33,配置为基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;
第二处理模块34,配置为基于所述修复图像得到所述目标对象的增广图像集。
在本申请上述实施方式中,通过获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;如此,能够自动化快速地通过一张二维图像获取大量缺损二维图像;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;基于所述修复图像得到所述目标对象的增广图像集。如此,基于所述修复图像得到所述目标对象的增广图像集有效的解决了神经网络中训练数据少、人工标注不准确的问题,可以扩充不同角度的人脸数据,并且提升了大姿态下人工标注的准确率,从而进一步提升神经网络模型训练的效果。
可选地,所述获取模块31还配置为获取所述目标对象的二维图像,基于所述二维图像以及三维标准模型包含的关键点映射关系,确定与所述二 维图像对应的三维图像及设置关键点的三维坐标。
可选地,所述投影模块32还配置为基于所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,确定所述三维图像旋转设定角度后所述目标对象对应的姿态,基于所述姿态确定对应的投影矩阵;基于所述投影矩阵确定与所述姿态对应的缺损二维图像。
可选地,所述获取模块31还配置为获取所述目标对象的原始二维图像作为待增广图像,对所述待增广图像进行处理,得到处理后的所述目标对象的二维图像;其中所述处理包括缩放处理和/或归一化处理。
可选地,所述第一处理模块33还配置为将所述缺损二维图像输入训练后的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得生成后的二维修复图像;将所述生成后的二维修复图像和所述二维图像输入对抗网络,确定所述生成后的二维修复图像和所述二维图像的判别结果,基于所述判别结果确定与所述缺损二维图像对应的修复图像。
可选地,所述获取模块31还配置为基于包含目标对象的二维图像重建获得三维训练图像,所述三维训练图像携带有目标对象的设置关键点标注标签;获取基于所述三维训练图像旋转不同设定角度后分别投影所对应的多个二维训练图像的二维训练图像集。
可选地,所述第一处理模块33还配置为将所述二维训练图像输入初始的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得对应的生成后的二维训练图像;将所述二维图像和所述生成后的二维训练图像输入所述对抗网络,确定所述二维图像和所述生成后的二维训练图像的判别结果;基于所述判别结果对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件,获得训练后的所述生成对抗网络。
可选地,所述第一处理模块33还配置为根据对抗损失函数和重构损失函数的组合,得到所述生成对抗网络对应的损失函数。
在另一实施方式中,如图8所示,还提供了一种神经网络训练装置,所述装置包括:
样本生成模块41,配置为采用如本申请任一实施例所提供的的图像增广方法获得目标对象的增广图像集,根据所述目标对象的二维图像和所述增广图像集形成训练样本集;
训练模块42,配置为将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
在另一实施方式中,如图9所示,还提供了一种计算机设备,包括:至少一个处理器210和配置为存储能够在处理器210上运行的计算机程序的存储器211;其中,图9中示意的处理器210并非配置为指代处理器的个数为一个,而是仅配置为指代处理器相对其他器件的位置关系,在实际应用中,处理器的个数可以为一个或多个;同样,图9中示意的存储器211也是同样的含义,即仅配置为指代存储器相对其他器件的位置关系,在实际应用中,存储器的个数可以为一个或多个。
其中,所述处理器210配置为运行所述计算机程序时,执行如下步骤:
获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;基于所述修复图像得到所述目标对象的增广图像集。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
获取所述目标对象的二维图像,基于所述二维图像以及三维标准模型包含的关键点映射关系,确定与所述二维图像对应的三维图像及设置关键点的三维坐标。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
基于所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,确定所述三维图像旋转设定角度后所述目标对象对应的姿态,基于所述姿态确定对应的投影矩阵;基于所述投影矩阵确定与所述姿态对应的缺损二维图像。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
获取所述目标对象的原始二维图像作为待增广图像,对所述待增广图像进行处理,得到处理后的所述目标对象的二维图像;其中所述处理包括缩放处理和/或归一化处理。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
将所述缺损二维图像输入训练后的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得生成后的二维修复图像;将所述生成后的二维修复图像和所述二维图像输入对抗网络,确定所述生成后的二维修复图像和所述二维图像的判别结果,基于所述判别结果确定与所述缺损二维图像对应的修复图像。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
基于包含目标对象的二维图像重建获得三维训练图像,所述三维训练图像携带有目标对象的设置关键点标注标签;获取基于所述三维训练图像旋转不同设定角度后分别投影所对应的多个二维训练图像的二维训练图像集。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
将所述二维训练图像输入初始的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得对应的生成后的训练修复图像;将所述二维训练图像和所述生成后的训练修复图像输入所述对抗网络,确定所述二维训练图像和所述生成后的训练修复图像的判别结果;基于所述判别结果对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件,获得训练后的所述生成对抗网络。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
根据对抗损失函数和重构损失函数的组合,得到所述生成对抗网络对应的损失函数。
在一个可选的实施例中,所述处理器210还配置为运行所述计算机程序时,执行如下步骤:
获得目标对象的增广图像集,根据所述目标对象的二维图像和所述增广图像集形成训练样本集;
将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
该计算机设备还可以包括:至少一个网络接口212。发送端中的各个组件通过总线系统213耦合在一起。可理解,总线系统213配置为实现这些组件之间的连接通信。总线系统213除包括数据总线之外,还包括电源总 线、控制总线和状态信号总线。但是为了清楚说明起见,在图5中将各种总线都标为总线系统213。
其中,存储器211可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器211旨在包括但不限于这些和任意其它适合类型的存储器。
本申请实施例中的存储器211用于存储各种类型的数据以支持发送端 的操作。这些数据的示例包括:用于在发送端上操作的任何计算机程序,如操作系统和应用程序。其中,操作系统包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序可以包含各种应用程序,用于实现各种应用业务。这里,实现本申请实施例方法的程序可以包含在应用程序中。
本实施例还提供了一种计算机存储介质,例如包括存储有计算机程序的存储器211,上述计算机程序可由发送端中的处理器210执行,以完成前述方法所述步骤。计算机存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备,如智能手机、平板电脑、笔记本电脑等。一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程被处理器运行时,执行如下步骤:
其中,所述处理器210用于运行所述计算机程序时,执行如下步骤:
获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;基于所述修复图像得到所述目标对象的增广图像集。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
获取所述目标对象的二维图像,基于所述二维图像以及三维标准模型包含的关键点映射关系,确定与所述二维图像对应的三维图像及设置关键点的三维坐标。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
基于所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,确定所述三维图像旋转设定角度后所述目标对象对应的姿态,基于所述姿态确定对应的投影矩阵;基于所述投影矩阵确定与所述姿态对应的缺损二维图像。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
获取所述目标对象的原始二维图像作为待增广图像,对所述待增广图像进行处理,得到处理后的所述目标对象的二维图像;其中所述处理包括缩放处理和/或归一化处理。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
将所述缺损二维图像输入训练后的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得生成后的二维修复图像;将所述生成后的二维修复图像和所述二维图像输入对抗网络,确定所述生成后的二维修复图像和所述二维图像的判别结果,基于所述判别结果确定与所述缺损二维图像对应的修复图像。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
基于包含目标对象的二维图像重建获得三维训练图像,所述三维训练图像携带有目标对象的设置关键点标注标签;获取基于所述三维训练图像旋转不同设定角度后分别投影所对应的多个二维训练图像的二维训练图像集。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如 下步骤:
将所述二维训练图像输入初始的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得对应的生成后的二维训练图像;将所述二维训练图像和所述生成后的训练修复图像输入所述对抗网络,确定所述二维训练图像和所述生成后的训练修复图像的判别结果;基于所述判别结果对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件,获得训练后的所述生成对抗网络。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
根据对抗损失函数和重构损失函数的组合,得到所述生成对抗网络对应的损失函数。
在一个可选的实施例中,所述计算机程序被处理器运行时,还执行如下步骤:
获得目标对象的增广图像集,根据所述目标对象的二维图像和所述增广图像集形成训练样本集;
将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
请参阅图6,以3DMM作为人脸重建方法、神经网络为条件对抗生成网络为例,以一个更详尽的例子对本申请实施例的图像增广方法作进一步详细的说明。该图像增广方法包括如下步骤:
S11:获取二维图像;
这里,获取二维图像是获取包含人脸图像的二维图像;
S12:生成三维图像;
这里,生成三维图像是指基于所述二维图像以及三维标准模型包含的关键点映射关系,确定与所述二维图像对应的三维图像及设置关键点的三 维坐标。
S13:106点关键点采样;
这里106点关键点采样是基于三维图像获取设置关键点的三维坐标。
这里,获取三维图像的106点关键点后分别执行步骤S14和步骤S16;
S14:对三维图像进行投影;
这里,对三维图像进行投影是指基于三维图像获取对应的二维图像;
S15:获取二维图像106点关键点坐标;
这里,获取二维图像106点关键点坐标是指基于设置关键点的三维坐标和对应的投影矩阵确定对应的二维图像106点关键点坐标;这里的二维图像即原始图像对应的106点关键点坐标;
S16:对三维图像进行旋转;
这里,对三维图像进行旋转是指对三维图像按照设定角度进行旋转,得到对应的不同的姿态;
S17:对旋转设定角度后的三维图像进行投影;
这里,对旋转设定角度后的三维图像进行投影是指基于三维图像按设定角度旋转的不同的姿态投影得到对应的二维缺失图像;例如,将所述三维图像放置于三维坐标原点上,从三维图像的正面进行投影,让三维图像在三维坐标上旋转,分别以三维坐标上坐标轴X、Y、Z为圆心旋转,从而获得不同设定角度后投影对应的缺损二维图像。
这里,对旋转设定个角度后的三维图像进行投影后分别执行步骤S18和步骤S20;
S18:基于训练后的生成对抗网络修复图像;
这里,基于训练后的生成对抗网络修复图像是指基于训练后的生成对抗网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图 像对应的修复图像;
S19:获取旋转修复后的增广图像集;
这里,获取旋转修复后的增广图像集是指基于所述修复图像得到所述目标对象的增广图像集是指将每一所述缺损二维图像修复得到大量与所述缺损二维图像对应的修复图像所组成的图像集,即增广图像集。
S20:旋转增广图像的106点关键点坐标
这里,旋转增广图像的106点关键点坐标是指基于步骤S17获得的缺损二维图像分别获得对应的106点关键点坐标。
相对于现有技术,本申请上述实施例至少解决了一下问题:一方面可以减少人脸关键点定位中对标注数据的大量依赖以及训练前的数据准备时间,收集海量训练数据进行模型的训练;另一方面,有效的解决了大姿态下训练数据少,人工标注不准确的问题,可以扩充不同角度的人脸数据,并且提升了大姿态下人工标注的准确率,从而进一步提升模型训练的效果。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。
工业实用性
在本申请实施例中,首先,获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;其次,获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;再次,基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;最后,基于所述修复图像得到所述目标对象的增广图像集。

Claims (13)

  1. 一种图像增广方法,包括:
    获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;
    获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;
    基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;
    基于所述修复图像得到所述目标对象的增广图像集。
  2. 如权利要求1所述的图像增广方法,所述获取携带有目标对象的设置关键点标注的三维图像,包括:
    获取所述目标对象的二维图像,基于所述二维图像以及三维标准模型包含的关键点映射关系,确定与所述二维图像对应的三维图像及设置关键点的三维坐标。
  3. 如权利要求1所述的图像增广方法,所述获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,包括:
    基于所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标,确定所述三维图像旋转设定角度后所述目标对象对应的姿态,基于所述姿态确定对应的投影矩阵;
    基于所述投影矩阵确定与所述姿态对应的缺损二维图像。
  4. 如权利要求1所述的图像增广方法,所述获取携带有目标对象的设置关键点标注的三维图像之前,还包括:
    获取所述目标对象的原始二维图像作为待增广图像,对所述待增广图像进行处理,得到处理后的所述目标对象的二维图像;其中所述处理包括缩放处理和/或归一化处理。
  5. 如权利要求1所述的图像增广方法,所述神经网络为生成对抗网络,所述生成对抗网络包括生成网络和对抗网络;所述基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像,包括:
    将所述缺损二维图像输入训练后的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得生成后的二维修复图像;
    将所述生成后的二维修复图像和所述二维图像输入对抗网络,确定所述生成后的二维修复图像和所述二维图像的判别结果,基于所述判别结果确定与所述缺损二维图像对应的修复图像。
  6. 如权利要求5所述的图像增广方法,所述获取携带有目标对象的设置的关键点标注的三维图像之前,包括:
    基于包含目标对象的二维图像重建获得三维训练图像,所述三维训练图像携带有目标对象的设置关键点标注标签;
    获取基于所述三维训练图像旋转不同设定角度后分别投影所对应的多个二维训练图像的二维训练图像集。
  7. 如权利要求6所述的图像增广方法,所述获取携带有目标对象的设置的关键点标注的三维图像之前,还包括:
    将所述二维训练图像输入初始的生成对抗网络,通过生成网络基于所述设置关键点的转换坐标与姿态对应关系获得对应的生成后的训练修复图像;
    将所述二维训练图像和所述训练修复图像输入所述对抗网络,确定所 述二维训练图像和所述训练修复图像的判别结果;
    基于所述判别结果对所述生成对抗网络进行单独交替迭代,直至设置的损失函数满足收敛条件,获得训练后的所述生成对抗网络。
  8. 如权利要求7所述的图像增广方法,所述基于所述判别结果对所述生成对抗网络进行单独交替迭代直至设置的损失函数满足收敛条件之前,还包括:
    根据对抗损失函数和重构损失函数的组合,得到所述生成对抗网络对应的损失函数。
  9. 一种神经网络训练方法,包括:
    采用如权利要求1至8中任一项所述的图像增广方法获得目标对象的增广图像集,根据所述目标对象的二维图像和所述增广图像集形成训练样本集;
    将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
  10. 一种图像增广装置,所述装置包括:
    获取模块,配置为获取携带有目标对象的设置关键点标注的三维图像,其中所述三维图像由所述目标对象的二维图像重建获得;
    投影模块,配置为获取所述三维图像旋转设定角度后投影所对应的缺损二维图像,所述缺损二维图像包括所述目标对象的所述设置关键点的与所述设定角度对应的转换坐标;
    第一处理模块,配置为基于训练后的神经网络对所述缺损二维图像进行特征提取,基于所述设置关键点的转换坐标与姿态对应关系对所述缺损二维图像进行修复,得到与所述缺损二维图像对应的修复图像;
    第二处理模块,配置为基于所述修复图像得到所述目标对象的增广图像集。
  11. 一种神经网络训练装置,包括:
    样本生成模块,配置为采用如权利要求1至8中任一项所述的图像增广方法获得目标对象的增广图像集,根据所述目标对象的二维图像和所述增广图像集形成训练样本集;
    训练模块,配置为将所述训练样本集输入神经网络模型进行训练,直至所述神经网络模型收敛,得到训练后的所述神经网络模型。
  12. 一种计算机设备,包括:处理器和配置为存储能够在处理器上运行的计算机程序的存储器;
    其中,所述处理器配置为运行所述计算机程序时,实现权利要求1至8任一项所述的图像增广方法、或实现权利要求9所述的神经网络训练方法。
  13. 一种计算机存储介质,所述计算机存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至8中任一项所述图像增广方法、或实现权利要求9所述的神经网络训练方法。
PCT/CN2020/078650 2019-04-09 2020-03-10 图像增广与神经网络训练方法、装置、设备及存储介质 WO2020207177A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910282291.0 2019-04-09
CN201910282291.0A CN111797264A (zh) 2019-04-09 2019-04-09 图像增广与神经网络训练方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020207177A1 true WO2020207177A1 (zh) 2020-10-15

Family

ID=72750852

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078650 WO2020207177A1 (zh) 2019-04-09 2020-03-10 图像增广与神经网络训练方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111797264A (zh)
WO (1) WO2020207177A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11869257B2 (en) 2021-03-19 2024-01-09 International Business Machines Corporation AR-based labeling tool for 3D object detection model training

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308781A (zh) * 2020-11-23 2021-02-02 中国科学院深圳先进技术研究院 一种基于深度学习的单幅图像三维超分辨率重构方法
CN112509091B (zh) * 2020-12-10 2023-11-14 上海联影医疗科技股份有限公司 医学图像重建方法、装置、设备及介质
CN112508821B (zh) * 2020-12-21 2023-02-24 南阳师范学院 一种基于定向回归损失函数的立体视觉虚拟图像空洞填补方法
CN114120414B (zh) * 2021-11-29 2022-11-01 北京百度网讯科技有限公司 图像处理方法、装置、电子设备和介质
CN114663810B (zh) * 2022-03-21 2023-11-10 中国电信股份有限公司 基于多模态的物体图像增广方法及装置、设备及存储介质
CN116959052A (zh) * 2022-04-15 2023-10-27 华为技术有限公司 面部特征检测方法、可读介质和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767376A (zh) * 2017-11-02 2018-03-06 西安邮电大学 基于深度学习的x线片骨龄预测方法及系统
CN108074244A (zh) * 2017-09-07 2018-05-25 汉鼎宇佑互联网股份有限公司 一种融合深度学习与背景差法的平安城市车流统计方法
CN108520202A (zh) * 2018-03-15 2018-09-11 华南理工大学 基于变分球面投影的对抗鲁棒性图像特征提取方法
CN109448083A (zh) * 2018-09-29 2019-03-08 浙江大学 一种从单幅图像生成人脸动画的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074244A (zh) * 2017-09-07 2018-05-25 汉鼎宇佑互联网股份有限公司 一种融合深度学习与背景差法的平安城市车流统计方法
CN107767376A (zh) * 2017-11-02 2018-03-06 西安邮电大学 基于深度学习的x线片骨龄预测方法及系统
CN108520202A (zh) * 2018-03-15 2018-09-11 华南理工大学 基于变分球面投影的对抗鲁棒性图像特征提取方法
CN109448083A (zh) * 2018-09-29 2019-03-08 浙江大学 一种从单幅图像生成人脸动画的方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11869257B2 (en) 2021-03-19 2024-01-09 International Business Machines Corporation AR-based labeling tool for 3D object detection model training

Also Published As

Publication number Publication date
CN111797264A (zh) 2020-10-20

Similar Documents

Publication Publication Date Title
WO2020207177A1 (zh) 图像增广与神经网络训练方法、装置、设备及存储介质
WO2020042720A1 (zh) 一种人体三维模型重建方法、装置和存储介质
JP7203954B2 (ja) 顔姿勢推定/3次元顔再構築方法、装置、及び電子デバイス
CA2423212C (en) Apparatus and method for generating a three-dimensional representation from a two-dimensional image
JP7040278B2 (ja) 顔認識のための画像処理装置の訓練方法及び訓練装置
US8818131B2 (en) Methods and apparatus for facial feature replacement
KR101007276B1 (ko) 3차원 안면 인식
CN113011401B (zh) 人脸图像姿态估计和校正方法、系统、介质及电子设备
WO2023001095A1 (zh) 人脸关键点的插值方法、装置、计算机设备和存储介质
CN114022542A (zh) 一种基于三维重建的3d数据库制作方法
CN114359453A (zh) 一种三维特效渲染方法、装置、存储介质及设备
CN114913552A (zh) 一种基于单视角点云序列的三维人体稠密对应估计方法
Kao et al. Towards 3d face reconstruction in perspective projection: Estimating 6dof face pose from monocular image
Yin et al. [Retracted] Virtual Reconstruction Method of Regional 3D Image Based on Visual Transmission Effect
CN114494576A (zh) 一种基于隐函数的快速高精度多视点人脸三维重建方法
CN112967329A (zh) 图像数据优化方法、装置、电子设备及存储介质
CN113379890A (zh) 一种基于单张照片的人物浅浮雕模型生成方法
CN112613457A (zh) 图像采集方式检测方法、装置、计算机设备和存储介质
Xiao et al. Effective Key Region-Guided Face Detail Optimization Algorithm for 3D Face Reconstruction
Peng et al. Geometrical consistency modeling on b-spline parameter domain for 3d face reconstruction from limited number of wild images
WO2023273272A1 (zh) 目标位姿估计方法、装置、计算设备、存储介质及计算机程序
CN116503524B (zh) 一种虚拟形象的生成方法、系统、装置及存储介质
Li et al. Model-based 3d registration optimization based on graphics simulation method
Lanitis et al. Reconstructing 3d faces in cultural heritage applications
WO2024028743A1 (en) Modelling method for making a virtual model of a user's head

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20787740

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.02.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20787740

Country of ref document: EP

Kind code of ref document: A1