CN114972912A - Sample generation and model training method, device, equipment and storage medium - Google Patents

Sample generation and model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN114972912A
CN114972912A CN202210575099.2A CN202210575099A CN114972912A CN 114972912 A CN114972912 A CN 114972912A CN 202210575099 A CN202210575099 A CN 202210575099A CN 114972912 A CN114972912 A CN 114972912A
Authority
CN
China
Prior art keywords
image
preset
face
sample image
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210575099.2A
Other languages
Chinese (zh)
Inventor
王海波
朱烽
赵瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shangtang Artificial Intelligence Research Center Shenzhen Co ltd
Original Assignee
Shangtang Artificial Intelligence Research Center Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shangtang Artificial Intelligence Research Center Shenzhen Co ltd filed Critical Shangtang Artificial Intelligence Research Center Shenzhen Co ltd
Priority to CN202210575099.2A priority Critical patent/CN114972912A/en
Publication of CN114972912A publication Critical patent/CN114972912A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the disclosure discloses a sample generation and model training method, a device, equipment and a storage medium, wherein the sample generation method comprises the following steps: acquiring preset object attribute information; generating a synthetic face image of the virtual object according to preset object attribute information; carrying out data processing on the synthesized face image to obtain a target sample image; the data processing comprises the following steps: at least one of a style migration process and a data cleansing process of the real image.

Description

Sample generation and model training method, device, equipment and storage medium
Technical Field
The present disclosure relates to, but not limited to, the field of computer vision technologies, and in particular, to a method, an apparatus, a device, and a storage medium for sample generation and model training.
Background
At present, a face recognition data set is usually collected from a network image, and detailed annotations of attributes (such as pose, age and the like) are lacked, and in addition, face data in enough scenes are often difficult to acquire due to accumulation of enterprises. Therefore, the problem of insufficient training data of the subsequent face recognition model can be greatly solved by utilizing the generated model to generate the face data. The data generation method can generate human face data of virtual identity with specific requirements, such as large-posture human face data of a side face image, a half side face image and the like, and can also provide richer marking information (such as identity information, rotation degree and the like) by decoupling and modeling the human face, and has important significance for improving the precision of a subsequent human face recognition task. However, in the related art, the quality of the generated large-pose face data is poor, so that the accuracy of the face recognition model obtained by training the large-pose face data with poor quality is generally not high.
Disclosure of Invention
In view of the above, the embodiments of the present disclosure provide at least a method, an apparatus, a device, and a storage medium for generating a sample and training a model.
The technical scheme of the embodiment of the disclosure is realized as follows:
in one aspect, an embodiment of the present disclosure provides a sample generation method, where the method includes:
acquiring preset object attribute information;
generating a synthetic face image of the virtual object according to the preset object attribute information;
carrying out data processing on the synthesized face image to obtain a target sample image; the data processing comprises: at least one of a style migration process and a data cleansing process of the real image.
In another aspect, an embodiment of the present disclosure provides a model training method, where the method includes:
acquiring a target sample image; the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image; the synthetic face image is a face image of a virtual object generated according to preset object attribute information;
and training an initial face recognition model at least according to the target sample image to obtain a face recognition model.
In another aspect, an embodiment of the present disclosure provides a face recognition method, where the method includes:
acquiring a face image to be recognized; the face image to be recognized is a face image with any posture;
and identifying the face image to be identified through the face identification model to obtain an identification result.
In yet another aspect, embodiments of the present disclosure provide a sample generation apparatus, the apparatus including:
the information acquisition module is used for acquiring preset object attribute information;
the image generation module is used for generating a synthetic face image of the virtual object according to the preset object attribute information;
the data processing module is used for carrying out data processing on the synthesized face image to obtain a target sample image; the data processing comprises: at least one of a style migration process and a data cleansing process of the real image.
In yet another aspect, an embodiment of the present disclosure provides a model training apparatus, including:
the sample acquisition module is used for acquiring a target sample image; the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image; the synthetic face image is a face image of a virtual object generated according to preset object attribute information;
and the training module is used for training the initial face recognition model at least according to the target sample image to obtain the face recognition model.
In another aspect, an embodiment of the present disclosure provides a face recognition apparatus, where the apparatus includes:
the image acquisition module is used for acquiring a face image to be recognized; the face image to be recognized is a face image with any posture;
and the recognition module is used for recognizing the face image to be recognized through the face recognition model to obtain a recognition result.
In yet another aspect, the present disclosure provides a computer device, including a memory and a processor, where the memory stores a computer program executable on the processor, and the processor implements some or all of the steps of the above method when executing the program.
In yet another aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements some or all of the steps of the above-described method.
In yet another aspect, the disclosed embodiments provide a computer program comprising computer readable code, which when run in a computer device, a processor in the computer device executes some or all of the steps for implementing the above method.
In yet another aspect, the disclosed embodiments provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, which when read and executed by a computer, implements some or all of the steps of the above method.
In the embodiment of the disclosure, a synthetic face image of a virtual object is generated by acquiring preset object attribute information and according to the acquired preset object attribute information, and then at least one of style migration processing and data cleaning processing of a real image is performed on the synthetic face image to obtain a target sample image; thus, by presetting the object attribute information, a synthetic face image with specific object attribute characteristics can be generated; by performing at least one of the style migration processing and the data cleaning processing of the real image on the generated specific synthetic face image, the specific synthetic face image with higher quality can be obtained, and the specific synthetic face image with higher quality is used as a target sample image for subsequently training a relevant model, so that the quality of the obtained target sample image is improved.
In the embodiment of the disclosure, a target sample image is obtained, wherein the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image, the synthesized face image is a face image of a virtual object generated according to preset object attribute information, and then an initial face recognition model is trained at least according to the target sample image to obtain a face recognition model; in this way, since the target sample image is an image obtained by performing at least one of the style migration processing and the data cleaning processing of the real image on the generated specific synthesized face image, the quality of the target sample image is high, and further, the accuracy of the face recognition model trained using the high-quality target sample image is also high, thereby improving the recognition accuracy of the trained face recognition model.
In the embodiment of the disclosure, a face image to be recognized in any posture is obtained, and the face image to be recognized is recognized through the face recognition model, so that a recognition result is obtained; in this way, the obtained face recognition model has high precision, so that the recognition accuracy can be improved when the face recognition model is adopted to recognize large-posture face images such as side face images and half side face images.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the technical aspects of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic diagram illustrating an exemplary effect of performing a style migration process on an image according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart illustrating an implementation of a sample generation method according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart illustrating an implementation of a sample generation method according to an embodiment of the present disclosure;
fig. 4 is a schematic flow chart illustrating an implementation of a sample generation method according to an embodiment of the present disclosure;
fig. 5 is a schematic flow chart illustrating an implementation of a sample generation method according to an embodiment of the present disclosure;
fig. 6 is a schematic flow chart illustrating an implementation of a sample generation method according to an embodiment of the present disclosure;
fig. 7 is a schematic flow chart illustrating an implementation of a model training method according to an embodiment of the present disclosure;
fig. 8 is a schematic flow chart illustrating an implementation of a model training method according to an embodiment of the present disclosure;
fig. 9 is a schematic view of an implementation flow of a face recognition method according to an embodiment of the present disclosure;
fig. 10 is a schematic flow chart illustrating an implementation of a sample generation method according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a sample generation apparatus according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;
fig. 13 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present disclosure;
fig. 14 is a hardware entity diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
For the purpose of making the purpose, technical solutions and advantages of the present disclosure clearer, the technical solutions of the present disclosure are further elaborated with reference to the drawings and the embodiments, the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Reference to the terms "first/second/third" merely distinguishes similar objects and does not denote a particular ordering with respect to the objects, it being understood that "first/second/third" may, where permissible, be interchanged in a particular order or sequence so that embodiments of the disclosure described herein can be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing the disclosure only and is not intended to be limiting of the disclosure.
Before further detailed description of the embodiments of the present disclosure, terms and expressions referred to in the embodiments of the present disclosure will be described, and the terms and expressions referred to in the embodiments of the present disclosure will be used for the following explanation.
1) Style migration model: for converting image data from one style to another, it is exemplary that the real image data and the composite image data belong to two different styles of image data. For example, as shown in fig. 1, the style migration model may migrate the oil painting style of the oil painting image b to the real image a of the cat to obtain the image c of the cat in the oil painting style.
2) Large-pose face image: refer to the facial images of non-frontal faces such as the lateral face and the half lateral face.
In order to better understand the face recognition and image generation method provided by the embodiment of the present disclosure, a related method adopted in the related art is described below.
In the related art, some methods synchronously train a generation confrontation model to generate a front face image corresponding to a side face on line when training a face recognition model, so that more front face-side face pairing data participate in the training of the face recognition model. In addition, the method only aims at the improvement of face recognition of two postures of the front face and the side face, and is difficult to generalize to a multi-posture scene, namely the recognition scene is relatively limited; in addition, the confrontation model is synchronously trained and generated in the training process of the face recognition model, and the quality of the generated image and the face recognition precision cannot be considered at the same time, so that the generated confrontation model is low in precision, and the quality of the synthesized face image is poor. Other methods synchronously train a human face three-dimensional model and a human face recognition model, wherein a human face image is generated through the human face three-dimensional model, and then the generated human face is input into the human face recognition model for model training. In the method, because the three-dimensional model of the face is not trained independently, the quality of the generated image and the face recognition precision can not be considered simultaneously, so that the precision of the three-dimensional model of the face is not high, and the quality of the synthesized face image is poor; in addition, the face recognition model only contacts the synthesized data in the training process, so that the model is only suitable for recognizing the synthesized face image, and the recognition capability of the real face image is poor.
The embodiment of the disclosure provides a sample generation method, a model training method and a face recognition method, which can be executed by a processor of computer equipment. The computer equipment can be equipment with sample generation, model training and face recognition capabilities, such as a notebook computer, a tablet computer, a desktop computer, a server and the like; and, it can also be implemented as a server, wherein the server can be an independent physical server, a server cluster or distributed system composed of a plurality of physical servers, and the like.
Fig. 2 is a schematic flow chart of an implementation of a sample generation method provided in the embodiment of the present disclosure, and as shown in fig. 2, the method includes S101 to S103.
S101, acquiring preset object attribute information.
In some embodiments, the preset object attribute information may be a preset parameter, where the preset parameter is used to generate a preset object attribute feature; for example, the preset object property features may be one or more different preset identity features, one or more different preset appearance features, and one or more different preset gestures.
In some embodiments, the preset object property information may be preset object property features, for example, one or more different preset identity features, one or more different preset appearance features, and one or more different preset gestures.
Here, the pose refers to a pose of a human face, for example, a side face, a front face, or a half side face, and the like.
In some embodiments, the appearance information includes: texture information, expression information, and noise information. For example, in the case where the object is a person, the texture information may represent information relating to the face of the person to colors, such as skin color, hair color, pupil color, and the like; the expression information may characterize the facial expression of the person, e.g., happy, calm, angry, etc.; the noise information may represent information of the face of the person other than the texture information and the expression information, for example, the hairstyle of the person, whether or not there is a mustache, and the like.
And S102, generating a synthetic face image of the virtual object according to the preset object attribute information.
In some embodiments, when the preset object attribute information is a preset parameter, the computer device generates a corresponding preset object attribute feature according to the preset parameter, and then generates a synthetic face image of one or more different object attribute features corresponding to at least one virtual object according to the generated preset object attribute feature. For example, facial images of three different facial poses of 3 different virtual objects are generated, each virtual object being under the same expression, the same noise information and the same texture information.
In some embodiments, when the preset object attribute information is the preset object attribute feature, the computer device directly generates a synthetic face image of one or more different object attribute features corresponding to the at least one virtual object according to the preset object attribute feature.
Here, the virtual object may be an object that does not actually exist, for example, a virtual character having a virtual identity.
S103, performing data processing on the synthetic face image to obtain a target sample image; the data processing comprises the following steps: at least one of a style migration process and a data cleansing process of the real image.
Here, the initial face recognition model may be Resnet50 or Resnet18, and the like, which is not limited by the embodiment of the present disclosure.
In some embodiments, the target sample image may be a synthetic face image obtained by performing a style migration process or a data cleansing process on a real image of the synthetic face image. In some embodiments, the target sample image may be a synthetic face image obtained by performing style migration processing and data cleaning processing on a synthetic face image, and the processing sequence of the style migration processing and the data cleaning processing of the real image is not limited in the embodiments of the present disclosure.
Here, the style migration processing of the real image is for converting the style of the synthesized face image into the style of the real image. And data cleaning processing is used for screening out the synthesized face image with poor quality in the synthesized face image.
In the embodiment of the disclosure, a synthetic face image of a virtual object is generated by acquiring preset object attribute information and according to the acquired preset object attribute information, and then at least one of style migration processing and data cleaning processing of a real image is performed on the synthetic face image to obtain a target sample image; thus, by presetting the object attribute information, a synthetic face image with specific object attribute characteristics can be generated; by performing at least one of the style migration processing and the data cleaning processing of the real image on the generated specific synthetic face image, the specific synthetic face image with higher quality can be obtained, and the specific synthetic face image with higher quality is used as a target sample image for subsequently training a relevant model, so that the quality of the obtained target sample image is improved.
In some embodiments, as shown in fig. 3, S102 may be implemented by S1021 to S1022, which will be described with reference to fig. 3 as an example.
S1021, generating preset object attribute characteristics according to preset parameters; the preset object attribute features include: the virtual object detection method comprises the following steps of representing at least one different preset identity characteristic of different virtual objects, at least one different preset prior gesture for representing different prior gestures, and at least one different preset appearance characteristic for representing different appearance information; the preset object attribute information includes: and presetting parameters.
In some embodiments, the preset parameters may include a first preset expected value, a first preset standard deviation, a second preset expected value, a second preset standard deviation and a preset prior posture parameter, so that the computer device may generate one or more normally distributed multiple different preset identity features for characterizing different virtual objects according to the first preset expected value and the first preset standard deviation; one or more normally distributed multiple different preset apparent characteristics used for representing different apparent information can be generated according to a second preset expected value and a second preset standard deviation; and generating one or more different preset prior postures used for representing the different prior postures according to the preset prior posture parameters. The first preset expected value and the second preset expected value may be the same or different, and the embodiment of the disclosure does not limit this; the first preset standard deviation and the second preset standard deviation may be the same or different, and the embodiment of the present disclosure does not limit this.
In some embodiments, each preset identity feature is a 256-dimensional vector, each preset pose feature is a set of deflection angles, each preset expression feature is a 64-dimensional vector, and each preset texture feature is a 199-dimensional vector.
And S1022, generating a synthesized face image according to the preset identity characteristic, the preset prior posture and the preset apparent characteristic.
Illustratively, in the presence of one preset identity feature for characterizing the virtual object i, three preset prior poses for characterizing three different prior face poses, and apparent features for characterizing one facial expression, the computer device may generate three combined face images corresponding to the virtual object i, wherein the three combined face images have the same facial expression but different facial poses, and are the three different prior face poses.
In the embodiment of the disclosure, the attribute characteristics of the preset object are generated according to the preset parameters; the preset object attribute characteristics comprise: the virtual object detection method comprises the following steps of representing at least one different preset identity characteristic of different virtual objects, at least one different preset prior gesture for representing different prior gestures, and at least one different preset appearance characteristic for representing different appearance information; generating a face image of a virtual object corresponding to each preset identity characteristic under each prior attitude and each apparent information according to the preset identity characteristic, the preset prior attitude and the preset apparent characteristic; therefore, the posture, identity and apparent information of the generated synthetic face image are controllable and accurate, so that the generated synthetic face image has high quality; in addition, because the pose, identity and appearance information of the generated synthetic face image are known, the problem of lack of annotation data of the face image is alleviated.
In some embodiments, the preset parameters include: a first prior deflection angle range, a second prior deflection angle range, and a third prior deflection angle range; based on this, as shown in fig. 4, S1021 can be realized through S201 to S202, and the description will be given by taking fig. 4 as an example.
S201, generating at least one group of deflection angles according to a first prior deflection angle range, a second prior deflection angle range and a third prior deflection angle range; a first angle value in each set of deflection angles belongs to a first a priori deflection angle range, a second angle value belongs to a second a priori deflection angle range, and a third angle value belongs to a third a priori deflection angle range; the preset parameters include: a first a priori deflection angle range, a second a priori deflection angle range, and a third a priori deflection angle range.
S202, taking at least one group of deflection angles as at least one different preset prior attitude.
In the disclosed embodiment, a first a priori yaw angle range may characterize a priori pitch angle (pitch) range, a second a priori yaw angle range may characterize a priori yaw angle (yaw) range, and a third a priori yaw angle range may characterize a priori roll angle (roll) range; wherein the face pose of the synthetic face image generated from these a priori deflection angle ranges is controllable and accurate.
For example, a first a priori yaw angle range may be pitch [ -30, +40], a second a priori yaw angle range may be yaw [ -60, +70], and a third a priori yaw angle range may be roll ═ 0;
here, the computer device may generate one or more sets of deflection angles by the three a priori deflection angle ranges, and the three angle values contained in each set of deflection angles belong to the three a priori deflection angle ranges, respectively.
In the embodiment of the disclosure, the inventor finds that the face pose of the synthesized face image generated in the deflection angle range is controllable and accurate through experimental observation, so that the deflection angle range is used as a priori deflection angle range, one or more groups of deflection angles are generated through the priori deflection angle range, the one or more groups of deflection angles are used as preset priori poses to generate the synthesized face image, the synthesized face image with accurate and controllable face pose can be obtained, and the quality of the generated synthesized face image is improved.
In some embodiments, S1021 may be implemented by S301 to S302:
s301, randomly sampling at least one different preset identity characteristic to obtain a sampling identity characteristic, randomly sampling at least one different preset prior attitude to obtain a sampling attitude, and randomly sampling at least one different preset apparent characteristic to obtain a sampling apparent characteristic.
In some embodiments, the computer device may randomly sample the generated preset identity features, sample one or more preset identity features, and use the sampled one or more preset identity features as the sampled identity features; the generated preset prior postures can be randomly sampled, one or more preset prior postures are sampled, and the sampled one or more preset prior postures are used as sampling postures; and randomly sampling the generated preset apparent features, sampling one or more preset apparent features, and taking the sampled one or more preset apparent features as the sampling apparent features.
For example, the computer device may sample a sampling identity feature from the generated preset identity feature, sample a sampling posture from the generated preset prior posture, and sample a sampling appearance feature from the generated preset appearance feature before generating the synthetic face image each time; and then, generating a synthetic face image according to the obtained sampling identity characteristics, sampling postures and sampling apparent characteristics until the generated synthetic face image meets the preset image quantity, or under the condition that the quantity of virtual objects corresponding to the generated synthetic face image meets the preset object quantity, no random sampling is performed.
S302, generating a model through pre-trained data, and generating a synthetic face image of the virtual object corresponding to each sampling identity characteristic under each prior posture and each apparent information according to the sampling identity characteristic, the sampling posture and the sampling apparent characteristic.
In some embodiments, in the case that a plurality of sampling identity features, a plurality of sampling attitudes and a plurality of sampling appearance features are sampled each time, the computer device may input one sampling identity feature, one sampling attitude and one sampling appearance feature each time when generating the synthetic face image, so as to generate the synthetic face image of the virtual object corresponding to the sampling identity feature under the prior attitude and the appearance information characterized by the sampling appearance features.
Here, the data generation model may be a DiscoFaceGAN model, or may be another face generation model, which is not limited in this disclosure.
In some embodiments, the data processing includes a style migration process of the real image; based on this, as shown in fig. 5, S103 can be realized by S1031 to S1033, and S103 in fig. 1 can be realized by S1031 to S1033 as an example.
And S1031, carrying out key point detection processing on the synthetic face image to obtain the face key points of the synthetic face image.
In this embodiment, for each synthesized face image, the computer device may first perform key point detection processing on the synthesized face image by using a key point detection method to obtain the face key points of the synthesized face image.
In some embodiments, the computer device may perform a keypoint detection process on each of the synthesized face images by using Dlib to obtain 68 keypoints of the synthesized face image, and select 5 keypoints, namely a left eye center, a right eye center, a nose center, a left mouth corner and a right mouth corner, from the 68 keypoints as the face keypoints of the synthesized face image.
In some embodiments, the computer device may perform the key point detection processing on one synthesized face image at a time, or may perform the key point detection processing on multiple synthesized face images at a time, which is not limited in this disclosure.
In some embodiments, in the case where the data processing includes only the style migration processing of the real image, or the data processing includes the data cleaning processing and the style migration processing of the real image, and the computer device performs the style migration processing of the real image on the synthetic face image first, and then performs the data cleaning processing, the synthetic face image in S1031 is the synthetic face image generated in S102.
In some embodiments, in the case that the data processing includes a data cleaning process and a style migration process of the real image, and the computer device performs the data cleaning process on the synthetic face image first and then performs the style migration process of the real image, the synthetic face image in S1031 is the synthetic face image after the data cleaning process.
And S1032, performing affine transformation processing and size adjustment processing on the synthetic face image according to the face key points to obtain a preprocessed sample image.
In the embodiment of the present disclosure, for each synthesized face image, the computer device may perform affine transformation processing and size adjustment processing on the synthesized face image according to the face key points of the synthesized face image, and use the processed image as a pre-processing sample image.
In some embodiments, for each synthesized face image, the computer device may perform affine transformation on the synthesized face image to obtain a transformed sample image, and then perform resizing on the transformed sample image, and use the resized transformed sample image as the preprocessed sample image.
Here, the resizing process may be a resizing process, for example, the computer device may resize the transform sample image to 224 × 224.
S1033, carrying out style migration processing on the real image on the preprocessed sample image through a style migration model to obtain a target sample image with the image style of the real image.
Here, the style migration model may be CycleGAN, neural-style, or the like, and the embodiment of the present disclosure does not limit this.
Here, the style migration model may migrate the image style of the real face image to the synthesized face image, so that the synthesized face image after the style migration processing is closer to the real face image.
In the embodiment of the disclosure, through the style migration processing, the target sample image can be closer to the real face image, so that the quality of the obtained target sample image is improved.
In some embodiments, after S102 and before S1033, the obtained preprocessed sample image may be further used to train the initial style migration model, and obtain a style migration model; the training method comprises the following steps: performing key point detection processing, affine transformation processing and size adjustment processing on the first preset real image to obtain a preprocessed real image; training the initial style migration model according to the preprocessed sample image and the preprocessed real image to obtain a style migration model; the style migration model is used for migrating the image style of the preprocessed real image to the preprocessed sample image.
Here, the first preset real image may be a front face, side face, or half side face human face image captured in a real scene. For example, the first preset real image may be a front face, a side face or a half side face image of a human face collected in an indoor environment; for another example, the first preset real image may be a front face, a side face, or a half side face image of a human face, etc., which is acquired in an outdoor environment. The computer device may perform the same key point detection processing, affine transformation processing, and size adjustment processing on each first preset real face image by using the same principle as in the above-described portions S1031 to S1033 to obtain a preprocessed real image, and train the initial style migration model by using the obtained preprocessed sample image and the preprocessed real image obtained here to obtain the style migration model.
In some embodiments, when training the initial style migration model using the preprocessed sample images and the preprocessed real images, the computer device may input one or more preprocessed real images to the initial style migration model to generate one or more first images having an image style of the preprocessed sample images, and then the initial style migration model generates a second image having an image style of the preprocessed real images according to the first images; then, comparing each second image with the corresponding preprocessed real image to obtain the difference between each second image and the corresponding preprocessed real image, and determining a first loss value according to the difference; comparing each first image with any one of the preprocessed sample images to obtain a difference between each first image and any one of the preprocessed sample images, and determining a second loss value according to the difference; inputting one or more preprocessed sample images into the initial style migration model to generate one or more third images with the image style of the preprocessed real images, and then generating a fourth image with the image style of the preprocessed sample images by the initial style migration model according to the third images; then, comparing each fourth image with the corresponding preprocessed sample image to obtain the difference between each fourth image and the corresponding preprocessed sample image, and determining a third loss value according to the difference; and comparing each third image with any one of the preprocessed real images to obtain the difference between each first image and any one of the preprocessed real images, and determining a fourth loss value according to the difference. And summing the first loss value, the second loss value, the third loss value and the fourth loss value to obtain a loss sum, performing back propagation on the loss sum to generate a gradient, adjusting network parameters of the initial style migration model, and then continuing to perform next training by adopting the updated model parameters of the initial style migration model according to the training principle until the loss value obtained by a certain training is less than or equal to a preset value or the difference value between the loss values obtained by two adjacent times of training is less than the preset difference value to obtain the trained style migration model.
In the embodiment of the disclosure, the synthesized face image and the first preset real image are used for training the initial style migration model to obtain the style migration model, so that the style migration performance of the style migration model on the synthesized face image is better, and the image style of the synthesized face image after the style migration processing is closer to the real style.
In some embodiments, the data processing comprises a data cleansing process; based on this, as shown in fig. 6, S103 can be realized by S1034 to S1038, and the description will be given by taking fig. 6 as an example.
S1034, extracting the image characteristics of each synthesized face image; the synthetic face image includes: and a plurality of synthetic face images corresponding to different virtual objects respectively.
S1035, performing an averaging process on the image features of the plurality of synthesized face images corresponding to each virtual object to obtain an initial average feature corresponding to each virtual object.
Here, there are multiple synthetic face images corresponding to each virtual object, and the face poses of the multiple synthetic face images are at least different.
In some embodiments, the computer device may extract image features of each synthetic face image through a feature extraction network, and for a plurality of different synthetic face images belonging to the same virtual object, the computer device may calculate average features of the image features of the different synthetic face images, so as to obtain an initial average feature corresponding to each virtual object.
Illustratively, the computer device may employ a feature extraction network,extracting image characteristics F of each combined face image D i j Wherein i is a virtual object to which the synthetic face image belongs, and j is an image index of the synthetic face image D in a plurality of different images corresponding to the virtual object i, so that the average characteristic C of the image characteristics of the plurality of different synthetic face images belonging to the virtual object i i Can be expressed by the following formula (1):
Figure BDA0003660284270000101
in the above formula (1), n is the total number of synthetic face images belonging to the virtual object i.
Here, for each virtual object, in a case where an average feature of image features of a plurality of synthetic face images corresponding to the virtual object is calculated, the computer device may take the average feature as an initial average feature of the virtual object. For example, in a case where the synthetic face image includes a plurality of different synthetic face images corresponding to 3 different virtual objects, the computer device may obtain initial average features corresponding to the 3 virtual objects.
In some embodiments, in the case where the data processing only includes data cleansing processing, or the data processing includes data cleansing processing and style migration processing of the real images, and the computer device performs the data cleansing processing on the synthetic face image first and then performs the style migration processing of the real images, the synthetic face image in S1034 is the synthetic face image generated in S102.
In some embodiments, in the case that the data processing includes data cleaning processing and style migration processing of the real image, and the computer device performs the style migration processing of the real image on the synthetic face image first, and then performs the data cleaning processing, the synthetic face image in S1034 is the synthetic face image after the style migration processing of the real image.
S1036, according to the image characteristics of each synthesized face image in a plurality of synthesized face images corresponding to the current object, the initial similarity between the image characteristics of each synthesized face image and the initial average characteristics corresponding to the current object and a preset numerical range, performing image screening on the plurality of synthesized face images corresponding to the current object to obtain a current screening sample image; the current object is any one of the different virtual objects.
Here, for each virtual object, in a case where an initial average feature of the virtual object is obtained, the computer device may calculate an initial similarity between an image feature of each synthesized face image in the multiple synthesized face images corresponding to the virtual object and the initial average feature of the virtual object, and filter at least one synthesized face image from the multiple synthesized face images corresponding to the virtual object as a current filtering sample image corresponding to the virtual object according to the obtained initial similarity and a preset value range.
For example, for the above example, the computer device may calculate the average feature C i And image feature F i j Cosine similarity between them cos _ sim (F) i j ,C i ) And screening out at least one synthesized face image from a plurality of synthesized face images belonging to the virtual object i according to the cosine similarity and the preset numerical value range corresponding to the image characteristics of each synthesized face image belonging to the virtual object i, wherein the synthesized face image is used as a current screening sample image, and the current screening sample image can be represented by the following formula (2):
D={F i x }={t 1 ≤cos_sim(F i j ,C i )≤t 2 } (2)
wherein D represents the current screening sample image, t 1 Is the lower limit, t, of a predetermined range of values 2 X represents the number of images of the current screening sample image, which is the upper limit value of the preset value range.
Here, the preset numerical range may be set according to actual needs, and may be (0,0.8), or [0,0.8], for example, and the embodiment of the present disclosure does not limit this.
S1037, under the condition that the preset value range does not meet the preset condition, updating the preset value range by adopting a preset interval value, and carrying out image screening on the currently screened sample image according to the updated value range until the sub-sample image corresponding to the current object is screened under the condition that the updated value range meets the preset condition.
In some embodiments, for each virtual object, after screening out the current-time screened sample image corresponding to the virtual object, the computer device may determine whether the preset value range satisfies the preset condition, and if the preset value range does not satisfy the preset condition, use a sum of a lower limit value of the preset value range and the preset interval value as a new lower limit value of the preset value range, thereby obtaining an updated value range, and then perform image screening on the current-time screened sample image corresponding to the virtual object according to the obtained updated value range until, if the updated value range satisfies the preset condition, adopt the updated value range satisfying the preset condition to screen out the sub-sample image corresponding to the virtual object.
Here, the preset condition may be that the current update time corresponding to the numerical range is a preset number threshold, or that a lower limit value of the numerical range is equal to a preset lower limit threshold; therefore, the computer device can determine whether the preset value range meets the preset condition by determining the current updating times corresponding to the preset value range, and judging whether the current updating times are the preset number threshold or not, or judging whether the lower limit value of the preset value range is equal to the preset lower limit threshold or not.
Here, the preset interval value may be set according to actual needs, for example, the preset interval value may be 0.05, and the like, which is not limited in the embodiment of the present disclosure.
Here, the preset number threshold and the preset lower limit threshold may also be set according to actual needs, for example, the preset number threshold may be 3, and the preset lower limit threshold may be 0.3, which is not limited in this disclosure.
Here, when the updated numerical range satisfies the preset condition, the sub-sample image corresponding to each virtual object is screened out, and the number of times of data cleaning can be controlled by the preset condition, so that the quality of the synthesized face image after data cleaning is controlled.
And S1038, taking the sub-sample images corresponding to the different virtual objects as target sample images.
In the embodiment of the present disclosure, after obtaining the sub-sample image corresponding to each virtual object in different virtual objects, the computer device may use all the sub-sample images corresponding to different virtual objects as the target sample image. For example, after obtaining the sub-sample image corresponding to each of the 3 different virtual objects, the computer device may use all the sub-sample images corresponding to the 3 virtual objects as the target sample image.
In the embodiment of the disclosure, the synthetic face image is cleaned layer by updating the numerical range for multiple times and cleaning the synthetic face image by adopting the numerical range updated each time, so that the quality of the obtained target sample image is improved.
In some embodiments, S1037 may be implemented through S401 to S403:
s401, under the condition that the preset value range does not meet the preset condition, updating the preset value range according to the preset interval value to obtain the current value range.
For example, in a case where the lower limit value of the preset value range is 0, the preset interval value is 0.05, and the preset condition is that the lower limit value is equal to 0.3 or the current update time is 3, the computer device may determine that the preset value range does not satisfy the preset condition, and may sum the lower limit value 0 of the preset value range and the preset interval value 0.05 to obtain a new lower limit value 0.05 of the preset value range, and thus obtain the current value range of which the lower limit value is 0.05 and the upper limit value is the upper limit value of the preset value range. For example, in the case where the preset numerical range is (0,0.8), the resulting current numerical range may be (0.05, 0.8).
And S302, determining the next average feature of the image features of the current screening sample image under the condition that the preset numerical range does not meet the preset condition.
Here, when the computer device determines that the preset value range does not satisfy the preset condition, the computer device further determines an average feature of the image features of the currently screened sample image, and takes the average feature as a next average feature; the principle of calculating the average feature is the same as that of the above formula (1).
And S303, screening the next screening sample image according to the image characteristics of each screening sample image in the current screening sample image, the similarity between the next screening sample image and the next average characteristic and the current numerical range, and screening the sub-sample image through the numerical range meeting the preset condition under the condition that the obtained numerical range meets the preset condition.
Here, after the next screening sample image is screened based on the image feature of each screening sample image in the current screening sample image, the similarity between the next screening sample image and the next average feature, and the current numerical range, and before the sub-sample image is screened through the numerical range satisfying the preset condition, the computer device may determine the current update times corresponding to the obtained numerical range, and determine that the obtained numerical range satisfies the preset condition in a case where the current update times is a preset number threshold, or a lower limit value of the obtained numerical range reaches a preset lower limit threshold.
In the embodiment of the present disclosure, the computer device may calculate a cosine similarity between an image feature of each screened sample image in the current screened sample image and the next average feature obtained above, perform image screening from the current screened sample image according to the obtained similarity and the current value range, use at least one selected synthesized face image as the next screened sample image, determine an average feature of the image feature of the next screened sample image (hereinafter, referred to as the next average feature), determine whether the obtained current value range satisfies a preset condition, update the current data range with a preset interval value when the preset condition is not satisfied, obtain the next value range, and update the image feature of each screened sample image in the next screened sample image according to the image feature of each screened sample image in the next screened sample image when the preset condition is satisfied by the next value range, and (4) screening the sub-sample image according to the similarity between the next average characteristic and the next numerical range.
In the embodiment of the disclosure, the probability that the high-quality synthesized face image is mistakenly removed can be reduced by gradually reducing the numerical range and cleaning the synthesized face image layer by adopting the gradually reduced numerical range; in summary, the quality of the resulting target sample image can be improved by the above method.
In some embodiments, S1036 may be implemented by S501 to S502:
s501, screening out the target similarity belonging to a preset numerical range from the initial similarity.
And S502, taking the synthesized face image corresponding to the target similarity as the current screening sample image.
For example, the computer device may select, from the plurality of synthesized face images belonging to the virtual object i, all similarity values that are greater than or equal to a lower limit value of the preset numerical range and less than or equal to an upper limit value of the preset numerical range as target similarity values according to the cosine similarity value and the preset numerical range corresponding to the image feature of each synthesized face image belonging to the virtual object i, and select all synthesized face images corresponding to the target similarity values as current-time filtered sample images.
In the embodiment of the disclosure, by the above method, the synthesized face image with the similarity value smaller than the lower limit value of the numerical range can be removed as a noise image, and the synthesized face image with the similarity larger than the upper limit value of the numerical range can be removed as a sample image with too small difference; thereby improving the quality of the resulting target sample image.
The embodiment of the present disclosure further provides a model training method, by which the initial face recognition model can be trained by using the obtained target sample image to obtain a face recognition model for face recognition, and the model training method will be described with reference to fig. 7 as an example.
S601, obtaining a target sample image; the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image; the synthetic face image is a face image of a virtual object generated according to preset object attribute information.
S602, training the initial face recognition model at least according to the target sample image to obtain a face recognition model.
In some embodiments, the computer device may train the initial face recognition model according to the obtained target sample image to obtain the face recognition model, so that the model training efficiency may be improved.
In some embodiments, the computer device may obtain some preset real images having the same image style as the obtained target sample image, and train the initial face recognition model according to the obtained target sample image and the obtained preset real images to obtain a face recognition model; therefore, the recognition accuracy of the face recognition model obtained by training on the real face image can be higher.
According to the method and the device, a target sample image is obtained, wherein the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image, the synthesized face image is a face image of a virtual object generated according to preset object attribute information, and then an initial face recognition model is trained at least according to the target sample image to obtain a face recognition model; in this way, since the target sample image is an image obtained by performing at least one of the style migration processing and the data cleaning processing of the real image on the generated specific synthesized face image, the quality of the target sample image is high, and further, the accuracy of the face recognition model trained using the high-quality target sample image is also high, thereby improving the recognition accuracy of the trained face recognition model.
In some embodiments, the above S602 may be implemented by training the initial face recognition model according to the target sample image and some preset real images (hereinafter referred to as second preset face images) with the same image style as that of the target sample image, so as to obtain the face recognition model, and the implementation steps are as shown in fig. 8.
And S6021, respectively carrying out preprocessing related to pixel values on the target sample image and the second preset real image, and correspondingly obtaining a synthetic image to be trained and a real image to be trained.
In some embodiments, the second predetermined face image may be the first predetermined face image; in some embodiments, the second predetermined face image may be a face image having the same style as the first predetermined face image.
Here, the computer device may perform preprocessing related to pixel values on each target sample image and each second preset real image, respectively, and obtain a synthetic image to be trained corresponding to each target sample image and obtain a real image to be trained corresponding to each second preset real image.
For example, for each target sample image or each second preset real image, the computer device may divide each pixel point of the image by a preset pixel threshold value, so as to change a value range of a pixel value of each point of the image from [0,255] to [0, Y ], where Y is a positive integer less than 255, then subtract a preset mean value of the corresponding channel from R, G, B values of each pixel point of the image, obtain a difference value of each channel, divide the obtained difference value of each channel by a standard deviation of the channel, and use the obtained image as an image to be trained of the image.
For example, the preset pixel threshold may be 255, Y may be 1, the preset mean of the R channel may be 0.485, the mean of the G channel may be 0.456, the mean of the B channel may be 0.406, the standard deviation of the R channel may be 0.229, the standard deviation of the G channel may be 0.224, and the standard deviation of the B channel may be 0.225.
In some embodiments, for each target sample image or each second pre-set real image, the computer device may resize the image to a pre-set size before performing pre-processing on the image related to pixel values. Illustratively, the computer device may crop the size of the image to 224 x 224.
And S6022, inputting the synthetic image to be trained and the real image to be trained into the initial face recognition model, and respectively obtaining a first classification result of each synthetic image to be trained and a second classification result of each real image to be trained.
In some embodiments, the computer device may input a plurality of synthetic images to be trained and a plurality of real images to be trained into the initial face recognition model for training at a time.
For example, the computer device may input the plurality of synthetic images to be trained and the plurality of real images to be trained into the initial face recognition model at a ratio of 1:1 for training each time. For example, the computer device may input 178 synthetic images to be trained and 178 real images to be trained at a time into the initial face recognition model for training.
In some embodiments, the computer device may also input one synthetic image to be trained or one real image to be trained at a time into the initial face recognition model for training, and continuously input the input alternately for multiple times.
Here, for each synthetic image to be trained or each real image to be trained, the computer device may first obtain a feature vector of the image through a feature extraction layer of the initial face recognition model, perform scaling on the feature vector to obtain a scaled feature, obtain an output feature (logits) through a full connection layer of the initial face recognition model, and then perform normalization processing on the output feature through a normalization layer of the initial face recognition model to obtain a category of the image and a probability value that the image belongs to each category.
S6023, determining a loss value according to the first classification result and the first preset real category as well as the second classification result and the second preset real category, and updating the model parameters of the initial face recognition model according to the loss value until the face recognition model is obtained under the condition that the obtained loss value meets the preset loss condition.
In some embodiments, in the case that a plurality of synthetic images to be trained and a plurality of real images to be trained are simultaneously input to the initial face recognition model each time, and a plurality of first classification results and a plurality of second classification results are obtained correspondingly, the computer device may input a plurality of first classification results, a plurality of second classification results, a plurality of first preset real categories corresponding to the plurality of first classification results, and a plurality of second preset real categories corresponding to the plurality of second classification results, which are obtained at the present time, into the cross entropy loss function, obtain the cross entropy loss at the present time, perform back propagation on the cross entropy loss to generate a gradient, adjust model parameters of the initial face recognition model, and then, continue to perform the next training using the updated model parameters and the training principle until the obtained loss value satisfies a preset loss condition, and obtaining a face recognition model.
In some embodiments, in a case that a synthetic image to be trained or a real image to be trained is input to the initial face recognition model each time, and is continuously and alternately input multiple times, and multiple first classification results and multiple second classification results are obtained correspondingly, the computer device may input the multiple obtained first classification results, multiple second classification results, multiple first preset real categories corresponding to the multiple first classification results, and multiple second preset real categories corresponding to the multiple second classification results into the cross entropy loss function, obtain a cross entropy loss corresponding to the current time, perform back propagation on the cross entropy loss to generate a gradient, adjust model parameters of the initial face recognition model, and then continue to perform next training by using the updated model parameters and the training principle until the obtained loss value satisfies a preset loss condition, and obtaining a face recognition model.
In some embodiments, the preset loss condition may be that the cross entropy loss is less than or equal to a preset loss value; in some embodiments, the preset loss condition may be that a difference between two adjacent cross entropy losses is smaller than a preset loss difference.
In some embodiments, the computer device may employ an Adaptive Moment Estimation (Adam) optimizer to update the model parameters of the initial face recognition model. Here, by updating the model parameters of the initial face recognition model by using the Adam optimizer, the updating of the model parameters can be free from the influence of the expansion and contraction transformation of the gradient, so that the difficulty in adjusting the model parameters can be reduced.
Fig. 9 is a schematic view of an implementation flow of a face recognition method provided in the embodiment of the present disclosure, and as shown in fig. 9, the method includes S701 to S702.
S701, acquiring a face image to be recognized; the face image to be recognized is a face image with any posture.
Here, the face image of an arbitrary pose may be a front face image, a side face image, a half side face image, or the like.
In some embodiments, the computer device may acquire an image of the target object through its own image acquisition device to obtain a face image to be recognized; in some embodiments, the computer device may also acquire the face image to be recognized from an external device, which is not limited by the embodiments of the present disclosure.
And S702, identifying the face image to be identified through the face identification model to obtain an identification result.
In this embodiment, the computer device may recognize the face image to be recognized by using the face recognition model obtained by the training, so as to obtain recognition results such as the object identity corresponding to the face image.
In the embodiment of the disclosure, a face image to be recognized in any posture is obtained, and the face image to be recognized is recognized through the face recognition model, so that a recognition result is obtained; in this way, the obtained face recognition model has high precision, so that the recognition accuracy can be improved when the face recognition model is adopted to recognize large-posture face images such as side face images and half side face images.
The following describes an application of the sample generation method provided by the embodiment of the present disclosure in an actual scene.
S1, obtaining a pre-trained face generation model DiscoFaceGAN, initializing the identity input feature, expression input feature, texture input feature and noise input feature as normal distributions, and initializing each gesture input as a set of three-dimensional constants belonging to three-axis deflection angles pitch [ -30, +40], yaw [ -60, +70] and roll ═ 0.
Here, the identity input feature corresponds to the preset identity feature, the expression input feature, the texture input feature, and the noise input feature correspond to the preset apparent feature, and the gesture input corresponds to the preset prior gesture.
Here, "initializing the identity input feature, expression input feature, texture input feature, and noise input feature as normal distributions, and initializing each posture input as a set of three-dimensional constants belonging to the three-axis deflection angles pitch [ -30, +40], yaw [ -60, +70], and roll ═ 0" corresponds to the content parts of S1021 and S201 to S202 described above.
S2, obtaining at least one identity input feature, at least one gesture input feature, at least one expression input feature, at least one texture input feature and at least one noise input feature through random sampling.
And S3, inputting an identity input feature, a posture input feature, an expression input feature, a texture input feature and a noise input feature into the DiscoFaceGAN model each time, and generating a face image of the virtual character corresponding to the identity under the face posture, the expression, the texture and the noise through the DiscoFaceGAN model.
S4, under the condition that the number of the generated face images does not reach the number of preset images or the number of virtual characters corresponding to the generated face images does not reach the number of preset objects, at least one identity input feature, at least one posture input feature, at least one expression input feature, at least one texture input feature and at least one noise input feature are obtained by continuing the random sampling, and the input features obtained by the sampling are input into the DiscoFaceGAN model to generate the corresponding face images.
And S5, obtaining the facial images of different facial poses, different facial expressions, different facial textures and different facial noises of each virtual character in the plurality of virtual characters under the condition that the number of the generated facial images reaches the number of preset images or the number of the virtual characters corresponding to the generated facial images reaches the number of preset objects.
For example, as shown in fig. 10, a plurality of facial images of the same virtual character under different facial poses, facial expressions, facial textures and facial noises can be obtained by inputting the same identity input feature, different pose inputs, different expression input features, different texture input features and different noise input features into the face generation model each time, and performing image generation for a plurality of times.
Fig. 11 is a schematic structural diagram of a sample generation apparatus according to an embodiment of the present disclosure, and as shown in fig. 11, the sample generation apparatus 600 includes: an information obtaining module 610, configured to obtain preset object attribute information; an image generating module 620, configured to generate a synthetic face image of a virtual object according to the preset object attribute information; a data processing module 630, configured to perform data processing on the synthesized face image to obtain a target sample image; the data processing comprises: at least one of a style migration process and a data cleansing process of the real image.
In some embodiments, the preset object attribute information includes: presetting parameters; the image generating module 620 is further configured to generate a preset object attribute feature according to the preset parameter; the preset object attribute features include: at least one different preset identity characteristic for characterizing different virtual objects, at least one different preset prior pose for characterizing different prior poses, and at least one different preset appearance characteristic for characterizing different appearance information; and generating the synthesized face image according to the preset identity characteristic, the preset prior posture and the preset apparent characteristic.
In some embodiments, the preset parameters include: a first prior deflection angle range, a second prior deflection angle range, and a third prior deflection angle range; the image generating module 620 is further configured to generate at least one group of deflection angles according to the first prior deflection angle range, the second prior deflection angle range, and the third prior deflection angle range; a first angle value of each set of deflection angles belongs to the first a priori deflection angle range, a second angle value belongs to the second a priori deflection angle range, and a third angle value belongs to the third a priori deflection angle range; and taking the at least one group of deflection angles as the at least one different preset prior attitude.
In some embodiments, the image generating module 620 is further configured to perform random sampling on the at least one different preset identity feature to obtain a sampled identity feature, perform random sampling on the at least one different preset prior gesture to obtain a sampled gesture, and perform random sampling on the at least one different preset apparent feature to obtain a sampled apparent feature; and generating the synthetic face image of the virtual object corresponding to each sampling identity feature under each prior pose and each apparent information according to the sampling identity feature, the sampling pose and the sampling apparent feature by a pre-trained data generation model.
In some embodiments, in a case that the data processing includes style migration processing of the real image, the data processing module 630 is further configured to perform key point detection processing on the synthetic face image to obtain a face key point of the synthetic face image; performing affine transformation processing and size adjustment processing on the synthesized face image according to the face key points to obtain a preprocessed sample image; and carrying out style migration processing of a real image on the preprocessed sample image through a style migration model to obtain the target sample image with the image style of the real image.
In some embodiments, the data processing module 630 is further configured to, after the generating of the synthetic face image of the virtual object according to the preset object attribute information, and before performing, by using the style migration model, style migration processing on the real image on the preprocessed sample image to obtain the target sample image with the image style of the real image, perform the keypoint detection processing, the affine transformation processing, and the size adjustment processing on the first preset real image to obtain a preprocessed real image; training an initial style migration model according to the preprocessed sample image and the preprocessed real image to obtain the style migration model; and the style migration model is used for migrating the image style of the preprocessed real image to the preprocessed sample image.
In some embodiments, the synthetic face image comprises: a plurality of synthetic face images corresponding to different virtual objects respectively; in the case that the data processing includes the data cleansing processing, the data processing module 630 is further configured to extract an image feature of each of the synthetic face images; carrying out averaging processing on image features of a plurality of synthetic face images corresponding to each virtual object to obtain initial average features corresponding to each virtual object; according to the image characteristics of each synthesized face image in a plurality of synthesized face images corresponding to a current object, the initial similarity between the image characteristics of each synthesized face image and the initial average characteristics corresponding to the current object and a preset numerical range, carrying out image screening on the plurality of synthesized face images corresponding to the current object to obtain a current screening sample image; the current object is any one of the different virtual objects; under the condition that the preset value range does not meet the preset condition, updating the preset value range by adopting a preset interval value, and carrying out image screening on the currently screened sample image according to the updated value range until the sub-sample image corresponding to the current object is screened under the condition that the updated value range meets the preset condition; and taking the sub-sample images corresponding to the different virtual objects as the target sample image.
In some embodiments, the data processing module 630 is further configured to update the preset value range according to the preset interval value to obtain a current value range when the preset value range does not meet a preset condition; determining the next average feature of the image features of the current screening sample image under the condition that the preset numerical range does not meet the preset condition; and screening the next screening sample image according to the image characteristics of each screening sample image in the current screening sample image, the similarity between the next screening sample image and the next average characteristic and the current numerical range until the sub-sample image is screened through the numerical range meeting the preset condition under the condition that the obtained numerical range meets the preset condition.
In some embodiments, the data processing module 630 is further configured to filter out target similarities belonging to the preset value range from the initial similarities; and taking the synthesized face image corresponding to the target similarity as the current screening sample image.
In some embodiments, the data processing module 630 is further configured to determine, after the next screening sample image is screened according to the image feature of each screened sample image in the current screening sample image, the similarity between the next average feature and the current numerical range, and before the sub-sample image is screened through the numerical range meeting the preset condition, the current update time corresponding to the obtained numerical range, and determine that the obtained numerical range meets the preset condition when the current update time is a preset number threshold or a lower limit of the obtained numerical range reaches a preset lower limit threshold.
Fig. 12 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure, and as shown in fig. 12, a model training apparatus 700 includes: a sample acquiring module 710 for acquiring a target sample image; the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image; the synthetic face image is a face image of a virtual object generated according to preset object attribute information; and the training module 720 is configured to train the initial face recognition model at least according to the target sample image, so as to obtain a face recognition model.
In some embodiments, the training module 720 is further configured to obtain a second preset real image with the same image style as the target sample image; and training the initial face recognition model according to the target sample image and the second preset real image to obtain the face recognition model.
In some embodiments, the training module 720 is further configured to perform pixel value-related preprocessing on the target sample image and the second preset real image, respectively, and correspondingly obtain a synthetic image to be trained and a real image to be trained; inputting the synthetic image to be trained and the real image to be trained into the initial face recognition model, and respectively obtaining a first classification result of each synthetic image to be trained and a second classification result of each real image to be trained; determining a loss value according to the first classification result and the first preset real category as well as the second classification result and the second preset real category, and updating the model parameters of the initial face recognition model according to the loss value until the face recognition model is obtained under the condition that the obtained loss value meets a preset loss condition.
Fig. 13 is a schematic diagram of a structure of a face recognition apparatus according to an embodiment of the present disclosure, and as shown in fig. 13, the face recognition apparatus 800 includes: the image acquisition module 810 is used for acquiring a face image to be recognized; the face image to be recognized is a face image with any posture; and the recognition module 820 is configured to recognize the face image to be recognized through the face recognition model to obtain a recognition result.
The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. In some embodiments, functions of or modules included in the apparatuses provided in the embodiments of the present disclosure may be used to perform the methods described in the above method embodiments, and for technical details not disclosed in the embodiments of the apparatuses of the present disclosure, please refer to the description of the method embodiments of the present disclosure for understanding.
It should be noted that, in the embodiment of the present disclosure, if the method is implemented in the form of a software functional module and sold or used as a standalone product, the method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present disclosure are not limited to any specific hardware, software, or firmware, or any combination thereof.
The embodiment of the present disclosure provides a computer device, which includes a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor implements some or all of the steps of the above method when executing the program.
The disclosed embodiments provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.
The disclosed embodiments provide a computer program comprising computer readable code, where the computer readable code runs in a computer device, a processor in the computer device executes some or all of the steps for implementing the above method.
The disclosed embodiments provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program that when read and executed by a computer performs some or all of the steps of the above method. The computer program product may be embodied in hardware, software or a combination thereof. In some embodiments, the computer program product is embodied in a computer storage medium, and in other embodiments, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
Here, it should be noted that: the foregoing description of the various embodiments is intended to highlight various differences between the embodiments, which are the same or similar and all of which are referenced. The above description of the apparatus, storage medium, computer program and computer program product embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the disclosed apparatus, storage medium, computer program and computer program product, reference is made to the description of the embodiments of the method of the present disclosure for understanding.
It should be noted that fig. 14 is a schematic hardware entity diagram of a computer device in an embodiment of the present disclosure, and as shown in fig. 14, the hardware entity of the computer device 900 includes: a processor 901, a communication interface 902, and a memory 903, wherein:
the processor 901 generally controls the overall operation of the computer device 900.
The communication interface 902 may enable the computer device to communicate with other terminals or servers via a network.
The Memory 903 is configured to store instructions and applications executable by the processor 901, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 901 and modules in the computer apparatus 900, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM). Data may be transferred between the processor 901, the communication interface 902, and the memory 903 via the bus 904.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the above steps/processes do not mean the execution sequence, and the execution sequence of each step/process should be determined by the function and the inherent logic of the step/process, and should not constitute any limitation to the implementation process of the embodiments of the present disclosure. The above-mentioned serial numbers of the embodiments of the present disclosure are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present disclosure may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media that can store program code, such as removable storage devices, ROMs, magnetic or optical disks, etc.
The above description is only an embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered by the scope of the present disclosure.

Claims (18)

1. A method of generating a sample, comprising:
acquiring preset object attribute information;
generating a synthetic face image of the virtual object according to the preset object attribute information;
carrying out data processing on the synthesized face image to obtain a target sample image; the data processing comprises: at least one of a style migration process and a data cleansing process of the real image.
2. The method according to claim 1, wherein the preset object attribute information comprises: presetting parameters; generating a synthetic face image of a virtual object according to the preset object attribute information, wherein the synthetic face image comprises:
generating preset object attribute characteristics according to the preset parameters; the preset object attribute features include: at least one different preset identity characteristic for characterizing different virtual objects, at least one different preset prior pose for characterizing different prior poses, and at least one different preset appearance characteristic for characterizing different appearance information;
and generating the synthesized face image according to the preset identity characteristic, the preset prior posture and the preset apparent characteristic.
3. The method of claim 2, wherein the preset parameters comprise: a first prior deflection angle range, a second prior deflection angle range, and a third prior deflection angle range; the generating of the preset object attribute feature according to the preset parameter includes:
generating at least one group of deflection angles according to the first prior deflection angle range, the second prior deflection angle range and the third prior deflection angle range; a first angle value of each set of deflection angles belongs to the first a priori deflection angle range, a second angle value belongs to the second a priori deflection angle range, and a third angle value belongs to the third a priori deflection angle range;
and taking the at least one group of deflection angles as the at least one different preset prior attitude.
4. The method according to claim 2 or 3, wherein the generating the synthetic face image according to the preset identity feature, the preset prior pose and the preset apparent feature comprises:
randomly sampling the at least one different preset identity characteristic to obtain a sampling identity characteristic, randomly sampling the at least one different preset prior gesture to obtain a sampling gesture, and randomly sampling the at least one different preset apparent characteristic to obtain a sampling apparent characteristic;
and generating the synthetic face image of the virtual object corresponding to each sampling identity feature under each prior gesture and each apparent information according to the sampling identity feature, the sampling gesture and the sampling apparent feature through a pre-trained data generation model.
5. The method according to any one of claims 1 to 4, wherein in a case where the data processing includes a style migration processing of the real image, the data processing the synthetic face image to obtain a target sample image includes:
performing key point detection processing on the synthesized face image to obtain face key points of the synthesized face image;
performing affine transformation processing and size adjustment processing on the synthesized face image according to the face key points to obtain a preprocessed sample image;
and carrying out style migration processing of a real image on the preprocessed sample image through a style migration model to obtain the target sample image with the image style of the real image.
6. The method according to claim 5, wherein after the generating of the synthetic face image of the virtual object according to the preset object attribute information and before performing, by the style migration model, style migration processing of the real image on the preprocessed sample image to obtain the target sample image having an image style of the real image, the method further comprises:
performing the key point detection processing, the affine transformation processing and the size adjustment processing on a first preset real image to obtain a preprocessed real image;
training an initial style migration model according to the preprocessed sample image and the preprocessed real image to obtain the style migration model; and the style migration model is used for migrating the image style of the preprocessed real image to the preprocessed sample image.
7. The method of any of claims 1-6, wherein the synthetic face image comprises: a plurality of synthetic face images corresponding to different virtual objects respectively; under the condition that the data processing comprises the data cleaning processing, the data processing is carried out on the synthesized face image to obtain a target sample image, and the method comprises the following steps:
extracting the image characteristics of each synthesized face image;
carrying out averaging processing on image features of a plurality of synthetic face images corresponding to each virtual object to obtain initial average features corresponding to each virtual object;
according to the image characteristics of each synthesized face image in a plurality of synthesized face images corresponding to a current object, the initial similarity between the image characteristics of each synthesized face image and the initial average characteristics corresponding to the current object and a preset numerical range, carrying out image screening on the plurality of synthesized face images corresponding to the current object to obtain a current screening sample image; the current object is any one of the different virtual objects;
under the condition that the preset value range does not meet the preset condition, updating the preset value range by adopting a preset interval value, and carrying out image screening on the currently screened sample image according to the updated value range until the sub-sample image corresponding to the current object is screened under the condition that the updated value range meets the preset condition;
and taking the sub-sample images corresponding to the different virtual objects as the target sample image.
8. The method according to claim 7, wherein the updating the preset value range by using a preset interval value when the preset value range does not satisfy a preset condition, and performing image screening on the currently screened sample image according to the updated value range until the screening of the sub-sample image corresponding to the current object when the updated value range satisfies the preset condition comprises:
under the condition that the preset value range does not meet the preset condition, updating the preset value range according to the preset interval value to obtain the current value range;
determining the next average feature of the image features of the current screening sample image under the condition that the preset numerical range does not meet the preset condition;
and screening the next screening sample image according to the image characteristics of each screening sample image in the current screening sample image, the similarity between the next screening sample image and the next average characteristic and the current numerical range until the sub-sample image is screened through the numerical range meeting the preset condition under the condition that the obtained numerical range meets the preset condition.
9. The method according to claim 7 or 8, wherein the performing image screening from the multiple synthesized face images corresponding to the current object according to an initial similarity between an image feature of each synthesized face image and an initial average feature corresponding to the current object and a preset value range in the multiple synthesized face images corresponding to the current object to obtain a current-time-screened sample image comprises:
screening out target similarity belonging to the preset numerical range from the initial similarity;
and taking the synthesized face image corresponding to the target similarity as the current screening sample image.
10. The method according to claim 8, wherein after the next screening sample image is screened according to the image feature of each screened sample image in the current screening sample image, the similarity with the next average feature, and the current value range, and before the sub-sample image is screened by the value range satisfying the preset condition, the method further comprises:
and determining the current updating times corresponding to the obtained numerical range, and determining that the obtained numerical range meets the preset condition under the condition that the current updating times are a preset number threshold or the lower limit value of the obtained numerical range reaches a preset lower limit threshold.
11. A method of model training, comprising:
acquiring a target sample image; the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image; the synthetic face image is a face image of a virtual object generated according to preset object attribute information;
and training an initial face recognition model at least according to the target sample image to obtain a face recognition model.
12. The method of claim 11, wherein training an initial face recognition model based at least on the target sample image to obtain a face recognition model comprises:
acquiring a second preset real image with the same image style as the target sample image;
and training the initial face recognition model according to the target sample image and the second preset real image to obtain the face recognition model.
13. The method of claim 12, wherein the training the initial face recognition model according to the target sample image and the second preset real image to obtain the face recognition model comprises:
respectively preprocessing the target sample image and the second preset real image related to pixel values to correspondingly obtain a synthetic image to be trained and a real image to be trained;
inputting the synthetic image to be trained and the real image to be trained into the initial face recognition model, and respectively obtaining a first classification result of each synthetic image to be trained and a second classification result of each real image to be trained;
determining a loss value according to the first classification result and the first preset real category as well as the second classification result and the second preset real category, and updating the model parameters of the initial face recognition model according to the loss value until the face recognition model is obtained under the condition that the obtained loss value meets a preset loss condition.
14. A face recognition method, comprising:
acquiring a face image to be recognized; the face image to be recognized is a face image with any posture;
the face image to be recognized is recognized through the face recognition model of any one of the above claims 11-13, and a recognition result is obtained.
15. A sample generation device, comprising:
the information acquisition module is used for acquiring preset object attribute information;
the image generation module is used for generating a synthetic face image of the virtual object according to the preset object attribute information;
the data processing module is used for carrying out data processing on the synthesized face image to obtain a target sample image; the data processing comprises: at least one of a style migration process and a data cleansing process of the real image.
16. A model training apparatus, comprising:
the sample acquisition module is used for acquiring a target sample image; the target sample image is a synthesized face image processed by at least one of style migration processing and data cleaning processing of a real image; the synthetic face image is a face image of a virtual object generated according to preset object attribute information;
and the training module is used for training the initial face recognition model at least according to the target sample image to obtain the face recognition model.
17. A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 14 when executing the program.
18. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 14.
CN202210575099.2A 2022-05-24 2022-05-24 Sample generation and model training method, device, equipment and storage medium Pending CN114972912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210575099.2A CN114972912A (en) 2022-05-24 2022-05-24 Sample generation and model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210575099.2A CN114972912A (en) 2022-05-24 2022-05-24 Sample generation and model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114972912A true CN114972912A (en) 2022-08-30

Family

ID=82955232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210575099.2A Pending CN114972912A (en) 2022-05-24 2022-05-24 Sample generation and model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114972912A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912639A (en) * 2023-09-13 2023-10-20 腾讯科技(深圳)有限公司 Training method and device of image generation model, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912639A (en) * 2023-09-13 2023-10-20 腾讯科技(深圳)有限公司 Training method and device of image generation model, storage medium and electronic equipment
CN116912639B (en) * 2023-09-13 2024-02-09 腾讯科技(深圳)有限公司 Training method and device of image generation model, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN111243626B (en) Method and system for generating speaking video
CN111325851B (en) Image processing method and device, electronic equipment and computer readable storage medium
JP5554984B2 (en) Pattern recognition method and pattern recognition apparatus
CN105005777B (en) Audio and video recommendation method and system based on human face
EP3084682B1 (en) System and method for identifying faces in unconstrained media
CN111931592B (en) Object recognition method, device and storage medium
CN109145717B (en) Face recognition method for online learning
CN106803055B (en) Face identification method and device
US11341770B2 (en) Facial image identification system, identifier generation device, identification device, image identification system, and identification system
CN111144348A (en) Image processing method, image processing device, electronic equipment and storage medium
JP6112801B2 (en) Image recognition apparatus and image recognition method
WO2020001083A1 (en) Feature multiplexing-based face recognition method
CN108198130B (en) Image processing method, image processing device, storage medium and electronic equipment
CN111652974B (en) Method, device, equipment and storage medium for constructing three-dimensional face model
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
CN111439267B (en) Method and device for adjusting cabin environment
US20210158593A1 (en) Pose selection and animation of characters using video data and training techniques
CN113850168A (en) Fusion method, device and equipment of face pictures and storage medium
CN114266695A (en) Image processing method, image processing system and electronic equipment
Prabhu et al. Facial Expression Recognition Using Enhanced Convolution Neural Network with Attention Mechanism.
CN112200147A (en) Face recognition method and device, computer equipment and storage medium
CN115827995A (en) Social matching method based on big data analysis
CN114972912A (en) Sample generation and model training method, device, equipment and storage medium
CN107153806B (en) Face detection method and device
CN111597928A (en) Three-dimensional model processing method and device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination