CN111046757A - Training method and device for face portrait generation model and related equipment - Google Patents

Training method and device for face portrait generation model and related equipment Download PDF

Info

Publication number
CN111046757A
CN111046757A CN201911182281.6A CN201911182281A CN111046757A CN 111046757 A CN111046757 A CN 111046757A CN 201911182281 A CN201911182281 A CN 201911182281A CN 111046757 A CN111046757 A CN 111046757A
Authority
CN
China
Prior art keywords
face
portrait
characteristic information
sample
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911182281.6A
Other languages
Chinese (zh)
Other versions
CN111046757B (en
Inventor
王楠楠
李志锋
朱明瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Xidian University
Original Assignee
Tencent Technology Shenzhen Co Ltd
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Xidian University filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911182281.6A priority Critical patent/CN111046757B/en
Publication of CN111046757A publication Critical patent/CN111046757A/en
Application granted granted Critical
Publication of CN111046757B publication Critical patent/CN111046757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the disclosure provides a training method and a device for a face portrait generation model and related equipment. The method comprises the following steps: acquiring a first training sample set; respectively processing the sample face picture and/or the sample face portrait through a neural network model to obtain first characteristic information and/or second characteristic information; processing the sample face picture through a face portrait generation model to obtain a predicted face portrait and predicted characteristic information thereof; determining a target loss function according to the first characteristic information and/or the second characteristic information, the predicted characteristic information, the sample face portrait and the predicted face portrait; and adjusting parameters of the face portrait generation model according to the target loss function. According to the scheme provided by the embodiment of the disclosure, the first characteristic information and the second characteristic information output by the neural network model are used as the supervision knowledge of the face portrait generating model to assist the training of the face portrait generating model, so that the synthesis quality and the recognition rate of the trained face portrait generating model can be improved at the same time.

Description

Training method and device for face portrait generation model and related equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a training method and apparatus for a face portrait generation model, an electronic device, and a computer-readable medium.
Background
The human face sketch portrait synthesis has wide application in both criminal investigation field and digital entertainment field.
In criminal investigation pursuit, the public security department is provided with a citizen photo database, and the citizen photo database is combined with a face recognition technology to determine the identity of a criminal suspect, but in practice, the criminal suspect photo is difficult to obtain, but a sketch portrait of the criminal suspect can be obtained under the cooperation of a painter and a witness, so that subsequent face retrieval and recognition can be carried out. Because of the great difference between the portrait and the common face photograph, it is difficult to obtain satisfactory recognition effect by directly using the traditional face recognition method. The photos in the citizen photo database are combined into the portraits, so that the difference of the textures of the citizen can be effectively reduced, and the recognition rate is further improved.
In digital entertainment, people often like to convert a face photo into a face sketch portrait as a head portrait of the people in a social account, and the face sketch portrait generation technology can well meet the requirement.
However, the model synthesis effect adopted in the face sketch image synthesis scheme in the related art is poor, and the synthesis quality and the face recognition rate cannot be simultaneously considered.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure.
Disclosure of Invention
The embodiment of the disclosure provides a training method and device for a face portrait generation model and an electronic device, so as to overcome the defects that the synthesis quality, the recognition rate and the model effect are not good enough in the related technology at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
The embodiment of the disclosure provides a training method for a face portrait generation model, which includes: acquiring a first training sample set, wherein the first training sample set comprises a sample face photo and a sample face portrait; processing the sample face picture and/or the sample face portrait respectively through a neural network model to obtain first characteristic information and/or second characteristic information; processing the sample face picture through a face picture generation model to obtain a predicted face picture and predicted characteristic information thereof; determining a target loss function according to the first characteristic information and/or the second characteristic information, the predicted characteristic information, the sample face portrait and the predicted face portrait; and adjusting parameters of the face portrait generating model according to the target loss function so as to obtain the trained face portrait generating model.
The embodiment of the present disclosure provides a training device for generating a face portrait model, including: a sample set acquisition module configured to acquire a first training sample set, the first training sample set including a sample face photograph and a sample face portrait; the characteristic information determining module is configured to process the sample face photo and/or the sample face portrait respectively through a neural network model to obtain first characteristic information and/or second characteristic information; the prediction result generation module is configured to process the sample face picture through a face picture generation model to obtain a predicted face picture and prediction characteristic information thereof; a loss function generation module configured to determine a target loss function from the first and/or second feature information, the predicted feature information, the sample face representation, and the predicted face representation; and the neural network training module is configured to adjust parameters of the face portrait generating model according to the target loss function so as to obtain the trained face portrait generating model.
In an exemplary embodiment of the present disclosure, the feature information determination module may include a first feature determination unit and/or a second feature extraction unit. The first feature determining unit is configured to process the sample face picture through a neural network model to obtain output feature information of k layers in the neural network model as first feature information, wherein k is an integer greater than 1. The second feature determination unit is configured to process the sample face image through the neural network model to obtain output feature information of a k layer in the neural network model as second feature information.
In an exemplary embodiment of the present disclosure, the prediction result generation module may include a prediction portrait generation unit and a prediction feature generation unit. Wherein the predicted face representation generating unit is configured to process the sample face picture through the face representation generating model to obtain a predicted face representation. The prediction feature generation unit is configured to determine output feature information of a 2k layer in the face portrait generation model as prediction feature information.
In one exemplary embodiment of the present disclosure, the face portrait generation model includes k +1 sequentially connected convolution layers and k-1 sequentially connected transposed convolution layers. The prediction result generation module may include a convolution processing unit and a transposed convolution processing unit. The convolution processing unit is configured to process the sample face photo through k +1 sequentially connected convolution layers to obtain k +1 prediction feature information respectively output by the k +1 sequentially connected convolution layers. The transposition convolution processing unit is configured to process the prediction characteristic information output by the previous layer of the mth transposition convolution layer through the mth transposition convolution layer in the k-1 sequentially connected transposition convolution layers, splice the processing result and the kt-mth prediction characteristic information, and perform convolution processing on the splicing result to obtain the mth prediction characteristic information output by the mth transposition convolution layer, wherein m is an integer larger than 0 and smaller than k.
In an exemplary embodiment of the present disclosure, the loss function generation module may include a first loss function unit, a second loss function unit, and a target loss function unit. Wherein the first loss function unit is configured to determine the first loss function based on the predicted characteristic information, the first characteristic information, and/or the second characteristic information. The second loss function unit is configured to determine a second loss function based on the predicted face representation and the sample face representation. The target loss function unit is configured to perform a weighted summation of the first loss function and the second loss function to obtain a target loss function.
In an exemplary embodiment of the present disclosure, the training device for generating a model of a face portrait further includes a convolutional layer determination module. The convolution layer determining module is configured to determine the number and size parameters of convolution kernels of an nth layer and a 2k +1-n th layer in a 2k layer of the face portrait generating model according to the channel number and size information of output information of the nth layer in the k layers of the neural network model, wherein n is an integer larger than 0 and smaller than or equal to k.
In an exemplary embodiment of the present disclosure, the neural network model is a deep convolutional network. The training device of the face portrait generating model also comprises a second training sample obtaining module and a second neural network training module. The second training sample acquisition module is configured to acquire a second training sample set, and the second training sample set comprises sample images with human faces and labeled human face images thereof. And the second neural network training module is configured to train the deep convolutional network through a second training sample set to obtain the deep convolutional network with the face recognition function.
In an exemplary embodiment of the present disclosure, the training device of the face portrait generation model further includes a face photo acquisition module and a face portrait generation module. The face photo acquisition module is configured to acquire a face photo to be processed. The face portrait generating module is configured to process the face photo to be processed through the trained face portrait generating model to obtain a target face portrait of the face photo to be processed.
In an exemplary embodiment of the present disclosure, the training device for a face portrait generation model further includes a portrait base generation module, a to-be-recognized portrait acquisition module, and a portrait matching module. The image library generation module is configured to process each photo to be recognized in the photo library to be recognized according to the trained face image generation model to obtain the image library to be recognized. The image to be recognized acquisition module is configured to acquire an image to be recognized. The portrait matching module is configured to match in a portrait library to be recognized based on the portrait to be recognized, obtain a target portrait, and determine a target photo in the photo library to be recognized according to the target portrait.
An embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for training a face representation generation model as described in the above embodiments.
The embodiment of the present disclosure provides a computer readable medium, on which a computer program is stored, where the computer program is executed by a processor to implement the training method for a face portrait generation model as described in the above embodiment.
In the technical solutions provided by some embodiments of the present disclosure, when a face portrait generation model is trained through a first training sample set, a sample face photo and/or a sample face portrait in the first training sample set are/is processed through a neural network model, and first feature information and/or second feature information that can express depth features of the sample face photo and the sample face portrait can be obtained. And the first characteristic information and/or the second characteristic information are/is used as supervision knowledge to guide the prediction characteristic information output by the face portrait generating model according to the supervision knowledge, a target loss function is determined according to a guidance result by combining the sample face portrait with the prediction face portrait, the mesomorphic depth characteristic of the face portrait generating model can be guided, and then the parameters of the face portrait generating model are adjusted according to the target loss function to obtain the trained face portrait generating model, so that the synthesis quality and the recognition rate of the face portrait generating model are improved at the same time.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
In the drawings:
FIG. 1 illustrates a schematic diagram of an exemplary system architecture 100 of a training method or apparatus for a face sketch generation model to which embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a training method of a face portrait generation model according to one embodiment of the present disclosure;
FIG. 3 is a flowchart in an exemplary embodiment based on step S220 of FIG. 2;
FIG. 4 is a flowchart in an exemplary embodiment based on step S230 of FIG. 2;
FIG. 5 schematically illustrates a flow chart of a method of training a face representation generation model according to another embodiment of the present disclosure;
FIG. 6 is a flowchart in an exemplary embodiment based on step S240 of FIG. 2;
FIG. 7 schematically illustrates a flow chart of a method of training a face representation generation model according to yet another embodiment of the present disclosure;
FIG. 8 is a flowchart in an exemplary embodiment based on step S230 of FIG. 2;
FIG. 9 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure;
FIG. 11 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure;
FIG. 12 is a schematic diagram of the structure of the face representation generation model in FIG. 11;
FIG. 13 is a schematic diagram illustrating test results for different face portrait models;
FIG. 14 schematically illustrates a block diagram of a training apparatus for a face representation generation model according to an embodiment of the present disclosure;
FIG. 15 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
FIG. 1 is a schematic diagram of an exemplary system architecture 100 of a training method or apparatus for a face sketch generation model to which embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, portable computers, desktop computers, wearable devices, virtual reality devices, smart homes, and so forth.
The server 105 may be a server that provides various services. For example, terminal device 103 (which may also be terminal device 101 or 102) uploads the first set of training samples to server 105. The server 105 may obtain a first set of training samples, the first set of training samples including a sample face photograph and a sample face representation; processing the sample face picture and/or the sample face portrait respectively through a neural network model to obtain first characteristic information and/or second characteristic information; processing the sample face picture through a face picture generation model to obtain a predicted face picture and predicted characteristic information thereof; determining a target loss function according to the first characteristic information and/or the second characteristic information, the predicted characteristic information, the sample face portrait and the predicted face portrait; and adjusting parameters of the face portrait generating model according to the target loss function so as to obtain the trained face portrait generating model. And feeds back the trained face portrait generation model to the terminal device 103, and the terminal device 103 can execute the task of face portrait generation based on the face portrait generation model, thereby obtaining a face portrait with high synthesis quality and recognition rate.
For another example, the server 105 may obtain a picture of a face to be processed; and processing the face photo to be processed by the training-finished face portrait generation model to obtain a target face portrait of the face photo to be processed. And feeds the target face portrait back to the terminal device 101 (or terminal device 102 or 103), so as to be used for browsing the target face portrait corresponding to the to-be-processed face photo based on the content displayed on the terminal device 101.
For another example, the server 105 may process each photo to be recognized in the photo library to be recognized according to the trained face portrait generation model to obtain a photo library to be recognized; acquiring an image to be identified; and matching the to-be-recognized portrait in the to-be-recognized portrait library based on the to-be-recognized portrait to obtain a target portrait so as to determine a target photo in the to-be-recognized photo library according to the target portrait. And feeds the target photo back to the terminal device 101 (or the terminal device 102 or 103) so as to be used for browsing the target photo corresponding to the portrait to be recognized based on the content displayed on the terminal device 101.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The face image compositing scheme proposed in the related art emphasizes only one of the compositing quality and the recognition rate of the face sketch image, and does not consider the effects of both. In addition, the related art solution cannot cope with an application scenario with insufficient training data, and in this case, an ideal model effect and a better generalization capability cannot be obtained.
FIG. 2 schematically illustrates a flow chart of a training method of a face portrait generation model according to one embodiment of the present disclosure. The method provided by the embodiment of the present disclosure may be processed by any electronic device with computing processing capability, for example, the server 105 and/or the terminal devices 102 and 103 in the embodiment of fig. 1 described above, and in the following embodiment, the server 105 is taken as an execution subject for example, but the present disclosure is not limited thereto.
As shown in fig. 2, a training method for a face portrait generation model provided by an embodiment of the present disclosure may include the following steps.
In step S210, a first training sample set is obtained, where the first training sample set includes a sample face photograph and a sample face portrait.
In the embodiment of the present disclosure, the sample face picture and the sample face image are in one-to-one correspondence. The sample face picture may be, for example, a front picture of a real face and the sample face representation may be a front real sketch of the real face.
In step S220, the sample face picture and/or the sample face portrait are processed by the neural network model, respectively, to obtain first feature information and/or second feature information.
In the embodiment of the present disclosure, the neural network model may be a trained model having a function of extracting facial feature information. The neural network model may include multiple layers, and each layer may output one piece of feature information. The superficial layer is used for extracting superficial layer characteristic information of the sample face photo and/or the sample face portrait, and the deep layer is used for extracting deep layer characteristic information of the sample face photo and/or the sample face portrait. For example, the sample face picture can be processed through the neural network model, and first feature information output by each layer in the neural network model is obtained. The first feature information output by each layer describes the depth feature information of each layer of the sample face photo; and processing the sample face image through the neural network model to obtain second characteristic information output by each layer in the neural network model. Wherein the second feature information output by each layer describes the depth feature information of each layer of the sample face image.
For another example, at least one layer of the neural network model may be selected as an output layer of feature information, and the neural network model is used to process the sample face picture and/or the sample face portrait respectively to obtain first feature information and/or second feature information output by the output layer of each feature information.
In step S230, the sample face picture is processed by the face image generation model to obtain a predicted face image and its predicted feature information.
In the embodiment of the present disclosure, the face portrait generation model may adopt a neural network model structure. For example, a U-type network can be adopted as the structure of the face picture generation model. The U-network comprises two parts: a feature extraction section and an upsampling section. And the feature information output by each layer in the feature extraction part is spliced with the output of each layer in the up-sampling part to serve as the output feature information of each layer in the up-sampling part. The present disclosure may also adopt other forms of model structures, and the present disclosure is not limited to this specifically.
In the disclosed embodiment, predicting a face representation may generate a final output of a model for the face representation. The predicted feature information may be intermediate feature information output from each layer of the face representation generation model.
In an exemplary embodiment, the number of the predicted feature information may be the same as the total number of the first feature information and/or the second feature information. For example, when the first feature information includes 3 first features and the second feature information includes 3 second features, the predicted feature information includes 3+3 — 6 predicted features.
In step S240, a target loss function is determined based on the first feature information and/or the second feature information, the predicted feature information, the sample face representation, and the predicted face representation.
In the embodiment of the present disclosure, the first feature information and/or the second feature information describe depth feature information of each layer of the sample face photo and/or the sample face portrait. The first characteristic information and/or the second characteristic information can be used as supervision knowledge to calculate a first loss function between the first characteristic information and/or the second characteristic information and the predicted characteristic information; and using the sample face image as supervised knowledge to calculate a second loss function between the sample face image and the predicted face image, and determining a target loss function according to the first loss function and the second loss function.
In step S250, parameters of the face portrait generation model are adjusted according to the target loss function to obtain a trained face portrait generation model.
In the embodiment of the present disclosure, steps S220 to S250 may be looped to implement a training process of loop iteration.
According to the training method for the face portrait generating model provided by the embodiment of the disclosure, when the face portrait generating model is trained through the first training sample set, the sample face photos and/or the sample face portraits in the first training sample set are respectively processed through the neural network model, and the first characteristic information and/or the second characteristic information which can express the depth characteristics of the sample face photos and the sample face portraits can be obtained. And the first characteristic information and/or the second characteristic information are/is used as supervision knowledge to guide the prediction characteristic information output by the face portrait generating model according to the supervision knowledge, a target loss function is determined according to a guidance result by combining the sample face portrait with the prediction face portrait, the mesomorphic depth characteristic of the face portrait generating model can be guided, and then the parameters of the face portrait generating model are adjusted according to the target loss function to obtain the trained face portrait generating model, so that the synthesis quality and the recognition rate of the face portrait generating model are improved at the same time.
Fig. 3 is a flowchart based on step S220 of fig. 2 in an exemplary embodiment.
As shown in fig. 3, step S220 in the above-mentioned embodiment of fig. 2 may further include the following steps.
In step S221, the sample face picture is processed through the neural network model, so as to obtain output feature information of k layers in the neural network model, where k is an integer greater than 1, as the first feature information.
In embodiments of the present disclosure, the neural network model may include multiple layers. The k-layer in the neural network model may be selected as an output layer of the first feature information. For example, the neural network model may be a 5-layer model, and 3 layers thereof may be selected as output layers of the first feature information.
In step S222, the sample face image is processed by the neural network model, and output feature information of k layers in the neural network model is obtained as second feature information.
In the embodiment of the present disclosure, the generation manner of the second characteristic information may be similar to the generation manner of the first characteristic information in step S221, and is not described herein again.
Fig. 4 is a flowchart in an exemplary embodiment based on step S230 of fig. 2.
As shown in fig. 4, step S230 in the above-mentioned embodiment of fig. 2 may further include the following steps.
In step S2311, a sample face picture is processed by the face picture generation model to obtain a predicted face picture.
In step S2312, output feature information of 2k layers in the face image generation model is determined as predicted feature information.
In the embodiment of the present disclosure, the face portrait generating model may include 2k intermediate layers, and may also include intermediate layers with a number of layers greater than 2k, which is not limited in the present disclosure. When the total number of the middle layers of the face portrait generation model is greater than 2k layers, the 2k layers can be selected as output layers of the predicted characteristic information. For example, the 2k layer may be selected randomly or based on empirical data, which is not particularly limited by the present disclosure.
FIG. 5 schematically illustrates a flow chart of a training method of a face portrait generation model according to another embodiment of the present disclosure.
As shown in FIG. 5, the training method for generating a model of a face portrait based on the above embodiment further includes the following steps.
In step S510, the number and size parameters of convolution kernels of the nth layer and the 2k + 1-nth layer in the 2k layer of the face portrait creation model are determined according to the number of channels and size information of the output information of the nth layer in the k layers of the neural network model, where n is an integer greater than 0 and less than or equal to k.
In the embodiment of the disclosure, in the k layers of the neural network model, the first feature information and/or the second feature information output by each layer may have different channel numbers and size information. The number of channels and the size information of the feature information output by each layer can be determined according to the number and the size parameters of the convolution kernels of the layer. The size parameter of the convolution kernel may be, for example, the size of the convolution kernel, the step size, etc. The step determines the channel number and the size parameter of the convolution kernel of the corresponding layer of the 2k layers in the face portrait generation model according to the channel number and the size information of the output information of the nth layer in the k layers, and can ensure that each piece of predicted feature information and the corresponding first feature information and/or second feature information have the same channel number and size information.
In an exemplary embodiment, when k is 3, the number of channels and the size parameter of the convolution kernels of the 1 st layer and the 6 th layer in the 6 layers of the face portrait generation model may be determined according to the number of channels and the size information of the output information of the 1 st layer (when n is 1) of the 3 layers in the neural network model; determining the channel number and the size parameter of convolution kernels of a 2 nd layer and a 5 th layer in 6 layers of a face portrait generation model according to the channel number and the size information of output information of a 2 nd layer (when n is 2) of 3 layers in the neural network model; and determining the channel number and the size parameter of convolution kernels of the 3 rd layer and the 4 th layer in the 6 layers of the face portrait generation model according to the channel number and the size information of the output information of the 3 rd layer (when n is 3) of the 3 layers in the neural network model.
Fig. 6 is a flowchart in an exemplary embodiment based on step S240 of fig. 2.
As shown in fig. 6, step S240 in the above embodiment of fig. 2 may further include the following steps.
In step S241, a first loss function is determined based on the predicted feature information, the first feature information, and/or the second feature information.
In the disclosed embodiment, the first loss function may be determined by:
Figure BDA0002291587060000111
wherein L is1Is a first loss function, x is a sample face image, γjRepresenting the jth first characteristic information and/or second characteristic information, Gj(x) Indicating the jth predicted feature information.
In step S242, a second penalty function is determined based on the predicted face representation and the sample face representation.
In the disclosed embodiment, the second loss function may be determined by:
Figure BDA0002291587060000121
wherein L is2For the second loss function, x is the sample face image, G (x) represents the sample face image, and y represents the predicted face image.
In step S243, the first loss function and the second loss function are weighted and summed to obtain a target loss function.
In the embodiment of the present disclosure, the weights of the first loss function and the second loss function may be empirically determined, so that the first loss function and the second loss function are weighted and summed based on the weights to obtain the target loss function. In a special case, the weight of the first loss function and the second loss function may be 1. In this case, the first loss function and the second loss function may be subjected to ordinary summation to obtain the target loss function.
FIG. 7 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure.
As shown in fig. 7, the training method for generating a model of a face portrait based on the above embodiment further includes the following steps.
In the embodiment of the disclosure, the neural network model is a visual geometry group network model. The VGG (Visual Geometry Group Network) is composed of 5 convolutional layers, 3 fully-connected layers, and an output layer (softmax). The layers are separated by using a maximum pooling layer, and all the active units of the hidden layers adopt Linear rectification functions (ReLU). The visual geometry swarm network model proves that the depth increase of the convolutional neural network and the use of a small convolutional kernel have great effect on the final classification recognition effect of the network.
In step S710, a second training sample set is obtained, where the second training sample set includes a sample image with a human face and an annotated face image thereof.
In the embodiment of the present disclosure, the annotated face image of each sample image with a face may be, for example, another image with the same face, and may also be, for example, a face identification code. The face identification code is used to distinguish different sample images.
In step S720, the visual geometry group network model is trained through the second training sample set, so as to obtain the visual geometry group network model with the face recognition function.
In the embodiment of the disclosure, the visual geometry cluster network model with the face recognition function can be used for extracting feature information of a face in a face image. The face feature information of different depths can be obtained based on the output of different layers of the visual geometric cluster network model.
Fig. 8 is a flowchart in an exemplary embodiment based on step S230 of fig. 2.
As shown in fig. 8, step S230 in the above-mentioned embodiment of fig. 2 may further include the following steps.
In the embodiment of the disclosure, the face portrait generation model comprises k +1 sequentially connected convolution layers and k-1 sequentially connected transposed convolution layers.
In step S2321, the sample face picture is processed by k +1 sequentially connected convolution layers, so as to obtain k +1 prediction feature information respectively output by the k +1 sequentially connected convolution layers.
In step S2322, the m-th prediction feature information output by the previous layer of the m-th transposed convolutional layer is processed by the m-th transposed convolutional layer of the k-1 sequentially connected transposed convolutional layers, the processing result and the k-m-th prediction feature information are spliced, and the splicing result is subjected to convolution processing, so as to obtain the m-th prediction feature information output by the m-th transposed convolutional layer, where m is an integer greater than 0 and less than k.
In the embodiment of the present disclosure, when k is 3, the face portrait generating model includes 4 sequentially connected convolution layers and 2 sequentially connected transposed convolution layers. The method comprises the steps of processing prediction characteristic information output by a previous layer of the 1 st transposed convolutional layer through a 1 st (m is 1) transposed convolutional layer in 2 sequentially connected transposed convolutional layers, splicing a processing result and the 2 nd prediction characteristic information, and performing convolution processing on the splicing result to obtain a 5 th prediction characteristic information output by the 1 st transposed convolutional layer, wherein m is an integer larger than 0 and smaller than k.
FIG. 9 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure.
As shown in fig. 9, the training method for generating a face portrait according to this embodiment may include the following steps.
In step S910, a picture of a face to be processed is acquired.
The disclosed embodiments may be applied to digital entertainment scenarios, such as an application scenario that provides a user with a head portrait of a user in a social account that converts a face photograph into a face sketch representation.
In step S920, the face photo to be processed is processed through the trained face portrait generating model, so as to obtain a target face portrait of the face photo to be processed.
FIG. 10 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure.
As shown in fig. 10, the training method for generating a face portrait according to this embodiment may include the following steps.
In step S1010, each photo to be recognized in the photo library to be recognized is processed according to the trained face portrait generation model, so as to obtain a photo library to be recognized.
The embodiment of the disclosure can be applied to application scenes in the criminal investigation field. For example, the photo library to be identified may be, for example, a citizenship photo library. The portrait database to be identified obtained by the step can be a citizen portrait database.
In step S1020, an image to be recognized is acquired.
In the embodiment of the disclosure, when the method is applied to an application scene in the criminal investigation field, the portrait to be identified can be obtained from a criminal suspect under the cooperation of a painter and a witness, for example.
In step S1030, matching is performed in the portrait library to be recognized based on the portrait to be recognized, and a target portrait is obtained, so as to determine a target photo in the photo library to be recognized according to the target portrait.
In the embodiment of the disclosure, the similarity between the image to be recognized and each image in the image library to be recognized can be calculated to determine the image with the maximum similarity to the image to be recognized as the target image, and the corresponding picture of the target image in the image library to be recognized is the target picture. When the method is applied to an application scene in the criminal investigation field, the target photo can be matched with the obtained photo of the criminal suspect.
FIG. 11 schematically illustrates a flow chart of a training method of a face portrait generation model according to yet another embodiment of the present disclosure.
As shown in fig. 11, the training method for generating a face portrait according to this embodiment may include the following steps.
In step S1110, a first training sample set is obtained, where the first training sample set includes a sample face photograph and a sample face portrait.
In the embodiment of the present disclosure, the photo-portrait image pair in the sample database may be divided to obtain a first training sample set and a first testing sample set. The first set of training samples may comprise at least one sample face photograph-sample face portrait image pair and the first set of test samples may comprise at least one sample face photograph-sample face portrait image pair. The first set of test samples may be used to test a trained face sketch generation model.
In step S1120, the sample face photo and the sample face portrait are processed by the visual geometry group network model, respectively, to obtain first feature information and second feature information.
In the embodiment of the disclosure, k layers in the visual geometry group network model are output layers of the feature information, and k is an integer greater than 1. For example, the sample face photo may be processed by the visual geometry group network model to obtain k pieces of first feature information according to k layers in the visual geometry group network model; and processing the sample face portrait through the visual geometric cluster network model so as to obtain k pieces of second characteristic information according to k layers in the visual geometric cluster network model.
In step S1130, the sample face picture is processed by the face image generation model to obtain a predicted face image and prediction feature information thereof.
In the embodiment of the present disclosure, the face portrait generation model may be a U-type network. The predicted feature information may be obtained from the 2k layer in the U-type network.
FIG. 12 is a schematic diagram of the structure of the face image generation model in FIG. 11. As shown in FIG. 12, the face image generation model is a U-shaped network. The face portrait generation model 1210 includes 4 convolution layers: 1211 to 1214, and 2 transposed convolution layers: 1215. 1216. The other convolutional layers and transposed convolutional layers included in the face image do not relate to a specific technical scheme, and are not described herein again.
As shown in figure 12, convolution layers 1211 to 1214 in the face image generation model 1210 output predicted feature information G1(x) To G4(x) The transposed convolutional layers 1215 to 1216 output prediction characteristic information G, respectively5(x) To G6(x) In that respect Wherein the first characteristic information includes γ1、γ2、γ3. The second characteristic information includes γ4、γ5、γ6. It should be noted that three layers (k ═ 3) of the neural network model 1220 shown in fig. 12 are used to output the first feature information or the second feature information, but the present disclosure does not particularly limit the total number of layers of the neural network model 1220, and the total number of layers of the neural network model may be 3 layers, 4 layers, 5 layers, or the like.
In an exemplary embodiment, the 2k layers (6 layers in fig. 12) in the face sketch generation model 1210 for outputting the predicted feature information should be consistent with the number and size of channels of the output information of the k layers in the neural network model 1220 for outputting the first feature information or the second feature information. That is, 1211, 1212, 1213, 1214, 1215, 1216 in the face representation generation model 1210 should be consistent with the number and size of channels of output information of 1221, 1222, 1223, 1222, 1221, respectively, of the neural network model 1220.
In step S1140, a first loss function is determined based on the predicted feature information, the first feature information, and the second feature information.
In the embodiment of the present disclosure, the first loss function may be calculated as shown in formula (1). The first loss function can measure the difference between the intermediate state depth feature of the face portrait generation model and the intermediate state depth feature of the visual geometry group network model.
In step S1150, a second loss function is determined based on the predicted face representation and the sample face representation.
In the embodiment of the present disclosure, the second loss function may be calculated as shown in formula (2). Wherein the second penalty function is capable of scaling the difference between the generated predicted face representation and the sample face representation.
In step S1160, the first loss function and the second loss function are summed to obtain the target loss function.
In the embodiment of the present disclosure, the target loss function may be calculated by the following formula:
L=L1+L2(3)
where L is the objective loss function.
In step S1170, parameters of the face portrait creation model are adjusted according to the target loss function to obtain a trained face portrait creation model.
In step S1180, the first test sample set tests the trained face portrait creation model to obtain a test face portrait.
In step S1190, the synthesis quality and recognition rate of the face image are evaluated.
FIG. 13 is a schematic diagram of test results for different face portrait models. As shown in fig. 13, the first column is a test face picture, the second to seventh columns are a depth Image feature learning network (DGFL), a full convolution network (End-to-End photo-segmentation function, FCN), a pixel-to-pixel model (Image-to-Image with a pixel adaptive network, pix2pix), a generated confrontation network (acquired Image-to-Image transformation cycle-dependent adaptation network) based on ring consistent loss, a picture gan, a multi-scale confrontation network (High-quality Image-segmentation synthesis-related interaction network), and a test result of the face picture is disclosed as a first test result, and the second to seventh columns are a test result of the face picture, and the test result is obtained as a second test result of the PS. As can be seen from fig. 13, the face portrait generating model obtained by the training method of the face portrait generating model according to the present disclosure can obtain a face portrait with more reasonable texture and better synthesis quality. Table 1 schematically shows the recognition rates of different face representation models.
TABLE 1
Figure BDA0002291587060000171
As can be seen from table 1, the training method for the face portrait generating model of the present disclosure uses the depth features extracted by the neural network model as the training knowledge for supervising the face portrait generating model, so that the intermediate depth features of the face portrait generating network have higher discriminativity, thereby improving the recognition rate of the generated face portrait.
The training method for generating the model by the face portrait in the embodiment of the disclosure overcomes the problems that the prior art lacks guidance of extra information, and the learned model effect and generalization capability are poor under the condition of insufficient training data, can better learn the mapping relation between the face photos and the face portrait, and further improves the synthesis quality of the generated face portrait.
According to the training method of the face portrait generating model in the embodiment of the disclosure, the neural network model is used for extracting the depth features to be used as training knowledge for monitoring the face portrait generating model, so that the mesomorphic depth features of the face portrait generating model have higher identification, the problem that the existing scheme cannot give consideration to both the quality and the identification rate of the generated face portrait is solved, the synthesis quality of the face portrait can be improved, sufficient identity identification information can be kept, and the identification rate is improved.
The following describes embodiments of the apparatus of the present disclosure, which may be used to perform the above-mentioned training method for face portrait generation model of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the training method for generating a face portrait model described above in the present disclosure.
FIG. 14 schematically illustrates a block diagram of a training apparatus for a face sketch generation model according to an embodiment of the present disclosure.
Referring to fig. 14, an apparatus 1400 for training a face portrait generation model according to an embodiment of the present disclosure may include: a training sample acquisition module 1410, a feature information determination module 1420, a prediction result generation module 1430, a loss function generation module 1440, and a neural network training module 1450.
In the training apparatus 1400 of the face sketch generation model, the training sample obtaining module 1410 may be configured to obtain a first training sample set, where the first training sample set includes a sample face photograph and a sample face sketch.
The feature information determination module 1420 may be configured to process the sample face photograph and/or the sample face representation, respectively, via a neural network model to obtain the first feature information and/or the second feature information.
In an exemplary embodiment, the feature information determination module 1420 may include a first feature determination unit and/or a second feature extraction unit. The first feature determining unit may be configured to process the sample face picture through a neural network model, to obtain output feature information of k layers in the neural network model as first feature information, where k is an integer greater than 1. The second feature determination unit may be configured to process the sample face image through the neural network model, and obtain output feature information of k layers in the neural network model as second feature information.
The prediction result generation module 1430 may be configured to process the sample face picture through the face sketch generation model to obtain a predicted face sketch and predicted feature information thereof.
In an exemplary embodiment, the prediction result generation module 1430 may include a predicted portrait generation unit and a predicted feature generation unit. Wherein the predicted face representation generating unit may be configured to process the sample face photograph by means of a face representation generating model, obtaining a predicted face representation. The prediction feature generation unit may be configured to determine output feature information of a 2k layer in the face sketch generation model as the prediction feature information.
In an exemplary embodiment, the face representation generation model includes k +1 sequentially connected convolutional layers and k-1 sequentially connected transposed convolutional layers. The prediction result generation module 1430 may include a convolution processing unit and a transposed convolution processing unit. The convolution processing unit may be configured to process the sample face picture through k +1 sequentially connected convolution layers, and obtain k +1 prediction feature information output by the k +1 sequentially connected convolution layers respectively. The transposed convolution processing unit may be configured to process the prediction feature information output by a previous layer of the mth transposed convolution layer by an mth transposed convolution layer of the k-1 sequentially connected transposed convolution layers, splice the processing result and the kt-m prediction feature information, and perform convolution processing on the spliced result to obtain an mth prediction feature information output by the mth transposed convolution layer, where m is an integer greater than 0 and less than k.
The loss function generation module 1440 may be configured to determine a target loss function based on the first and/or second feature information, the predicted feature information, the sample face representation, and the predicted face representation.
In an exemplary embodiment, the loss function generation module 1440 may include a first loss function unit, a second loss function unit, and a target loss function unit. Wherein the first loss function unit may be configured to determine the first loss function based on the predicted characteristic information, the first characteristic information, and/or the second characteristic information. The second loss function unit may be configured to determine a second loss function based on the predicted face representation and the sample face representation. The target loss function unit may be configured to perform a weighted summation of the first loss function and the second loss function to obtain the target loss function.
The neural network training module 1450 may be configured to adjust parameters of the face representation generation model according to the target loss function to obtain a trained face representation generation model.
In an exemplary embodiment, the training apparatus 1400 for generating a face portrait model may further include a convolutional layer determination module. The convolutional layer determining module can be configured to determine the number and size parameters of convolutional kernels of an nth layer and a 2k +1-n layer in a 2k layer of the face portrait generating model according to the channel number and size information of output information of the nth layer in the k layers of the neural network model, wherein n is an integer greater than 0 and less than or equal to k.
In an exemplary embodiment, the neural network model is a deep convolutional network. The training apparatus 1400 for generating a face portrait may further include a second training sample obtaining module and a second neural network training module. The second training sample acquisition module may be configured to acquire a second training sample set, where the second training sample set includes a sample image with a human face and an annotated human face image thereof. The second neural network training module can be configured to train the deep convolutional network through a second training sample set to obtain the deep convolutional network with the face recognition function.
In an exemplary embodiment, the training device 1400 for the face portrait generation model may further include a face photo acquisition module and a face portrait generation module. The face photo acquisition module can be configured to acquire a face photo to be processed. The face portrait generating module can be configured to process the face photo to be processed through the trained face portrait generating model to obtain a target face portrait of the face photo to be processed.
In an exemplary embodiment, the training device 1400 for the face sketch generation model may further include a sketch library generation module, a to-be-recognized sketch acquisition module, and a sketch matching module. The image library generation module can be configured to process each photo to be recognized in the photo library to be recognized according to the trained face image generation model to obtain the image library to be recognized. The image to be recognized acquisition module can be configured to acquire an image to be recognized. The portrait matching module can be configured to match in a portrait library to be recognized based on the portrait to be recognized, obtain a target portrait, and determine a target photo in the photo library to be recognized according to the target portrait.
When the training device for the face portrait creation model provided by the embodiment of the disclosure trains the face portrait creation model through the first training sample set, the neural network model is used for respectively processing the sample face photos and/or the sample face portraits in the first training sample set, and the first feature information and/or the second feature information which can express the depth features of the sample face photos and the sample face portraits can be obtained. And the first characteristic information and/or the second characteristic information are/is used as supervision knowledge to guide the prediction characteristic information output by the face portrait generating model according to the supervision knowledge, a target loss function is determined according to a guidance result by combining the sample face portrait with the prediction face portrait, the mesomorphic depth characteristic of the face portrait generating model can be guided, and then the parameters of the face portrait generating model are adjusted according to the target loss function to obtain the trained face portrait generating model, so that the synthesis quality and the recognition rate of the face portrait generating model are improved at the same time.
FIG. 15 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure. It should be noted that the computer system 1500 of the electronic device shown in fig. 15 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 15, the computer system 1500 includes a Central Processing Unit (CPU)1501 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1502 or a program loaded from a storage section 1508 into a Random Access Memory (RAM) 1503. In the RAM 1503, various programs and data necessary for system operation are also stored. The CPU1501, the ROM 1502, and the RAM 1503 are connected to each other by a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.
The following components are connected to the I/O interface 1505: an input portion 1506 including a keyboard, a mouse, and the like; an output portion 1507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1508 including a hard disk and the like; and a communication section 1509 including a network interface card such as a LAN card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the internet. A drive 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1510 as necessary, so that a computer program read out therefrom is installed into the storage section 1508 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1509, and/or installed from the removable medium 1511. When the computer program is executed by a Central Processing Unit (CPU)1501, various functions defined in the system of the present application are executed.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units may also be disposed in a processor. Wherein the names of such modules and/or units do not in some way constitute a limitation on the modules and/or units themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (12)

1. A training method for a face portrait generation model is characterized by comprising the following steps:
acquiring a first training sample set, wherein the first training sample set comprises a sample face photo and a sample face portrait;
processing the sample face picture and/or the sample face portrait respectively through a neural network model to obtain first characteristic information and/or second characteristic information;
processing the sample face picture through a face picture generation model to obtain a predicted face picture and predicted characteristic information thereof;
determining a target loss function according to the first characteristic information and/or the second characteristic information, the predicted characteristic information, the sample face portrait and the predicted face portrait;
and adjusting parameters of the face portrait generating model according to the target loss function so as to obtain the trained face portrait generating model.
2. The method of claim 1, wherein the processing the sample face photograph and/or the sample face representation, respectively, via a neural network model to obtain first feature information and/or second feature information comprises:
processing the sample face picture through the neural network model to obtain output characteristic information of a k layer in the neural network model as the first characteristic information, wherein k is an integer larger than 1; and/or
And processing the sample face portrait by the neural network model to obtain output characteristic information of a k layer in the neural network model as the second characteristic information.
3. The method of claim 2, wherein processing the sample face image via a face image generation model to obtain a predicted face image and predicted feature information thereof comprises:
processing the sample face picture through the face picture generation model to obtain a predicted face picture; and
and determining output characteristic information of a 2k layer in the face portrait generation model as the prediction characteristic information.
4. The method of claim 3, further comprising:
and determining the number and size parameters of convolution kernels of the nth layer and the 2k +1-n layers in the 2k layer of the face portrait generation model according to the channel number and size information of the output information of the nth layer in the k layers of the neural network model, wherein n is an integer which is greater than 0 and less than or equal to k.
5. The method of claim 3, wherein determining an objective loss function based on the first and/or second feature information, the predicted feature information, the sample face representation, and the predicted face representation comprises:
determining a first loss function according to the predicted characteristic information, the first characteristic information and/or the second characteristic information;
determining a second loss function from the predicted face representation and the sample face representation;
and carrying out weighted summation on the first loss function and the second loss function to obtain the target loss function.
6. The method of claim 3, wherein said face representation generation model comprises k +1 sequentially connected convolutional layers and k-1 sequentially connected transposed convolutional layers; processing the sample face picture through a face picture generation model to obtain a predicted face picture and predicted characteristic information thereof, wherein the method comprises the following steps:
processing the sample face photo through the k +1 sequentially connected convolution layers to obtain k +1 prediction characteristic information respectively output by the k +1 sequentially connected convolution layers;
processing the prediction characteristic information output by the previous layer of the mth transposed convolution layer through the mth transposed convolution layer in the k-1 sequentially connected transposed convolution layers, splicing the processing result and the kt-m prediction characteristic information, and performing convolution processing on the splicing result to obtain the mth prediction characteristic information output by the mth transposed convolution layer, wherein m is an integer larger than 0 and smaller than k.
7. The method of claim 1, wherein the neural network model is a visual geometry group network model; wherein the method further comprises:
acquiring a second training sample set, wherein the second training sample set comprises a sample image with a human face and an annotated human face image thereof;
and training the visual geometric cluster network model through the second training sample set to obtain the visual geometric cluster network model with a face recognition function.
8. The method of claim 1, further comprising:
acquiring a picture of a face to be processed;
and processing the face photo to be processed through the trained face portrait generation model to obtain a target face portrait of the face photo to be processed.
9. The method of claim 1, further comprising:
processing each photo to be recognized in a photo library to be recognized according to the trained face portrait generation model to obtain a photo library to be recognized;
acquiring an image to be identified;
and matching the to-be-recognized portrait in the to-be-recognized portrait library based on the to-be-recognized portrait to obtain a target portrait so as to determine a target photo in the to-be-recognized photo library according to the target portrait.
10. An apparatus for training a face portrait model, comprising:
a training sample acquisition module configured to acquire a first training sample set, the first training sample set including a sample face photograph and a sample face portrait;
the characteristic information determining module is configured to process the sample face photo and/or the sample face portrait respectively through a neural network model to obtain first characteristic information and/or second characteristic information;
the prediction result generation module is configured to process the sample face picture through a face picture generation model to obtain a predicted face picture and prediction characteristic information thereof;
a loss function generation module configured to determine a target loss function from the first and/or second feature information, the predicted feature information, the sample face representation, and the predicted face representation;
and the neural network training module is configured to adjust parameters of the face portrait generating model according to the target loss function so as to obtain the trained face portrait generating model.
11. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
12. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-9.
CN201911182281.6A 2019-11-27 2019-11-27 Training method and device for face portrait generation model and related equipment Active CN111046757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911182281.6A CN111046757B (en) 2019-11-27 2019-11-27 Training method and device for face portrait generation model and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911182281.6A CN111046757B (en) 2019-11-27 2019-11-27 Training method and device for face portrait generation model and related equipment

Publications (2)

Publication Number Publication Date
CN111046757A true CN111046757A (en) 2020-04-21
CN111046757B CN111046757B (en) 2024-03-05

Family

ID=70233901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911182281.6A Active CN111046757B (en) 2019-11-27 2019-11-27 Training method and device for face portrait generation model and related equipment

Country Status (1)

Country Link
CN (1) CN111046757B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860782A (en) * 2020-07-15 2020-10-30 西安交通大学 Triple multi-scale CycleGAN, fundus fluorography generation method, computer device, and storage medium
CN113822790A (en) * 2021-06-03 2021-12-21 腾讯云计算(北京)有限责任公司 Image processing method, device, equipment and computer readable storage medium
CN114639001A (en) * 2022-04-22 2022-06-17 武汉中科通达高新技术股份有限公司 Training method and recognition method of face attribute recognition network and related equipment
CN116662598A (en) * 2023-07-27 2023-08-29 北京全景智联科技有限公司 Character portrait information management method based on vector index and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427939A (en) * 2018-03-30 2018-08-21 百度在线网络技术(北京)有限公司 model generating method and device
CN109145704A (en) * 2018-06-14 2019-01-04 西安电子科技大学 A kind of human face portrait recognition methods based on face character
CN109961407A (en) * 2019-02-12 2019-07-02 北京交通大学 Facial image restorative procedure based on face similitude
WO2019128646A1 (en) * 2017-12-28 2019-07-04 深圳励飞科技有限公司 Face detection method, method and device for training parameters of convolutional neural network, and medium
CN110069992A (en) * 2019-03-18 2019-07-30 西安电子科技大学 A kind of face image synthesis method, apparatus, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019128646A1 (en) * 2017-12-28 2019-07-04 深圳励飞科技有限公司 Face detection method, method and device for training parameters of convolutional neural network, and medium
CN108427939A (en) * 2018-03-30 2018-08-21 百度在线网络技术(北京)有限公司 model generating method and device
CN109145704A (en) * 2018-06-14 2019-01-04 西安电子科技大学 A kind of human face portrait recognition methods based on face character
CN109961407A (en) * 2019-02-12 2019-07-02 北京交通大学 Facial image restorative procedure based on face similitude
CN110069992A (en) * 2019-03-18 2019-07-30 西安电子科技大学 A kind of face image synthesis method, apparatus, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高彦;朱明瑞;: "基于岭回归与最近邻搜索的人脸画像合成算法", 电子科技, no. 06 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860782A (en) * 2020-07-15 2020-10-30 西安交通大学 Triple multi-scale CycleGAN, fundus fluorography generation method, computer device, and storage medium
CN111860782B (en) * 2020-07-15 2022-04-22 西安交通大学 Triple multi-scale CycleGAN, fundus fluorography generation method, computer device, and storage medium
CN113822790A (en) * 2021-06-03 2021-12-21 腾讯云计算(北京)有限责任公司 Image processing method, device, equipment and computer readable storage medium
CN113822790B (en) * 2021-06-03 2023-04-21 腾讯云计算(北京)有限责任公司 Image processing method, device, equipment and computer readable storage medium
CN114639001A (en) * 2022-04-22 2022-06-17 武汉中科通达高新技术股份有限公司 Training method and recognition method of face attribute recognition network and related equipment
CN116662598A (en) * 2023-07-27 2023-08-29 北京全景智联科技有限公司 Character portrait information management method based on vector index and electronic equipment
CN116662598B (en) * 2023-07-27 2023-09-26 北京全景智联科技有限公司 Character portrait information management method based on vector index and electronic equipment

Also Published As

Publication number Publication date
CN111046757B (en) 2024-03-05

Similar Documents

Publication Publication Date Title
CN111930992B (en) Neural network training method and device and electronic equipment
CN113378784B (en) Training method of video label recommendation model and method for determining video label
US11663823B2 (en) Dual-modality relation networks for audio-visual event localization
CN111476871B (en) Method and device for generating video
CN111046757B (en) Training method and device for face portrait generation model and related equipment
US11727717B2 (en) Data-driven, photorealistic social face-trait encoding, prediction, and manipulation using deep neural networks
CN110119757A (en) Model training method, video category detection method, device, electronic equipment and computer-readable medium
CN109325148A (en) The method and apparatus for generating information
CN110929780A (en) Video classification model construction method, video classification device, video classification equipment and media
CN113191495A (en) Training method and device for hyper-resolution model and face recognition method and device, medium and electronic equipment
CN110852295B (en) Video behavior recognition method based on multitasking supervised learning
CN113177450A (en) Behavior recognition method and device, electronic equipment and storage medium
CN113572981A (en) Video dubbing method and device, electronic equipment and storage medium
JP2023535108A (en) Video tag recommendation model training method, video tag determination method, device, electronic device, storage medium and computer program therefor
CN112668638A (en) Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN116935170A (en) Processing method and device of video processing model, computer equipment and storage medium
CN113515994A (en) Video feature extraction method, device, equipment and storage medium
WO2021147084A1 (en) Systems and methods for emotion recognition in user-generated video(ugv)
CN115223214A (en) Identification method of synthetic mouth-shaped face, model acquisition method, device and equipment
CN117455757A (en) Image processing method, device, equipment and storage medium
CN115082840B (en) Action video classification method and device based on data combination and channel correlation
CN113822324A (en) Image processing method and device based on multitask model and related equipment
CN116932788A (en) Cover image extraction method, device, equipment and computer storage medium
Wu et al. Saliency-guided convolution neural network–transformer fusion network for no-reference image quality assessment
CN118555423B (en) Target video generation method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment