WO2022156061A1 - 一种图像模型训练方法、装置及电子设备、存储介质 - Google Patents

一种图像模型训练方法、装置及电子设备、存储介质 Download PDF

Info

Publication number
WO2022156061A1
WO2022156061A1 PCT/CN2021/082604 CN2021082604W WO2022156061A1 WO 2022156061 A1 WO2022156061 A1 WO 2022156061A1 CN 2021082604 W CN2021082604 W CN 2021082604W WO 2022156061 A1 WO2022156061 A1 WO 2022156061A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
module
quality
target
Prior art date
Application number
PCT/CN2021/082604
Other languages
English (en)
French (fr)
Inventor
陈丹
陆进
陈斌
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022156061A1 publication Critical patent/WO2022156061A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to an image model training method, device, electronic device, and storage medium.
  • Image processing is a technology that uses computers to analyze images to achieve desired results; in the field of image processing technology, image quality score prediction is a particularly important research topic.
  • the inventor realized that with the advancement of the research on neural network models, the method of predicting the quality score of an image through the model to obtain the quality score of the image has gradually been widely recognized. It can be seen that how to obtain a model with perfect performance through model training is particularly important for the accuracy of subsequent image quality score prediction.
  • the embodiments of the present application provide an image model training method, device, electronic device, and storage medium, which are beneficial to improving the accuracy of image quality scoring performed by an image quality scoring model.
  • the embodiment of the present application provides an image model training method, the method includes:
  • the image processing model includes: a backbone network, a quality module, and one or more processing modules, the processing modules are associated with an image service task corresponding to the quality module;
  • An image quality scoring model is determined according to the quality module obtained by the joint training, and the image quality scoring model is used to determine the quality score of the input image.
  • an image model training device including:
  • an acquisition unit configured to acquire an image processing model, where the image processing model includes: a backbone network, a quality module, and one or more processing modules, the processing modules are associated with an image service task object corresponding to the quality module;
  • a processing unit configured to perform joint training on the quality module and the one or more processing modules based on an image training set
  • the processing unit is further configured to determine an image quality scoring model according to the quality module obtained by the joint training, where the image quality scoring model is used to determine the quality score of the input image.
  • an embodiment of the present application provides an electronic device, including a processor, a storage device, and a communication interface, wherein the processor, the storage device, and the communication interface are connected to each other, wherein the storage device is used to store and support terminal execution
  • the computer program of the above method includes program instructions, the processor is configured to call the program instructions, and perform the following steps: acquiring an image processing model, the image processing model includes: a backbone network, a quality module, and a In the above processing module, the processing module is associated with the image service task corresponding to the quality module; the quality module and one or more processing modules are jointly trained based on the image training set; the image quality scoring model is determined according to the quality modules obtained by the joint training.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause all The processor executes the following methods:
  • the image processing model includes: a backbone network, a quality module, and one or more processing modules, the processing modules are associated with an image service task corresponding to the quality module;
  • An image quality scoring model is determined according to the quality module obtained by the joint training, and the image quality scoring model is used to determine the quality score of the input image.
  • the image quality training of the quality module can be assisted by the processing module associated with the image service task object, and the final image quality scoring model can be obtained based on the quality modules obtained through joint training, which is conducive to improving The accuracy of the image quality scoring model for image quality scoring.
  • FIG. 1 is a schematic structural diagram of an image processing model according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a backbone network according to an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of an image model training method according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another image model training method according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of another image processing model according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image model training apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the technical solution of the present application relates to the field of artificial intelligence technology and can promote the construction of smart cities.
  • the data involved in this application such as images and/or quality scores, may be stored in a database, or may be stored in a blockchain, which is not limited in this application.
  • the training of the image evaluation model usually uses the training images marked with the quality score to train the image quality evaluation model, and the subsequently trained image quality evaluation model can directly determine the quality score of the input image.
  • the training of the current image quality evaluation model completely ignores the image quality evaluation and its corresponding image service task object, so that in the actual application process, the quality score of the input image determined by the image quality evaluation model is low in accuracy.
  • the image service task object corresponding to the image quality evaluation model for face recognition since there are many factors involved in the quality of face images, the quality is a comprehensive definition, and it is difficult to describe clearly through a single aspect.
  • the quality score of an image cannot be given an accurate definition, and the accuracy of an image quality evaluation model trained by an inaccurately labeled training image is naturally greatly reduced.
  • the image processing model includes a backbone network, quality module, and one or more processing modules, the processing modules are associated with the image service task object corresponding to the quality module.
  • the image processing model may include a backbone network, a quality module, and m (m is an integer greater than 0, such as 1, 2, 3, etc.) processing modules.
  • the backbone network is mainly used to extract the image features of the input image, obtain the initial feature map, and input the initial feature map into the quality module and each processing module.
  • the backbone network may include multiple residual block structures, and the network structure of each residual block may be shown in FIG. 2 .
  • Each processing module corresponds to an image service task object.
  • the corresponding processing module can be a face recognition module, which is used to perform face recognition on the input face image to determine the input Which user (that is, the target user) the face in the face image is the face of, the output result may be the identification (eg, ID) of the target user.
  • the network structure of the face recognition module can be: take the ResNext50 network, remove the last global average pooling layer (Average Pooling layer) of the ResNext50 network, connect two fully connected layers (fc) at the end, the first full
  • the connection layer fc1 is used to output the feature map (for the convenience of distinguishing the various feature maps mentioned in the embodiments of the present application, the feature maps here may be collectively referred to as the identification feature maps in the embodiments of the present application, and the identification feature map may be, for example, 512 dimensional features), the second fully connected layer fc2 is used to output the scores of the input face images belonging to each category, where one category corresponds to a user, and the category with the highest score is the target user to which the face in the input face image belongs. .
  • the corresponding processing module may be a living body detection module, which is used to detect whether the object included in the input image is a living body or a non-living body, where a living body can be understood as an object with vital signs ; non-living is the opposite.
  • a face image obtained by shooting a real person can be classified as a living body; a face image obtained by shooting a photo containing a real person can be classified as a non-living body.
  • the network structure of the living body detection module may be: using the MobileNetV3-Small network structure.
  • the feature maps output by the penultimate layer of convolution in MobileNetV3-Small may be collectively referred to as detection feature maps in the embodiments of this application for ease of distinction.
  • the detection feature maps may be, for example, 1024-dimensional features, the last layer in MobileNetV3-Small.
  • the convolutional layer is used to perform binary classification processing according to the detection feature map input from the previous layer, and determine the scores of the corresponding objects of the input image belonging to the two categories of living and non-living, and the category with the highest score is the living category to which the corresponding object of the input image belongs ( living or non-living).
  • the quality module is used to evaluate the quality score of the input image, and the output result can be the quality score of the input image.
  • the model structure of the quality module can be as follows: using the depthwise separable convolution in the lightweight network MobileNet (consisting of two parts: Depthwise convolution and Pointwise convolution), and after 5 depthwise separable convolution layers, the final result is obtained.
  • the feature map is then followed by a global average pooling layer and a fully connected layer.
  • Sigmoid the Sigmoid function is often used as the activation function of the neural network to map the variables between 0 and 1. It is converted into a value between 0-1, which is the quality score of a single image.
  • the quality module and each processing module in the image processing model can be jointly trained based on the image training set, and the image quality scoring model can be determined according to the quality modules obtained by the joint training,
  • the image quality scoring model is used to determine the quality score of the input image.
  • the image quality scoring model may be composed of modules in the dotted box in FIG. 1 .
  • the image quality training of the quality module can be assisted by the tasks associated with the image service task object (such as face recognition and living body detection), and the final quality module obtained based on the joint training can be obtained.
  • the image quality scoring model can make the image quality scoring model increase the correlation between the image quality score and its corresponding image service task object when performing image quality scoring, thereby improving the accuracy of the image quality scoring model for image quality scoring.
  • FIG. 1 only schematically represents the model structure of the image processing model, and does not limit the model structure of the image processing model proposed in the embodiments of the present application.
  • the above processing modules can also be extended in combination with specific image service task objects.
  • the image service task object is automated driving
  • the processing module can also include a path recognition module, Environment recognition module and so on.
  • the embodiment of the present application proposes an image model training method, which can be executed by an electronic device, and the electronic device here can be a server or a terminal, and the terminal can include but is not limited to: Tablets, Laptops, Notebooks, Desktops, and more.
  • the model training method may include the following steps S301-S303:
  • an image processing model where the image processing model includes: a backbone network, a quality module, and one or more processing modules, where the processing modules are associated with an image service task object corresponding to the quality module.
  • an image processing model may be pre-built, for example, the image processing model may be as shown in FIG. 1 .
  • S302 Jointly train the quality module and one or more processing modules based on the image training set.
  • S303 Determine an image quality scoring model according to the quality modules obtained through joint training, where the image quality scoring model is used to determine the quality score of the input image.
  • the image service task corresponding to the above processing module and the quality module can also be understood as the image service task corresponding to the final image quality scoring model, that is, the image quality scoring model evaluates the quality score of any single image, then this What image tasks will a single image with a quality score be applied to later, such as training of a face recognition model (ie, face recognition) or training of a living body detection model (ie, living body recognition), etc.
  • Using high-quality images for model training is conducive to improving the accuracy of subsequent use of the trained model.
  • a processing module associated with an image service task corresponding to the quality module can be combined to perform auxiliary supervision on the quality module, and an image quality scoring model can be constructed based on the quality module obtained by training.
  • the obtained image quality scoring results are more in line with the image quality evaluation standards of the corresponding image service tasks (for example, the factors that pay more attention to image quality in face recognition include: saturation, sharpness, illumination, posture, occlusion, strong color, exaggerated expressions, etc.), which is beneficial to improve the scoring accuracy of image quality by the image quality scoring model.
  • an image processing model can be obtained, and the image processing model includes: a backbone network, a quality module, and more than one processing module, where the processing module is associated with an image service task corresponding to the quality module;
  • the above processing modules are jointly trained; the image quality scoring model is determined according to the quality modules obtained by the joint training.
  • the image quality training of the quality module can be assisted by the processing module associated with the image service task object, and the final image quality scoring model can be obtained based on the quality modules obtained by joint training, which is conducive to improving the image quality scoring model. The accuracy of the image quality score.
  • the image model training method may include the following steps S401-S404:
  • an image processing model where the image processing model includes: a backbone network, a quality module, and one or more processing modules, where the one or more processing modules include: a face recognition module and a living body detection module.
  • S402 Perform first joint training on the face recognition module and the quality module according to the first training image set.
  • the first joint training is performed on the face recognition module and the quality module according to the first training image set, including: acquiring a target face training image corresponding to the first target user from the first training image set; wherein, the first target The user is any one of multiple users. Further, input the target face training image into the image processing model, extract the image features of the target face training image through the backbone network, obtain the first initial feature map of the target face training image, and input the first initial feature map into face recognition. Module and quality module, extract the recognition feature map through the face recognition module, and perform face recognition on the target face training image according to the recognition feature map, and determine the value loss1 of the loss function of the face recognition module according to the face recognition result.
  • the quality module to determine the quality score of the target face training image, and perform weighting processing on the recognition feature map according to the quality score to obtain a shared feature map, and calculate the value loss2 of the loss function of the quality module based on the shared feature map.
  • the direction of the first target loss value is to update the network parameters of modules other than the living body detection module in the image processing model.
  • the first target loss value is the sum of loss1 and loss2.
  • the initial image processing model is iteratively trained according to the updated network parameters, until loss1 and loss2 reach a convergence state, and a first image processing model is obtained.
  • the first training image set includes face training images corresponding to multiple users, and the face training images corresponding to each user include face training images of the first type of quality and face training images of the second type of quality.
  • a user identifies an identification (eg ID), and the face training images corresponding to each user include L1 (the L1 is an integer greater than 0) face images of different picture qualities, and the L1 is based on experiments Data is predetermined.
  • ID an identification
  • L1 is an integer greater than 0
  • Data is predetermined.
  • each ID contains more than eight pictures (that is, more than 8 different pictures of the same person), and the picture quality is different (each ID contains fuzzy, large Poor quality images such as angles and normal high quality images).
  • images with poor quality such as blur and large angles can be classified as images of the first type of quality; normal and high-quality images are classified as images of the second type of quality.
  • L1 face images with the same ID can be input for each training. face training images).
  • the network structure of the face recognition module is: take the ResNext50 network, remove the last global average pooling layer (Average Pooling layer) of the ResNext50 network, and connect two full connections at the end.
  • Layer (fc) the first fully connected layer fc1 is used to output the recognition feature map, (for example, the recognition feature map can be a 512-dimensional feature), and the second fully connected layer fc2 is used to output the input face image belonging to each category.
  • the model structure of the quality module is: using the depthwise separable convolution in the lightweight network MobileNet (consisting of two parts: Depthwise convolution and Pointwise convolution), and after 5 depthwise separable convolution layers, the final feature map is obtained , then connect to the global average pooling layer and a fully connected layer, and finally pass through Sigmoid (Sigmoid function is often used as the activation function of neural network to map variables between 0 and 1.) Converted to a value between 0-1 , which is the quality score of a single image.
  • the specific training process is as follows:
  • the first initial feature maps of all target face training images are obtained as face recognition modules and quality modules. input of.
  • the 512-dimensional feature maps of all target face training images that is, the above-mentioned recognition feature maps
  • the 512-dimensional feature map stored in the feature pool has two functions. One is to directly pass through the second fully connected layer fc2 of the face recognition module, and use the first loss function (such as softmaxloss) corresponding to the face recognition module for face recognition.
  • the supervision training of the module (this step is the training process of the usual face recognition module, which can be specifically: face recognition is performed on the target face training image according to the recognition feature map, and the face is determined according to the face recognition result through the first loss function.
  • the value of the loss function of the recognition module is loss1), and the second is reserved for the quality module to reuse.
  • All target face training images corresponding to the first target user input in one training pass through the quality module to obtain the quality scores of each target face training image, and the corresponding weighting coefficients are determined according to the quality scores of each target face training image.
  • the quality score can be directly determined as the weighting coefficient because the value of the quality score belongs to 0-1.
  • the weighting coefficient may be reassigned according to the principle that the higher the quality score corresponds to the higher the weighting coefficient, which is not specifically limited.
  • weighting processing can be performed on the recognition feature map of each target face training image according to the weighting coefficient of each target face training image, thereby obtaining a shared feature map.
  • all target face training images corresponding to the first target user include: image 1, image 2, image 3, image 4, image 5, image 6, image 7 and image 8, each target face training image,
  • the corresponding relationship between the weighting coefficient and the recognition feature map is shown in Table 1, then the shared feature map is:
  • the shared feature map can be input into the fully connected layer in the quality module, and the fully connected layer can be classified and processed according to the shared feature map to determine that the shared feature map belongs to each ID (or can be understood as Each user) score, the ID with the highest score is the final classification result.
  • the value loss2 of the current loss function of the quality module can be calculated according to the difference between the classification result and the ID of the first target user through the second loss function corresponding to the quality module. Among them, the total loss in the first joint training stage (ie, the first target loss value) is equal to the sum of the losses of the two branches of the face recognition module and the quality module (ie, the above loss1+loss2).
  • the gradient descent method can be used to optimize the total loss in the direction of reducing the first target loss value to update the network parameters of modules other than the living body detection module in the image processing model.
  • input the face training images corresponding to other users to continue the first joint training of the face recognition module and the quality module, and optimize the total loss until the loss of the two branches (that is, loss1 and loss2)) all reach the convergence state, and the first joint training is suspended.
  • S403 Perform second joint training on the living body detection module and the quality module according to the second training image set.
  • the second training image set includes in vivo training images corresponding to multiple users, and the in vivo training images corresponding to each user include in vivo training images of the first in vivo category and second in vivo training images.
  • the second joint training is performed on the living body detection module and the quality module according to the second training image set, including: acquiring the target living body training image corresponding to the second target user from the second training image set;
  • the target user is any one of multiple users.
  • Input the living body detection module and the quality module extract the detection feature map through the living body detection module, and perform live body detection on the target living body training image according to the detection feature map, determine the loss function value loss3 of the living body detection module according to the living body detection result, and call the first image
  • the quality module in the processing model determines the quality score of the target face training image, and performs weighting processing on the detection feature map according to the quality score to obtain the living body shared feature map corresponding to the target living body training image.
  • the value loss4 of the loss function of the quality module is calculated based on the living body shared feature map, and the network parameters of modules other than the backbone network and the face recognition module in the first image processing model are updated according to the direction of reducing the second target loss value,
  • the second target loss value is the sum of the loss3 and the loss4
  • the first image processing model is iteratively trained according to the updated network parameters, until the loss3 and loss4 reach a convergence state, and the second image processing model is obtained.
  • the above-mentioned first type of living body type and second type of living body type may refer to living body and non-living body, respectively.
  • the third loss function corresponding to the living body detection module can use the commonly used Softmax Loss or ArcFace loss, where both Softmax Loss and ArcFace loss are used for the second classification.
  • the second loss function corresponding to the quality module can use the commonly used SoftmaxLoss, ArcFace loss, or the triplet array function Triplet loss.
  • the second loss function corresponding to the quality module is Softmax Loss or ArcFace loss
  • the live data set can be collected from the video stream.
  • Living images and non-living images each user is required to correspond to at least 4 living or 4 non-living images.
  • each training can input in vivo training images corresponding to the same user (for example, 4 in vivo pictures + 4 non-in vivo pictures of the second target user (marked as ID1)).
  • ID1 an identification
  • each user when preparing the training data (ie the second training image set), each user can still be required to correspond to at least 4 living bodies or 4 Zhang non-living images.
  • the images input in one training may include: two living images corresponding to the second target user and Two non-living images can be used as Anchor images, the remaining 2 living images and 2 non-living images corresponding to the second target user can be used as Positive images, and four images (two living and two Zhang non-living) can be used as a Negative graph.
  • Each user corresponds to an identification (eg ID).
  • the second joint training process is similar to the above-mentioned first joint training process.
  • the second loss function corresponding to the quality module is Softmax Loss or ArcFace loss
  • the specific training process is as follows:
  • all target living training images corresponding to the second target user are input to the first image processing model (ie, the image processing model after the first joint training is completed), and after feature extraction is performed through the backbone network, the first image of all target living training images is obtained.
  • Two initial feature maps which are used as input to the liveness detection module and the quality module.
  • 1024-dimensional feature maps of all target living body training images ie, the above-mentioned detection feature maps
  • the 1024-dimensional feature map stored in the feature pool corresponding to the living body detection module has two functions.
  • One is to directly pass through the living body detection module and use the third loss function (such as softmaxloss or ArcFace loss) corresponding to the living body detection module as the living body recognition module.
  • Supervised training (this step is the usual training process of the living body detection module, specifically: determining the value of the loss function loss3 of the living body detection module according to the living body detection result through the third loss function), and secondly, it is reserved for the quality module for reuse.
  • All target living body training images corresponding to the second target user input in one training pass through the quality module to obtain the quality scores of each target living body training image, and the corresponding weighting coefficients are determined according to the quality scores of each target living body training image.
  • the quality score can be directly determined as the weighting coefficient because the value of the quality score belongs to 0-1.
  • the weighting coefficient may be reassigned according to the principle that the higher the quality score corresponds to the higher the weighting coefficient, which is not specifically limited.
  • weighting processing can be performed on the detection feature map of each target living body training image according to the weighting coefficient of each target living body training image, thereby obtaining a living body shared feature map.
  • the living body shared feature map can be input into the fully connected layer in the quality module, and the fully connected layer can be classified according to the living body shared feature map to determine that the living body shared feature map belongs to each ID ( Or it can be understood as the score of each user), and the ID with the highest score is the final classification result.
  • the value loss4 of the current loss function of the quality module can be calculated according to the difference between the classification result and the ID of the second target user through the second loss function corresponding to the quality module. Among them, the total loss in the second joint training stage (ie, the second target loss value) is equal to the sum of the losses of the two branches of the live detection module and the quality module (ie, the above loss3+loss4).
  • the gradient descent method can be used to optimize the total loss to update the network parameters of modules other than the backbone network and the face recognition module in the image processing model in the direction of reducing the second target loss value.
  • input the live training images corresponding to other users to continue the second joint training of the live detection module and the quality module to optimize the total loss until the loss of the two branches (ie loss3 and loss4) ) reach the convergence state, and the second joint training is suspended.
  • the second loss function corresponding to the quality module is Tripletloss
  • the third loss function corresponding to the living body module is ArcFace loss as an example, specifically: the characteristics in the feature pool corresponding to the living body module have two One is to directly use ArcFace loss for live detection and supervision to classify living and non-living bodies, and the other is to use the quality score as a weighting coefficient to weight the three features in Anchor, Positive, and Negative in the feature pool. Three weighted feature maps are obtained, and then the Triplet loss is calculated. The total loss in this stage is equal to the sum of the two parts of the loss.
  • the parameters of the Backbone part and the network identifying the branch will be frozen, and the gradient descent method will be used for training to optimize the total loss, until the loss of both parts no longer decreases and reaches a convergence state, then the training is stopped.
  • sequence of the first joint training and the second joint training is not limited, and the first joint training may be performed first and then the second joint training may be performed according to the above content.
  • the live detection module and the quality module can also be jointly trained first, and then the face recognition module and the quality module can be jointly trained.
  • a new image may also be input into the second image processing model, and the backbone network,
  • the face recognition module and the living body detection module perform freezing processing, and adjust the quality module in the second image processing model according to the new image to obtain the adjusted second image processing model.
  • the second image processing model may be fine-tuned according to the fixed learning rate and the new image, and subsequently an image quality scoring model may be generated based on the fine-tuned model.
  • an image quality scoring model may be generated based on the fine-tuned model.
  • the backbone network, face recognition module and live-body module are frozen, and only the quality module is fine-tuned.
  • the so-called freezing means that the corresponding network parameters are not updated during the training phase.
  • the specific fine-tuning process is: input a new image to the second image processing model, after inputting a new image, the face recognition module in the second image processing model can extract the 512-dimensional feature map, and the quality module can follow the For the quality score of the new image, the 512-dimensional feature map is weighted, and the value of the loss function is calculated according to the weighted feature map, which is recorded as loss5.
  • base_lr initial learning rate
  • the quality module and the backbone module in the second image processing model are , respectively determined as the target quality module and the target backbone network obtained after the first joint training and the second joint training, including: respectively determining the quality module and the backbone module in the adjusted second image processing model as the first The target quality module and target backbone network obtained after joint training and second joint training.
  • S404 Determine an image quality scoring model according to the quality module obtained after the first joint training and the second joint training, where the image quality scoring model is used to determine the quality score of the input image.
  • determining the image quality scoring model according to the quality modules obtained through joint training includes: determining the quality module and the backbone module in the second image processing model as obtained after the first joint training and the second joint training, respectively. Based on the target quality module and target backbone network, an image quality scoring model is constructed based on the target quality module and target backbone network.
  • Embodiments of the present application further provide a computer storage medium (or a computer-readable storage medium), where program instructions are stored in the computer storage medium, and when the program instructions are executed, they are used to implement the corresponding methods described in the foregoing embodiments. .
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • FIG. 6 is a schematic structural diagram of an image model training apparatus according to an embodiment of the present application.
  • the apparatus includes the following structure.
  • an acquisition unit 60 configured to acquire an image processing model, where the image processing model includes: a backbone network, a quality module, and one or more processing modules, the processing modules are associated with an image service task object corresponding to the quality module;
  • a processing unit 61 configured to jointly train the quality module and the one or more processing modules based on an image training set
  • the processing unit 61 is further configured to determine an image quality scoring model according to the quality module obtained by the joint training, where the image quality scoring model is used to determine the quality score of the input image.
  • the one or more processing modules include: a face recognition module and a living body detection module, the joint training includes a first joint training and a second joint training, and the image training set includes a first image training set and the second image training set, the processing unit 61 is specifically used for:
  • a second joint training is performed on the liveness detection module and the quality module according to the second training image set.
  • the first training image set includes face training images corresponding to a plurality of users, and the face training images corresponding to each user include face training images of the first type of quality and face training images of the second type of quality.
  • the face training image, the processing unit 61, is also specifically used for:
  • the target face training image corresponding to the first target user from the first training image set; wherein, the first target user is any user among multiple users;
  • the first initial feature map is input into the face recognition module and the quality module, the recognition feature map is extracted by the face recognition module, and the target face training image is subjected to human analysis according to the recognition feature map.
  • Face recognition according to the face recognition result, determine the value loss1 of the loss function of the face recognition module;
  • the quality module to determine the quality score of the target face training image, and perform weighting processing on the recognition feature map according to the quality score to obtain a shared feature map;
  • the initial image processing model is iteratively trained according to the updated network parameters, until the loss1 and the loss2 reach a convergence state, and a first image processing model is obtained.
  • the second training image set includes in vivo training images corresponding to multiple users, and the in vivo training images corresponding to each user include in vivo training images of the first in vivo category and
  • the living body training image of the second living body category, the processing unit 61 is also specifically used for:
  • the target living training image corresponding to the second target user from the second training image set; wherein, the second target user is any user among multiple users;
  • the quality module in the first image processing model to determine the quality score of the target face training image, and perform weighting processing on the detection feature map according to the quality score to obtain the corresponding target living training image.
  • the first image processing model is iteratively trained according to the updated network parameters, until the loss3 and the loss4 reach a convergence state, and a second image processing model is obtained.
  • processing unit 61 is further specifically configured to:
  • an image quality scoring model is constructed.
  • the method further includes:
  • Input a new image to the second image processing model, and freeze the backbone network, the face recognition module and the living body detection module in the second image processing model;
  • the quality module in the second image processing model is adjusted according to the new image to obtain an adjusted second image processing model.
  • the processing unit 61 is further specifically configured to:
  • the quality module and the backbone module in the adjusted second image processing model are respectively determined as the target quality module and the target backbone network obtained after the first joint training and the second joint training.
  • FIG. 7 it is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device in an embodiment of the present application includes structures such as a power supply module, and includes a processor 701 , a storage device 702 , and a communication interface 703 . Data can be exchanged between the processor 701 , the storage device 702 and the communication interface 703 , and the processor 701 implements the corresponding image model training function.
  • the storage device 702 may include a volatile memory (volatile memory), such as random-access memory (RAM); the storage device 702 may also include a non-volatile memory (non-volatile memory), such as a flash memory. Flash memory (flash memory), solid-state drive (solid-state drive, SSD), etc.; the storage device 702 may also include a combination of the above-mentioned types of memory.
  • volatile memory volatile memory
  • non-volatile memory such as a flash memory.
  • flash memory flash memory
  • solid-state drive solid-state drive, SSD
  • the storage device 702 may also include a combination of the above-mentioned types of memory.
  • the processor 701 may be a central processing unit (central processing unit, CPU). In one embodiment, the processor 701 may also be a graphics processor 701 (Graphics Processing Unit, GPU). The processor 701 may also be a combination of a CPU and a GPU. In the electronic device, multiple CPUs and GPUs may be included to perform corresponding image model training as required. In one embodiment, the storage device 702 is used to store program instructions. The processor 701 may invoke the program instructions to implement various methods as mentioned above in the embodiments of the present application.
  • the processor 701 of the electronic device invokes program instructions stored in the storage device 702 to acquire an image processing model, where the image processing model includes: a backbone network, A quality module, and one or more processing modules, the processing modules are associated with an image serving task corresponding to the quality module; the quality module and the one or more processing modules are jointly trained based on an image training set; according to the The jointly trained quality modules determine an image quality scoring model that is used to determine the quality score of the input image.
  • the image processing model includes: a backbone network, A quality module, and one or more processing modules, the processing modules are associated with an image serving task corresponding to the quality module; the quality module and the one or more processing modules are jointly trained based on an image training set; according to the The jointly trained quality modules determine an image quality scoring model that is used to determine the quality score of the input image.
  • the one or more processing modules include: a face recognition module and a living body detection module, the joint training includes a first joint training and a second joint training, and the image training set includes a first image training set and the second image training set, the processor 701 is specifically used for:
  • a second joint training is performed on the liveness detection module and the quality module according to the second training image set.
  • the first training image set includes face training images corresponding to a plurality of users, and the face training images corresponding to each user include face training images of the first type of quality and face training images of the second type of quality.
  • the face training image, the processor 701 is also specifically used for:
  • the target face training image corresponding to the first target user from the first training image set; wherein, the first target user is any user among multiple users;
  • the first initial feature map is input into the face recognition module and the quality module, the recognition feature map is extracted by the face recognition module, and the target face training image is subjected to human analysis according to the recognition feature map.
  • face recognition determine the value loss1 of the loss function of the face recognition module according to the face recognition result;
  • the quality module to determine the quality score of the target face training image, and perform weighting processing on the recognition feature map according to the quality score to obtain a shared feature map;
  • the initial image processing model is iteratively trained according to the updated network parameters, until the loss1 and the loss2 reach a convergence state, and a first image processing model is obtained.
  • the second training image set includes in vivo training images corresponding to multiple users, and the in vivo training images corresponding to each user include in vivo training images of the first in vivo category and In vivo training images of the second in vivo category, the processor 701 is further specifically used for:
  • the target living training image corresponding to the second target user from the second training image set; wherein, the second target user is any user among multiple users;
  • the quality module in the first image processing model to determine the quality score of the target face training image, and perform weighting processing on the detection feature map according to the quality score to obtain the corresponding target living training image.
  • the first image processing model is iteratively trained according to the updated network parameters, until the loss3 and the loss4 reach a convergence state, and a second image processing model is obtained.
  • the processor 701 is further specifically configured to:
  • an image quality scoring model is constructed.
  • the method further includes:
  • Input a new image to the second image processing model, and freeze the backbone network, the face recognition module and the living body detection module in the second image processing model;
  • the quality module in the second image processing model is adjusted according to the new image to obtain an adjusted second image processing model.
  • the processor 701 is further specifically configured to: The quality module and the backbone module in the adjusted second image processing model are respectively determined as the target quality module and the target backbone network obtained after the first joint training and the second joint training.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例公开了一种图像模型训练方法、装置及电子设备、存储介质,应用于人工智能技术领域,其中方法包括:获取图像处理模型,基于图像训练集对图像处理模型中质量模块和一个以上的处理模块进行联合训练,处理模块与质量模块对应的图像服务任务关联;依照联合训练得到的质量模块确定图像质量评分模型。在训练过程中可以借助与图像服务任务对象关联的处理模块对质量模块的图片质量训练进行辅助监督,并基于联合训练得到的质量模块得到最终的图像质量评分模型,有利于提高图像质量评分模型进行图像质量评分的准确度。本申请涉及区块链技术,如可将图像质量评分模型存储至区块链中,以用于图像质量评分等场景。

Description

一种图像模型训练方法、装置及电子设备、存储介质
本申请要求于2021年1月22日提交中国专利局、申请号为202110087937.7,发明名称为“一种图像模型训练方法、装置及电子设备、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种图像模型训练方法、装置及电子设备、存储介质。
背景技术
图像处理是一种采用计算机对图像进行分析,以达到所需结果的技术;在图像处理技术领域中,图像的质量得分预测是一个尤其重要的研究课题。发明人意识到,随着神经网络模型的研究推进,通过模型对图像进行质量得分预测从而得到图像的质量得分的方法逐渐受到了广泛认可。由此可见,如何通过模型训练得到性能完善的模型对后续图像质量得分预测的准确性尤其重要。
发明内容
本申请实施例提供了一种图像模型训练方法、装置及电子设备、存储介质,有利于提高图像质量评分模型进行图像质量评分的准确度。
一方面,本申请实施例提供了一种图像模型训练方法,所述方法包括:
获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务关联;
基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
另一方面,本申请实施例提供了一种图像模型训练装置,包括:
获取单元,用于获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务对象关联;
处理单元,用于基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
所述处理单元,还用于依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
再一方面,本申请实施例提供了一种电子设备,包括处理器、存储装置和通信接口,所述处理器、存储装置和通信接口相互连接,其中,所述存储装置用于存储支持终端执行上述方法的计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行如下步骤:获取图像处理模型,图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,处理模块与质量模块对应的图像服务任务关联;基于图像训练集对质量模块和一个以上的处理模块进行联合训练;依照联合训练得到的质量模块确定图像质量评分模型。
又一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行以下方法:
获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务关联;
基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
本申请实施例,在训练过程中可以借助与图像服务任务对象关联的处理模块对质量模块的图片质量训练进行辅助监督,并基于联合训练得到的质量模块得到最终的图像质量评分模型,有利于提高图像质量评分模型进行图像质量评分的准确度。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例的一种图像处理模型的结构示意图;
图2是本申请实施例的一种骨干网络的结构示意图;
图3是本申请实施例的一种图像模型训练方法的流程示意图;
图4是本申请实施例的另一种图像模型训练方法的流程示意图;
图5是本申请实施例的另一种图像处理模型的结构示意图;
图6是本申请实施例的一种图像模型训练装置的结构示意图;
图7是本申请实施例的一种电子设备的结构示意图。
具体实施方式
本申请的技术方案涉及人工智能技术领域,可推动智慧城市的建设。可选的,本申请涉及的数据如图像和/或质量得分等可存储于数据库中,或者可以存储于区块链中,本申请不做限定。
目前,对于图像评价模型的训练通常采用标注有质量得分的训练图像对图像质量评价模型进行训练,后续训练好的图像质量评价模型可以直接确定输入图像的质量得分。但目前图像质量评价模型的训练完全忽略了图像质量评价与其对应的图像服务任务对象,使得实际应用过程中,图像质量评价模型确定的输入图像的质量得分,准确性低下。以图像质量评价模型对应的图像服务任务对象为人脸识别为例,由于人脸图像质量涉及的因素较多,质量高低是一个综合性的定义,难以通过单一方面描述清楚,相应地,对训练图像的质量得分无法给出准确的定义,通过一个标注不准确的训练图像训练出的图像质量评价模型,其准确度自然大大折扣。
为了解决上述问题,增加图像质量评分与其对应图像服务任务对象的关联性,从而提高模型进行图像质量评分的准确度,本申请实施提出了一种图像处理模型,该图像处理模型包括骨干网络、质量模块、以及一个以上的处理模块,该处理模块与质量模块对应的图像服务任务对象关联。
参见图1所示:图像处理模型可包括骨干网络、质量模块、以及m(m为大于0的整数,例如1、2、3等等)个处理模块。其中,骨干网络主要用于提取输入图像的图像特征,得到初始特征图,并将初始特征图输入质量模块和各处理模块。示例性地,该骨干网络可以包括多个残差块结构,每个残差块的网络结构可以参见图2所示的。
每个处理模块对应一种图像服务任务对象,假设图像服务任务对象为人脸识别,那么对应的处理模块,可以为人脸识别模块,用于对输入的人脸图像进行人脸识别,以确定输入的人脸图像中的人脸是哪一个用户(即目标用户)的人脸,其输出结果可以为目标用户的标识(例如ID)。示例性地,人脸识别模块的网络结构可以为:取ResNext50网络,去掉ResNext50网最后的全局平均池化层(Average Pooling层),在最后接两个全连接层(fc),第一个全连接层fc1用于输出特征图(为了便于区别本申请实施例提及的各种特征图,此处的特征图在本申请实施例中可以统称为识别特征图,该识别特征图例如可以为512维特征),第二全连接层fc2用于输出输入的人脸图像属于各个类别的分数,其中,一个类别对应一个用户,分数最高的类别即为输入的人脸图像中人脸所属的目标用户。
假设图像服务任务对象为活体识别,那么对应的处理模块可以为活体检测模块,该活体检测模块用于检测输入图像所包括的对象为活体还是非活体,其中,活体可以理解为具有生命体征的对象;非活体则相反。例如,拍摄真人得到的人脸图像,可以划分为活体;拍摄包含真人照片得到的人脸图像,可以划分为非活体。示例性地,活体检测模块的网络结构可以为:采用MobileNetV3-Small网络结构。MobileNetV3-Small中倒数第二层卷积输出的特征图,为了便于区分,在本申请实施例中可以统称为检测特征图,该检测特征图例如可以为1024维特征,MobileNetV3-Small中最后一层卷积层用于依照上一层输入的检测特征图进行二分类处理,确定输入图像对应对象属于活体和非活体这两类的得分,得分最高的类别即为输入图像对应对象所属的活体类别(活体或者非活体)。
质量模块,用于评价输入图像的质量得分,输出结果可以为输入图像的质量得分。示例性地,质量模块的模型结构可以为:采用轻量级网络MobileNet中深度可分离卷积(由Depthwise卷积和Pointwise卷积两部分组成),经过5个深度可分离卷积层后得到最后的特征图,然后接全局平均池化层和一个全连接层。最后经过Sigmoid(Sigmoid函数常被用作神经网络的激活函数,将变量映射到0,1之间。)转化为0-1之间的数值,即为单张图的质量得分。
在一个实施例中,在获取到上述图像处理模型之后,可以基于图像训练集对图像处理模型中的质量模块和各处理模块进行联合训练,并依照联合训练得到的质量模块确定图像质量评分模型,该图像质量评分模型用于确定输入图像的质量得分。示例性地,该图像质量评分模型可以如图1中虚线框中的模块组成。采用这样的训练方式,在训练过程中可以借助与图像服务任务对象关联的任务(例如人脸识别和活体检测)对质量模块的图片质量训练进行辅助监督,并基于联合训练得到的质量模块得到最终的图像质量评分模型,可以使得图像质量评分模型在进行图像质量评分时,增加图像质量评分与其对应图像服务任务对象的关联性,从而提高图像质量评分模型进行图像质量评分的准确度。
需要说明的是,图1只是示意性地表征图像处理模型的模型结构,并不对本申请实施例所提出的图像处理模型的模型结构进行限定。除此以外,上述处理模块除了人脸识别模块和活体检测模块,还可以结合具体的图像服务任务对象进行相应地拓展,例如图像服务任务对象为自动化驾驶,那么处理模块还可以包括路径识别模块、环境识别模块等等。
基于上述的图像处理模型的模型结构,本申请实施例提出了一种图像模型训练方法,该方法可以由电子设备执行,此处的电子设备可以为服务器或者终端,该终端可以包括但不限于:平板电脑、膝上计算机、笔记本电脑以及台式电脑,等等。请参见图3所示,该模型训练方法可包括以下步骤S301-S303:
S301、获取图像处理模型,该图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,该处理模块与质量模块对应的图像服务任务对象关联。在一个实施例中,可以预先构建图像处理模型,示例性,该图像处理模型可以如图1所示。
S302:基于图像训练集对质量模块和一个以上的处理模块进行联合训练。
S303:依照联合训练得到的质量模块确定图像质量评分模型,该图像质量评分模型用于确定输入图像的质量得分。
其中,上述处理模块与质量模块对应的图像服务任务,或者也可以理解为最终的图像质量评分模型对应的图像服务任务,也即,图像质量评分模型评价任一单张图像的质量得分,那么这个拥有质量得分的单张图像后续将应用于怎样的图像任务,如人脸识别模型的训练(即人脸识别)或者活体检测模型的训练(即活体识别)等等。采用高质量的图像进行模型训练,有利于提高所训练模型后续使用的准确度。因此,本申请实施例可以结合与质量模块对应的图像服务任务关联的处理模块,对质量模块进行辅助监督,并基于训练得到的质量模块构建图像质量评分模型。使得图像质量评分模型在实际应用过程中,所得到 的图像质量评分结果,更符合其对应的图像服务任务对图像质量的评价标准(例如人脸识别中图像质量更加关注的因素包括:饱和度、清晰度、光照、姿态、遮挡、浓彩以及夸张表情等),有利于提高图像质量评分模型对图像质量的评分准确度。
本申请实施例,可以获取图像处理模型,图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,处理模块与质量模块对应的图像服务任务关联;基于图像训练集对质量模块和一个以上的处理模块进行联合训练;依照联合训练得到的质量模块确定图像质量评分模型。在训练过程中可以借助与图像服务任务对象关联的处理模块对质量模块的图片质量训练进行辅助监督,并基于联合训练得到的质量模块得到最终的图像质量评分模型,有利于提高图像质量评分模型进行图像质量评分的准确度。
请参见图4,本申请实施例提出了另一种图像模型训练方法,请参见图4所示,该图像模型训练方法可包括以下步骤S401-S404:
S401、获取图像处理模型,该图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,该一个以上的处理模块包括:人脸识别模块和活体检测模块。
S402:依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练。
具体实现中,依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练,包括:从第一训练图像集中获取第一目标用户对应的目标人脸训练图像;其中,第一目标用户为多个用户中的任一用户。进一步地,将目标人脸训练图像输入图像处理模型,通过骨干网络提取目标人脸训练图像的图像特征,得到目标人脸训练图像的第一初始特征图,将第一初始特征图输入人脸识别模块和质量模块,通过人脸识别模块提取识别特征图,并根据识别特征图对目标人脸训练图像进行人脸识别,依照人脸识别结果确定人脸识别模块的损失函数的值loss1。进一步地,调用质量模块确定目标人脸训练图像的质量得分,并根据质量得分对识别特征图进行加权处理,得到共享特征图,基于共享特征图计算质量模块的损失函数的值loss2,按照减小第一目标损失值的方向,更新图像处理模型中除活体检测模块以外模块的网络参数,该第一目标损失值为loss1和所述loss2之和。进一步地,根据更新后的网络参数对初始图像处理模型进行迭代训练,直至loss1和loss2达到收敛状态,得到第一图像处理模型。
其中,第一训练图像集包括多个用户各自对应的人脸训练图像,每个用户对应的人脸训练图像包括第一类质量的人脸训练图像和第二类质量的人脸训练图像。在一个实施例中,一个用户对一个标识(例如ID),每个用户对应的人脸训练图像包括不同图片质量的L1(该L1为大于0的整数)张人脸图像,该L1是根据实验数据预先确定。例如,在准备第一训练图像集时,可以要求每个ID包含图片数要大于八张图片(即同一个人的不同图片多于8张),且图片质量不同(每个ID中包含模糊、大角度等质量欠佳图及正常优质图片)。其中,模糊、大角度等质量欠佳的图像均可以归类为第一类质量的图像;正常优质的图像归类为第二类质量的图像。
在一个实施例中,在依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练的过程中,每一次训练,可以输入同一个ID的L1张人脸图像(即一个用户对应的人脸训练图像)。示例性地,假设图像处理模型参见图5所示,人脸识别模块的网络结构为:取ResNext50网络,去掉ResNext50网最后的全局平均池化层(Average Pooling层),在最后接两个全连接层(fc),第一个全连接层fc1用于输出识别特征图,(该识别特征图例如可以为512维特征),第二全连接层fc2用于输出输入的人脸图像属于各个类别的分数,质量模块的模型结构为:采用轻量级网络MobileNet中深度可分离卷积(由Depthwise卷积和Pointwise卷积两部分组成),经过5个深度可分离卷积层后得到最后的特征图,然后接全局平均池化层和一个全连接层,最后经过Sigmoid(Sigmoid函数常被用作神经网络的激活函数,将变量映射到0,1之间。)转化为0-1之间的数值,即为单张图的质量得分。第一联 合训练过程中,只训练人脸识别模块和质量模块,活体检测模块的网络参数不进行更新。以一次训练为例,其具体训练过程为:
一次训练向图像处理模型输入第一目标用户对应的所有目标人脸训练图像,经过骨干网络进行特征提取之后,得到所有目标人脸训练图像的第一初始特征图,作为人脸识别模块和质量模块的输入。经过人脸识别模块后得到所有目标人脸训练图像的512维特征图(即上述识别特征图),存入特征池中。特征池中存入的512维特征图有两个作用,一是直接经过人脸识别模块的第二个全连接层fc2,利用人脸识别模块对应的第一损失函数(例如softmaxloss)做人脸识别模块的监督训练(该步为通常的人脸识别模块的训练过程,具体可以为:根据识别特征图对目标人脸训练图像进行人脸识别,通过第一损失函数依照人脸识别结果确定人脸识别模块的损失函数的值loss1),二是留给质量模块复用。
一次训练输入的第一目标用户对应的所有目标人脸训练图像经过质量模块后得到各目标人脸训练图像的质量得分,依照各目标人脸训练图像的质量得分确定各自对应的加权系数。其中,确定加权系数时,可以由于质量得分的数值属于0-1,可以直接将质量得分确定为加权系数。或者,可以依照质量得分越高对应加权系数越高的原则重新分配加权系数,对此不作具体限定。
进一步地,在确定各目标人脸训练图像的加权系数后,可以依照各目标人脸训练图像的加权系数,对各目标人脸训练图像的识别特征图进行加权处理,从而得到一个共享特征图。示例性地,假设第一目标用户对应的所有目标人脸训练图像包括:图像1、图像2、图像3、图像4、图像5、图像6、图像7和图像8,各目标人脸训练图像、加权系数和识别特征图三者之间的对应关系如表1所示,那么共享特征图为:
(0.05*P1+0.1*P2+0.1*P3+0.5*P4+0.1*P5+0.05*P6+0.05*P7+0.05*P8)。
表1
人脸训练图像 加权系数 识别特征图
图像1 0.05 识别特征图1(记为P1)
图像2 0.1 识别特征图2(记为P2)
图像3 0.1 识别特征图3(记为P3)
图像4 0.5 识别特征图4(记为P4)
图像5 0.1 识别特征图5(记为P5)
图像6 0.05 识别特征图6(记为P6)
图像7 0.05 识别特征图7(记为P7)
图像8 0.05 识别特征图8(记为P8)
进一步地,在得到共享特征图之后,可以将共享特征图输入质量模块中的全连接层,全连接层可以依照该共享特征图进行分类处理,确定该共享特征图属于各个ID(或者可以理解为各个用户)的得分,得分最高的ID即为最终的分类结果。可以通过质量模块对应的第二损失函数,依照该分类结果与第一目标用户的ID之间的差异,计算质量模块当前损失函数的值loss2。其中,第一联合训练阶段的总loss(即第一目标损失值)等于人脸识别模块和质量模块两个分支的loss之和(即上述loss1+loss2)。第一联合训练阶段可以采用梯度下降法按照减小第一目标损失值的方向,优化总loss更新图像处理模型中除活体检测模块以外模块的网络参数。以此类推,可以依照上述相同的训练方式,输入其它用户对应的人脸训练图像继续对人脸识别模块和质量模块进行第一联合训练,优化总loss,直到两个分支的loss(即loss1和loss2))均达到收敛状态,暂停第一联合训练。
S403:依照第二训练图像集对活体检测模块和质量模块进行第二联合训练。
在一个实施例中,得到第一图像处理模型之后,第二训练图像集包括多个用户各自对应的活体训练图像,每个用户对应的活体训练图像包括第一活体类别的活体训练图像和第 二活体类别的活体训练图像,依照第二训练图像集对活体检测模块和质量模块进行第二联合训练,包括:从第二训练图像集中获取第二目标用户对应的目标活体训练图像;其中,第二目标用户为多个用户中的任一用户。将目标人脸训练图像输入第一图像处理模型,通过第一图像处理模型中的骨干网络提取目标活体训练图像的图像特征,得到目标活体训练图像的第二初始特征图,将第二初始特征图输入活体检测模块和质量模块,通过活体检测模块提取检测特征图,并根据检测特征图对目标活体训练图像进行活体检测,依照活体检测结果确定活体检测模块的损失函数的值loss3,调用第一图像处理模型中的质量模块确定目标人脸训练图像的质量得分,并根据质量得分对检测特征图进行加权处理,得到目标活体训练图像对应的活体共享特征图。进一步地,基于活体共享特征图计算质量模块的损失函数的值loss4,按照减小第二目标损失值的方向,更新第一图像处理模型中除骨干网络和人脸识别模块以外模块的网络参数,该第二目标损失值为所述loss3和所述loss4之和,根据更新后的网络参数对第一图像处理模型进行迭代训练,直至loss3和loss4达到收敛状态,得到第二图像处理模型。其中,上述第一类活体类别和第二类活体类别可以分别指活体和非活体。
其中,活体检测模块对应的第三损失函数可以采用常用的Softmax Loss,也可以采用ArcFace loss,其中,Softmax Loss和ArcFace loss均用于做二分类。质量模块对应的第二损失函数可以采用常用的SoftmaxLoss、ArcFace loss,也可以采用三元数组函数Triplet loss。
作为一种可行的方式,当质量模块对应的第二损失函数为Softmax Loss或者ArcFace loss时,那么在准备训练数据(即第二训练图像集)时,可以从视频流中采集活体数据集(包含活体图像和非活体图像),要求每个用户对应至少4张活体或者4张非活体图像。在进行第二联合训练时,每次训练可以输入同一用户对应的活体训练图像(例如第二目标用户(记为ID1)的4张活体图片+4张非活体图片)。每一个用户对应一个标识(例如ID)。
作为另一种可行的方式,当质量模块对应的第二损失函数为Triplet loss时,那么在准备训练数据(即第二训练图像集)时,仍然可以要求每个用户对应至少4张活体或者4张非活体图像。在进行第二联合训练时,假设求每个用户对应有4张活体或者4张非活体图像,以一次训练为例,一次训练输入的图像可以包括:第二目标用户对应的两张活体图和两张非活体图可以作为Anchor图,第二目标用户对应剩余的2张活体图和2张非活体图可以作为Positive图,另一个用户(记为ID2)的四张图(两张活体和两张非活体)可以作为Negative图。每一个用户对应一个标识(例如ID)。
其中,第二联合训练过程与上述第一联合训练过程相似。当质量模块对应的第二损失函数为Softmax Loss或者ArcFace loss时,第二联合训练过程中,只训练活体检测模块和质量模块,人脸识别模块和骨干网络的网络参数不进行更新。以一次训练为例,其具体训练过程为:
一次训练向第一图像处理模型(即第一联合训练完成后的图像处理模型)输入第二目标用户对应的所有目标活体训练图像,经过骨干网络进行特征提取之后,得到所有目标活体训练图像的第二初始特征图,作为活体检测模块和质量模块的输入。经过活体检测模块后得到所有目标活体训练图像的1024维特征图(即上述检测特征图),存入活体检测模块对应的特征池中。活体检测模块对应的特征池中存入的1024维特征图有两个作用,一是直接经过活体检测模块,利用活体检测模块对应的第三损失函数(例如softmaxloss或者ArcFace loss)做活体识别模块的监督训练(该步为通常的活体检测模块的训练过程,具体可以为:通过第三损失函数依照活体检测结果确定活体检测模块的损失函数的值loss3),二是留给质量模块复用。
一次训练输入的第二目标用户对应的所有目标活体训练图像经过质量模块后得到各目标活体训练图像的质量得分,依照各目标活体训练图像的质量得分确定各自对应的加权系 数。其中,确定加权系数时,可以由于质量得分的数值属于0-1,可以直接将质量得分确定为加权系数。或者,可以依照质量得分越高对应加权系数越高的原则重新分配加权系数,对此不作具体限定。
进一步地,在确定各目标活体训练图像的加权系数后,可以依照各目标活体训练图像的加权系数,对各目标活体训练图像的检测特征图进行加权处理,从而得到一个活体共享特征图。
进一步地,在得到活体共享特征图之后,可以将活体共享特征图输入质量模块中的全连接层,全连接层可以依照该活体共享特征图进行分类处理,确定该活体共享特征图属于各个ID(或者可以理解为各个用户)的得分,得分最高的ID即为最终的分类结果。可以通过质量模块对应的第二损失函数,依照该分类结果与第二目标用户的ID之间的差异,计算质量模块当前损失函数的值loss4。其中,第二联合训练阶段的总loss(即第二目标损失值)等于活体检测模块和质量模块两个分支的loss之和(即上述loss3+loss4)。第二联合训练阶段可以采用梯度下降法按照减小第二目标损失值的方向,优化总loss更新图像处理模型中除骨干网络和人脸识别模块以外模块的网络参数。以此类推,可以依照上述相同的训练方式,输入其它用户对应的活体训练图像继续对活体检测模块和质量模块进行第二联合训练,优化总loss,直到两个分支的loss(即loss3和loss4))均达到收敛状态,暂停第二联合训练。
或者,在另一个实施例中,以质量模块对应的第二损失函数为Tripletloss、活体模块对应的第三损失函数为ArcFace loss为例进行说明,具体地:活体模块对应特征池中的特征有两个用途,一是直接用于活体检测监督采用ArcFace loss做活体和非活体的二分类,二是用质量得分作为加权系数对特征池中的Anchor、Positive、Negative中的三个特征做加权处理,求得三个加权后的特征图,然后计算Triplet loss。该阶段总loss等于两部分loss之和。在第二联合训练过程中,会冻结Backbone部分和识别分支的网络的参数,并采用梯度下降法进行训练,优化总Loss,直到两部分loss都不再下降达到收敛状态,则停止训练。
可以理解的是,第一联合训练和第二联合训练的先后顺序不进行限定,可以依照上述内容先进行第一联合训练再进行第二联合训练。也可以先对活体检测模块和质量模块进行联合训练,再对人脸识别模块和质量模块进行联合训练。
在一个实施例中,在进行第一联合训练和第二联合训练,得到第二图像处理模型之后,还可以向第二图像处理模型输入新的图像,对第二图像处理模型中的骨干网络、人脸识别模块和活体检测模块进行冻结处理,依照新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型。
具体实现中,可以依照固定学习率和新的图像对第二图像处理模型进行微调,后续可以基于微调后的模型生成图像质量评分模型。具体地,在微调阶段,冻结骨干网络、人脸识别模块和活体模块,只微调质量模块。其中,所谓的冻结,是指在训练阶段不更新对应的网络参数。具体的微调过程为:对第二图像处理模型输入一张新的图像,输入一张新的图像后,第二图像处理模型中的人脸识别模块可以提取512维特征图,质量模块可以依照该新的图像的质量得分,对512维特征图进行加权处理,依照加权后的特征图求损失函数的值,记为loss5。另一方面,质量模块可以该新的图像的质量得分,对活体检测模块传来的1024维特征图进行加权处理,依照加权后的特征图求损失函数的值,记为loss6,总loss=loss5+loss6,设置一个较小的初始学习率(base_lr),只做微调,例如base_lr=e-5,从而防止网络震荡。每次按照总loss减小的方向,依照base_lr更新第二图像处理模型中质量模块的网络参数,以此类推,输入其它图像对第二图像处理模型进行迭代训练,直至总loss基本不再发生变化,则停止训练,从而完成模型微调。
在一个实施例中,依照新的图像对所述第二图像处理模型中的质量模块进行调整,以 得到调整后的第二图像处理模型之后,将第二图像处理模型中的质量模块和骨干模块,分别确定为经过第一联合训练和第二联合训练后得到的目标质量模块和目标骨干网络,包括:将调整后的第二图像处理模型中的质量模块和骨干模块,分别确定为经过第一联合训练和第二联合训练后得到的目标质量模块和目标骨干网络。
S404:依照进行第一联合训练和第二联合训练后得到的质量模块确定图像质量评分模型,该图像质量评分模型用于确定输入图像的质量得分。
在一个实施例中,依照联合训练得到的质量模块确定图像质量评分模型,包括:将第二图像处理模型中的质量模块和骨干模块,分别确定为经过第一联合训练和第二联合训练后得到的目标质量模块和目标骨干网络,基于目标质量模块和目标骨干网络,构建图像质量评分模型。
本申请实施例还提供了一种计算机存储介质(或者为计算机可读存储介质),该计算机存储介质中存储有程序指令,该程序指令被执行时,用于实现上述实施例中描述的相应方法。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
再请参见图6,是本申请实施例的一种图像模型训练装置的结构示意图。
本申请实施例的所述装置的一个实现方式中,所述装置包括如下结构。
获取单元60,用于获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务对象关联;
处理单元61,用于基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
所述处理单元61,还用于依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
在一个实施例中,所述一个以上的处理模块包括:人脸识别模块和活体检测模块,所述联合训练包括第一联合训练和第二联合训练,所述图像训练集包括第一图像训练集和第二图像训练集,处理单元61,具体用于:
依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练;
依照第二训练图像集对活体检测模块和质量模块进行第二联合训练。
在一个实施例中,所述第一训练图像集包括多个用户各自对应的人脸训练图像,每个用户对应的人脸训练图像包括第一类质量的人脸训练图像和第二类质量的人脸训练图像,处理单元61,还具体用于:
从第一训练图像集中获取第一目标用户对应的目标人脸训练图像;其中,所述第一目标用户为多个用户中的任一用户;
将所述目标人脸训练图像输入所述图像处理模型,通过所述骨干网络提取所述目标人脸训练图像的图像特征,得到所述目标人脸训练图像的第一初始特征图;
将所述第一初始特征图输入所述人脸识别模块和所述质量模块,通过所述人脸识别模块提取识别特征图,并根据所述识别特征图对所述目标人脸训练图像进行人脸识别,依照人脸识别结果确定所述人脸识别模块的损失函数的值loss1;
调用所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述识别特征图进行加权处理,得到共享特征图;
基于所述共享特征图计算所述质量模块的损失函数的值loss2;
按照减小第一目标损失值的方向,更新所述图像处理模型中除所述活体检测模块以外模块的网络参数,所述第一目标损失值为所述loss1和所述loss2之和;
根据更新后的网络参数对初始图像处理模型进行迭代训练,直至所述loss1和所述loss2 达到收敛状态,得到第一图像处理模型。
在一个实施例中,所述得到第一图像处理模型之后,第二训练图像集包括多个用户各自对应的活体训练图像,每个用户对应的活体训练图像包括第一活体类别的活体训练图像和第二活体类别的活体训练图像,处理单元61,还具体用于:
从第二训练图像集中获取第二目标用户对应的目标活体训练图像;其中,所述第二目标用户为多个用户中的任一用户;
将所述目标人脸训练图像输入所述第一图像处理模型,通过所述第一图像处理模型中的骨干网络提取所述目标活体训练图像的图像特征,得到所述目标活体训练图像的第二初始特征图;
将所述第二初始特征图输入所述活体检测模块和所述质量模块,通过所述活体检测模块提取检测特征图,并根据所述检测特征图对所述目标活体训练图像进行活体检测,依照活体检测结果确定所述活体检测模块的损失函数的值loss3;
调用所述第一图像处理模型中的所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述检测特征图进行加权处理,得到所述目标活体训练图像对应的活体共享特征图;
基于活体共享特征图计算所述质量模块的损失函数的值loss4;
按照减小第二目标损失值的方向,更新所述第一图像处理模型中除所述骨干网络和所述人脸识别模块以外模块的网络参数,所述第二目标损失值为所述loss3和所述loss4之和;
根据更新后的网络参数对所述第一图像处理模型进行迭代训练,直至所述loss3和所述loss4达到收敛状态,得到第二图像处理模型。
在一个实施例中,处理单元61,还具体用于:
将所述第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络;
基于所述目标质量模块和目标骨干网络,构建图像质量评分模型。
在一个实施例中,所述得到第二图像处理模型之后,所述方法还包括:
向所述第二图像处理模型输入新的图像,对所述第二图像处理模型中的骨干网络、人脸识别模块和活体检测模块进行冻结处理;
依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型。
在一个实施例中,所述依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型之后,处理单元61,还具体用于:将所述调整后的第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络。
再请参见图7,是本申请实施例的一种电子设备的结构示意图,本申请实施例的所述电子设备包括供电模块等结构,并包括处理器701、存储装置702以及通信接口703。所述处理器701、存储装置702以及通信接口703之间可以交互数据,由处理器701实现相应的图像模型训练功能。
所述存储装置702可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储装置702也可以包括非易失性存储器(non-volatile memory),例如快闪存储器(flash memory),固态硬盘(solid-state drive,SSD)等;所述存储装置702还可以包括上述种类的存储器的组合。
所述处理器701可以是中央处理器701(central processing unit,CPU)。在一个实施例中,所述处理器701还可以是图形处理器701(Graphics Processing Unit,GPU)。所述处理器701也可以是由CPU和GPU的组合。在所述电子设备中,可以根据需要包括多个CPU 和GPU进行相应的图像模型训练。在一个实施例中,所述存储装置702用于存储程序指令。所述处理器701可以调用所述程序指令,实现如本申请实施例中上述涉及的各种方法。
在第一个可能的实施方式中,所述电子设备的所述处理器701,调用所述存储装置702中存储的程序指令,用于获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务关联;基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
在一个实施例中,所述一个以上的处理模块包括:人脸识别模块和活体检测模块,所述联合训练包括第一联合训练和第二联合训练,所述图像训练集包括第一图像训练集和第二图像训练集,处理器701,具体用于:
依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练;
依照第二训练图像集对活体检测模块和质量模块进行第二联合训练。
在一个实施例中,所述第一训练图像集包括多个用户各自对应的人脸训练图像,每个用户对应的人脸训练图像包括第一类质量的人脸训练图像和第二类质量的人脸训练图像,处理器701,还具体用于:
从第一训练图像集中获取第一目标用户对应的目标人脸训练图像;其中,所述第一目标用户为多个用户中的任一用户;
将所述目标人脸训练图像输入所述图像处理模型,通过所述骨干网络提取所述目标人脸训练图像的图像特征,得到所述目标人脸训练图像的第一初始特征图;
将所述第一初始特征图输入所述人脸识别模块和所述质量模块,通过所述人脸识别模块提取识别特征图,并根据所述识别特征图对所述目标人脸训练图像进行人脸识别,依照人脸识别结果确定所述人脸识别模块的损失函数的值loss1;
调用所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述识别特征图进行加权处理,得到共享特征图;
基于所述共享特征图计算所述质量模块的损失函数的值loss2;
按照减小第一目标损失值的方向,更新所述图像处理模型中除所述活体检测模块以外模块的网络参数,所述第一目标损失值为所述loss1和所述loss2之和;
根据更新后的网络参数对初始图像处理模型进行迭代训练,直至所述loss1和所述loss2达到收敛状态,得到第一图像处理模型。
在一个实施例中,所述得到第一图像处理模型之后,第二训练图像集包括多个用户各自对应的活体训练图像,每个用户对应的活体训练图像包括第一活体类别的活体训练图像和第二活体类别的活体训练图像,处理器701,还具体用于:
从第二训练图像集中获取第二目标用户对应的目标活体训练图像;其中,所述第二目标用户为多个用户中的任一用户;
将所述目标人脸训练图像输入所述第一图像处理模型,通过所述第一图像处理模型中的骨干网络提取所述目标活体训练图像的图像特征,得到所述目标活体训练图像的第二初始特征图;
将所述第二初始特征图输入所述活体检测模块和所述质量模块,通过所述活体检测模块提取检测特征图,并根据所述检测特征图对所述目标活体训练图像进行活体检测,依照活体检测结果确定所述活体检测模块的损失函数的值loss3;
调用所述第一图像处理模型中的所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述检测特征图进行加权处理,得到所述目标活体训练图像对应的活体共享特征图;
基于活体共享特征图计算所述质量模块的损失函数的值loss4;
按照减小第二目标损失值的方向,更新所述第一图像处理模型中除所述骨干网络和所述人脸识别模块以外模块的网络参数,所述第二目标损失值为所述loss3和所述loss4之和;
根据更新后的网络参数对所述第一图像处理模型进行迭代训练,直至所述loss3和所述loss4达到收敛状态,得到第二图像处理模型。
在一个实施例中,处理器701,还具体用于:
将所述第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络;
基于所述目标质量模块和目标骨干网络,构建图像质量评分模型。
在一个实施例中,所述得到第二图像处理模型之后,所述方法还包括:
向所述第二图像处理模型输入新的图像,对所述第二图像处理模型中的骨干网络、人脸识别模块和活体检测模块进行冻结处理;
依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型。
在一个实施例中,所述依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型之后,处理器701,还具体用于:将所述调整后的第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络。
可参考前述各个附图所对应的实施例中相关内容的描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所揭露的仅为本申请的部分实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (20)

  1. 一种图像模型训练方法,包括:
    获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务关联;
    基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
    依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
  2. 如权利要求1所述的方法,其中,所述一个以上的处理模块包括:人脸识别模块和活体检测模块,所述联合训练包括第一联合训练和第二联合训练,所述图像训练集包括第一图像训练集和第二图像训练集,所述基于图像训练集对所述质量模块和一个以上的所述处理模块进行联合训练,包括:
    依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练;
    依照第二训练图像集对活体检测模块和质量模块进行第二联合训练。
  3. 如权利要求2所述的方法,其中,所述第一训练图像集包括多个用户各自对应的人脸训练图像,每个用户对应的人脸训练图像包括第一类质量的人脸训练图像和第二类质量的人脸训练图像,所述依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练,包括:
    从第一训练图像集中获取第一目标用户对应的目标人脸训练图像;其中,所述第一目标用户为多个用户中的任一用户;
    将所述目标人脸训练图像输入所述图像处理模型,通过所述骨干网络提取所述目标人脸训练图像的图像特征,得到所述目标人脸训练图像的第一初始特征图;
    将所述第一初始特征图输入所述人脸识别模块和所述质量模块,通过所述人脸识别模块提取识别特征图,并根据所述识别特征图对所述目标人脸训练图像进行人脸识别,依照人脸识别结果确定所述人脸识别模块的损失函数的值loss1;
    调用所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述识别特征图进行加权处理,得到共享特征图;
    基于所述共享特征图计算所述质量模块的损失函数的值loss2;
    按照减小第一目标损失值的方向,更新所述图像处理模型中除所述活体检测模块以外模块的网络参数,所述第一目标损失值为所述loss1和所述loss2之和;
    根据更新后的网络参数对初始图像处理模型进行迭代训练,直至所述loss1和所述loss2达到收敛状态,得到第一图像处理模型。
  4. 如权利要求3所述的方法,其中,所述得到第一图像处理模型之后,第二训练图像集包括多个用户各自对应的活体训练图像,每个用户对应的活体训练图像包括第一活体类别的活体训练图像和第二活体类别的活体训练图像,所述依照第二训练图像集对活体检测模块和质量模块进行第二联合训练,包括:
    从第二训练图像集中获取第二目标用户对应的目标活体训练图像;其中,所述第二目标用户为多个用户中的任一用户;
    将所述目标人脸训练图像输入所述第一图像处理模型,通过所述第一图像处理模型中的骨干网络提取所述目标活体训练图像的图像特征,得到所述目标活体训练图像的第二初始特征图;
    将所述第二初始特征图输入所述活体检测模块和所述质量模块,通过所述活体检测模块提取检测特征图,并根据所述检测特征图对所述目标活体训练图像进行活体检测,依照活体检测结果确定所述活体检测模块的损失函数的值loss3;
    调用所述第一图像处理模型中的所述质量模块确定所述目标人脸训练图像的质量得分, 并根据所述质量得分对所述检测特征图进行加权处理,得到所述目标活体训练图像对应的活体共享特征图;
    基于活体共享特征图计算所述质量模块的损失函数的值loss4;
    按照减小第二目标损失值的方向,更新所述第一图像处理模型中除所述骨干网络和所述人脸识别模块以外模块的网络参数,所述第二目标损失值为所述loss3和所述loss4之和;
    根据更新后的网络参数对所述第一图像处理模型进行迭代训练,直至所述loss3和所述loss4达到收敛状态,得到第二图像处理模型。
  5. 如权利要求4所述的方法,其中,所述依照所述联合训练得到的质量模块确定图像质量评分模型,包括:
    将所述第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络;
    基于所述目标质量模块和目标骨干网络,构建图像质量评分模型。
  6. 根据权利要求4所述的方法,其中,所述得到第二图像处理模型之后,所述方法还包括:
    向所述第二图像处理模型输入新的图像,对所述第二图像处理模型中的骨干网络、人脸识别模块和活体检测模块进行冻结处理;
    依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型。
  7. 根据权利要求6所述的方法,其中,所述依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型之后,所述将所述第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络,包括:
    将所述调整后的第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络。
  8. 一种图像模型训练装置,包括:
    获取单元,用于获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务对象关联;
    处理单元,用于基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
    所述处理单元,还用于依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
  9. 一种电子设备,包括处理器、存储装置和通信接口,所述处理器、所述存储装置和所述通信接口相互连接,其中,所述存储装置用于存储计算机程序指令,所述处理器被配置用于执行所述程序指令,实现以下方法:
    获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务关联;
    基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
    依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
  10. 如权利要求9所述的电子设备,其中,所述一个以上的处理模块包括:人脸识别模块和活体检测模块,所述联合训练包括第一联合训练和第二联合训练,所述图像训练集包括第一图像训练集和第二图像训练集,执行所述基于图像训练集对所述质量模块和一个以上的所述处理模块进行联合训练,包括:
    依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练;
    依照第二训练图像集对活体检测模块和质量模块进行第二联合训练。
  11. 如权利要求10所述的电子设备,其中,所述第一训练图像集包括多个用户各自对应的人脸训练图像,每个用户对应的人脸训练图像包括第一类质量的人脸训练图像和第二类质量的人脸训练图像,执行所述依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练,包括:
    从第一训练图像集中获取第一目标用户对应的目标人脸训练图像;其中,所述第一目标用户为多个用户中的任一用户;
    将所述目标人脸训练图像输入所述图像处理模型,通过所述骨干网络提取所述目标人脸训练图像的图像特征,得到所述目标人脸训练图像的第一初始特征图;
    将所述第一初始特征图输入所述人脸识别模块和所述质量模块,通过所述人脸识别模块提取识别特征图,并根据所述识别特征图对所述目标人脸训练图像进行人脸识别,依照人脸识别结果确定所述人脸识别模块的损失函数的值loss1;
    调用所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述识别特征图进行加权处理,得到共享特征图;
    基于所述共享特征图计算所述质量模块的损失函数的值loss2;
    按照减小第一目标损失值的方向,更新所述图像处理模型中除所述活体检测模块以外模块的网络参数,所述第一目标损失值为所述loss1和所述loss2之和;
    根据更新后的网络参数对初始图像处理模型进行迭代训练,直至所述loss1和所述loss2达到收敛状态,得到第一图像处理模型。
  12. 如权利要求11所述的电子设备,其中,所述得到第一图像处理模型之后,第二训练图像集包括多个用户各自对应的活体训练图像,每个用户对应的活体训练图像包括第一活体类别的活体训练图像和第二活体类别的活体训练图像,执行所述依照第二训练图像集对活体检测模块和质量模块进行第二联合训练,包括:
    从第二训练图像集中获取第二目标用户对应的目标活体训练图像;其中,所述第二目标用户为多个用户中的任一用户;
    将所述目标人脸训练图像输入所述第一图像处理模型,通过所述第一图像处理模型中的骨干网络提取所述目标活体训练图像的图像特征,得到所述目标活体训练图像的第二初始特征图;
    将所述第二初始特征图输入所述活体检测模块和所述质量模块,通过所述活体检测模块提取检测特征图,并根据所述检测特征图对所述目标活体训练图像进行活体检测,依照活体检测结果确定所述活体检测模块的损失函数的值loss3;
    调用所述第一图像处理模型中的所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述检测特征图进行加权处理,得到所述目标活体训练图像对应的活体共享特征图;
    基于活体共享特征图计算所述质量模块的损失函数的值loss4;
    按照减小第二目标损失值的方向,更新所述第一图像处理模型中除所述骨干网络和所述人脸识别模块以外模块的网络参数,所述第二目标损失值为所述loss3和所述loss4之和;
    根据更新后的网络参数对所述第一图像处理模型进行迭代训练,直至所述loss3和所述loss4达到收敛状态,得到第二图像处理模型。
  13. 如权利要求12所述的电子设备,其中,执行所述依照所述联合训练得到的质量模块确定图像质量评分模型,包括:
    将所述第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络;
    基于所述目标质量模块和目标骨干网络,构建图像质量评分模型。
  14. 根据权利要求12所述的电子设备,其中,所述得到第二图像处理模型之后,所述处理器还用于执行:
    向所述第二图像处理模型输入新的图像,对所述第二图像处理模型中的骨干网络、人脸识别模块和活体检测模块进行冻结处理;
    依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质中存储有计算机程序指令,所述计算机程序指令被处理器执行时,用于执行以下方法:
    获取图像处理模型,所述图像处理模型包括:骨干网络、质量模块、以及一个以上的处理模块,所述处理模块与所述质量模块对应的图像服务任务关联;
    基于图像训练集对所述质量模块和所述一个以上的处理模块进行联合训练;
    依照所述联合训练得到的质量模块确定图像质量评分模型,所述图像质量评分模型用于确定输入图像的质量得分。
  16. 如权利要求15所述的计算机可读存储介质,其中,所述一个以上的处理模块包括:人脸识别模块和活体检测模块,所述联合训练包括第一联合训练和第二联合训练,所述图像训练集包括第一图像训练集和第二图像训练集,执行所述基于图像训练集对所述质量模块和一个以上的所述处理模块进行联合训练,包括:
    依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练;
    依照第二训练图像集对活体检测模块和质量模块进行第二联合训练。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述第一训练图像集包括多个用户各自对应的人脸训练图像,每个用户对应的人脸训练图像包括第一类质量的人脸训练图像和第二类质量的人脸训练图像,执行所述依照第一训练图像集对人脸识别模块和质量模块进行第一联合训练,包括:
    从第一训练图像集中获取第一目标用户对应的目标人脸训练图像;其中,所述第一目标用户为多个用户中的任一用户;
    将所述目标人脸训练图像输入所述图像处理模型,通过所述骨干网络提取所述目标人脸训练图像的图像特征,得到所述目标人脸训练图像的第一初始特征图;
    将所述第一初始特征图输入所述人脸识别模块和所述质量模块,通过所述人脸识别模块提取识别特征图,并根据所述识别特征图对所述目标人脸训练图像进行人脸识别,依照人脸识别结果确定所述人脸识别模块的损失函数的值loss1;
    调用所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述识别特征图进行加权处理,得到共享特征图;
    基于所述共享特征图计算所述质量模块的损失函数的值loss2;
    按照减小第一目标损失值的方向,更新所述图像处理模型中除所述活体检测模块以外模块的网络参数,所述第一目标损失值为所述loss1和所述loss2之和;
    根据更新后的网络参数对初始图像处理模型进行迭代训练,直至所述loss1和所述loss2达到收敛状态,得到第一图像处理模型。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述得到第一图像处理模型之后,第二训练图像集包括多个用户各自对应的活体训练图像,每个用户对应的活体训练图像包括第一活体类别的活体训练图像和第二活体类别的活体训练图像,执行所述依照第二训练图像集对活体检测模块和质量模块进行第二联合训练,包括:
    从第二训练图像集中获取第二目标用户对应的目标活体训练图像;其中,所述第二目标用户为多个用户中的任一用户;
    将所述目标人脸训练图像输入所述第一图像处理模型,通过所述第一图像处理模型中 的骨干网络提取所述目标活体训练图像的图像特征,得到所述目标活体训练图像的第二初始特征图;
    将所述第二初始特征图输入所述活体检测模块和所述质量模块,通过所述活体检测模块提取检测特征图,并根据所述检测特征图对所述目标活体训练图像进行活体检测,依照活体检测结果确定所述活体检测模块的损失函数的值loss3;
    调用所述第一图像处理模型中的所述质量模块确定所述目标人脸训练图像的质量得分,并根据所述质量得分对所述检测特征图进行加权处理,得到所述目标活体训练图像对应的活体共享特征图;
    基于活体共享特征图计算所述质量模块的损失函数的值loss4;
    按照减小第二目标损失值的方向,更新所述第一图像处理模型中除所述骨干网络和所述人脸识别模块以外模块的网络参数,所述第二目标损失值为所述loss3和所述loss4之和;
    根据更新后的网络参数对所述第一图像处理模型进行迭代训练,直至所述loss3和所述loss4达到收敛状态,得到第二图像处理模型。
  19. 如权利要求18所述的计算机可读存储介质,其中,执行所述依照所述联合训练得到的质量模块确定图像质量评分模型,包括:
    将所述第二图像处理模型中的质量模块和骨干模块,分别确定为经过所述第一联合训练和所述第二联合训练后得到的目标质量模块和目标骨干网络;
    基于所述目标质量模块和目标骨干网络,构建图像质量评分模型。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述得到第二图像处理模型之后,所述计算机程序指令被处理器执行时还用于执行:
    向所述第二图像处理模型输入新的图像,对所述第二图像处理模型中的骨干网络、人脸识别模块和活体检测模块进行冻结处理;
    依照所述新的图像对所述第二图像处理模型中的质量模块进行调整,以得到调整后的第二图像处理模型。
PCT/CN2021/082604 2021-01-22 2021-03-24 一种图像模型训练方法、装置及电子设备、存储介质 WO2022156061A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110087937.7 2021-01-22
CN202110087937.7A CN112861659B (zh) 2021-01-22 2021-01-22 一种图像模型训练方法、装置及电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2022156061A1 true WO2022156061A1 (zh) 2022-07-28

Family

ID=76007955

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/082604 WO2022156061A1 (zh) 2021-01-22 2021-03-24 一种图像模型训练方法、装置及电子设备、存储介质

Country Status (2)

Country Link
CN (1) CN112861659B (zh)
WO (1) WO2022156061A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071348A (zh) * 2023-03-02 2023-05-05 深圳市捷牛智能装备有限公司 基于视觉检测的工件表面检测方法及相关装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269149A (zh) * 2021-06-24 2021-08-17 中国平安人寿保险股份有限公司 活体人脸图像的检测方法、装置、计算机设备及存储介质
CN116416656A (zh) * 2021-12-29 2023-07-11 荣耀终端有限公司 基于屏下图像的图像处理方法、装置及存储介质
CN114863224B (zh) * 2022-07-05 2022-10-11 深圳比特微电子科技有限公司 训练方法、图像质量检测方法、装置和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270653A1 (en) * 2016-03-15 2017-09-21 International Business Machines Corporation Retinal image quality assessment, error identification and automatic quality correction
CN108230291A (zh) * 2017-03-30 2018-06-29 北京市商汤科技开发有限公司 物体识别系统训练方法、物体识别方法、装置和电子设备
CN109614866A (zh) * 2018-11-08 2019-04-12 中科天网(广东)科技有限公司 基于级联深度卷积神经网络的人脸检测方法
CN111160434A (zh) * 2019-12-19 2020-05-15 中国平安人寿保险股份有限公司 目标检测模型的训练方法、装置及计算机可读存储介质
CN111340195A (zh) * 2020-03-09 2020-06-26 创新奇智(上海)科技有限公司 网络模型的训练方法及装置、图像处理方法及存储介质
CN111931929A (zh) * 2020-07-29 2020-11-13 深圳地平线机器人科技有限公司 一种多任务模型的训练方法、装置及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633209B (zh) * 2017-08-17 2018-12-18 平安科技(深圳)有限公司 电子装置、动态视频人脸识别的方法及存储介质
CN107679525B (zh) * 2017-11-01 2022-11-29 腾讯科技(深圳)有限公司 图像分类方法、装置及计算机可读存储介质
CN109934115B (zh) * 2019-02-18 2021-11-02 苏州市科远软件技术开发有限公司 人脸识别模型的构建方法、人脸识别方法及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270653A1 (en) * 2016-03-15 2017-09-21 International Business Machines Corporation Retinal image quality assessment, error identification and automatic quality correction
CN108230291A (zh) * 2017-03-30 2018-06-29 北京市商汤科技开发有限公司 物体识别系统训练方法、物体识别方法、装置和电子设备
CN109614866A (zh) * 2018-11-08 2019-04-12 中科天网(广东)科技有限公司 基于级联深度卷积神经网络的人脸检测方法
CN111160434A (zh) * 2019-12-19 2020-05-15 中国平安人寿保险股份有限公司 目标检测模型的训练方法、装置及计算机可读存储介质
CN111340195A (zh) * 2020-03-09 2020-06-26 创新奇智(上海)科技有限公司 网络模型的训练方法及装置、图像处理方法及存储介质
CN111931929A (zh) * 2020-07-29 2020-11-13 深圳地平线机器人科技有限公司 一种多任务模型的训练方法、装置及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116071348A (zh) * 2023-03-02 2023-05-05 深圳市捷牛智能装备有限公司 基于视觉检测的工件表面检测方法及相关装置

Also Published As

Publication number Publication date
CN112861659B (zh) 2023-07-14
CN112861659A (zh) 2021-05-28

Similar Documents

Publication Publication Date Title
WO2022156061A1 (zh) 一种图像模型训练方法、装置及电子设备、存储介质
CN107766850B (zh) 基于结合人脸属性信息的人脸识别方法
Li et al. Contour knowledge transfer for salient object detection
WO2019228317A1 (zh) 人脸识别方法、装置及计算机可读介质
WO2021159742A1 (zh) 图像分割方法、装置和存储介质
CN110689025B (zh) 图像识别方法、装置、系统及内窥镜图像识别方法、装置
WO2020019738A1 (zh) 磁共振血管壁成像的斑块处理方法、装置和计算设备
US20220270348A1 (en) Face recognition method and apparatus, computer device, and storage medium
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN111444826B (zh) 视频检测方法、装置、存储介质及计算机设备
WO2023109714A1 (zh) 用于蛋白质表征学习的多模态信息融合方法、系统、终端及存储介质
CN111292262B (zh) 图像处理方法、装置、电子设备以及存储介质
WO2022052782A1 (zh) 图像的处理方法及相关设备
WO2024060395A1 (zh) 一种基于深度学习的高精度点云补全方法及装置
CN114926892A (zh) 一种基于深度学习的眼底图像匹配方法、系统和可读介质
JP2023526899A (ja) 画像修復モデルを生成するための方法、デバイス、媒体及びプログラム製品
TWI728369B (zh) 人工智慧雲端膚質與皮膚病灶辨識方法及其系統
CN116778527A (zh) 人体模型构建方法、装置、设备及存储介质
CN113239866B (zh) 一种时空特征融合与样本注意增强的人脸识别方法及系统
CN111429414B (zh) 基于人工智能的病灶影像样本确定方法和相关装置
CN111414817B (zh) 面部识别系统及面部识别方法
WO2024027146A1 (zh) 阵列式人脸美丽预测方法、设备及存储介质
CN112164447A (zh) 图像处理方法、装置、设备及存储介质
TWM586599U (zh) 人工智慧雲端膚質與皮膚病灶辨識系統
CN116704401A (zh) 操作类考试的评分校验方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920450

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21920450

Country of ref document: EP

Kind code of ref document: A1