WO2023137905A1 - Image processing method and apparatus, and electronic device and storage medium - Google Patents

Image processing method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2023137905A1
WO2023137905A1 PCT/CN2022/090297 CN2022090297W WO2023137905A1 WO 2023137905 A1 WO2023137905 A1 WO 2023137905A1 CN 2022090297 W CN2022090297 W CN 2022090297W WO 2023137905 A1 WO2023137905 A1 WO 2023137905A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
key point
image
uncertainty
target face
Prior art date
Application number
PCT/CN2022/090297
Other languages
French (fr)
Chinese (zh)
Inventor
胡显
易军
邓巍
Original Assignee
小米科技(武汉)有限公司
北京小米移动软件有限公司
北京小米松果电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 小米科技(武汉)有限公司, 北京小米移动软件有限公司, 北京小米松果电子有限公司 filed Critical 小米科技(武汉)有限公司
Publication of WO2023137905A1 publication Critical patent/WO2023137905A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular to an image processing method, device, electronic equipment and storage medium.
  • Face key point detection refers to locating the feature key points of the face from the face image, such as the key points of the facial contour and the key points of the facial features. Due to the influence of factors such as pose, occlusion or light, face key point detection is a challenging task.
  • the embodiments of the present disclosure provide an image processing method, device, electronic equipment and storage medium.
  • an embodiment of the present disclosure provides an image processing method, including:
  • the target human face is included in the human face image to be tested;
  • a detection result of the target face is determined according to the key point information and the first uncertainty.
  • the acquisition of the face image to be tested includes:
  • the image to be processed includes at least one human face
  • the human face image to be tested corresponding to the human face is obtained by cropping according to the human face area information.
  • the performing image detection on the face image to be tested, and determining the key point information of at least one key point of the target face and the first uncertainty of the target face include:
  • the first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
  • the preset key points include at least one of the following types:
  • Key points of face contour key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
  • the determining the detection result of the target face according to the key point information and the first uncertainty includes:
  • the key points are output on the human face image to be tested according to the key point information of each key point.
  • the determining the detection result of the target face according to the key point information and the first uncertainty includes:
  • the target face tracking model is determined from a plurality of preset face tracking models
  • the target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
  • the determining the detection result of the target face according to the key point information and the first uncertainty includes:
  • the performing image detection on the face image to be tested, and determining the key point information of at least one key point of the target face and the first uncertainty of the target face include:
  • the method of the embodiment of the present disclosure also includes a training process for training the feature extraction network and the key point detection network, the training process includes:
  • each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
  • the face sample image is input into the feature extraction network to be trained, and the feature map of the human face sample image output by the feature extraction network is obtained;
  • an image processing device including:
  • An acquisition module configured to acquire a face image to be tested, the face image to be tested includes a target face;
  • the image detection module is configured to perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to the second uncertainty of all key points of the target face;
  • the result determination module is configured to determine the detection result of the target face according to the key point information and the first uncertainty.
  • the acquisition module is configured to:
  • the image to be processed includes at least one human face
  • the human face image to be tested corresponding to the human face is obtained by cropping according to the human face area information.
  • the image detection module is configured to:
  • the first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
  • the preset key points include at least one of the following types:
  • Key points of face contour key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
  • the outcome determination module is configured to:
  • the key points are output on the human face image to be tested according to the key point information of each key point.
  • the outcome determination module is configured to:
  • the target face tracking model is determined from a plurality of preset face tracking models
  • the target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
  • the outcome determination module is configured to:
  • the image detection module includes:
  • a feature extraction module configured to input the face image to be tested into a pre-trained feature extraction network to obtain a feature map output by the feature extraction network;
  • the key point detection module is configured to input the feature map into a pre-trained key point detection network, and obtain the key point information and the first uncertainty of each key point of the target face output by the key point detection network.
  • the device described in the embodiments of the present disclosure further includes a training module configured to:
  • each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
  • the face sample image is input into the feature extraction network to be trained, and the feature map of the human face sample image output by the feature extraction network is obtained;
  • an electronic device including:
  • the memory stores computer instructions that can be read by the processor, and when the computer instructions are read, the processor executes the method according to any implementation manner of the first aspect.
  • the embodiments of the present disclosure provide a storage medium for storing computer-readable instructions, and the computer-readable instructions are used to cause a computer to execute the method according to any embodiment of the first aspect.
  • the image processing method includes acquiring a face image to be tested, performing image detection on the face image to be tested, determining key point information of at least one key point of a target face and a first uncertainty of the target face, and determining a detection result of the target face according to the key point information and the first uncertainty.
  • the first uncertainty of the target face is used to assist the detection of key points of the face to improve the effect and accuracy of face detection, and at the same time, it is applicable to various task scenarios, and the first uncertainty representing the comprehensive error of the target face is determined based on the second uncertainty of all key points of the target face, so as to improve the network effect and training efficiency.
  • FIG. 1 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • FIG. 2 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • FIG. 3 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • Fig. 4 is a schematic diagram of facial key points according to some implementations of the present disclosure.
  • Fig. 5 is a schematic structural diagram of an image detection network according to some embodiments of the present disclosure.
  • FIG. 6 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • FIG. 7 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • FIG. 8 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • FIG. 9 is a flowchart of an image processing method according to some embodiments of the present disclosure.
  • FIG. 10 is a structural block diagram of an image processing device according to some embodiments of the present disclosure.
  • FIG. 11 is a structural block diagram of an image processing device according to some embodiments of the present disclosure.
  • Fig. 12 is a structural block diagram of an electronic device according to some embodiments of the present disclosure.
  • Face key point detection is a necessary means for face recognition tasks. Face key point detection refers to locating key points of facial features from the face image, such as key points of facial contour and key points of facial features. Key points of facial contour can include key points of chin, jaw, and cheeks.
  • DNN Deep Neural Network
  • DNN Deep Neural Network
  • embodiments of the present disclosure provide an image processing method, device, electronic equipment, and storage medium, aiming at improving the accuracy of facial key point positioning and optimizing the structure and effect of an image detection network.
  • an embodiment of the present disclosure provides an image processing method, which can be applied to an electronic device.
  • the electronic device may be any type of device suitable for implementation, such as a mobile terminal, a vehicle terminal, a wearable device, an access control system, a video surveillance system, a cloud platform, and a server, etc., and the present disclosure does not limit this.
  • the image processing method of the present disclosure example includes:
  • the face image to be tested refers to an image in which a face object is expected to be detected, so that the face image to be tested may include one or more face objects, and the face object is the target face.
  • the face image to be tested may be a single frame image collected by the image collection device of the electronic device, or may be a frame image in a video stream collected by the image collection device of the electronic device.
  • the electronic device is a smart phone.
  • the smart phone includes a camera, and an image including a human face can be captured through the camera, and the image can be used as the human face image to be tested in the present disclosure.
  • the electronic device takes a video surveillance system as an example.
  • the video surveillance system includes a surveillance camera, which can capture a video stream including a human face in the target scene area through the surveillance camera.
  • the frame images in the video stream can be used as the face image to be tested in the present disclosure.
  • the face image to be tested can be any image that is expected to detect a face object from the image, it can be an image acquired in real time, or it can be a face image uploaded or downloaded through the network, which will not be repeated in this disclosure.
  • the face image acquired by the electronic device often has many interference factors, for example, the face image includes multiple face objects, and for example, the face image includes a large non-face area.
  • the face image can be cropped in advance, and the cropped image including only one face object can be used as the face image to be tested.
  • S120 Perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face.
  • face key point detection needs to detect multiple face key points from the face image to be tested, and these face key points may respectively belong to different face key point types.
  • Key point types can include, for example, facial contour key points, eye key points, eyebrow key points, nose key points, mouth key points, ear key points, etc.
  • Each key point type can include multiple key points, for example, eyebrow key points can include 5*2 total of 10 key points.
  • the key point information may include key point coordinates corresponding to each key point.
  • image detection may be performed on the face image to be tested based on image detection technology, so that all key points of the target face and the position coordinates of each key point in the image may be obtained from the face image to be tested.
  • the first uncertainty of the target face needs to be determined during key point detection, and the first uncertainty represents the comprehensive error of all key points of the target face, that is, the first uncertainty of the target face is obtained according to the second uncertainty of all key points.
  • the higher the first uncertainty the greater the error of the key point detection of the target face, and on the contrary, the lower the first uncertainty, the smaller the error of the key point detection of the target face.
  • the pre-trained key point detection network can be used to predict the position coordinates of each key point of the target face, so as to obtain the key point information of each key point.
  • the key point detection network can also predict the uncertainty of each key point to obtain the second uncertainty corresponding to each key point.
  • the key point detection network can predict the position coordinates (x, y) and the second uncertainty p of the key point A based on the eyebrow feature of the target face, and the second uncertainty p represents the error of the position coordinates (x, y) of the key point.
  • each key point corresponds to key point information and a second uncertainty.
  • the first uncertainty of the target face is calculated according to the second uncertainties of all key points, and the first uncertainty is used as the comprehensive uncertainty corresponding to the target face.
  • the root mean square of the second uncertainties of all key points may be used as the first uncertainties corresponding to the target face.
  • the mean value of the second uncertainty of all key points may be used as the first uncertainty corresponding to the target face.
  • the first uncertainty can also be obtained by fusing the second uncertainties of all key points in other ways, as long as the first uncertainty can represent the comprehensive error of the key points, which is not limited in the present disclosure.
  • the corresponding post-processing logic can be set according to different downstream task scenarios, so as to obtain the detection result for the target face based on the key point information and the first uncertainty.
  • the face image uploaded by the user must meet certain requirements, such as not covering the eyebrows, not tilting the head too far, and so on. Therefore, through the above process of the present disclosure, image processing can be performed on the face photo uploaded by the user to obtain the key point information of each key point of the face image and the first uncertainty for the target face.
  • the first uncertainty is greater than the preset threshold, it means that the key point detection deviation of the face image uploaded by the user is large, and there may be problems such as facial features occlusion, so that the corresponding detection result can be output to the user as failed, and a certain facial feature is occluded.
  • the electronic device corresponds to different working conditions under different lighting conditions. For example, in an extremely dark light scene, the exposure of the face image collected by the electronic device is very low, so the first uncertainty obtained by key point detection is relatively large. On the contrary, for example, in a bright scene, the exposure of the face image collected by the electronic device is normal, so the first uncertainty obtained by the key point detection is relatively low. Based on this, by setting an appropriate threshold, the first uncertainty can be used to determine the current lighting environment of the device, so that the corresponding tracking algorithm model can be used to realize face tracking.
  • the first uncertainty representing the comprehensive error of the target face is determined based on the second uncertainty of all key points of the target face, so that for the key point detection network, it is not necessary to perform regression optimization on the uncertainty of each key point during the training process, but to optimize the comprehensive uncertainty of the face, the network is easy to converge, the effect is better, and the training efficiency is greatly improved.
  • the first uncertainty of the target face is used to assist the detection of key points of the face, so as to improve the effect and accuracy of face detection. And based on the second uncertainty of all the key points of the target face, the first uncertainty representing the comprehensive error of the target face is determined to improve the network effect and training efficiency.
  • the disclosed method does not limit the application scenarios, and can be applied to downstream tasks in various scenarios, such as face image quality detection, face tracking, key point positioning, etc., and has higher robustness.
  • the process of obtaining the face image to be tested includes:
  • S210 Acquire an image to be processed, where the image to be processed includes at least one human face.
  • S220 Perform image detection on the image to be processed, and determine face area information of each face on the image to be processed.
  • the image to be processed may be an original image collected by an image collection device of the electronic device, or an uploaded image uploaded to the electronic device by a user. It can be understood that the image to be processed may include one human face, or may include multiple human faces.
  • image detection may be performed on the image to be processed based on the image detection technology to obtain face area information of each face on the image to be processed.
  • the image to be processed can be detected through the CenterFace network, so as to obtain the face detection frame of each face area on the image to be processed, and the face detection frame is also the face area information.
  • the image to be processed can be cropped according to the face detection frame, so as to obtain a face image including each face area, which is the face image to be tested.
  • the center point of each face detection frame can be used as the origin, and the coordinates of the origin are kept unchanged to uniformly expand the entire face detection frame at a preset ratio, and the face image is cut out along the expanded face detection frame.
  • the face image of each face can be cut out through the process of the embodiment shown in FIG. 2 , and these face images can be used as the face image to be tested in the present disclosure.
  • key point detection can be performed on the target face in the face image to be tested based on the image detection technology, which will be described below with reference to FIG. 3 .
  • the process of performing image detection on the face image to be tested includes:
  • S310 Perform key point detection on the face image to be tested, and determine key point information and a second uncertainty of each key point of the target face based on the preset key point type of the face.
  • S320 Determine the first uncertainty of the target face according to the second uncertainty of each key point of the target face.
  • Keypoint types include, for example, eye keypoints, eyebrow keypoints, nose keypoints, mouth keypoints, facial contour keypoints, etc., wherein each keypoint type may include multiple keypoints.
  • the preset face key point types can include the following table 1:
  • face key point types are not limited to the examples in Table 1 above, and may also include any other key point types suitable for implementation, such as ear key points, apple muscle key points, etc., which are not limited in the present disclosure.
  • the above-mentioned key point type detection may be performed on the to-be-tested face image based on image detection, so that the key point information of each key point of the target face and the second uncertainty of each key point may be determined.
  • the first uncertainty corresponding to the target face can be obtained based on the second uncertainty calculation of each key point.
  • the root mean square of the second uncertainties of all key points may be used as the first uncertainties corresponding to the target face.
  • the key point detection of the target face in the face image to be tested can be realized based on a pre-trained image detection network.
  • Fig. 5 shows the image detection network structure in some embodiments of the present disclosure, which will be described below in conjunction with Fig. 5 .
  • the image detection network of the example of the present disclosure includes a feature extraction network 510 and a key point detection network 520 .
  • the feature extraction network 510 is the backbone network (Backbone Network) of the image detection network, which is mainly used for feature extraction of the face image to be tested, thereby obtaining a feature map (feature map) including semantic features and texture features of the face to be tested. That is, the input of the feature extraction network 510 is the human face image to be tested, and the output is the feature map of the human face image to be tested.
  • Backbone Network Backbone Network
  • the feature extraction network 510 can adopt a learnable network based on a convolutional neural network (CNN, Convolutional Neural Network) architecture.
  • CNN convolutional neural network
  • the feature extraction network 510 can adopt a relatively lightweight MobileNet neural network.
  • the key point detection network 520 is used to predict and output key point information and the first uncertainty according to the feature map output by the feature extraction network 510 .
  • the network structure of the key point detection network 520 includes two branches, that is, the output layer is divided into two fully connected layers.
  • One of the branches is key point information prediction, which is used to perform regression prediction on the position coordinates of each key point of the target face, and obtain the key point information of each key point.
  • the other branch is uncertainty prediction, which is used to predict the first uncertainty of the output target face according to the uncertainty of each key point.
  • the pooling layer of the key point detection network 520 adopts a 7*7 pooling layer, and each fully connected layer adopts a 256*1-dimensional fully connected layer.
  • the image detection network shown in FIG. 5 before using the image detection network shown in FIG. 5 to process the face image to be tested, it also includes a process of normalizing the face image to be tested.
  • the purpose of the normalization process is to normalize the pixel values of the face image to be tested, so as to obtain an input image that meets the network design requirements and reduce the amount of calculation.
  • the face image to be tested may first be scaled to a preset size, such as 112 pixels*112 pixels, by bilinear interpolation, for example, and the image is pixel-normalized, expressed as:
  • I Norm represents the normalized image pixel value
  • I represents the pixel value of the original image
  • the normalized image is used as the input image of the image detection network.
  • the image processing method of the present disclosure determines the detection result of the target face includes:
  • the reliability score of the target face can be calculated based on the first uncertainty.
  • the first uncertainty represents the comprehensive error of key point detection and positioning of the target face, which reflects the reliability of the detected key point information, based on which the reliability score of the target face can be determined.
  • the first uncertainty output by the image detection network is a value between 0 and 1, so the reliability score of the determined target face can be expressed as:
  • represents the reliability score of the target face
  • represents the first uncertainty of the target face
  • a first preset threshold may be set in advance based on prior knowledge or scene requirements, and the first preset threshold represents a critical value for passing or failing the key point detection result of the target face.
  • the reliability score is greater than the first preset threshold, it means that the detection result of the target face is a reliable result, that is, the detection is passed, and the first preset condition is met.
  • the reliability score is not greater than the first preset threshold, it indicates that the detection result of the target face is unreliable, that is, the detection fails, and the first preset condition is not met.
  • each key point can be marked on the original face image to be tested according to the key point information of each key point, so that the user can watch the position of each key point on the image, and realize the visual output of the key point of the face.
  • the complexity of the current scene may be determined based on the first uncertainty to implement switching of the face tracking model, which will be described below with reference to the implementation in FIG. 7 .
  • determining the detection result of the target face includes:
  • S132-2 Determine a target face tracking model from a plurality of preset face tracking models according to the reliability score and the pre-established correspondence between the reliability score and the face tracking model.
  • the reliability score of the target face may be determined based on the aforementioned process of the implementation manner in FIG. 6 , which will not be repeated in this disclosure.
  • the first uncertainties obtained by key point detection should also be different.
  • the exposure of the face image collected by the electronic device is very low, so the first uncertainty obtained by the key point detection is relatively large, and correspondingly, the reliability score of the target face is also lower.
  • the exposure of the face image collected by the electronic device is normal, so the first uncertainty obtained by the key point detection is lower, and correspondingly, the reliability score of the target face is higher.
  • the correspondence between the reliability score and the face tracking model can be established in advance based on prior knowledge or a limited number of experiments.
  • the pre-established correspondence can be shown in Table 2 below:
  • the face tracking model corresponding to the reliability score can be determined as the target face basis model according to the correspondence in Table 2 above, and then the target face tracking model can be used to detect and track the target face. For example, in an example, if the reliability score of the target face in the image to be detected is 0.8, then based on the correspondence in Table 2 above, it can be determined that the current scene is a normal scene, and the corresponding target face tracking model is "model 1", so that the target face is tracked and detected using model 1, and the face detection result is obtained.
  • the current lighting scene can be judged based on the reliability score, so as to select the corresponding face tracking model for face tracking detection, and improve the effect of the detection system.
  • the quality inspection of the stored photos can be implemented according to the disclosed method.
  • face recognition scenarios such as identity verification
  • users are often required to upload a face photo that meets the requirements in advance, so as to be used as a template photo for subsequent identity verification.
  • the disclosed method can be used to detect the photos uploaded by users to determine whether the uploaded photos are qualified. The following will describe the embodiment in conjunction with FIG. 8 .
  • determining the detection result of the target face includes:
  • S133-1 Determine the reliability score of the target face according to the first uncertainty of the target face.
  • the face image can be used as the face image to be tested as described in the foregoing embodiments of the present disclosure.
  • the key point detection of the face image to be tested can obtain the key point information and the first uncertainty of the target face.
  • the reliability score of the target face can be determined based on the aforementioned process of the implementation manner in FIG. 6 , which will not be repeated in this disclosure.
  • a second preset threshold may be preset based on prior knowledge or a limited number of trials, and the second preset threshold represents a critical value for whether the target face is detected or not.
  • the reliability score is greater than the second preset threshold, it means that the detection of the target face is passed, meets the second preset condition, and can be stored.
  • the reliability score is not greater than the second preset threshold, it means that the detection result of the target face does not pass, does not meet the second preset condition, and cannot be stored.
  • the key points that do not meet the requirements can also be determined according to the key point information, so as to output prompt information to the user, such as "eyebrows are blocked” and so on.
  • the method of the embodiments of the present disclosure can be applied to various face recognition scenarios, and can distinguish image quality or current environmental conditions based on the first uncertainty, which has strong practicability and robustness, and improves the effect of face recognition tasks.
  • the process of performing network training on the image detection network includes:
  • the sample data set includes a large amount of sample data.
  • the sample data set includes 5000 pieces of sample data.
  • For each sample data it includes a face sample image and a key point label of each key point of the target face in the pre-labeled face sample image.
  • the key point label represents the ground truth of each key point of the target face in the face sample image
  • the key point label can be obtained by manual labeling.
  • the N key point coordinates of the target face in the face sample image can be marked by manual labeling to obtain the key point label corresponding to each face sample image.
  • the massive data in the sample data set can also be preprocessed in advance.
  • the preprocessing process can refer to the aforementioned embodiment in FIG.
  • the network structure of the image detection network may refer to the implementation manner shown in FIG. 5 above.
  • each n sample data can be used as a batch (Batch) of training samples, usually n can be 256.
  • the following takes a sample data as an example to illustrate the training process.
  • the face sample image before inputting the face sample image into the image detection network, can be normalized in advance, and the normalization process can refer to the aforementioned formula (1), which will not be repeated here.
  • the human face sample image included in the sample data is input into the feature extraction network 510 to be trained, so that the feature extraction network 510 outputs a feature map corresponding to the human face sample image.
  • the feature map output by the feature extraction network 510 is used as the input of the key point detection network 520, and through the pooling layer and the fully connected layer of the key point detection network 520, the key point information P of the target face and the first uncertainty ⁇ of the target face are respectively output.
  • the key point information of the target face output by the key point detection network 520 is expressed as:
  • P represents key point information
  • N represents the number of key points
  • ( xi , y i ) represents the position coordinates of the i-th key point.
  • the key point information can include the position coordinates of the key points predicted by the image detection network, and the key point labels represent the real coordinates of the key points, so that the difference between the two can be calculated based on the pre-built loss function, that is, the loss.
  • the image detection network is not optimized for training based on the difference between key point information and key point labels, but the first uncertainty is integrated for optimal training at the same time, so that there is no need to set additional labels for the first uncertainty, and the network is easier to converge.
  • the image processing network uses a multi-objective constraint loss function, expressed as follows:
  • L represents the loss between key point information and key point labels
  • L p represents the key point error loss function
  • L a represents the uncertainty error loss function
  • ⁇ p represents the root mean square error of all key points of the target face
  • represents the first uncertainty of the predicted output
  • f represents the L1 loss function
  • the network parameters of the feature extraction network and/or the key point detection network can be optimized and adjusted according to the difference backpropagation.
  • the above process is repeated repeatedly using the sample data in the sample data set, and the image detection network is iteratively optimized until the convergence condition is met, and the network training is completed.
  • the first uncertainty of the target face is integrated to optimize the training of the network to improve the effect of the image processing network.
  • the constructed loss function has a simple structure, and the optimization of the first uncertainty can be realized without setting additional labels for the first uncertainty, and the network is easier to converge.
  • the first uncertainty is the comprehensive uncertainty of the target face.
  • the embodiments of the present disclosure provide an image processing device, which can be applied to electronic equipment.
  • the electronic device may be any type of device suitable for implementation, such as a mobile terminal, a vehicle terminal, a wearable device, an access control system, a video surveillance system, a cloud platform, and a server, etc., and the present disclosure does not limit this.
  • the image processing device of the present disclosure includes:
  • the obtaining module 10 is configured to obtain a human face image to be tested, the human face image to be tested includes a target human face;
  • the image detection module 20 is configured to perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to the second uncertainty of all key points of the target face;
  • the result determination module 30 is configured to determine the detection result of the target face according to the key point information and the first uncertainty.
  • the first uncertainty of the target face is used to assist the detection of key points of the face, so as to improve the effect and accuracy of face detection. And based on the second uncertainty of all the key points of the target face, the first uncertainty representing the comprehensive error of the target face is determined to improve the network effect and training efficiency.
  • the disclosed method does not limit the application scenarios, and can be applied to downstream tasks in various scenarios, such as face image quality detection, face tracking, key point positioning, etc., and has higher robustness.
  • the acquisition module 10 is configured to:
  • the image to be processed includes at least one human face
  • the human face image to be tested corresponding to the human face is obtained by cropping according to the human face area information.
  • the image detection module 20 is configured to:
  • the first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
  • the preset key points include at least one of the following types:
  • Key points of face contour key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
  • the result determination module 30 is configured to:
  • the key points are output on the human face image to be tested according to the key point information of each key point.
  • the result determination module 30 is configured to:
  • the target face tracking model is determined from a plurality of preset face tracking models
  • the target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
  • the result determination module 30 is configured to:
  • the image detection module 20 includes:
  • the feature extraction module 40 is configured to input the pre-trained feature extraction network of the human face image to be tested, and obtain the feature map output by the feature extraction network;
  • the key point detection module 50 is configured to input the feature map into a pre-trained key point detection network, and obtain the key point information and the first uncertainty of each key point of the target face output by the key point detection network.
  • the device described in the embodiments of the present disclosure further includes a training module 60, the training module is configured to:
  • each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
  • the feature extraction network to be trained is input to the human face sample image, obtain the feature map of the described human face sample image output by the feature extraction network;
  • the network is optimized and trained by fusing the first uncertainty of the target face to improve the effect of the image processing network.
  • the constructed loss function has a simple structure, and the optimization of the first uncertainty can be realized without setting additional labels for the first uncertainty, and the network is easier to converge.
  • the first uncertainty is the comprehensive uncertainty of the target face.
  • an electronic device including:
  • the memory stores computer instructions that can be read by the processor, and when the computer instructions are read, the processor executes the method according to any implementation manner of the first aspect.
  • the embodiments of the present disclosure provide a storage medium for storing computer-readable instructions, and the computer-readable instructions are used to cause a computer to execute the method according to any embodiment of the first aspect.
  • FIG. 12 shows a schematic structural diagram of an electronic device 600 suitable for implementing the method of the present disclosure.
  • the electronic device shown in FIG. 12 can realize the corresponding functions of the above-mentioned processor and storage medium.
  • the electronic device 600 includes a processor 601 that can perform various appropriate actions and processes according to programs stored in the memory 602 or loaded from the storage part 608 into the memory 602 .
  • various programs and data necessary for the operation of the electronic device 600 are also stored.
  • the processor 601 and the memory 602 are connected to each other through a bus 604 .
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 608 including a hard disk, etc.; and a communication section 609 including a network interface card such as a LAN card, a modem, etc.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • a drive 610 is also connected to the I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 610 as necessary so that a computer program read therefrom is installed into the storage section 608 as necessary.
  • the above method process can be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method described above.
  • the computer program may be downloaded and installed from a network via the communication portion 609 and/or installed from a removable medium 611 .
  • each block in the flowchart or block diagram may represent a module, program segment, or a portion of code that includes one or more executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

An image processing method. The image processing method comprises: acquiring a facial image to be subjected to detection, wherein said facial image comprises a target face; performing image detection on said facial image, and determining key point information of at least one key point of the target face and a first uncertainty of the target face; and determining a detection result of the target face according to the key point information and the first uncertainty.

Description

图像处理方法、装置、电子设备以及存储介质Image processing method, device, electronic device and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请日为2022年1月21日、申请号为202210074181.7号的中国专利申请,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入作为参考。This application is based on a Chinese patent application with a filing date of January 21, 2022 and application number 202210074181.7, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference.
技术领域technical field
本公开涉及计算机视觉技术领域,具体涉及一种图像处理方法、装置、电子设备以及存储介质。The present disclosure relates to the technical field of computer vision, and in particular to an image processing method, device, electronic equipment and storage medium.
背景技术Background technique
目前,基于深度神经网络(DNN,Deep Neural Network)的人脸识别是计算机视觉(CV,Computer Vision)领域最为重要的应用之一。人脸关键点检测是指从人脸图像中定位出人脸面部的特征关键点,例如脸部轮廓关键点、五官关键点等。由于受到姿态、遮挡或者光线等因素的影响,人脸关键点检测是一个富有挑战性的任务。At present, face recognition based on deep neural network (DNN, Deep Neural Network) is one of the most important applications in the field of computer vision (CV, Computer Vision). Face key point detection refers to locating the feature key points of the face from the face image, such as the key points of the facial contour and the key points of the facial features. Due to the influence of factors such as pose, occlusion or light, face key point detection is a challenging task.
发明内容Contents of the invention
为提高人脸关键点检测效果,本公开实施方式提供了一种图像处理方法、装置、电子设备以及存储介质。In order to improve the detection effect of facial key points, the embodiments of the present disclosure provide an image processing method, device, electronic equipment and storage medium.
第一方面,本公开实施方式提供了一种图像处理方法,包括:In a first aspect, an embodiment of the present disclosure provides an image processing method, including:
获取待测人脸图像,所述待测人脸图像中包括目标人脸;Acquiring a human face image to be tested, the target human face is included in the human face image to be tested;
对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度;其中,所述第一不确定度根据所述目标人脸的所有关键点的第二不确定度得到;Perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to second uncertainties of all key points of the target face;
根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果。A detection result of the target face is determined according to the key point information and the first uncertainty.
在一些实施方式中,所述获取待测人脸图像包括:In some embodiments, the acquisition of the face image to be tested includes:
获取待处理图像,所述待处理图像中包括至少一个人脸;Acquiring an image to be processed, the image to be processed includes at least one human face;
对所述待处理图像进行图像检测,确定所述待处理图像上的每个所述人脸的人脸区域信息;performing image detection on the image to be processed, and determining face area information of each face on the image to be processed;
对于任意一个人脸,根据所述人脸区域信息裁切得到所述人脸对应的所述待测人脸图像。For any human face, the human face image to be tested corresponding to the human face is obtained by cropping according to the human face area information.
在一些实施方式中,所述对待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度,包括:In some implementations, the performing image detection on the face image to be tested, and determining the key point information of at least one key point of the target face and the first uncertainty of the target face include:
对所述待测人脸图像进行关键点检测,基于预先设置的人脸关键点类型,确定所述目标人脸的每个关键点的所述关键点信息和第二不确定度;Carry out key point detection on the face image to be tested, and determine the key point information and the second uncertainty of each key point of the target face based on the preset face key point type;
根据所述目标人脸的各个关键点的第二不确定度,确定所述目标人脸的第一不确定度。The first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
在一些实施方式中,所述预设关键点包括类型以下至少之一:In some implementations, the preset key points include at least one of the following types:
脸部轮廓关键点,眼睛关键点,眉毛关键点,鼻子关键点,嘴部关键点,耳朵关键点。Key points of face contour, key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
在一些实施方式中,所述根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果,包括:In some implementation manners, the determining the detection result of the target face according to the key point information and the first uncertainty includes:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
响应于所述可靠性分值满足第一预设条件,根据各个关键点的所述关键点信息,在所述待测人脸图像上输出所述关键点。In response to the reliability score satisfying a first preset condition, the key points are output on the human face image to be tested according to the key point information of each key point.
在一些实施方式中,所述根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果,包括:In some implementation manners, the determining the detection result of the target face according to the key point information and the first uncertainty includes:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
根据所述可靠性分值,以及预先建立的可靠性分值与人脸跟踪模型的对应关系,由预先设置的多个人脸跟踪模型中确定目标人脸跟踪模型;According to the reliability score and the corresponding relationship between the reliability score and the face tracking model established in advance, the target face tracking model is determined from a plurality of preset face tracking models;
利用所述目标人脸跟踪模型对所述目标人脸进行检测跟踪,得到所述目标人脸的所述检测结果。The target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
在一些实施方式中,所述根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果,包括:In some implementation manners, the determining the detection result of the target face according to the key point information and the first uncertainty includes:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
响应于所述可靠性分值满足第二预设条件,确定所述待测人脸图像的所述目标人脸检测通过。In response to the reliability score satisfying a second preset condition, it is determined that the target face detection of the face image to be tested passes.
在一些实施方式中,所述对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度,包括:In some implementations, the performing image detection on the face image to be tested, and determining the key point information of at least one key point of the target face and the first uncertainty of the target face include:
将所述待测人脸图像输入预先训练的特征提取网络,得到所述特征提取网络输出的特征图;Input the pre-trained feature extraction network of the human face image to be tested to obtain the feature map output by the feature extraction network;
将所述特征图输入预先训练的关键点检测网络,得到所述关键点检测网络输出的所述目标人脸的各个关键点的所述关键点信息以及所述第一不确定度。Inputting the feature map into a pre-trained key point detection network to obtain the key point information and the first uncertainty of each key point of the target face output by the key point detection network.
在一些实施方式中,本公开实施方式的方法,还包括对所述特征提取网络和所述关键点检测网络进行训练的训练过程,所述训练过程包括:In some embodiments, the method of the embodiment of the present disclosure also includes a training process for training the feature extraction network and the key point detection network, the training process includes:
获取样本数据集,所述样本数据集中的每个样本数据包括人脸样本图像,以及所述人脸样本图像中目标人脸的每个关键点的关键点标签;Obtain a sample data set, each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
对于任意一个样本数据,将所述人脸样本图像输入待训练的特征提取网络,得到所述特征提取网络输出的所述人脸样本图像的特征图;For any sample data, the face sample image is input into the feature extraction network to be trained, and the feature map of the human face sample image output by the feature extraction network is obtained;
将所述人脸样本图像的特征图输入待训练的关键点检测网络,得到所述目标人脸的每个关键点的关键点信息,以及所述目标人脸的第一不确定度;Input the feature map of the human face sample image into the key point detection network to be trained, obtain the key point information of each key point of the target human face, and the first uncertainty of the target human face;
基于所述关键点信息、关键点标签以及所述第一不确定度,确定所述关键点信息与所 述关键点标签之间的差异;Based on the key point information, the key point label and the first uncertainty, determine the difference between the key point information and the key point label;
根据所述差异调整所述特征提取网络和/或所述关键点检测网络的网络参数,直至满足收敛条件,得到训练后的所述特征提取网络和/或所述关键点检测网络。Adjust the network parameters of the feature extraction network and/or the key point detection network according to the difference until a convergence condition is satisfied, and obtain the trained feature extraction network and/or the key point detection network.
第二方面,本公开实施方式提供了一种图像处理装置,包括:In a second aspect, an embodiment of the present disclosure provides an image processing device, including:
获取模块,被配置为获取待测人脸图像,所述待测人脸图像中包括目标人脸;An acquisition module configured to acquire a face image to be tested, the face image to be tested includes a target face;
图像检测模块,被配置为对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度;其中,所述第一不确定度根据所述目标人脸的所有关键点的第二不确定度得到;The image detection module is configured to perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to the second uncertainty of all key points of the target face;
结果确定模块,被配置为根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果。The result determination module is configured to determine the detection result of the target face according to the key point information and the first uncertainty.
在一些实施方式中,所述获取模块被配置为:In some implementations, the acquisition module is configured to:
获取待处理图像,所述待处理图像中包括至少一个人脸;Acquiring an image to be processed, the image to be processed includes at least one human face;
对所述待处理图像进行图像检测,确定所述待处理图像上的每个所述人脸的人脸区域信息;performing image detection on the image to be processed, and determining face area information of each face on the image to be processed;
对于任意一个人脸,根据所述人脸区域信息裁切得到所述人脸对应的所述待测人脸图像。For any human face, the human face image to be tested corresponding to the human face is obtained by cropping according to the human face area information.
在一些实施方式中,所述图像检测模块被配置为:In some embodiments, the image detection module is configured to:
对所述待测人脸图像进行关键点检测,基于预先设置的人脸关键点类型,确定所述目标人脸的每个关键点的所述关键点信息和第二不确定度;Carry out key point detection on the face image to be tested, and determine the key point information and the second uncertainty of each key point of the target face based on the preset face key point type;
根据所述目标人脸的各个关键点的第二不确定度,确定所述目标人脸的第一不确定度。The first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
在一些实施方式中,所述预设关键点包括类型以下至少之一:In some implementations, the preset key points include at least one of the following types:
脸部轮廓关键点,眼睛关键点,眉毛关键点,鼻子关键点,嘴部关键点,耳朵关键点。Key points of face contour, key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
在一些实施方式中,所述结果确定模块被配置为:In some embodiments, the outcome determination module is configured to:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
响应于所述可靠性分值满足第一预设条件,根据各个关键点的所述关键点信息,在所述待测人脸图像上输出所述关键点。In response to the reliability score satisfying a first preset condition, the key points are output on the human face image to be tested according to the key point information of each key point.
在一些实施方式中,所述结果确定模块被配置为:In some embodiments, the outcome determination module is configured to:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
根据所述可靠性分值,以及预先建立的可靠性分值与人脸跟踪模型的对应关系,由预先设置的多个人脸跟踪模型中确定目标人脸跟踪模型;According to the reliability score and the corresponding relationship between the reliability score and the face tracking model established in advance, the target face tracking model is determined from a plurality of preset face tracking models;
利用所述目标人脸跟踪模型对所述目标人脸进行检测跟踪,得到所述目标人脸的所述检测结果。The target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
在一些实施方式中,所述结果确定模块被配置为:In some embodiments, the outcome determination module is configured to:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
响应于所述可靠性分值满足第二预设条件,确定所述待测人脸图像的所述目标人脸检 测通过。In response to the reliability score meeting a second preset condition, it is determined that the target face detection of the human face image to be tested has passed.
在一些实施方式中,所述图像检测模块包括:In some embodiments, the image detection module includes:
特征提取模块,被配置为将所述待测人脸图像输入预先训练的特征提取网络,得到所述特征提取网络输出的特征图;A feature extraction module configured to input the face image to be tested into a pre-trained feature extraction network to obtain a feature map output by the feature extraction network;
关键点检测模块,被配置为将所述特征图输入预先训练的关键点检测网络,得到所述关键点检测网络输出的所述目标人脸的各个关键点的所述关键点信息以及所述第一不确定度。The key point detection module is configured to input the feature map into a pre-trained key point detection network, and obtain the key point information and the first uncertainty of each key point of the target face output by the key point detection network.
在一些实施方式中,本公开实施方式所述的装置,还包括训练模块,所述训练模块被配置为:In some embodiments, the device described in the embodiments of the present disclosure further includes a training module configured to:
获取样本数据集,所述样本数据集中的每个样本数据包括人脸样本图像,以及所述人脸样本图像中目标人脸的每个关键点的关键点标签;Obtain a sample data set, each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
对于任意一个样本数据,将所述人脸样本图像输入待训练的特征提取网络,得到所述特征提取网络输出的所述人脸样本图像的特征图;For any sample data, the face sample image is input into the feature extraction network to be trained, and the feature map of the human face sample image output by the feature extraction network is obtained;
将所述人脸样本图像的特征图输入待训练的关键点检测网络,得到所述目标人脸的每个关键点的关键点信息,以及所述目标人脸的第一不确定度;Input the feature map of the human face sample image into the key point detection network to be trained, obtain the key point information of each key point of the target human face, and the first uncertainty of the target human face;
基于所述关键点信息、关键点标签以及所述第一不确定度,确定所述关键点信息与所述关键点标签之间的差异;determining a difference between the key point information and the key point label based on the key point information, the key point label, and the first uncertainty;
根据所述差异调整所述特征提取网络和/或所述关键点检测网络的网络参数,直至满足收敛条件,得到训练后的所述特征提取网络和/或所述关键点检测网络。Adjust the network parameters of the feature extraction network and/or the key point detection network according to the difference until a convergence condition is satisfied, and obtain the trained feature extraction network and/or the key point detection network.
第三方面,本公开实施方式提供了一种电子设备,包括:In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
处理器;以及processor; and
存储器,存储有能够被所述处理器读取的计算机指令,当所述计算机指令被读取时,所述处理器执行根据第一方面任一实施方式所述的方法。The memory stores computer instructions that can be read by the processor, and when the computer instructions are read, the processor executes the method according to any implementation manner of the first aspect.
第四方面,本公开实施方式提供了一种存储介质,用于存储计算机可读指令,所述计算机可读指令用于使计算机执行根据第一方面任一实施方式所述的方法。In a fourth aspect, the embodiments of the present disclosure provide a storage medium for storing computer-readable instructions, and the computer-readable instructions are used to cause a computer to execute the method according to any embodiment of the first aspect.
本公开实施方式的图像处理方法,包括获取待测人脸图像,对待测人脸图像进行图像检测,确定目标人脸的至少一个关键点的关键点信息以及目标人脸的第一不确定度,根据关键点信息和第一不确定度,确定目标人脸的检测结果。本公开实施方式,通过目标人脸的第一不确定度辅助对人脸关键点的检测,提高人脸检测效果和精度,同时可适用于多种任务场景,并且基于目标人脸的所有关键点的第二不确定度确定代表目标人脸综合误差的第一不确定度,提高网络效果和训练效率。The image processing method according to the embodiment of the present disclosure includes acquiring a face image to be tested, performing image detection on the face image to be tested, determining key point information of at least one key point of a target face and a first uncertainty of the target face, and determining a detection result of the target face according to the key point information and the first uncertainty. In the embodiments of the present disclosure, the first uncertainty of the target face is used to assist the detection of key points of the face to improve the effect and accuracy of face detection, and at the same time, it is applicable to various task scenarios, and the first uncertainty representing the comprehensive error of the target face is determined based on the second uncertainty of all key points of the target face, so as to improve the network effect and training efficiency.
附图说明Description of drawings
为了更清楚地说明本公开具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图 是本公开的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of the present disclosure or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the specific embodiments or prior art. Obviously, the accompanying drawings in the following description are some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative work.
图1是根据本公开一些实施方式的图像处理方法的流程图。FIG. 1 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图2是根据本公开一些实施方式的图像处理方法的流程图。FIG. 2 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图3是根据本公开一些实施方式的图像处理方法的流程图。FIG. 3 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图4是根据本公开一些实施方式中人脸关键点的示意图。Fig. 4 is a schematic diagram of facial key points according to some implementations of the present disclosure.
图5是根据本公开一些实施方式中图像检测网络的结构示意图。Fig. 5 is a schematic structural diagram of an image detection network according to some embodiments of the present disclosure.
图6是根据本公开一些实施方式的图像处理方法的流程图。FIG. 6 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图7是根据本公开一些实施方式的图像处理方法的流程图。FIG. 7 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图8是根据本公开一些实施方式的图像处理方法的流程图。FIG. 8 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图9是根据本公开一些实施方式的图像处理方法的流程图。FIG. 9 is a flowchart of an image processing method according to some embodiments of the present disclosure.
图10是根据本公开一些实施方式的图像处理装置的结构框图。FIG. 10 is a structural block diagram of an image processing device according to some embodiments of the present disclosure.
图11是根据本公开一些实施方式的图像处理装置的结构框图。FIG. 11 is a structural block diagram of an image processing device according to some embodiments of the present disclosure.
图12是根据本公开一些实施方式中电子设备的结构框图。Fig. 12 is a structural block diagram of an electronic device according to some embodiments of the present disclosure.
具体实施方式Detailed ways
下面将结合附图对本公开的技术方案进行清楚、完整地描述,显然,所描述的实施方式是本公开一部分实施方式,而不是全部的实施方式。基于本公开中的实施方式,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施方式,都属于本公开保护的范围。此外,下面所描述的本公开不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。The technical solutions of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings. Apparently, the described implementations are part of the implementations of the present disclosure, but not all of them. Based on the implementation manners in the present disclosure, all other implementation manners obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present disclosure. In addition, the technical features involved in different embodiments of the present disclosure described below may be combined with each other as long as they do not constitute a conflict with each other.
人脸关键点检测是人脸识别任务的必要手段,人脸关键点检测是指从人脸图像中定位出人脸面部的特征关键点,例如脸部轮廓关键点、五官关键点等,脸部轮廓关键点可以包括下巴关键点、下颌关键点、脸颊关键点等,五官关键点可以包括眼睛关键点、眉毛关键点、鼻子关键点、嘴部关键点、耳朵关键点等。Face key point detection is a necessary means for face recognition tasks. Face key point detection refers to locating key points of facial features from the face image, such as key points of facial contour and key points of facial features. Key points of facial contour can include key points of chin, jaw, and cheeks.
目前,基于深度神经网络(DNN,Deep Neural Network)的人脸关键点定位是最为高效且常用的检测方式。相关技术中,为提高DNN对关键点预测定位的准确性,会针对每个关键点设计不确定度参数,利用DNN预测每个人脸关键点的不确定度,基于不确定度对关键点坐标进行回归预测,从而DNN输出精度相对较高的人脸关键点。At present, face key point location based on deep neural network (DNN, Deep Neural Network) is the most efficient and commonly used detection method. In related technologies, in order to improve the accuracy of DNN in predicting and positioning key points, uncertainty parameters are designed for each key point, and DNN is used to predict the uncertainty of each key point of a face, and the coordinates of key points are regressed and predicted based on the uncertainty, so that DNN outputs face key points with relatively high accuracy.
但是,在这种方案中,由于DNN检测的每个人脸关键点都需要回归一个不确定度,对于检测精度要求较高的网络,人脸关键点的数量可能达到几百上千个,导致DNN网络结构复杂度和计算量十分庞大,成本较高。并且,在DNN训练过程中,对于人脸关键点的不确定度无法设计明确的优化目标,导致网络难以收敛,实际使用效果较差。However, in this scheme, since each face key point detected by DNN needs to return an uncertainty, for a network that requires high detection accuracy, the number of face key points may reach hundreds or thousands, resulting in a very large complexity of DNN network structure and calculation, and high cost. Moreover, in the DNN training process, it is impossible to design a clear optimization goal for the uncertainty of key points of the face, which makes it difficult for the network to converge, and the actual use effect is poor.
基于上述缺陷,本公开实施方式提供了一种图像处理方法、装置、电子设备以及存储介质,旨在提高人脸关键点定位精度,并且优化图像检测网络的结构和效果。Based on the above defects, embodiments of the present disclosure provide an image processing method, device, electronic equipment, and storage medium, aiming at improving the accuracy of facial key point positioning and optimizing the structure and effect of an image detection network.
第一方面,本公开实施方式提供了一种图像处理方法,该方法可应用于电子设备。本公开实施方式中,电子设备可以是任何适于实施的设备类型,例如移动终端、车载终端、可穿戴设备、门禁系统、视频监控系统、云平台及服务器等,本公开对此不作限制。In a first aspect, an embodiment of the present disclosure provides an image processing method, which can be applied to an electronic device. In the embodiments of the present disclosure, the electronic device may be any type of device suitable for implementation, such as a mobile terminal, a vehicle terminal, a wearable device, an access control system, a video surveillance system, a cloud platform, and a server, etc., and the present disclosure does not limit this.
如图1所示,在一些实施方式中,本公开示例的图像处理方法,包括:As shown in Figure 1, in some implementations, the image processing method of the present disclosure example includes:
S110、获取待测人脸图像,待测人脸图像中包括目标人脸。S110. Acquire a face image to be tested, where the face image to be tested includes a target face.
具体而言,待测人脸图像是指期望于由图像中检测出人脸对象的图像,从而待测人脸图像中可包括一个或多个人脸对象,该人脸对象即为所述的目标人脸。Specifically, the face image to be tested refers to an image in which a face object is expected to be detected, so that the face image to be tested may include one or more face objects, and the face object is the target face.
在本公开实施方式中,待测人脸图像可以是由电子设备的图像采集装置采集到的单帧图像,也可以是由电子设备的图像采集装置采集的视频流中的帧图像。In the embodiments of the present disclosure, the face image to be tested may be a single frame image collected by the image collection device of the electronic device, or may be a frame image in a video stream collected by the image collection device of the electronic device.
例如一个示例中,电子设备以智能手机为例,智能手机包括摄像头,通过摄像头可以拍摄到包括人脸的图像,该图像即可作为本公开所述的待测人脸图像。For example, in one example, the electronic device is a smart phone. The smart phone includes a camera, and an image including a human face can be captured through the camera, and the image can be used as the human face image to be tested in the present disclosure.
例如另一个示例中,电子设备以视频监控系统为例,视频监控系统包括监控摄像头,通过监控摄像头可以采集到目标场景区域中包括人脸的视频流,视频流中的帧图像即可作为本公开所述的待测人脸图像。For example, in another example, the electronic device takes a video surveillance system as an example. The video surveillance system includes a surveillance camera, which can capture a video stream including a human face in the target scene area through the surveillance camera. The frame images in the video stream can be used as the face image to be tested in the present disclosure.
总而言之,待测人脸图像可以是任何期望于从图像中检测得到人脸对象的图像,可以是实时采集获取的图像,也可以是通过网络上传或者下载的人脸图像,本公开对此不再赘述。In a word, the face image to be tested can be any image that is expected to detect a face object from the image, it can be an image acquired in real time, or it can be a face image uploaded or downloaded through the network, which will not be repeated in this disclosure.
在一些实施方式中,考虑到电子设备获取的人脸图像中往往具备较多的干扰因素,例如人脸图像中包括多张人脸对象,又例如人脸图像中包括较大面积的非人脸区域。为提高后续关键点检测精度,可以预先对人脸图像进行裁切处理,将裁切后仅包括一个人脸对象的图像作为待测人脸图像。本公开下述实施方式中进行说明,在此暂不详述。In some implementations, it is considered that the face image acquired by the electronic device often has many interference factors, for example, the face image includes multiple face objects, and for example, the face image includes a large non-face area. In order to improve the accuracy of subsequent key point detection, the face image can be cropped in advance, and the cropped image including only one face object can be used as the face image to be tested. The present disclosure will be described in the following embodiments, and will not be described in detail here.
S120、对待测人脸图像进行图像检测,确定目标人脸的至少一个关键点的关键点信息以及目标人脸的第一不确定度。S120. Perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face.
具体而言,人脸关键点检测需要由待测人脸图像中检测得到多个人脸关键点,这些人脸关键点可以分别属于不同的人脸关键点类型。关键点类型可以包括例如脸部轮廓关键点、眼睛关键点、眉毛关键点、鼻子关键点、嘴部关键点、耳朵关键点等,每一个关键点类型可以包括多个关键点,例如眉毛关键点可以包括5*2共计10个关键点。Specifically, face key point detection needs to detect multiple face key points from the face image to be tested, and these face key points may respectively belong to different face key point types. Key point types can include, for example, facial contour key points, eye key points, eyebrow key points, nose key points, mouth key points, ear key points, etc. Each key point type can include multiple key points, for example, eyebrow key points can include 5*2 total of 10 key points.
本公开实施方式中,关键点信息可以包括每个关键点对应的关键点坐标,例如,可以基于图像检测技术对待测人脸图像进行图像检测,从而由待测人脸图像中检测得到目标人脸的所有关键点,以及每个关键点在图像中的位置坐标。In the embodiments of the present disclosure, the key point information may include key point coordinates corresponding to each key point. For example, image detection may be performed on the face image to be tested based on image detection technology, so that all key points of the target face and the position coordinates of each key point in the image may be obtained from the face image to be tested.
同时,在本公开实施方式中,关键点检测时还需要确定目标人脸的第一不确定度,第一不确定度表示目标人脸的所有关键点的综合误差,也即,目标人脸的第一不确定度根据所有关键点的第二不确定度得到。第一不确定度越高,表示对目标人脸的关键点检测的误差也越大,反之,第一不确定度越低,表示目标人脸的关键点检测的误差越小。At the same time, in the embodiments of the present disclosure, the first uncertainty of the target face needs to be determined during key point detection, and the first uncertainty represents the comprehensive error of all key points of the target face, that is, the first uncertainty of the target face is obtained according to the second uncertainty of all key points. The higher the first uncertainty, the greater the error of the key point detection of the target face, and on the contrary, the lower the first uncertainty, the smaller the error of the key point detection of the target face.
在一些实施方式中,可以利用预先训练的关键点检测网络,对目标人脸的每个关键点 的位置坐标进行预测,从而得到每个关键点的关键点信息。同时,关键点检测网络还可以对每个关键点的不确定度进行预测,得到每个关键点对应的第二不确定度。In some embodiments, the pre-trained key point detection network can be used to predict the position coordinates of each key point of the target face, so as to obtain the key point information of each key point. At the same time, the key point detection network can also predict the uncertainty of each key point to obtain the second uncertainty corresponding to each key point.
以目标人脸的眉毛关键点中的某一个关键点A为例,关键点检测网络可以基于目标人脸的眉毛特征预测得到该关键点A的位置坐标(x,y)和第二不确定度p,第二不确定度p表示该关键点的位置坐标(x,y)的误差。第二不确定度p越大,表示位置坐标(x,y)的误差也越大,反之,第二不确定度p越小,表示位置坐标(x,y)的误差也越小。Taking a certain key point A in the eyebrow key points of the target face as an example, the key point detection network can predict the position coordinates (x, y) and the second uncertainty p of the key point A based on the eyebrow feature of the target face, and the second uncertainty p represents the error of the position coordinates (x, y) of the key point. The larger the second uncertainty p, the larger the error of the position coordinates (x, y), on the contrary, the smaller the second uncertainty p, the smaller the error of the position coordinates (x, y).
可以理解,上述仅以其中一个关键点为例进行说明,对于目标人脸的所有关键点,每个关键点均对应有关键点信息和第二不确定度。在本公开实施方式中,并非直接基于每个关键点的第二不确定度确定目标人脸的检测结果,而是根据所有关键点的第二不确定度计算得到目标人脸的第一不确定度,将第一不确定度作为目标人脸所对应的综合不确定度。It can be understood that only one of the key points is used as an example for illustration, and for all key points of the target face, each key point corresponds to key point information and a second uncertainty. In the embodiment of the present disclosure, instead of directly determining the detection result of the target face based on the second uncertainty of each key point, the first uncertainty of the target face is calculated according to the second uncertainties of all key points, and the first uncertainty is used as the comprehensive uncertainty corresponding to the target face.
在一些实施方式中,可以将所有关键点的第二不确定度的均方根作为目标人脸对应的第一不确定度。在另一些实施方式中,可以将所有关键点的第二不确定的均值作为目标人脸对应的第一不确定度。当然,可以理解,还可以采用其他方式融合所有关键点的第二不确定度得到第一不确定度,只要保证第一不确定度可以代表关键点的综合误差即可,本公开对此不作限制。In some implementation manners, the root mean square of the second uncertainties of all key points may be used as the first uncertainties corresponding to the target face. In some other implementation manners, the mean value of the second uncertainty of all key points may be used as the first uncertainty corresponding to the target face. Of course, it can be understood that the first uncertainty can also be obtained by fusing the second uncertainties of all key points in other ways, as long as the first uncertainty can represent the comprehensive error of the key points, which is not limited in the present disclosure.
S130、根据关键点信息和第一不确定度,确定目标人脸的检测结果。S130. Determine the detection result of the target face according to the key point information and the first uncertainty.
具体而言,在确定目标人脸的每个关键点的关键点信息,以及目标人脸所对应的第一不确定度之后,即可根据不同的下游任务场景,设置对应的后处理逻辑,从而基于关键点信息和第一不确定度得到针对目标人脸的检测结果。Specifically, after determining the key point information of each key point of the target face and the first uncertainty corresponding to the target face, the corresponding post-processing logic can be set according to different downstream task scenarios, so as to obtain the detection result for the target face based on the key point information and the first uncertainty.
一个示例中,以人脸照片入库为例,用户上传的人脸图像必须符合一定的要求,例如不得遮挡眉毛、不得偏头角度过大等。从而通过本公开上述过程,可以对用户上传的人脸照片进行图像处理,得到人脸图像的每个关键点的关键点信息,以及针对目标人脸的第一不确定度。当第一不确定度大于预设阈值时,说明用户上传的人脸图像关键点检测的偏差较大,可能存在五官遮挡等问题,从而可向用户输出对应的检测结果为不通过,且某个五官存在遮挡。In one example, take face photo storage as an example, the face image uploaded by the user must meet certain requirements, such as not covering the eyebrows, not tilting the head too far, and so on. Therefore, through the above process of the present disclosure, image processing can be performed on the face photo uploaded by the user to obtain the key point information of each key point of the face image and the first uncertainty for the target face. When the first uncertainty is greater than the preset threshold, it means that the key point detection deviation of the face image uploaded by the user is large, and there may be problems such as facial features occlusion, so that the corresponding detection result can be output to the user as failed, and a certain facial feature is occluded.
另一个示例中,以人脸追踪场景为例,不同的光照情况下电子设备对应的工况不同,例如在极暗光场景下,电子设备采集的人脸图像曝光度很低,从而关键点检测得到的第一不确定度较大。反之,例如在明亮场景下,电子设备采集的人脸图像曝光度正常,从而关键点检测得到的第一不确定度较低。基于此,可以通过设置合适的阈值,利用第一不确定度确定设备当前所处的光照环境,从而采用对应的追踪算法模型实现人脸追踪。In another example, taking the face tracking scene as an example, the electronic device corresponds to different working conditions under different lighting conditions. For example, in an extremely dark light scene, the exposure of the face image collected by the electronic device is very low, so the first uncertainty obtained by key point detection is relatively large. On the contrary, for example, in a bright scene, the exposure of the face image collected by the electronic device is normal, so the first uncertainty obtained by the key point detection is relatively low. Based on this, by setting an appropriate threshold, the first uncertainty can be used to determine the current lighting environment of the device, so that the corresponding tracking algorithm model can be used to realize face tracking.
当然,可以理解,本公开示例的场景并不局限于上述示例,本公开下文中对此进行具体说明,在此暂不详述。Of course, it can be understood that the scenarios of the examples in the present disclosure are not limited to the above examples, which will be described in detail below in the present disclosure, and will not be described in detail here.
值得说明的是,本公开实施方式中,基于目标人脸的所有关键点的第二不确定度确定代表目标人脸综合误差的第一不确定度,从而对于关键点检测网络,在训练过程中无需对每个关键点的不确定度进行回归优化,而是对人脸的综合不确定度进行优化,网络容易收 敛,效果更好,并且大大提高训练效率。It is worth noting that in the embodiment of the present disclosure, the first uncertainty representing the comprehensive error of the target face is determined based on the second uncertainty of all key points of the target face, so that for the key point detection network, it is not necessary to perform regression optimization on the uncertainty of each key point during the training process, but to optimize the comprehensive uncertainty of the face, the network is easy to converge, the effect is better, and the training efficiency is greatly improved.
通过上述可知,本公开实施方式中,通过目标人脸的第一不确定度辅助对人脸关键点的检测,提高人脸检测效果和精度。并且基于目标人脸的所有关键点的第二不确定度确定代表目标人脸综合误差的第一不确定度,提高网络效果和训练效率。同时,本公开方法对于应用场景不作限制,可以适用于多种场景的下游任务,例如人脸图像质量检测、人脸追踪、关键点定位等,鲁棒性更高。It can be known from the above that in the embodiments of the present disclosure, the first uncertainty of the target face is used to assist the detection of key points of the face, so as to improve the effect and accuracy of face detection. And based on the second uncertainty of all the key points of the target face, the first uncertainty representing the comprehensive error of the target face is determined to improve the network effect and training efficiency. At the same time, the disclosed method does not limit the application scenarios, and can be applied to downstream tasks in various scenarios, such as face image quality detection, face tracking, key point positioning, etc., and has higher robustness.
如图2所示,在一些实施方式中,本公开示例的图像处理方法中,获取待测人脸图像的过程,包括:As shown in Figure 2, in some implementations, in the image processing method of the present disclosure example, the process of obtaining the face image to be tested includes:
S210、获取待处理图像,待处理图像中包括至少一个人脸。S210. Acquire an image to be processed, where the image to be processed includes at least one human face.
S220、对待处理图像进行图像检测,确定待处理图像上的每个人脸的人脸区域信息。S220. Perform image detection on the image to be processed, and determine face area information of each face on the image to be processed.
S230、对于任意一个人脸,根据人脸区域信息裁切得到人脸对应的待测人脸图像。S230. For any one face, crop according to the face area information to obtain a face image to be tested corresponding to the face.
具体而言,待处理图像可以是通过电子设备的图像采集装置采集到的原始图像,或者用户上传至电子设备的上传图像。可以理解,待处理图像中可能包括一个人脸,也可能包括多个人脸。Specifically, the image to be processed may be an original image collected by an image collection device of the electronic device, or an uploaded image uploaded to the electronic device by a user. It can be understood that the image to be processed may include one human face, or may include multiple human faces.
本公开实施方式中,可基于图像检测技术,对待处理图像进行图像检测,得到待处理图像上每个人脸的人脸区域信息。例如一个示例中,可以通过例如CenterFace网络对待处理图像进行图像检测,从而得到待处理图像上每个人脸区域的人脸检测框,人脸检测框也即人脸区域信息。In the embodiments of the present disclosure, image detection may be performed on the image to be processed based on the image detection technology to obtain face area information of each face on the image to be processed. For example, in one example, the image to be processed can be detected through the CenterFace network, so as to obtain the face detection frame of each face area on the image to be processed, and the face detection frame is also the face area information.
在得到每个人脸的人脸检测框之后,即可根据人脸检测框对待处理图像进行裁切处理,从而得到包括每个人脸区域的人脸图像,该人脸图像即为待测人脸图像。After the face detection frame of each face is obtained, the image to be processed can be cropped according to the face detection frame, so as to obtain a face image including each face area, which is the face image to be tested.
在一个示例中,可以每个人脸检测框的中心点为原点,保持原点坐标不变以预设比例对人脸检测框整体进行均匀外扩,沿外扩后的人脸检测框裁切出人脸图像。In an example, the center point of each face detection frame can be used as the origin, and the coordinates of the origin are kept unchanged to uniformly expand the entire face detection frame at a preset ratio, and the face image is cut out along the expanded face detection frame.
可以理解,在待处理图像上包括多个人脸时,通过图2实施方式过程,可以裁切出每个人脸的人脸图像,这些人脸图像均可以作为本公开所述的待测人脸图像。It can be understood that when the image to be processed includes multiple faces, the face image of each face can be cut out through the process of the embodiment shown in FIG. 2 , and these face images can be used as the face image to be tested in the present disclosure.
在一些实施方式中,在得到待测人脸图像之后,即可基于图像检测技术对待测人脸图像中的目标人脸进行关键点检测,下面结合图3进行说明。In some embodiments, after obtaining the face image to be tested, key point detection can be performed on the target face in the face image to be tested based on the image detection technology, which will be described below with reference to FIG. 3 .
如图3所示,在一些实施方式中,本公开示例的图像处理方法,对待测人脸图像进行图像检测的过程包括:As shown in FIG. 3 , in some implementations, in the image processing method of the disclosed example, the process of performing image detection on the face image to be tested includes:
S310、对待测人脸图像进行关键点检测,基于预先设置的人脸关键点类型,确定目标人脸的每个关键点的关键点信息和第二不确定度。S310. Perform key point detection on the face image to be tested, and determine key point information and a second uncertainty of each key point of the target face based on the preset key point type of the face.
S320、根据目标人脸的各个关键点的第二不确定度,确定目标人脸的第一不确定度。S320. Determine the first uncertainty of the target face according to the second uncertainty of each key point of the target face.
具体而言,在对目标人脸进行关键点检测时,需要从待测人脸图像中检测到属于目标人脸的一种或者多种的关键点类型所包括的关键点。关键点类型包括例如眼睛关键点、眉毛关键点、鼻子关键点、嘴部关键点、脸部轮廓关键点等,其中每个关键点类型可包括多个关键点。Specifically, when performing key point detection on a target face, key points included in one or more key point types belonging to the target face need to be detected from the face image to be tested. Keypoint types include, for example, eye keypoints, eyebrow keypoints, nose keypoints, mouth keypoints, facial contour keypoints, etc., wherein each keypoint type may include multiple keypoints.
例如图4所示,预先设置的人脸关键点类型可以包括如下表一所示:For example, as shown in Figure 4, the preset face key point types can include the following table 1:
表一Table I
人脸关键点类型face key point type 关键点编号key point number
脸部轮廓关键点Key points of facial contour 0~320~32
眉毛关键点Eyebrow Key Points 33~4233~42
鼻子关键点nose key points 43~5143~51
眼睛关键点eye key points 52~6352~63
嘴部关键点Mouth Key Points 64~8364~83
当然可以理解,人脸关键点类型并不局限于上述表一示例,还可以包括其他任何适于实施的关键点类型,例如耳朵关键点、苹果肌关键点等等,本公开对此不作限制。Of course, it can be understood that the face key point types are not limited to the examples in Table 1 above, and may also include any other key point types suitable for implementation, such as ear key points, apple muscle key points, etc., which are not limited in the present disclosure.
在本公开实施方式中,可以基于图像检测对待测人脸图像进行上述关键点类型的检测,从而可以确定目标人脸的每个关键点的关键点信息以及每个关键点的第二不确定度。In the embodiments of the present disclosure, the above-mentioned key point type detection may be performed on the to-be-tested face image based on image detection, so that the key point information of each key point of the target face and the second uncertainty of each key point may be determined.
基于前述可知,对于任意一个关键点,其关键点信息包括该关键点在图像坐标系中的位置坐标,而第二不确定度表示该关键点定位结果的不确定程度。从而,对于目标人脸的所有关键点,可基于每个关键点的第二不确定计算得到目标人脸所对应的第一不确定度。例如一个示例中,可以将所有关键点的第二不确定度的均方根作为目标人脸所对应的第一不确定度。Based on the foregoing, it can be seen that for any key point, its key point information includes the position coordinates of the key point in the image coordinate system, and the second uncertainty represents the degree of uncertainty of the key point positioning result. Therefore, for all the key points of the target face, the first uncertainty corresponding to the target face can be obtained based on the second uncertainty calculation of each key point. For example, in an example, the root mean square of the second uncertainties of all key points may be used as the first uncertainties corresponding to the target face.
在一些实施方式中,可以基于预先训练的图像检测网络实现对待测人脸图像中目标人脸的关键点检测。图5示出了本公开一些实施方式中的图像检测网络结构,下面结合图5进行说明。In some implementations, the key point detection of the target face in the face image to be tested can be realized based on a pre-trained image detection network. Fig. 5 shows the image detection network structure in some embodiments of the present disclosure, which will be described below in conjunction with Fig. 5 .
如图5所示,在一些实施方式中,本公开示例的图像检测网络包括特征提取网络510和关键点检测网络520。As shown in FIG. 5 , in some implementations, the image detection network of the example of the present disclosure includes a feature extraction network 510 and a key point detection network 520 .
特征提取网络510为图像检测网络的骨干网络(Backbone Network),其主要用于对待测人脸图像进行特征提取,从而得到包括待测人脸语义特征和纹理特征的特征图(feature map)。也即,特征提取网络510的输入为待测人脸图像,输出为待测人脸图像的特征图。The feature extraction network 510 is the backbone network (Backbone Network) of the image detection network, which is mainly used for feature extraction of the face image to be tested, thereby obtaining a feature map (feature map) including semantic features and texture features of the face to be tested. That is, the input of the feature extraction network 510 is the human face image to be tested, and the output is the feature map of the human face image to be tested.
在一些示例实施方式中,特征提取网络510可以采用基于卷积神经网络(CNN,Convolutional Neural Network)架构的可学习网络,例如在一个示例中,为便于在移动终端中部署,特征提取网络510可以采用较为轻量级的MobileNet神经网络。In some exemplary implementations, the feature extraction network 510 can adopt a learnable network based on a convolutional neural network (CNN, Convolutional Neural Network) architecture. For example, in one example, in order to facilitate deployment in mobile terminals, the feature extraction network 510 can adopt a relatively lightweight MobileNet neural network.
关键点检测网络520用于根据特征提取网络510输出的特征图,预测输出关键点信息以及第一不确定度。例如图5示例中,关键点检测网络520的网络结构包括两个分支,也即输出层分为两个全连接层。其中一个分支为关键点信息预测,用于对目标人脸的每个关键点的位置坐标进行回归预测,得到每个关键点的关键点信息。其中另一个分支为不确定度预测,用于根据每个关键点的不确定度预测输出目标人脸的第一不确定度。The key point detection network 520 is used to predict and output key point information and the first uncertainty according to the feature map output by the feature extraction network 510 . For example, in the example in FIG. 5 , the network structure of the key point detection network 520 includes two branches, that is, the output layer is divided into two fully connected layers. One of the branches is key point information prediction, which is used to perform regression prediction on the position coordinates of each key point of the target face, and obtain the key point information of each key point. The other branch is uncertainty prediction, which is used to predict the first uncertainty of the output target face according to the uncertainty of each key point.
在一个示例中,关键点检测网络520的池化层采用7*7的池化层,每个全连接层采用 256*1维的全连接层。In an example, the pooling layer of the key point detection network 520 adopts a 7*7 pooling layer, and each fully connected layer adopts a 256*1-dimensional fully connected layer.
在一些实施方式中,利用图5所示的图像检测网络对待测人脸图像进行处理之前,还包括对待测人脸图像进行归一化处理的过程,归一化处理的目的是将待测人脸图像的像素值进行归一化,从而得到符合网络设计要求的输入图像,减小计算量。In some embodiments, before using the image detection network shown in FIG. 5 to process the face image to be tested, it also includes a process of normalizing the face image to be tested. The purpose of the normalization process is to normalize the pixel values of the face image to be tested, so as to obtain an input image that meets the network design requirements and reduce the amount of calculation.
在一个示例中,在待测人脸图像输入图像检测网络之前,可首先通过例如双线性插值将待测人脸图像缩放至预设尺寸,例如112像素*112像素,并且对图像进行像素归一化,表示为:In one example, before the face image to be tested is input into the image detection network, the face image to be tested may first be scaled to a preset size, such as 112 pixels*112 pixels, by bilinear interpolation, for example, and the image is pixel-normalized, expressed as:
I Norm=(I-127.5)/127.5           式(1) I Norm =(I-127.5)/127.5 Formula (1)
式(1)中,I Norm表示归一化处理后的图像像素值,I表示原图像的像素值,将归一化处理后的图像作为图像检测网络的输入图像。 In formula (1), I Norm represents the normalized image pixel value, I represents the pixel value of the original image, and the normalized image is used as the input image of the image detection network.
本公开实施方式,在得到图像检测网络预测输出的目标人脸的关键点信息和目标人脸的第一不确定度之后,可以根据下游任务的具体需求,得到不同的针对目标人脸的检测结果,下面分别进行说明。In the embodiments of the present disclosure, after obtaining the key point information of the target face predicted and output by the image detection network and the first uncertainty of the target face, different detection results for the target face can be obtained according to the specific requirements of downstream tasks, which will be described separately below.
例如一些场景中,期望由场景图像中检测出人脸,并且在场景图像中显示出人脸关键点的可视化效果。如图6所示,在该场景中,本公开的图像处理方法,确定目标人脸的检测结果包括:For example, in some scenes, it is expected to detect the human face from the scene image, and display the visualization effect of the key points of the human face in the scene image. As shown in FIG. 6, in this scene, the image processing method of the present disclosure determines the detection result of the target face includes:
S131-1、根据目标人脸的第一不确定度,确定目标人脸的可靠性分值。S131-1. Determine the reliability score of the target face according to the first uncertainty of the target face.
S132-2、响应于可靠性分值满足第一预设条件,根据各个关键点的关键点信息,在待测人脸图像上输出关键点。S132-2. In response to the reliability score satisfying the first preset condition, output the key points on the face image to be tested according to the key point information of each key point.
具体而言,在得到图像检测网络输出的目标人脸的关键点信息和目标人脸的第一不确定度之后,可以基于第一不确定度计算出目标人脸的可靠性分值。Specifically, after obtaining the key point information of the target face output by the image detection network and the first uncertainty of the target face, the reliability score of the target face can be calculated based on the first uncertainty.
可以理解,第一不确定度表示对目标人脸进行关键点检测定位的综合误差,其反应的是检测出的关键点信息的可靠程度,基于此可以确定目标人脸的可靠性分值。It can be understood that the first uncertainty represents the comprehensive error of key point detection and positioning of the target face, which reflects the reliability of the detected key point information, based on which the reliability score of the target face can be determined.
在一个示例中,图像检测网络输出的第一不确定度为位于0~1之间的数值,从而确定的目标人脸的可靠性分值,即可表示为:In an example, the first uncertainty output by the image detection network is a value between 0 and 1, so the reliability score of the determined target face can be expressed as:
θ=1-α              式(2)θ=1-α Equation (2)
式(2)中,θ表示目标人脸的可靠性分值,α表示目标人脸的第一不确定度。In formula (2), θ represents the reliability score of the target face, and α represents the first uncertainty of the target face.
在本公开实施方式中,可以预先基于先验知识或者场景需求设置第一预设阈值,第一预设阈值表示目标人脸的关键点检测结果通过与否的临界值。当可靠性分值大于该第一预设阈值时,表示目标人脸的检测结果为可靠结果,也即检测通过,满足第一预设条件。反之,当可靠性分值不大于该第一预设阈值时,表示目标人脸的检测结果不可靠,也即检测不通过,不满足第一预设条件。In the embodiments of the present disclosure, a first preset threshold may be set in advance based on prior knowledge or scene requirements, and the first preset threshold represents a critical value for passing or failing the key point detection result of the target face. When the reliability score is greater than the first preset threshold, it means that the detection result of the target face is a reliable result, that is, the detection is passed, and the first preset condition is met. Conversely, when the reliability score is not greater than the first preset threshold, it indicates that the detection result of the target face is unreliable, that is, the detection fails, and the first preset condition is not met.
在确定可靠性分值满足第一预设条件的情况下,即可根据每个关键点的关键点信息,在原始的待测人脸图像上标注出各个关键点,从而用户可以观看到每个关键点在图像上的位置,实现人脸关键点的可视化输出。When it is determined that the reliability score satisfies the first preset condition, each key point can be marked on the original face image to be tested according to the key point information of each key point, so that the user can watch the position of each key point on the image, and realize the visual output of the key point of the face.
例如一些场景中,在对人脸进行实时跟踪时,往往需要针对不同的工况采用不同的跟踪模型。举例来说,对于例如极暗光、逆光、模糊等极端场景,需要采用适用于极端场景的人脸跟踪模型;而对于例如光照良好的普通场景,则采用适用于普通场景的人脸跟踪模型即可。从而,本公开一些实施方式中,可以基于第一不确定度确定当前所处的场景复杂度,实现对人脸跟踪模型的切换,下面结合图7实施方式进行说明。For example, in some scenarios, it is often necessary to use different tracking models for different working conditions when tracking human faces in real time. For example, for extreme scenes such as extremely dark light, backlight, blur, etc., it is necessary to use a face tracking model suitable for extreme scenes; and for ordinary scenes such as good lighting, it is sufficient to use a face tracking model suitable for ordinary scenes. Therefore, in some implementations of the present disclosure, the complexity of the current scene may be determined based on the first uncertainty to implement switching of the face tracking model, which will be described below with reference to the implementation in FIG. 7 .
如图7所示,在一些实施方式中,本公开的图像处理方法,确定目标人脸的检测结果包括:As shown in FIG. 7, in some implementations, in the image processing method of the present disclosure, determining the detection result of the target face includes:
S132-1、根据目标人脸的第一不确定度,确定目标人脸的可靠性分值。S132-1. Determine the reliability score of the target face according to the first uncertainty of the target face.
S132-2、根据可靠性分值,以及预先建立的可靠性分值与人脸跟踪模型的对应关系,由预先设置的多个人脸跟踪模型中确定目标人脸跟踪模型。S132-2. Determine a target face tracking model from a plurality of preset face tracking models according to the reliability score and the pre-established correspondence between the reliability score and the face tracking model.
S132-3、利用目标人脸跟踪模型对目标人脸进行检测跟踪,得到目标人脸的检测结果。S132-3. Use the target face tracking model to detect and track the target face, and obtain a detection result of the target face.
具体而言,在本示例中,可以基于前述图6实施方式的过程,确定目标人脸的可靠性分值,本公开对此不再赘述。Specifically, in this example, the reliability score of the target face may be determined based on the aforementioned process of the implementation manner in FIG. 6 , which will not be repeated in this disclosure.
可以理解的是,对于不同光照场景的待检测人脸图像,关键点检测得到的第一不确定度也应当不同。例如在极暗光场景下,电子设备采集的人脸图像曝光度很低,从而关键点检测得到的第一不确定度较大,相应的,目标人脸的可靠性分值也就越低。反之,例如在明亮场景下,电子设备采集的人脸图像曝光度正常,从而关键点检测得到的第一不确定度较低,相应的,目标人脸的可靠性分值也就越高。It can be understood that for face images to be detected in different lighting scenes, the first uncertainties obtained by key point detection should also be different. For example, in an extremely dark scene, the exposure of the face image collected by the electronic device is very low, so the first uncertainty obtained by the key point detection is relatively large, and correspondingly, the reliability score of the target face is also lower. Conversely, for example, in a bright scene, the exposure of the face image collected by the electronic device is normal, so the first uncertainty obtained by the key point detection is lower, and correspondingly, the reliability score of the target face is higher.
据此可以基于先验知识或者有限次试验,预先建立可靠性分值与人脸跟踪模型的对应关系。在一个示例中,预先建立的对应关系可如下表二所示:Accordingly, the correspondence between the reliability score and the face tracking model can be established in advance based on prior knowledge or a limited number of experiments. In an example, the pre-established correspondence can be shown in Table 2 below:
表二Table II
可靠性分值reliability score 人脸跟踪模型Face Tracking Model 光线场景light scene
[0,0.6)[0,0.6) 模型1model 1 普通场景normal scene
[0.6,1][0.6, 1] 模型2model 2 暗光场景dark scene
从而在确定目标人脸的可靠性分值之后,即可根据上表二中的对应关系,确定与可靠性分值对应的人脸跟踪模型为目标人脸根据模型,然后即可利用该目标人脸跟踪模型对目标人脸进行检测跟踪。例如一个示例中,待检测图像中目标人脸的可靠性分值为0.8,则可基于上述表二对应关系,确定当前场景为普通场景,对应的目标人脸跟踪模型为“模型1”,从而利用模型1对目标人脸进行跟踪检测,得到人脸检测结果。Therefore, after the reliability score of the target face is determined, the face tracking model corresponding to the reliability score can be determined as the target face basis model according to the correspondence in Table 2 above, and then the target face tracking model can be used to detect and track the target face. For example, in an example, if the reliability score of the target face in the image to be detected is 0.8, then based on the correspondence in Table 2 above, it can be determined that the current scene is a normal scene, and the corresponding target face tracking model is "model 1", so that the target face is tracked and detected using model 1, and the face detection result is obtained.
通过上述可知,在本示例实施方式中,可以基于可靠性分值判断当前光线场景,从而选择对应的人脸跟踪模型进行人脸跟踪检测,提高检测系统的效果。From the above, it can be seen that in this exemplary embodiment, the current lighting scene can be judged based on the reliability score, so as to select the corresponding face tracking model for face tracking detection, and improve the effect of the detection system.
例如一些场景中,可根据本公开方法实现对入库照片的质量检测。举例来说,对于身份验证等人脸识别场景,往往需要用户预先上传符合要求的人脸照片,从而作为后续身份验证时调取使用的模板照片。在此情况下,可通过本公开方法对用户上传照片进行检测,确定上传照片是否合格。下面结合图8实施方式进行说明。For example, in some scenarios, the quality inspection of the stored photos can be implemented according to the disclosed method. For example, for face recognition scenarios such as identity verification, users are often required to upload a face photo that meets the requirements in advance, so as to be used as a template photo for subsequent identity verification. In this case, the disclosed method can be used to detect the photos uploaded by users to determine whether the uploaded photos are qualified. The following will describe the embodiment in conjunction with FIG. 8 .
如图8所示,在一些实施方式中,本公开的图像处理方法,确定目标人脸的检测结果包括:As shown in FIG. 8, in some implementations, in the image processing method of the present disclosure, determining the detection result of the target face includes:
S133-1、根据目标人脸的第一不确定度,确定目标人脸的可靠性分值。S133-1. Determine the reliability score of the target face according to the first uncertainty of the target face.
S133-2、响应于可靠性分值满足第二预设条件,确定待测人脸图像的目标人脸检测通过。S133-2. In response to the reliability score satisfying the second preset condition, determine that the target face detection of the face image to be tested has passed.
具体而言,在用户上传人脸图像或者通过电子设备采集用户人脸图像之后,该人脸图像即可作为本公开前述实施方式所述的待测人脸图像,基于前述实施方式方法对待测人脸图像进行关键点检测,可以得到目标人脸的关键点信息以及第一不确定度。Specifically, after the user uploads the face image or collects the user's face image through an electronic device, the face image can be used as the face image to be tested as described in the foregoing embodiments of the present disclosure. Based on the methods of the foregoing embodiments, the key point detection of the face image to be tested can obtain the key point information and the first uncertainty of the target face.
在本示例中,可以基于前述图6实施方式的过程,确定目标人脸的可靠性分值,本公开对此不再赘述。In this example, the reliability score of the target face can be determined based on the aforementioned process of the implementation manner in FIG. 6 , which will not be repeated in this disclosure.
可以理解,对于人脸识别所需的入库照片,往往需要满足一定的要求,例如面部无遮挡、人脸倾斜角度不能过大等等,这些干扰因素会导致人脸关键点缺失或偏移,从而关键点检测得到的第一不确定度较大。It can be understood that for the photos required for face recognition, it is often necessary to meet certain requirements, such as the face is unobstructed, the tilt angle of the face cannot be too large, etc. These interference factors will cause the key points of the face to be missing or shifted, so the first uncertainty of the key point detection is relatively large.
据此可以基于先验知识或者有限次试验,预先设置第二预设阈值,第二预设阈值表示目标人脸是否检测通过的临界值。当可靠性分值大于该第二预设阈值时,表示目标人脸的检测通过,满足第二预设条件,可以入库。反之,当可靠性分值不大于该第二预设阈值时,表示目标人脸的检测结果不通过,不满足第二预设条件,无法入库。Accordingly, a second preset threshold may be preset based on prior knowledge or a limited number of trials, and the second preset threshold represents a critical value for whether the target face is detected or not. When the reliability score is greater than the second preset threshold, it means that the detection of the target face is passed, meets the second preset condition, and can be stored. Conversely, when the reliability score is not greater than the second preset threshold, it means that the detection result of the target face does not pass, does not meet the second preset condition, and cannot be stored.
在一些实施方式中,在确定目标人脸检测不通过的情况下,还可以根据关键点信息确定不符合要求的关键点,从而向用户输出提示信息,例如“眉毛存在遮挡”等。In some implementations, when it is determined that the target face detection fails, the key points that do not meet the requirements can also be determined according to the key point information, so as to output prompt information to the user, such as "eyebrows are blocked" and so on.
通过上述可知,本公开实施方式的方法,可以应用于各种人脸识别场景,可以基于第一不确定度区分图像质量或者当前环境条件,实用性和鲁棒性强,提高人脸识别任务的效果。From the above, it can be seen that the method of the embodiments of the present disclosure can be applied to various face recognition scenarios, and can distinguish image quality or current environmental conditions based on the first uncertainty, which has strong practicability and robustness, and improves the effect of face recognition tasks.
值得说明的是,本公开实施方式中,对于例如图5所示的图像检测网络,在训练过程中无需对每个关键点的不确定度进行回归优化,而是对人脸的综合不确定度进行优化,网络容易收敛,效果更好,并且大大提高训练效率。下面结合图9实施方式对训练过程进行具体说明。It is worth noting that, in the embodiment of the present disclosure, for the image detection network shown in Figure 5, for example, in the training process, it is not necessary to perform regression optimization on the uncertainty of each key point, but to optimize the comprehensive uncertainty of the face, the network is easy to converge, the effect is better, and the training efficiency is greatly improved. The training process will be described in detail below with reference to the embodiment shown in FIG. 9 .
如图9所示,在一些实施方式中,本公开示例的图像处理方法,对图像检测网络进行网络训练的过程包括:As shown in FIG. 9, in some implementations, in the image processing method of the disclosed example, the process of performing network training on the image detection network includes:
S910、获取样本数据集。S910. Acquire a sample data set.
具体而言,样本数据集包括海量的样本数据,例如一个示例中,样本数据集包括5000张样本数据。对于每一个样本数据,其包括人脸样本图像,以及预先标注的人脸样本图像中目标人脸的每个关键点的关键点标签。Specifically, the sample data set includes a large amount of sample data. For example, in one example, the sample data set includes 5000 pieces of sample data. For each sample data, it includes a face sample image and a key point label of each key point of the target face in the pre-labeled face sample image.
可以理解,关键点标签表示人脸样本图像中目标人脸的各个关键点的真实值(Ground truth),关键点标签可以通过人工标注的方式得到。例如一个示例中,可以通过人工标注的方式对人脸样本图像中的目标人脸的N个关键点坐标进行标记,得到每个人脸样本图像 对应的关键点标签。It can be understood that the key point label represents the ground truth of each key point of the target face in the face sample image, and the key point label can be obtained by manual labeling. For example, in an example, the N key point coordinates of the target face in the face sample image can be marked by manual labeling to obtain the key point label corresponding to each face sample image.
在一些实施方式中,还可以预先对样本数据集中海量数据进行预处理,预处理的过程可参照前述图2实施方式,也即由人脸样本图像中裁切出人脸区域作为图像检测网络的输入图像。In some implementations, the massive data in the sample data set can also be preprocessed in advance. The preprocessing process can refer to the aforementioned embodiment in FIG.
S920、对于任意一个样本数据,将人脸样本图像输入待训练的特征提取网络,得到特征提取网络输出的人脸样本图像的特征图。S920. For any sample data, input the face sample image into the feature extraction network to be trained, and obtain the feature map of the face sample image output by the feature extraction network.
本公开实施方式中,图像检测网络的网络结构可参照前述图5实施方式所示。在利用样本数据集对图像检测网络进行网络训练时,可将每n个样本数据作为一个批次(Batch)的训练样本,通常n可以取256。下面以一个样本数据为例,对训练过程进行说明。In the implementation manner of the present disclosure, the network structure of the image detection network may refer to the implementation manner shown in FIG. 5 above. When using the sample data set to perform network training on the image detection network, each n sample data can be used as a batch (Batch) of training samples, usually n can be 256. The following takes a sample data as an example to illustrate the training process.
在一些实施方式中,在将人脸样本图像输入图像检测网络之前,可以预先对人脸样本图像进行归一化处理,归一化处理的过程可参照前述式(1),对此不再赘述。In some implementations, before inputting the face sample image into the image detection network, the face sample image can be normalized in advance, and the normalization process can refer to the aforementioned formula (1), which will not be repeated here.
将样本数据包括的人脸样本图像输入待训练的特征提取网络510中,从而特征提取网络510输出得到人脸样本图像所对应的特征图。The human face sample image included in the sample data is input into the feature extraction network 510 to be trained, so that the feature extraction network 510 outputs a feature map corresponding to the human face sample image.
S930、将人脸样本图像的特征图输入待训练的关键点检测网络,得到目标人脸的每个关键点的关键点信息,以及目标人脸的第一不确定度。S930. Input the feature map of the face sample image into the key point detection network to be trained, and obtain the key point information of each key point of the target face and the first uncertainty of the target face.
具体而言,特征提取网络510输出的特征图作为关键点检测网络520的输入,经过关键点检测网络520池化层和全连接层,分别输出目标人脸的关键点信息P以及目标人脸的第一不确定度α。Specifically, the feature map output by the feature extraction network 510 is used as the input of the key point detection network 520, and through the pooling layer and the fully connected layer of the key point detection network 520, the key point information P of the target face and the first uncertainty α of the target face are respectively output.
在一个示例中,关键点检测网络520输出的目标人脸的关键点信息表示为:In an example, the key point information of the target face output by the key point detection network 520 is expressed as:
P={(x 1,y 1),(x 2,y 2),...(x i,y i)} i=1,2,3,..,N         式(3) P={(x 1 ,y 1 ),(x 2 ,y 2 ),...(x i ,y i )} i=1,2,3,...,N formula (3)
式(3)中,P表示关键点信息,N表示关键点数量,(x i,y i)表示第i个关键点的位置坐标。 In formula (3), P represents key point information, N represents the number of key points, and ( xi , y i ) represents the position coordinates of the i-th key point.
S940、基于关键点信息、关键点标签以及第一不确定度,确定关键点信息与关键点标签之间的差异。S940. Determine a difference between the key point information and the key point label based on the key point information, the key point label, and the first uncertainty.
具体而言,关键点信息可以包括图像检测网络预测输出的关键点的位置坐标,而关键点标签表示关键点的真实坐标,从而可基于预先构建的损失函数计算出两者的差异,也即损失。Specifically, the key point information can include the position coordinates of the key points predicted by the image detection network, and the key point labels represent the real coordinates of the key points, so that the difference between the two can be calculated based on the pre-built loss function, that is, the loss.
值得说明的是,本公开实施方式方法中,并非仅根据关键点信息与关键点标签之间的差异对图像检测网络进行优化训练,而是融合第一不确定度同时进行优化训练,从而无需为第一不确定度设置额外的标签,网络更容易收敛。It is worth noting that, in the method of the embodiment of the present disclosure, the image detection network is not optimized for training based on the difference between key point information and key point labels, but the first uncertainty is integrated for optimal training at the same time, so that there is no need to set additional labels for the first uncertainty, and the network is easier to converge.
在一些实施方式中,图像处理网络采用多目标约束损失函数,表示如下:In some implementations, the image processing network uses a multi-objective constraint loss function, expressed as follows:
L=L p+λ*L α             式(4) L=L p +λ*L α formula (4)
在式(4)中,L表示关键点信息与关键点标签之间的损失,L p表示关键点误差损失函数,L a表示不确定度误差损失函数,两者表示如下: In formula (4), L represents the loss between key point information and key point labels, L p represents the key point error loss function, L a represents the uncertainty error loss function, and the two are expressed as follows:
Figure PCTCN2022090297-appb-000001
Figure PCTCN2022090297-appb-000001
L p=f(σ p)         式(6) L p =f(σ p ) formula (6)
L α=f(σ p-α)            式(7) L α =f(σ p -α) formula (7)
f(x)=|x|           式(8)f(x)=|x| Formula (8)
在式(5)~(8)中,σ p表示目标人脸所有关键点的均方根误差,α表示预测输出的第一不确定度,f表示L1损失函数,x i
Figure PCTCN2022090297-appb-000002
分别表示第i个关键点x坐标标签和预测值,y i
Figure PCTCN2022090297-appb-000003
分别表示第i个关键点y坐标标签和预测值。
In formulas (5)-(8), σ p represents the root mean square error of all key points of the target face, α represents the first uncertainty of the predicted output, f represents the L1 loss function, xi and
Figure PCTCN2022090297-appb-000002
Represent the i-th key point x-coordinate label and predicted value, y i and
Figure PCTCN2022090297-appb-000003
Represent the i-th keypoint y-coordinate label and predicted value, respectively.
S950、根据差异调整特征提取网络和/或关键点检测网络的网络参数,直至满足收敛条件,得到训练后的特征提取网络和/或关键点检测网络。S950. Adjust the network parameters of the feature extraction network and/or the key point detection network according to the difference until the convergence condition is satisfied, and obtain the trained feature extraction network and/or the key point detection network.
具体而言,在确定预测值与标签值之间的差异之后,即可根据该差异反向传播对特征提取网络和/或关键点检测网络的网络参数进行优化调整。利用样本数据集中的样本数据反复重复上述过程,对图像检测网络进行迭代优化,直至满足收敛条件,网络训练完成。Specifically, after determining the difference between the predicted value and the label value, the network parameters of the feature extraction network and/or the key point detection network can be optimized and adjusted according to the difference backpropagation. The above process is repeated repeatedly using the sample data in the sample data set, and the image detection network is iteratively optimized until the convergence condition is met, and the network training is completed.
值得说明的是,本公开实施方式中,通过构建例如式(4)所示的损失函数,融合目标人脸的第一不确定度对网络进行优化训练,提高图像处理网络的效果。并且,构建的损失函数结构简单,无需对第一不确定度额外设置标签,即可实现对第一不确定度的优化,网络更容易收敛。而且第一不确定度为目标人脸的综合不确定度,在训练过程中无需对每一个关键点单独进行回归优化,简化计算量,提高网络训练效率。It is worth noting that, in the embodiments of the present disclosure, by constructing the loss function shown in formula (4), for example, the first uncertainty of the target face is integrated to optimize the training of the network to improve the effect of the image processing network. Moreover, the constructed loss function has a simple structure, and the optimization of the first uncertainty can be realized without setting additional labels for the first uncertainty, and the network is easier to converge. Moreover, the first uncertainty is the comprehensive uncertainty of the target face. During the training process, there is no need to perform regression optimization on each key point separately, which simplifies the amount of calculation and improves the efficiency of network training.
第二方面,本公开实施方式提供了一种图像处理装置,该装置可应用于电子设备。本公开实施方式中,电子设备可以是任何适于实施的设备类型,例如移动终端、车载终端、可穿戴设备、门禁系统、视频监控系统、云平台及服务器等,本公开对此不作限制。In a second aspect, the embodiments of the present disclosure provide an image processing device, which can be applied to electronic equipment. In the embodiments of the present disclosure, the electronic device may be any type of device suitable for implementation, such as a mobile terminal, a vehicle terminal, a wearable device, an access control system, a video surveillance system, a cloud platform, and a server, etc., and the present disclosure does not limit this.
如图10所示,在一些实施方式中,本公开示例的图像处理装置,包括:As shown in FIG. 10 , in some implementations, the image processing device of the present disclosure includes:
获取模块10,被配置为获取待测人脸图像,所述待测人脸图像中包括目标人脸;The obtaining module 10 is configured to obtain a human face image to be tested, the human face image to be tested includes a target human face;
图像检测模块20,被配置为对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度;其中,所述第一不确定度根据所述目标人脸的所有关键点的第二不确定度得到;The image detection module 20 is configured to perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to the second uncertainty of all key points of the target face;
结果确定模块30,被配置为根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果。The result determination module 30 is configured to determine the detection result of the target face according to the key point information and the first uncertainty.
通过上述可知,本公开实施方式中,通过目标人脸的第一不确定度辅助对人脸关键点的检测,提高人脸检测效果和精度。并且基于目标人脸的所有关键点的第二不确定度确定代表目标人脸综合误差的第一不确定度,提高网络效果和训练效率。同时,本公开方法对于应用场景不作限制,可以适用于多种场景的下游任务,例如人脸图像质量检测、人脸追踪、关键点定位等,鲁棒性更高。It can be known from the above that in the embodiments of the present disclosure, the first uncertainty of the target face is used to assist the detection of key points of the face, so as to improve the effect and accuracy of face detection. And based on the second uncertainty of all the key points of the target face, the first uncertainty representing the comprehensive error of the target face is determined to improve the network effect and training efficiency. At the same time, the disclosed method does not limit the application scenarios, and can be applied to downstream tasks in various scenarios, such as face image quality detection, face tracking, key point positioning, etc., and has higher robustness.
在一些实施方式中,所述获取模块10被配置为:In some implementations, the acquisition module 10 is configured to:
获取待处理图像,所述待处理图像中包括至少一个人脸;Acquiring an image to be processed, the image to be processed includes at least one human face;
对所述待处理图像进行图像检测,确定所述待处理图像上的每个所述人脸的人脸区域信息;performing image detection on the image to be processed, and determining face area information of each face on the image to be processed;
对于任意一个人脸,根据所述人脸区域信息裁切得到所述人脸对应的所述待测人脸图像。For any human face, the human face image to be tested corresponding to the human face is obtained by cropping according to the human face area information.
在一些实施方式中,所述图像检测模块20被配置为:In some embodiments, the image detection module 20 is configured to:
对所述待测人脸图像进行关键点检测,基于预先设置的人脸关键点类型,确定所述目标人脸的每个关键点的所述关键点信息和第二不确定度;Carry out key point detection on the face image to be tested, and determine the key point information and the second uncertainty of each key point of the target face based on the preset face key point type;
根据所述目标人脸的各个关键点的第二不确定度,确定所述目标人脸的第一不确定度。The first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
在一些实施方式中,所述预设关键点包括类型以下至少之一:In some implementations, the preset key points include at least one of the following types:
脸部轮廓关键点,眼睛关键点,眉毛关键点,鼻子关键点,嘴部关键点,耳朵关键点。Key points of face contour, key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
在一些实施方式中,所述结果确定模块30被配置为:In some embodiments, the result determination module 30 is configured to:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
响应于所述可靠性分值满足第一预设条件,根据各个关键点的所述关键点信息,在所述待测人脸图像上输出所述关键点。In response to the reliability score satisfying a first preset condition, the key points are output on the human face image to be tested according to the key point information of each key point.
在一些实施方式中,所述结果确定模块30被配置为:In some embodiments, the result determination module 30 is configured to:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
根据所述可靠性分值,以及预先建立的可靠性分值与人脸跟踪模型的对应关系,由预先设置的多个人脸跟踪模型中确定目标人脸跟踪模型;According to the reliability score and the corresponding relationship between the reliability score and the face tracking model established in advance, the target face tracking model is determined from a plurality of preset face tracking models;
利用所述目标人脸跟踪模型对所述目标人脸进行检测跟踪,得到所述目标人脸的所述检测结果。The target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
在一些实施方式中,所述结果确定模块30被配置为:In some embodiments, the result determination module 30 is configured to:
根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
响应于所述可靠性分值满足第二预设条件,确定所述待测人脸图像的所述目标人脸检测通过。In response to the reliability score satisfying a second preset condition, it is determined that the target face detection of the face image to be tested passes.
如图11所示,在一些实施方式中,所述图像检测模块20包括:As shown in Figure 11, in some implementations, the image detection module 20 includes:
特征提取模块40,被配置为将所述待测人脸图像输入预先训练的特征提取网络,得到所述特征提取网络输出的特征图;The feature extraction module 40 is configured to input the pre-trained feature extraction network of the human face image to be tested, and obtain the feature map output by the feature extraction network;
关键点检测模块50,被配置为将所述特征图输入预先训练的关键点检测网络,得到所述关键点检测网络输出的所述目标人脸的各个关键点的所述关键点信息以及所述第一不确定度。The key point detection module 50 is configured to input the feature map into a pre-trained key point detection network, and obtain the key point information and the first uncertainty of each key point of the target face output by the key point detection network.
在一些实施方式中,本公开实施方式所述的装置,还包括训练模块60,所述训练模块被配置为:In some embodiments, the device described in the embodiments of the present disclosure further includes a training module 60, the training module is configured to:
获取样本数据集,所述样本数据集中的每个样本数据包括人脸样本图像,以及所述人脸样本图像中目标人脸的每个关键点的关键点标签;Obtain a sample data set, each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
对于任意一个样本数据,将所述人脸样本图像输入待训练的特征提取网络,得到所述 特征提取网络输出的所述人脸样本图像的特征图;For any sample data, the feature extraction network to be trained is input to the human face sample image, obtain the feature map of the described human face sample image output by the feature extraction network;
将所述人脸样本图像的特征图输入待训练的关键点检测网络,得到所述目标人脸的每个关键点的关键点信息,以及所述目标人脸的第一不确定度;Input the feature map of the human face sample image into the key point detection network to be trained, obtain the key point information of each key point of the target human face, and the first uncertainty of the target human face;
基于所述关键点信息、关键点标签以及所述第一不确定度,确定所述关键点信息与所述关键点标签之间的差异;determining a difference between the key point information and the key point label based on the key point information, the key point label, and the first uncertainty;
根据所述差异调整所述特征提取网络和/或所述关键点检测网络的网络参数,直至满足收敛条件,得到训练后的所述特征提取网络和/或所述关键点检测网络。Adjust the network parameters of the feature extraction network and/or the key point detection network according to the difference until a convergence condition is satisfied, and obtain the trained feature extraction network and/or the key point detection network.
通过上述可知,本公开实施方式中,通过融合目标人脸的第一不确定度对网络进行优化训练,提高图像处理网络的效果。并且,构建的损失函数结构简单,无需对第一不确定度额外设置标签,即可实现对第一不确定度的优化,网络更容易收敛。而且第一不确定度为目标人脸的综合不确定度,在训练过程中无需对每一个关键点单独进行回归优化,简化计算量,提高网络训练效率。It can be known from the above that in the embodiments of the present disclosure, the network is optimized and trained by fusing the first uncertainty of the target face to improve the effect of the image processing network. Moreover, the constructed loss function has a simple structure, and the optimization of the first uncertainty can be realized without setting additional labels for the first uncertainty, and the network is easier to converge. Moreover, the first uncertainty is the comprehensive uncertainty of the target face. During the training process, there is no need to perform regression optimization on each key point separately, which simplifies the amount of calculation and improves the efficiency of network training.
第三方面,本公开实施方式提供了一种电子设备,包括:In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
处理器;以及processor; and
存储器,存储有能够被所述处理器读取的计算机指令,当所述计算机指令被读取时,所述处理器执行根据第一方面任一实施方式所述的方法。The memory stores computer instructions that can be read by the processor, and when the computer instructions are read, the processor executes the method according to any implementation manner of the first aspect.
第四方面,本公开实施方式提供了一种存储介质,用于存储计算机可读指令,所述计算机可读指令用于使计算机执行根据第一方面任一实施方式所述的方法。In a fourth aspect, the embodiments of the present disclosure provide a storage medium for storing computer-readable instructions, and the computer-readable instructions are used to cause a computer to execute the method according to any embodiment of the first aspect.
具体而言,图12示出了适于用来实现本公开方法的电子设备600的结构示意图,通过图12所示电子设备,可实现上述处理器及存储介质相应功能。Specifically, FIG. 12 shows a schematic structural diagram of an electronic device 600 suitable for implementing the method of the present disclosure. The electronic device shown in FIG. 12 can realize the corresponding functions of the above-mentioned processor and storage medium.
如图12所示,电子设备600包括处理器601,其可以根据存储在存储器602中的程序或者从存储部分608加载到存储器602中的程序而执行各种适当的动作和处理。在存储器602中,还存储有电子设备600操作所需的各种程序和数据。处理器601和存储器602通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 12 , the electronic device 600 includes a processor 601 that can perform various appropriate actions and processes according to programs stored in the memory 602 or loaded from the storage part 608 into the memory 602 . In the memory 602, various programs and data necessary for the operation of the electronic device 600 are also stored. The processor 601 and the memory 602 are connected to each other through a bus 604 . An input/output (I/O) interface 605 is also connected to the bus 604 .
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 608 including a hard disk, etc.; and a communication section 609 including a network interface card such as a LAN card, a modem, etc. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 610 as necessary so that a computer program read therefrom is installed into the storage section 608 as necessary.
特别地,根据本公开的实施方式,上文方法过程可以被实现为计算机软件程序。例如,本公开的实施方式包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行上述方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。In particular, according to the embodiments of the present disclosure, the above method process can be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method described above. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 609 and/or installed from a removable medium 611 .
附图中的流程图和框图,图示了按照本公开各种实施方式的系统、方法和计算机程序 产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or a portion of code that includes one or more executable instructions for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or operations, or by combinations of special purpose hardware and computer instructions.
显然,上述实施方式仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本公开创造的保护范围之中。Apparently, the above-mentioned implementation manners are only examples for clear description, rather than limiting the implementation manners. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. And the obvious changes or changes derived therefrom are still within the scope of protection of the present disclosure.

Claims (12)

  1. 一种图像处理方法,,包括:An image processing method, comprising:
    获取待测人脸图像,所述待测人脸图像中包括目标人脸;Acquiring a human face image to be tested, the target human face is included in the human face image to be tested;
    对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度;其中,所述第一不确定度根据所述目标人脸的所有关键点的第二不确定度得到;Perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to second uncertainties of all key points of the target face;
    根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果。A detection result of the target face is determined according to the key point information and the first uncertainty.
  2. 根据权利要求1所述的方法,其中,所述获取待测人脸图像包括:The method according to claim 1, wherein said obtaining the face image to be tested comprises:
    获取待处理图像,所述待处理图像中包括至少一个人脸;Acquiring an image to be processed, the image to be processed includes at least one human face;
    对所述待处理图像进行图像检测,确定所述待处理图像上的每个所述人脸的人脸区域信息;performing image detection on the image to be processed, and determining face area information of each face on the image to be processed;
    对于任意一个人脸,根据所述人脸区域信息裁切得到所述人脸对应的所述待测人脸图像。For any human face, the human face image to be tested corresponding to the human face is obtained by cropping according to the human face region information.
  3. 根据权利要求1所述的方法,其中,所述对待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度,包括:The method according to claim 1, wherein, performing image detection on the face image to be tested, determining the key point information of at least one key point of the target face and the first uncertainty of the target face, comprising:
    对所述待测人脸图像进行关键点检测,基于预先设置的人脸关键点类型,确定所述目标人脸的每个关键点的所述关键点信息和第二不确定度;Carry out key point detection on the face image to be tested, and determine the key point information and the second uncertainty of each key point of the target face based on the preset face key point type;
    根据所述目标人脸的各个关键点的第二不确定度,确定所述目标人脸的第一不确定度。The first uncertainty of the target face is determined according to the second uncertainty of each key point of the target face.
  4. 根据权利要求3所述的方法,其中,所述人脸关键点包括类型以下至少之一:The method according to claim 3, wherein the human face key points include at least one of the following types:
    脸部轮廓关键点,眼睛关键点,眉毛关键点,鼻子关键点,嘴部关键点,耳朵关键点。Key points of face contour, key points of eyes, key points of eyebrows, key points of nose, key points of mouth, key points of ears.
  5. 根据权利要求1至4任一项所述的方法,其中,所述根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果,包括:The method according to any one of claims 1 to 4, wherein said determining the detection result of the target face according to the key point information and the first uncertainty includes:
    根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
    响应于所述可靠性分值满足第一预设条件,根据各个关键点的所述关键点信息,在所述待测人脸图像上输出所述关键点。In response to the reliability score satisfying a first preset condition, the key points are output on the human face image to be tested according to the key point information of each key point.
  6. 根据权利要求1至4任一项所述的方法,其中,所述根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果,包括:The method according to any one of claims 1 to 4, wherein said determining the detection result of the target face according to the key point information and the first uncertainty includes:
    根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
    根据所述可靠性分值,以及预先建立的可靠性分值与人脸跟踪模型的对应关系,由预先设置的多个人脸跟踪模型中确定目标人脸跟踪模型;According to the reliability score and the corresponding relationship between the reliability score and the face tracking model established in advance, the target face tracking model is determined from a plurality of preset face tracking models;
    利用所述目标人脸跟踪模型对所述目标人脸进行检测跟踪,得到所述目标人脸的所述检测结果。The target face is detected and tracked by using the target face tracking model to obtain the detection result of the target face.
  7. 根据权利要求1至4任一项所述的方法,其中,所述根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果,包括:The method according to any one of claims 1 to 4, wherein said determining the detection result of the target face according to the key point information and the first uncertainty includes:
    根据所述目标人脸的所述第一不确定度,确定所述目标人脸的可靠性分值;determining the reliability score of the target face according to the first uncertainty of the target face;
    响应于所述可靠性分值满足第二预设条件,确定所述待测人脸图像的所述目标人脸检测通过。In response to the reliability score satisfying a second preset condition, it is determined that the target face detection of the face image to be tested passes.
  8. 根据权利要求1至7任一项所述的方法,其中,所述对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度,包括:The method according to any one of claims 1 to 7, wherein the performing image detection on the face image to be tested, determining key point information of at least one key point of the target face and the first uncertainty of the target face comprises:
    将所述待测人脸图像输入预先训练的特征提取网络,得到所述特征提取网络输出的特征图;Input the pre-trained feature extraction network of the human face image to be tested to obtain the feature map output by the feature extraction network;
    将所述特征图输入预先训练的关键点检测网络,得到所述关键点检测网络输出的所述目标人脸的各个关键点的所述关键点信息以及所述第一不确定度。Inputting the feature map into a pre-trained key point detection network to obtain the key point information and the first uncertainty of each key point of the target face output by the key point detection network.
  9. 根据权利要求8所述的方法,还包括对所述特征提取网络和所述关键点检测网络进行训练的训练过程,所述训练过程包括:The method according to claim 8, further comprising a training process for training the feature extraction network and the key point detection network, the training process comprising:
    获取样本数据集,所述样本数据集中的每个样本数据包括人脸样本图像,以及所述人脸样本图像中目标人脸的每个关键点的关键点标签;Obtain a sample data set, each sample data in the sample data set includes a face sample image, and a key point label of each key point of the target face in the face sample image;
    对于任意一个样本数据,将所述人脸样本图像输入待训练的特征提取网络,得到所述特征提取网络输出的所述人脸样本图像的特征图;For any sample data, the face sample image is input into the feature extraction network to be trained, and the feature map of the human face sample image output by the feature extraction network is obtained;
    将所述人脸样本图像的特征图输入待训练的关键点检测网络,得到所述目标人脸的每个关键点的关键点信息,以及所述目标人脸的第一不确定度;Input the feature map of the human face sample image into the key point detection network to be trained, obtain the key point information of each key point of the target human face, and the first uncertainty of the target human face;
    基于所述关键点信息、关键点标签以及所述第一不确定度,确定所述关键点信息与所述关键点标签之间的差异;determining a difference between the key point information and the key point label based on the key point information, the key point label, and the first uncertainty;
    根据所述差异调整所述特征提取网络和/或所述关键点检测网络的网络参数,直至满足收敛条件,得到训练后的所述特征提取网络和/或所述关键点检测网络。Adjust the network parameters of the feature extraction network and/or the key point detection network according to the difference until a convergence condition is satisfied, and obtain the trained feature extraction network and/or the key point detection network.
  10. 一种图像处理装置,包括:An image processing device, comprising:
    获取模块,被配置为获取待测人脸图像,所述待测人脸图像中包括目标人脸;An acquisition module configured to acquire a face image to be tested, the face image to be tested includes a target face;
    图像检测模块,被配置为对所述待测人脸图像进行图像检测,确定所述目标人脸的至少一个关键点的关键点信息以及所述目标人脸的第一不确定度;其中,所述第一不确定度根据所述目标人脸的所有关键点的第二不确定度得到;The image detection module is configured to perform image detection on the face image to be tested, and determine key point information of at least one key point of the target face and a first uncertainty of the target face; wherein, the first uncertainty is obtained according to the second uncertainty of all key points of the target face;
    结果确定模块,被配置为根据所述关键点信息和所述第一不确定度,确定所述目标人脸的检测结果。The result determination module is configured to determine the detection result of the target face according to the key point information and the first uncertainty.
  11. 一种电子设备,包括:An electronic device comprising:
    处理器;以及processor; and
    存储器,存储有能够被所述处理器读取的计算机指令,当所述计算机指令被读取时,所述处理器执行根据权利要求1至9中任一项所述的方法。A memory storing computer instructions readable by the processor, the processor executing the method according to any one of claims 1 to 9 when the computer instructions are read.
  12. 一种存储介质,用于存储计算机可读指令,所述计算机可读指令用于使计算机执行根据权利要求1至9中任一项所述的方法。A storage medium for storing computer-readable instructions for causing a computer to execute the method according to any one of claims 1-9.
PCT/CN2022/090297 2022-01-21 2022-04-29 Image processing method and apparatus, and electronic device and storage medium WO2023137905A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210074181.7A CN116543426A (en) 2022-01-21 2022-01-21 Image processing method, device, electronic equipment and storage medium
CN202210074181.7 2022-01-21

Publications (1)

Publication Number Publication Date
WO2023137905A1 true WO2023137905A1 (en) 2023-07-27

Family

ID=87347704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090297 WO2023137905A1 (en) 2022-01-21 2022-04-29 Image processing method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN116543426A (en)
WO (1) WO2023137905A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472494A (en) * 2019-06-21 2019-11-19 深圳壹账通智能科技有限公司 Face feature extracts model training method, facial feature extraction method, device, equipment and storage medium
CN111488774A (en) * 2019-01-29 2020-08-04 北京搜狗科技发展有限公司 Image processing method and device for image processing
CN112200176A (en) * 2020-12-10 2021-01-08 长沙小钴科技有限公司 Method and system for detecting quality of face image and computer equipment
CN112581480A (en) * 2020-12-22 2021-03-30 深圳市雄帝科技股份有限公司 Automatic image matting method, system and readable storage medium thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488774A (en) * 2019-01-29 2020-08-04 北京搜狗科技发展有限公司 Image processing method and device for image processing
CN110472494A (en) * 2019-06-21 2019-11-19 深圳壹账通智能科技有限公司 Face feature extracts model training method, facial feature extraction method, device, equipment and storage medium
CN112200176A (en) * 2020-12-10 2021-01-08 长沙小钴科技有限公司 Method and system for detecting quality of face image and computer equipment
CN112581480A (en) * 2020-12-22 2021-03-30 深圳市雄帝科技股份有限公司 Automatic image matting method, system and readable storage medium thereof

Also Published As

Publication number Publication date
CN116543426A (en) 2023-08-04

Similar Documents

Publication Publication Date Title
US11182592B2 (en) Target object recognition method and apparatus, storage medium, and electronic device
CN110826519B (en) Face shielding detection method and device, computer equipment and storage medium
KR102641115B1 (en) A method and apparatus of image processing for object detection
US10452893B2 (en) Method, terminal, and storage medium for tracking facial critical area
CN108829900B (en) Face image retrieval method and device based on deep learning and terminal
WO2021012526A1 (en) Face recognition model training method, face recognition method and apparatus, device, and storage medium
WO2019232866A1 (en) Human eye model training method, human eye recognition method, apparatus, device and medium
WO2019232862A1 (en) Mouth model training method and apparatus, mouth recognition method and apparatus, device, and medium
EP3989104A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
US7925093B2 (en) Image recognition apparatus
WO2017161756A1 (en) Video identification method and system
CN111738120B (en) Character recognition method, character recognition device, electronic equipment and storage medium
US20190114470A1 (en) Method and System for Face Recognition Based on Online Learning
CN108446672B (en) Face alignment method based on shape estimation of coarse face to fine face
US11238302B2 (en) Method and an apparatus for performing object illumination manipulation on an image
JP2023526899A (en) Methods, devices, media and program products for generating image inpainting models
JP2007025900A (en) Image processor and image processing method
CN110658918B (en) Positioning method, device and medium for eyeball tracking camera of video glasses
CN114663726A (en) Training method of target type detection model, target detection method and electronic equipment
CN113822927A (en) Face detection method, device, medium and equipment suitable for weak-quality images
CN115410240A (en) Intelligent face pockmark and color spot analysis method and device and storage medium
JP2007026308A (en) Image processing method and image processor
WO2023137905A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN115661618A (en) Training method of image quality evaluation model, image quality evaluation method and device
CN113283318A (en) Image processing method, image processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921336

Country of ref document: EP

Kind code of ref document: A1