CN113569594A

CN113569594A - Method and device for labeling key points of human face

Info

Publication number: CN113569594A
Application number: CN202010350817.7A
Authority: CN
Inventors: 顾阳; 王晋玮; 李源; 杨德尧; 左钟融; 张册
Original assignee: Momenta Suzhou Technology Co Ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2021-10-29

Abstract

The embodiment of the invention discloses a method and a device for labeling key points of a human face, wherein the method comprises the following steps: acquiring face images acquired by a plurality of image acquisition devices in the same acquisition period; detecting image position information corresponding to each face key point in a target face from each face image; determining the labeling position information of each face key point of the target face in each face image based on the image position information corresponding to the face key point in the target face in each face image and a preset key point labeling rule, wherein the preset key point labeling rule comprises the following steps: the method comprises the following steps of labeling rules based on image position information corresponding to face key points in a target face image and/or labeling rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions, wherein the target face image is as follows: the face images meeting the specified screening rules in the face images are used for improving the accuracy of the position identification result of the face key points.

Description

Method and device for labeling key points of human face

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a device for labeling key points of a human face.

Background

The face recognition technology is widely applied to the technical fields of security protection, personnel identity detection, personnel tracking and the like. In the related face recognition technology, it is generally required to first detect image position information of face key points of each part of a face from an acquired image including the face, and then perform subsequent task steps based on the image position information of the face key points. For example, the method comprises the steps of carrying out identity verification on a person, or carrying out fatigue driving behavior identification on the person, and the like.

As can be seen from the above process, it is important to accurately identify the image position information of the face key points of each part of the face included in the image. In the related face recognition technology, in the process of recognizing the key points of the face of the human eye part, the image position information of the eye corner point of the human eye part can be determined based on the eyelid line. In some scenarios, the above process is prone to position recognition errors, such as: when a person closes the eyes with strength, wrinkles are prone to appear around the eyes, and at the moment, the eyebrow tails are prone to be identified as canthus points, so that position identification errors of key points of the human faces are caused.

Disclosure of Invention

The invention provides a method, a device, a method and a device for labeling key points of a human face, and aims to improve the accuracy of a position identification result of the key points of the human face. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for labeling key points of a human face, where the method includes:

the method comprises the steps of obtaining face images collected by a plurality of image collecting devices in the same collecting period, wherein the plurality of image collecting devices shoot a target face from different angles;

detecting image position information corresponding to each face key point in the target face from each face image;

determining labeling position information of each face key point of the target face in each face image based on image position information corresponding to the face key points in the target face in each face image and a preset key point labeling rule, wherein the preset key point labeling rule comprises: the method comprises the following steps of labeling rules based on image position information corresponding to face key points in a target face image, and/or labeling rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions, wherein the target face image is as follows: and the face images meet the specified screening rules.

Optionally, the step of detecting image position information corresponding to each face key point in the target face from each face image includes:

determining a corresponding thermodynamic diagram of each face image based on the face images;

aiming at each face image, determining image position information corresponding to each face key point in the target face and semantic information corresponding to the image position information from the thermodynamic diagram corresponding to the face image by using a preset clustering algorithm;

the step of determining labeling position information of each face key point of the target face in each face image based on image position information corresponding to the face key point in the target face in each face image and a preset key point labeling rule comprises the following steps:

for each face image, determining image position information corresponding to face key points corresponding to target semantic information from the face key points in the face image based on semantic information corresponding to each face key point in the target face in the face image, wherein the target semantic information is semantic information corresponding to image position information corresponding to at least two face key points;

aiming at each target semantic information corresponding to each face image, sequentially taking the image position information corresponding to the face key point corresponding to the target semantic information as the target image position information corresponding to the face key point corresponding to the target semantic information; determining a reprojection error corresponding to the face image based on target image position information corresponding to a face key point corresponding to each target semantic information in the face image, image position information corresponding to face key points corresponding to other semantic information, and a preset three-dimensional face model, wherein the other semantic information is as follows: semantic information other than the target semantic information;

and determining the labeling position information of each face characteristic point in each face image based on each reprojection error corresponding to the face image and the image position information corresponding to a group of face key points corresponding to each reprojection error corresponding to the face image.

Optionally, the preset key point annotation rule includes: a rule for labeling based on image position information corresponding to the face key points which meet the specified position conditions and are obtained by traversing in each face image;

the step of determining, for each face image, based on the reprojection errors corresponding to the face image and the image location information corresponding to a group of face key points corresponding to the reprojection errors corresponding to the face image, the annotation location information of each face feature point in the face image includes:

and determining image position information corresponding to a group of human face key points corresponding to the reprojection error with the minimum numerical value in the reprojection errors corresponding to each human face image as the labeling position information of each human face characteristic point in the human face image.

Optionally, the preset key point annotation rule includes: the method comprises the steps of carrying out annotation rules based on image position information corresponding to face key points in a target face image and carrying out annotation rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions;

aiming at each face image, determining image position information corresponding to a group of face key points corresponding to the re-projection error with the minimum numerical value in the re-projection errors corresponding to the face image as the middle position information of each face characteristic point in the face image;

displaying each face image containing the middle position information of each face key point so that a user can select at least two frames of target face images with accurate face key point detection results from the displayed face images containing the middle position information of each face key point;

determining at least two frames of target face images selected by a user based on the operation of the user;

determining three-dimensional position information corresponding to each face key point based on the middle position information of each face key point in each target face image and the equipment information of the image acquisition equipment corresponding to each target face image;

and determining the labeling position information of each face key point in each face image based on the three-dimensional position information corresponding to each face key point, the equipment information of the image acquisition equipment corresponding to each face image and a preset projection formula.

Optionally, the target semantic information includes a left-eye corner point and/or a right-eye corner point.

Optionally, the preset key point annotation rule includes: a rule for labeling based on image position information corresponding to a face key point in a target face image;

displaying each face image containing image position information corresponding to each face key point in the target face, so that a user can select at least two frames of target face images with accurate face key point detection results from the displayed face images containing the image position information corresponding to each face key point in the target face;

determining three-dimensional position information corresponding to each face key point based on image position information corresponding to each face key point in each target face image and equipment information of image acquisition equipment corresponding to each target face image;

In a second aspect, an embodiment of the present invention provides a device for labeling key points of a human face, where the device includes:

the system comprises an obtaining module, a processing module and a processing module, wherein the obtaining module is configured to obtain face images which are acquired by a plurality of image acquisition devices in the same acquisition period, and the plurality of image acquisition devices shoot target faces from different angles;

the detection module is configured to detect image position information corresponding to each face key point in the target face from each face image;

a determining module, configured to determine labeling position information of each face key point of the target face in each face image based on image position information corresponding to the face key point in the target face in each face image and a preset key point labeling rule, where the preset key point labeling rule includes: the method comprises the following steps of labeling rules based on image position information corresponding to face key points in a target face image, and/or labeling rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions, wherein the target face image is as follows: and the face images meet the specified screening rules.

Optionally, the detection module includes:

the first determining unit is configured to determine a corresponding thermodynamic diagram based on each face image;

the second determining unit is configured to determine image position information corresponding to each face key point in the target face and semantic information corresponding to the image position information from the thermodynamic diagram corresponding to each face image by using a preset clustering algorithm for each face image;

the determining module includes:

a third determining unit, configured to determine, for each face image, image location information corresponding to face key points corresponding to target semantic information from the face key points in the face image based on semantic information corresponding to each face key point in the target face in the face image, where the target semantic information is semantic information corresponding to image location information corresponding to at least two face key points;

a fourth determining unit, configured to, for each target semantic information corresponding to each face image, sequentially use image position information corresponding to a face key point corresponding to the target semantic information as target image position information corresponding to the face key point corresponding to the target semantic information; determining a reprojection error corresponding to the face image based on target image position information corresponding to a face key point corresponding to each target semantic information in the face image, image position information corresponding to face key points corresponding to other semantic information, and a preset three-dimensional face model, wherein the other semantic information is as follows: semantic information other than the target semantic information;

and the fifth determining unit is configured to determine, for each face image, the annotation position information of each face feature point in the face image based on each reprojection error corresponding to the face image and the image position information corresponding to a group of face key points corresponding to each reprojection error corresponding to the face image.

the fifth determining unit is specifically configured to determine, for each face image, image location information corresponding to a group of face key points corresponding to a smallest-valued reprojection error among the reprojection errors corresponding to the face image, as labeled location information of each face feature point in the face image.

the fifth determining unit is specifically configured to determine, for each face image, image position information corresponding to a group of face key points corresponding to a re-projection error with a smallest numerical value among re-projection errors corresponding to the face image, as middle position information of each face feature point in the face image;

the determining module is specifically configured to display each face image containing image position information corresponding to each face key point in the target face, so that a user can select at least two frames of target face images with accurate face key point detection results from the displayed face images containing the image position information corresponding to each face key point in the target face;

As can be seen from the above, the method and device for labeling key points of a human face provided in the embodiments of the present invention obtain human face images acquired by a plurality of image acquisition devices in the same acquisition period, wherein the plurality of image acquisition devices shoot a target human face from different angles; detecting image position information corresponding to each face key point in a target face from each face image; determining the labeling position information of each face key point of the target face in each face image based on the image position information corresponding to the face key point in the target face in each face image and a preset key point labeling rule, wherein the preset key point labeling rule comprises the following steps: the method comprises the following steps of labeling rules based on image position information corresponding to face key points in a target face image and/or labeling rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions, wherein the target face image is as follows: and the face images meet the specified screening rules in the face images.

By applying the embodiment of the invention, the target face image which meets the specified screening rule, namely has accurate position detection result, is determined from the face image based on the rule of labeling the image position information corresponding to the face key points in the target face image, further, the position information of the key points of the face in each face image is marked by using the image position information corresponding to the key points of the face in the determined target face image, and/or based on the rule of labeling the image position information corresponding to the face key points which are obtained by traversing in each face image and meet the specified position condition, labeling the image position information corresponding to the face key points with accurate detection results in each face image, the marked position information of the human face characteristic points with accurate positions is obtained, and the accuracy of the position identification result of the human face key points is improved. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

The innovation points of the embodiment of the invention comprise:

1. the method comprises the steps of determining a target face image which meets a specified screening rule, namely accurate position detection results, from the face image based on an image position information corresponding to face key points in the target face image, labeling the position information of the face key points in each face image based on the image position information corresponding to the face key points in the determined target face image, and/or labeling the position information of the face key points in each face image based on an image position information corresponding to the face key points meeting specified position conditions, which is obtained by traversing in each face image, and labeling the position information of the face feature points with accurate position by using the image position information corresponding to the face key points with accurate detection results in each face image, so as to obtain the labeled position information of the face feature points with accurate position, and improve the accuracy of the position identification results of the face key points.

2. Firstly, determining a thermodynamic diagram corresponding to each face image to determine image position information and semantic information thereof corresponding to each face key point contained in the face image from the thermodynamic diagram, then enumerating the image position information corresponding to target semantic information corresponding to at least two pieces of image position information, sequentially using the image position information as target image position information corresponding to the face key point corresponding to the target semantic information, and further combining a preset three-dimensional face model to determine a reprojection error corresponding to the face image under the condition that each piece of image position information corresponding to the target semantic information is used as the target image position information. Subsequently, on the one hand, considering that the relationship between the three-dimensional position information corresponding to each human face feature point in the preset three-dimensional human face model conforms to the actual human face feature, the smaller the reprojection error corresponding to the obtained human face image is, the higher the accuracy of the image position information representing the human face feature point corresponding to each semantic information detected from the image is, and the corresponding implementation is to determine the image position information corresponding to a group of human face key points corresponding to the reprojection error with the smallest numerical value in the reprojection error corresponding to the human face image as the labeled position information of each human face feature point in the human face image. On the other hand, considering that the relationship between the three-dimensional position information corresponding to each human face feature point in the preset three-dimensional human face model conforms to the actual human face feature, the smaller the reprojection error corresponding to the obtained human face image is, the higher the accuracy of the image position information representing the human face feature point corresponding to each semantic information detected from the image is, and meanwhile, based on the manual detection of the personnel on the position information of the human face key points detected in the multi-frame image, the human face key points with higher accuracy of the corresponding image position information are determined, the subsequent position annotation of the human face key points is performed, and the accuracy of the annotation position information of the human face key points in each human face image is better improved.

3. Considering the situation that the position detection of the face characteristic points in the multi-frame face images is more accurate and the position detection of the face characteristic points is less accurate, the target face image with the more accurate position detection of the face characteristic points in the frame face images is identified manually, and the labeled position information of each face key point in each face image is determined based on the image position information corresponding to the face characteristic points in the target face image, so that the accuracy of the position identification result of the face key points is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are merely exemplary of some embodiments of the invention. For a person skilled in the art, without inventive effort, further figures can be obtained from these figures.

Fig. 1 is a schematic flow chart of a method for labeling key points of a human face according to an embodiment of the present invention;

fig. 2 is another schematic flow chart of a method for labeling key points of a human face according to an embodiment of the present invention;

FIG. 3 is an exemplary diagram of a thermodynamic diagram corresponding to a face image;

fig. 4 is another schematic flow chart of a method for labeling key points of a human face according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a device for labeling key points of a human face according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The invention provides a method and a device for labeling key points of a human face, which are used for improving the accuracy of a position identification result of the key points of the human face. The following provides a detailed description of embodiments of the invention.

Fig. 1 is a schematic flow chart of a method for labeling key points of a human face according to an embodiment of the present invention. The method may comprise the steps of:

s101: and acquiring the face images acquired by a plurality of image acquisition devices in the same acquisition period.

The plurality of image acquisition devices shoot the target face from different angles.

The method for labeling the key points of the human face provided by the embodiment of the invention can be applied to any type of electronic equipment with computing capability, and the electronic equipment can be a server or a terminal. The electronic device can be connected with a plurality of image acquisition devices to obtain images acquired by the plurality of image acquisition devices. In one implementation, the plurality of image capture devices may be disposed within a compartment of a vehicle and may be captured from different angles with respect to a human face of a person within the vehicle. Alternatively, the plurality of image capturing devices may be located in any indoor or outdoor scene. The plurality of image acquisition devices can be used for shooting the human faces of people in an all-around mode. In one case, there may be an overlapping region of the image capturing regions of every two image capturing apparatuses positioned adjacently in the plurality of image capturing apparatuses.

In one case, the target face may be in a state of being strongly opened and closed, and accordingly, the electronic device may obtain images of the face in the state of being strongly opened and closed by the target face acquired by the plurality of image acquisition devices.

S102: and detecting image position information corresponding to each face key point in the target face from each face image.

In this step, the electronic device may detect each face image by using a preset face key point detection algorithm, and detect image position information corresponding to each face key point in a target face in the face image.

In one implementation, the preset face keypoint detection algorithm may include, but is not limited to: a deep learning-based key point detection model, an asm (active Shape model) algorithm, and a CPR (Cascaded position Regression) algorithm. The key point detection model based on deep learning may be a neural network model obtained by training based on sample images marked with key points of each face in the face, wherein the training process of the key point detection model based on deep learning may refer to the training process of the neural network model in the related art, and is not described herein again. The embodiment of the present invention does not limit the specific type of the preset face key point detection algorithm, and any algorithm that can detect image position information corresponding to each face key point of a target face in a face image can be applied to the embodiment of the present invention.

The electronic equipment can obtain the image position information corresponding to each face key point in the target face in the face image and can also obtain the semantic information corresponding to each face key point.

In another implementation manner, the electronic device may first process each face image by using a HeatMap algorithm to obtain a thermodynamic diagram corresponding to each face image, and then determine image position information corresponding to each face key point in the target face and semantic information corresponding to the image position information from the thermodynamic diagram corresponding to each face image by using a preset clustering algorithm.

Wherein the content of the first and second substances,

semantic information corresponding to the face key points is as follows: and information describing attribute information of the corresponding face key points in the target face image. For example: semantic information includes, but is not limited to: the left face outer eye corner point, the left face inner eye corner point, the right face outer eye corner point, the right face inner eye corner point and the like.

In the thermodynamic diagram corresponding to each face image, the pixel value of each pixel point represents the brightness of the pixel point. The larger the pixel value of the pixel point is, the larger the brightness of the pixel point is; and the pixel value of the pixel point in the thermodynamic diagram corresponding to the face image can also represent the possibility that the pixel point is a target point, namely a key point, and the greater the pixel value of the pixel point is, the greater the possibility that the corresponding pixel point is the target point is.

In one case, each face image may correspond to a plurality of thermodynamic diagrams, where each thermodynamic diagram corresponding to a face image corresponds to semantic information for describing attribute information of the corresponding face key point in the target face image. For example: the thermodynamic diagrams corresponding to the face image may include a thermodynamic diagram corresponding to the outer corner points of the left face, a thermodynamic diagram corresponding to the inner corner points of the left face, a thermodynamic diagram corresponding to the outer corner points of the right face, a thermodynamic diagram corresponding to the inner corner points of the right face, and the like. Correspondingly, in the thermodynamic diagram corresponding to the outer corner point of the left eye, if the pixel value corresponding to the pixel point (x0, y0) is the largest, the outer corner point of the left eye is most likely to be located at (x0, y 0).

S103: and determining the labeling position information of each face key point of the target face in each face image based on the image position information corresponding to the face key point in the target face in each face image and a preset key point labeling rule.

Wherein, presetting the key point marking rule comprises: the method comprises the following steps of labeling rules based on image position information corresponding to face key points in a target face image and/or labeling rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions, wherein the target face image is as follows: and the face images meet the specified screening rules in the face images.

In this step, a preset key point labeling rule is pre-stored in the electronic device locally or in a connected storage device, after the electronic device obtains image position information corresponding to each face key point in a target face in each face image, a face image satisfying a specified screening rule can be determined from the face image based on the preset key point labeling rule and the image position information corresponding to each face key point in the target face in each face image, that is, a face image with an accurate position detection result is determined as the target face image, and the labeling position information of each face key point in each face image is determined based on the image position information corresponding to each face key point in the target face image, and/or the image position information corresponding to the face key point satisfying the specified position condition obtained by traversal in each face image is determined to determine the image position information corresponding to the face key point with an accurate position detection result, and determining the labeling position information of each face key point in each face image. And obtaining the marking position information of the key points of the human face with accurate positions in the human face images.

Subsequently, in an implementation manner, after determining the annotation position information of each face key point in each face image, the electronic device may perform a subsequent process based on the annotation position information of each face key point in each face image, for example: and performing a face recognition process, or performing a fatigue driving behavior detection process, or performing a personnel identity verification process, or sending the labeled position information of each face key point in each face image to other electronic equipment, so that the other electronic equipment executes the corresponding preset process.

By applying the embodiment of the invention, the target face image which meets the specified screening rule, namely has accurate position detection result, is determined from the face image based on the rule of labeling the image position information corresponding to the face key points in the target face image, further, the position information of the key points of the face in each face image is marked by using the image position information corresponding to the key points of the face in the determined target face image, and/or based on the rule of labeling the image position information corresponding to the face key points which are obtained by traversing in each face image and meet the specified position condition, labeling the image position information corresponding to the face key points with accurate detection results in each face image, the marked position information of the human face characteristic points with accurate positions is obtained, and the accuracy of the position identification result of the human face key points is improved.

In another embodiment of the present invention, as shown in fig. 2, the method may include the steps of:

s201: and acquiring the face images acquired by a plurality of image acquisition devices in the same acquisition period.

S202: based on each face image, its corresponding thermodynamic diagram is determined.

S203: and aiming at each face image, determining image position information corresponding to each face key point in the target face and semantic information corresponding to the image position information from the thermodynamic diagram corresponding to the face image by using a preset clustering algorithm.

S204: and aiming at each face image, determining image position information corresponding to each face key point corresponding to the target semantic information from the face key points in the face image based on the semantic information corresponding to each face key point in the target face in the face image.

The target semantic information is semantic information corresponding to image position information corresponding to at least two face key points;

s205: aiming at each target semantic information corresponding to each face image, sequentially taking the image position information corresponding to the face key point corresponding to the target semantic information as the target image position information corresponding to the face key point corresponding to the target semantic information; and determining a reprojection error corresponding to the face image based on the target image position information corresponding to the face key point corresponding to each target semantic information in the face image, the image position information corresponding to the face key points corresponding to other semantic information and a preset three-dimensional face model.

Wherein, the other semantic information is: semantic information other than the target semantic information;

s206: and determining the labeling position information of each face characteristic point in each face image based on each reprojection error corresponding to the face image and the image position information corresponding to a group of face key points corresponding to each reprojection error corresponding to the face image.

In the embodiment of the present invention, it is considered that, in some cases, the preset face keypoint detection algorithm is directly used to detect the face keypoints in the face image, and a situation that the position information of the detected face feature points is inaccurate may occur, for example, when the target face is in a state of opening and closing eyes with force, errors are likely to occur in the detection of the image position information of the eye corner points of the human eyes. In order to ensure the accuracy of the position information of the detected face feature points, the electronic device may first process each face image by using a HeatMap algorithm to obtain a thermodynamic diagram corresponding to each face image, and then determine image position information corresponding to each face key point in the target face and semantic information corresponding to the image position information from the thermodynamic diagram corresponding to each face image by using a preset clustering algorithm. And then, determining the labeling position information of each face key point in each face image based on the image position information corresponding to each face key point in the target face determined from the thermodynamic diagram and the semantic information corresponding to the image position information.

In one aspect, the process of determining the labeled position information of each face key point in each face image based on the image position information corresponding to each face key point in the target face determined from the thermodynamic diagram and the semantic information corresponding to the image position information may be: for each face image, determining semantic information corresponding to at least two image position information from the semantic information as target semantic information based on semantic information corresponding to each face key point in the target face in the face image; and then, determining image position information corresponding to the face key points corresponding to the target semantic information from the face key points in the face image.

Aiming at each target semantic information corresponding to each face image, sequentially taking the image position information corresponding to the face key point corresponding to the target semantic information as the target image position information corresponding to the face key point corresponding to the target semantic information; determining a reprojection error corresponding to the face image based on target image position information corresponding to a face key point corresponding to each target semantic information in the face image, image position information corresponding to face key points corresponding to other semantic information, and a preset three-dimensional face model, wherein the preset three-dimensional face model includes three-dimensional position information of a space point corresponding to the face key point corresponding to each target semantic information, and three-dimensional position information of a space point corresponding to the face key point corresponding to other semantic information, and the preset three-dimensional face model can be determined based on a 3d fuzzy model (3d deformable model).

Specifically, the process of determining the reprojection error corresponding to the face image may be: determining three-dimensional position information of a space point corresponding to each target semantic information and three-dimensional position information of a space point corresponding to other semantic information from a preset three-dimensional face model as model three-dimensional position information, and further converting the space point corresponding to the model three-dimensional position information from a rectangular space coordinate system where the preset three-dimensional face model is located to an equipment coordinate system of image acquisition equipment corresponding to the face image based on a preset position conversion relation; further, projecting a spatial point corresponding to each target semantic information and spatial points corresponding to other semantic information in an equipment coordinate system of the image acquisition equipment corresponding to the face image into the face image by using a preset projection formula of the image acquisition equipment corresponding to the face image, and determining projection position information of a projection point of the spatial point corresponding to a face key corresponding to each target semantic information in the face image and projection position information of a projection point of the spatial point corresponding to a face key corresponding to each other semantic information in the face image; and calculating projection position information corresponding to each target semantic information, target image position information corresponding to the face key point corresponding to each target semantic information, and the distance between the projection position information corresponding to each other semantic information and the image position information corresponding to each other semantic information, and taking the sum of the calculated distances as a re-projection error corresponding to the face image.

The image acquisition equipment corresponding to the face image is as follows: and the image acquisition equipment acquires the face image, and the image acquisition equipment determines the face image according to a preset projection formula. The preset position conversion relation is as follows: and presetting a conversion relation between a rectangular coordinate system of a space where the three-dimensional face model is located and an equipment coordinate system of image acquisition equipment corresponding to the face image.

In another case, the target semantic information may be set in advance according to actual conditions, in an implementation manner, the target human face is in a state of being opened and closed with force, accordingly, considering that many wrinkles are likely to occur around the human eye under the condition that the target human face is in the state of being opened and closed with force, due to the occurrence of wrinkles, a detection position of an eye corner point of the human eye is wrong, for example: the eyebrow tail points of eyebrows above the human eyes are detected as the outer eye corner points of the human eyes, and in order to ensure the accuracy of the position information of the outer eye corner points in the determined key points of the human face, the target semantic information can comprise a left outer eye corner point and/or a right outer eye corner point. Another example is: the eyebrow points of the eyebrows above the human eyes are detected as the inner eye angular points of the human eyes, and in order to ensure the accuracy of the position information of the inner eye angular points in the determined key points of the human face, the target semantic information can comprise left inner eye angular points and/or right inner eye angular points.

Correspondingly, the process of determining the labeling position information of each face key point in each face image based on the image position information corresponding to each face key point in the target face determined from the thermodynamic diagram and the semantic information corresponding to the image position information may be: and determining image position information corresponding to the face key points corresponding to the target semantic information from the face key points in the face image directly based on semantic information corresponding to each face key point in the target face in the face image, and if the face key points corresponding to the semantic information representing the face key points as canthus points and the image position information corresponding to the face key points are determined from the face image.

Further, aiming at each target semantic information corresponding to each face image, sequentially taking the image position information corresponding to the face key point corresponding to the target semantic information as the target image position information corresponding to the face key point corresponding to the target semantic information; and determining a reprojection error corresponding to the face image based on the target image position information corresponding to the face key point corresponding to each target semantic information in the face image, the image position information corresponding to the face key points corresponding to other semantic information and a preset three-dimensional face model.

For example, when the target face is in a state of opening and closing the eyes with force, two pieces of image position information corresponding to the target semantic information that is the corner of the left eye and/or two pieces of image position information corresponding to the target semantic information that is the corner of the right eye may appear in the thermodynamic diagram corresponding to the face image, as shown in fig. 3. Correspondingly, the electronic device can determine the image position information corresponding to the left eye external canthus point with accurate position from the 2 image position information corresponding to the left eye external canthus point with the target semantic information, and can sequentially use the 2 image position information corresponding to the left eye external canthus point as the target image position information corresponding to the left eye external canthus point; and/or determining image position information corresponding to the right-eye external canthus point with accurate position from 2 image position information corresponding to the right-eye external canthus point with target semantic information, wherein the 2 image position information corresponding to the right-eye external canthus point can be used as target image position information corresponding to the right-eye external canthus point in sequence; and determining a reprojection error corresponding to the face image based on the target image position information corresponding to the left eye external canthus point, the target image position information corresponding to the right eye external canthus point, the image position information corresponding to the face key point corresponding to other semantic information and a preset three-dimensional face model.

It can be understood that the number of the reprojection errors corresponding to the face image is related to the number of the target semantic information in the face image and the number of the image position information corresponding to each target semantic information. For example, if the number of target semantic information in the face image is 2 and the number of image position information corresponding to each target semantic information in the face image is 3, the number of reprojection errors corresponding to the face image is 2 × 3 — 6.

Subsequently, after the electronic device determines the reprojection error corresponding to each face image, for each face image, based on each reprojection error corresponding to the face image and the image position information corresponding to a group of face key points corresponding to each reprojection error corresponding to the face image, the annotation position information of each face feature point in the face image is determined.

In another embodiment of the present invention, considering that the relationship between the three-dimensional position information corresponding to each human face feature point in the preset three-dimensional human face model conforms to the actual human face feature, the smaller the reprojection error corresponding to the human face image is, the higher the accuracy of the image position information representing the human face feature point corresponding to each semantic information detected from the human face image is. Correspondingly, the preset key point labeling rule may include: a rule for labeling based on image position information corresponding to the face key points which meet the specified position conditions and are obtained by traversing in each face image;

the S206 may include the following steps 011:

011: and determining image position information corresponding to a group of human face key points corresponding to the reprojection error with the minimum numerical value in the reprojection errors corresponding to the human face image as the labeling position information of the human face characteristic points in the human face image.

In the embodiment of the present invention, the face key points satisfying the specified position condition may refer to: and among all face key points in the face image, calculating a reprojection error corresponding to the face image by using the image position information corresponding to the face key points, wherein the reprojection error is the face key point with the minimum value in the plurality of reprojection errors corresponding to the face image.

Considering that the relationship between the three-dimensional position information corresponding to each human face characteristic point in the preset three-dimensional human face model conforms to the actual human face characteristics, the smaller the reprojection error corresponding to the determined human face image is, the higher the accuracy of the image position information representing the human face characteristic points corresponding to each semantic information detected from the image is, and meanwhile, the different accuracy of the detection of the image position information of the human face characteristic points in different human face images is considered. In order to determine the labeling position information of the face key points with higher accuracy in each face image. In another embodiment of the present invention, the preset keyword annotation rule includes: the method comprises the steps of carrying out annotation rules based on image position information corresponding to face key points in a target face image and carrying out annotation rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions;

the step S206 may include the following steps 021-:

021: and aiming at each face image, determining image position information corresponding to a group of face key points corresponding to the reprojection error with the minimum numerical value in the reprojection errors corresponding to the face image as the middle position information of each face characteristic point in the face image.

022: and displaying each face image containing the middle position information of each face key point so that a user can select at least two frames of target face images with accurate face key point detection results from the displayed face images containing the middle position information of each face key point.

023: and determining at least two frames of target face images selected by the user based on the operation of the user.

024: and determining three-dimensional position information corresponding to each face key point based on the middle position information of each face key point in each target face image and the equipment information of the image acquisition equipment corresponding to each target face image.

025: and determining the labeling position information of each face key point in each face image based on the three-dimensional position information corresponding to each face key point, the equipment information of the image acquisition equipment corresponding to each face image and a preset projection formula.

In the embodiment of the invention, aiming at each face image, the electronic equipment determines the reprojection error with the minimum value from the reprojection errors corresponding to the face image, and determines the image position information corresponding to a group of face key points corresponding to the reprojection error with the minimum value as the middle position information of each face characteristic point in the face image; furthermore, each face image containing the middle position information of each face key point is displayed, so that a user can select at least two frames of target face images with accurate face key point detection results from the displayed face images containing the middle position information of each face key point; and determining at least two frames of target face images selected by the user based on the selection operation of the user.

The electronic equipment determines three-dimensional position information corresponding to the face key points corresponding to the semantic information based on the middle position information of the face key points corresponding to the semantic information in each target face image and the equipment information of the image acquisition equipment corresponding to each target face image aiming at each semantic information; after the three-dimensional position information corresponding to the face key point corresponding to each semantic information is determined, the equipment information and a preset projection formula of the image acquisition equipment corresponding to each face image are utilized, the space point represented by the three-dimensional position information corresponding to the face key point corresponding to each semantic information is projected into each face image, and the labeling position information of the face key point corresponding to each semantic information in each face image, namely the labeling position information of each face key point in each face image, is determined.

The device information of the image acquisition device corresponding to the face image may include: and the image acquisition equipment acquires the face image, and acquires the equipment pose information and the equipment internal reference information when the face image is acquired. The device internal reference information of the image acquisition device includes but is not limited to: the length of each pixel point in the direction of a transverse axis of an imaging surface of the image acquisition equipment, the length of each pixel point in the direction of a longitudinal axis, a focal length, position information of an image principal point, a zoom factor and the like, wherein the image principal point is an intersection point of an optical axis of the image acquisition equipment and an image plane. The device pose information of the image capturing device may include: and the position and the posture of the face image are acquired by the image acquisition equipment.

In the embodiment of the invention, the face key points with higher accuracy of the corresponding image position information are determined by screening the image position information corresponding to each face key point in the face image and manually detecting the position information of the face key points detected in the multi-frame image by personnel, and the subsequent position labeling of the face key points is carried out, so that the accuracy of the labeled position information of the face key points in each face image is better improved.

Considering the situation that the position detection of the face characteristic points in the multi-frame face images is more accurate and the position detection of the face characteristic points is less accurate, the target face image with the more accurate position detection of the face characteristic points in the frame face images is identified manually, and the labeled position information of each face key point in each face image is determined based on the image position information corresponding to the face characteristic points in the target face image, so that the accuracy of the position identification result of the face key points is improved. Correspondingly, in another embodiment of the present invention, the presetting of the keyword annotation rule includes: a rule for labeling based on image position information corresponding to a face key point in a target face image; as shown in fig. 4, the method may include the steps of:

s401: and acquiring the face images acquired by a plurality of image acquisition devices in the same acquisition period.

S402: and detecting image position information corresponding to each face key point in the target face from each face image.

S403: and displaying each face image containing the image position information corresponding to each face key point in the target face so that the user can select at least two frames of target face images with accurate face key point detection results from the displayed face images containing the image position information corresponding to each face key point in the target face.

S404: and determining at least two frames of target face images selected by the user based on the operation of the user.

S405: and determining three-dimensional position information corresponding to each face key point based on image position information corresponding to each face key point in each target face image and equipment information of image acquisition equipment corresponding to each target face image.

S406: and determining the labeling position information of each face key point in each face image based on the three-dimensional position information corresponding to each face key point, the equipment information of the image acquisition equipment corresponding to each face image and a preset projection formula.

In the embodiment of the invention, after detecting the image position information corresponding to each face key point in the target face from each face image, the electronic equipment directly displays each face image containing the image position information corresponding to each face key point in the target face, and a user can select at least two frames of target face images with accurate detection results of the face key points from the displayed face images containing the image position information corresponding to each face key point in the target face and execute the selection operation. The electronic equipment determines at least two frames of target face images selected by a user based on the selection operation of the user, wherein the at least two frames of target face images are as follows: and selecting the face image with more accurate image position information corresponding to the face key point contained in the face image by the user.

Subsequently, the electronic equipment determines three-dimensional position information corresponding to the face key point corresponding to the semantic information based on the middle position information of the face key point corresponding to the semantic information in each target face image and the equipment information of the image acquisition equipment corresponding to each target face image aiming at each semantic information; after the three-dimensional position information corresponding to the face key point corresponding to each semantic information is determined, the equipment information and a preset projection formula of the image acquisition equipment corresponding to each face image are utilized, the space point represented by the three-dimensional position information corresponding to the face key point corresponding to each semantic information is projected into each face image, and the labeling position information of the face key point corresponding to each semantic information in each face image, namely the labeling position information of each face key point in each face image, is determined.

Corresponding to the above method embodiment, an embodiment of the present invention provides a device for labeling key points of a human face, and as shown in fig. 5, the device includes:

an obtaining module 510 configured to obtain face images acquired by a plurality of image acquisition devices in the same acquisition period, wherein the plurality of image acquisition devices shoot a target face from different angles;

a detection module 520 configured to detect image position information corresponding to each face key point in the target face from each face image;

a determining module 530, configured to determine labeling position information of each face key point of the target face in each face image based on image position information corresponding to a face key point in the target face in each face image and a preset key point labeling rule, where the preset key point labeling rule includes: the method comprises the following steps of labeling rules based on image position information corresponding to face key points in a target face image, and/or labeling rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions, wherein the target face image is as follows: and the face images meet the specified screening rules.

In another embodiment of the present invention, the detecting module 520 includes:

a first determining unit (not shown in the figure) configured to determine a corresponding thermodynamic diagram based on each face image;

a second determining unit (not shown in the figures), configured to determine, for each face image, image position information corresponding to each face key point in the target face and semantic information corresponding to the image position information from a thermodynamic diagram corresponding to the face image by using a preset clustering algorithm;

the determining module 530 includes:

a third determining unit (not shown in the figures), configured to determine, for each face image, image location information corresponding to face key points corresponding to target semantic information from the face key points in the face image based on semantic information corresponding to each face key point in the target face in the face image, where the target semantic information is semantic information corresponding to image location information corresponding to at least two face key points;

a fourth determining unit (not shown in the figure), configured to, for each target semantic information corresponding to each face image, sequentially use image position information corresponding to a face key point corresponding to the target semantic information as target image position information corresponding to the face key point corresponding to the target semantic information; determining a reprojection error corresponding to the face image based on target image position information corresponding to a face key point corresponding to each target semantic information in the face image, image position information corresponding to face key points corresponding to other semantic information, and a preset three-dimensional face model, wherein the other semantic information is as follows: semantic information other than the target semantic information;

and a fifth determining unit (not shown in the figure), configured to determine, for each face image, annotation position information of each face feature point in the face image based on each reprojection error corresponding to the face image and image position information corresponding to a group of face key points corresponding to each reprojection error corresponding to the face image.

In another embodiment of the present invention, the preset keyword annotation rule includes: a rule for labeling based on image position information corresponding to the face key points which meet the specified position conditions and are obtained by traversing in each face image;

In another embodiment of the present invention, the preset keyword annotation rule includes: the method comprises the steps of carrying out annotation rules based on image position information corresponding to face key points in a target face image and carrying out annotation rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions;

In another embodiment of the present invention, the target semantic information includes a left eye corner point and/or a right eye corner point.

In another embodiment of the present invention, the preset keyword annotation rule includes: a rule for labeling based on image position information corresponding to a face key point in a target face image;

the determining module 530 is specifically configured to display each face image including image position information corresponding to each face key point in the target face, so that the user selects at least two frames of target face images with accurate face key point detection results from the displayed face images including image position information corresponding to each face key point in the target face;

The device and system embodiments correspond to the method embodiments, and have the same technical effects as the method embodiments, and specific descriptions refer to the method embodiments. The device embodiment is obtained based on the method embodiment, and for specific description, reference may be made to the method embodiment section, which is not described herein again. Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for labeling key points of a human face is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step of detecting image position information corresponding to each face key point in the target face from each face image comprises:

3. The method of claim 2, wherein the preset keypoint labeling rule comprises: a rule for labeling based on image position information corresponding to the face key points which meet the specified position conditions and are obtained by traversing in each face image;

4. The method of claim 2, wherein the preset keypoint labeling rule comprises: the method comprises the steps of carrying out annotation rules based on image position information corresponding to face key points in a target face image and carrying out annotation rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions;

5. The method of claims 2-4, wherein the target semantic information includes left eye corner points and/or right eye corner points.

6. The method of claim 1, wherein the preset keypoint labeling rule comprises: a rule for labeling based on image position information corresponding to a face key point in a target face image;

7. A labeling device for key points of a human face is characterized by comprising:

8. The apparatus of claim 7, wherein the detection module comprises:

the determining module includes:

9. The apparatus of claim 8, wherein the preset keypoint labeling rule comprises: a rule for labeling based on image position information corresponding to the face key points which meet the specified position conditions and are obtained by traversing in each face image;

10. The apparatus of claim 8, wherein the preset keypoint labeling rule comprises: the method comprises the steps of carrying out annotation rules based on image position information corresponding to face key points in a target face image and carrying out annotation rules based on image position information corresponding to the face key points which are obtained by traversing in each face image and meet specified position conditions;