CN110826421B

CN110826421B - Method and device for filtering faces with difficult gestures

Info

Publication number: CN110826421B
Application number: CN201910991099.9A
Authority: CN
Inventors: 邓卉; 田泽康; 危明
Original assignee: Ysten Technology Co ltd
Current assignee: Ysten Technology Co ltd
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2023-09-05
Anticipated expiration: 2039-10-18
Also published as: CN110826421A

Abstract

The application provides a method for filtering a face with a difficult gesture, which comprises the following steps: face detection, namely obtaining a rectangular face position; performing face key point detection on the rectangular face position by using a first face key positioning technology and a second face key positioning technology to obtain a first key point position set and a second key point position set; obtaining a similar transformation matrix for converting the first key point position set into an average face key point position set; transforming the first and second key point position sets to the face average face according to the similarity transformation matrix to obtain first and second transformed key point position sets; and calculating the distances between the first and second transformation key point position sets and the average face key point position set, wherein when the judgment distances are smaller than the threshold value, the image containing the face is a candidate face easy to pose, and otherwise, the image containing the face is a face difficult to pose. The difficult gesture face is selected through screening, so that the accuracy of face recognition is ensured. Corresponding apparatus, devices and media are also provided.

Description

Method and device for filtering faces with difficult gestures

Technical Field

The application belongs to the technical field of image processing, and particularly relates to a method and a device for filtering a face with a difficult gesture, a computer readable medium and electronic equipment.

Background

The human face is an important biological feature, has the characteristics of complex structure, great detail change and the like, and simultaneously contains a great amount of information. At present, face recognition and analysis are widely applied, and technologies such as automatic face recognition, facial expression analysis and three-dimensional face reconstruction are vigorously developed.

However, when a face presents a difficult pose such as an extreme side face, the accuracy of face recognition and analysis may be reduced. Particularly, on a platform (such as a mobile terminal) with high real-time requirements and weak calculation capability, a very complex network cannot be used for face recognition and analysis, and when a difficult face gesture occurs, the recognition accuracy can be greatly reduced.

The difficult pose face sample (an example is shown in fig. 2) refers to a face sample which has very little face information presented in a camera when the face is rotated at a very large angle, so that face recognition and analysis are very difficult.

The current method for filtering the face with the difficult gesture is to position key points of the face, calculate the gesture of the face according to the key points, and judge whether the gesture of the face is good or bad. However, in the case where the extreme side face or the face posture angle is too large, the face key point positioning method is positioned erroneously, so that the determined posture is also erroneous.

Disclosure of Invention

In order to solve the defects in the prior art, the application provides a method for cross verification by two methods aiming at recognizing the face with the difficult gesture, so as to eliminate the difficult gesture, ensure the accuracy of positioning key points and further ensure the accuracy of face recognition and analysis.

Specifically, the application provides a method for filtering a face with a difficult gesture, which comprises the following steps:

s110, acquiring an image containing a human face;

s120, carrying out face detection on the image containing the face to obtain a rectangular position of the face;

s130, performing face key point detection on the rectangular face position by using a first face key point positioning technology to obtain a first key point position set;

s140, performing face key point detection on the rectangular face position by using a second face key positioning technology to obtain a second key point position set;

s150, acquiring a face average face and a corresponding average face key point position set, and acquiring a similarity transformation matrix for converting the first key point position set into the average face key point position set according to the average face key point position set and the first key point position set;

s160, transforming the first key point position set and the second key point position set to the face average face according to the similarity transformation matrix to obtain a first transformation key point position set and a second transformation key point position set;

s170, calculating the distances between the first transformation key point position set, the second transformation key point position set and the average face key point position set, obtaining a first distance between the first transformation key point position set and the second transformation key point position set, obtaining a second distance between the first transformation key point position set and the average face key point position set, judging that the image containing the face is a candidate face easy to pose when the first distance, the second distance and the third distance are all smaller than a threshold value, and judging that the image containing the face is a face easy to pose otherwise the face difficult to pose.

Further, the method further comprises the following steps:

and S180, when the image containing the human face is a candidate easy-posture human face, judging whether the corresponding first key point position set and second key point position set are projected to a two-dimensional plane within the range of the image containing the human face, and when the two-dimensional plane is not in the range, judging that the candidate easy-posture human face is a difficult-posture human face.

Further, the step of calculating the distance between every two of the first transformation key point position set, the second transformation key point position set and the average face key point position set includes:

s171, obtaining a distance vector and a distance of a minimum point pair according to the minimum point of the Euclidean distance between the first group of key points and the second group of key points;

s172, removing the minimum point pair from the first set of key points and the second set of key points;

s173, performing translation processing on the first group of key points and the second group of key points after the minimum point pair is removed according to the distance vector, and obtaining a first group of key points and a second group of key points after transformation;

and S174, taking the transformed first set of key points and the transformed second set of key points as the first set of key points and the second set of key points in the step S171, repeatedly executing the steps S171 to S173 until the key points in the first set of key points and the second set of key points are removed, and adding the distances of the minimum point pairs obtained by each iterative calculation as the distances between the first set of key points and the second set of key points.

Further, the distance in step S170 includes one or more of hamming distance, euclidean distance, and marquee distance.

Further, the step S180 specifically includes determining whether the nose point in the first set of keypoint locations and the nose point in the second set of keypoint locations are projected onto the two-dimensional plane within the range of the image containing the face, and if not, determining that the candidate face with easy pose is a face with difficult pose.

Further, the step of determining whether the nose point projected onto the two-dimensional plane is within the image range containing the face comprises:

and judging whether the nose tip point is in a rectangle formed by a left eye center, a right eye center, a left mouth corner and a right mouth corner, if so, projecting the nose tip point to a two-dimensional plane in the image range containing the human face, otherwise, not in the image range containing the human face.

Further, the first face key point positioning technique and the second face key point positioning technique include: the first face key point positioning technology and the second face key point positioning technology are different based on one or more of a convolutional neural network CNN algorithm, a supervised descent SDM algorithm, a subjective shape model ASM algorithm and a cascade parallel transmission regression method.

In another aspect of the present application, there is provided a device for filtering difficult poses of a face, comprising:

the receiving module is used for acquiring an image containing a human face;

the face rectangle detection module is used for carrying out face detection on the image containing the face to obtain a face rectangle position;

the first key point extraction module is used for carrying out face key point detection on the rectangular face position by using a first face key point positioning technology to obtain a first key point position set;

the second key point extraction module is used for carrying out face key point detection on the rectangular face position by using a second face key positioning technology to obtain a second key point position set;

the similarity transformation matrix calculation module is used for acquiring a face average face and a corresponding average face key point position set, and acquiring a similarity transformation matrix for converting the first key point position set into the average face key point position set according to the average face key point position set and the first key point position set;

the key point transformation module is used for transforming the first key point position set and the second key point position set to the face average face according to the similarity transformation matrix to obtain a first transformation key point position set and a second transformation key point position set;

the screening module is used for calculating the distances between every two of the first transformation key point position set, the second transformation key point position set and the average face key point position set, obtaining a first distance between the first transformation key point position set and the second transformation key point position set, obtaining a second distance between the first transformation key point position set and the average face key point position set, judging a third distance between the second transformation key point position set and the average face key point position set, and judging that the image containing the face is a candidate face easy to pose when the first distance, the second distance and the third distance are smaller than the threshold value, otherwise, judging that the image containing the face is a face with a difficult pose.

In a third aspect of the present application, there is provided an electronic apparatus comprising:

one or more processors;

a storage device having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of any of the above.

In a fourth aspect of the application, a computer readable medium is provided, on which a computer program is stored, wherein the program, when executed by a processor, implements any of the methods described above.

The face sample method for filtering the difficult gestures provided by the embodiment of the application carries out cross verification through two recognition methods, and has the following beneficial effects:

1. and the face with a difficult gesture is screened out, so that the error rate of face recognition and analysis is reduced.

2. The system is applicable to various mobile terminals, such as smart phones and tablet computers.

Drawings

The features and advantages of the present application will be more clearly understood by reference to the accompanying drawings, which are illustrative and should not be construed as limiting the application in any way, in which:

FIG. 1 is a schematic diagram of a system architecture of operations of a method and an extraction device for filtering difficult poses in some examples of the present application;

FIG. 2 is a schematic illustration of a difficult pose face in some examples of the application;

FIG. 3 is a flow chart of a method for filtering difficult poses faces in some embodiments of the application;

FIG. 4 is a flowchart of a method for filtering a face with a difficult pose according to other embodiments of the present application;

FIG. 5 is a schematic flow chart of a distance calculation process in a method for filtering a face with a difficult gesture according to other embodiments of the present application;

FIG. 6 is a schematic diagram of a system for filtering difficult pose faces based on the method for filtering difficult pose faces in the above-described figures according to some embodiments of the present application;

FIG. 7 is a schematic diagram of a computer system running a method or an extraction device for filtering faces with difficult poses according to some embodiments of the present application;

FIG. 8 is a schematic diagram of a result of extracting a face rectangle from a method for filtering a face with a difficult gesture according to some embodiments of the present application;

fig. 9 is a schematic diagram of a result of extracting key points by a method for filtering a face with a difficult gesture according to some embodiments of the present application.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the scope of the present application is not limited to the specific embodiments disclosed below.

FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of a method of filtering difficult pose faces or a device of filtering difficult pose faces of embodiments of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or transmit data (e.g., video) or the like. Various communication client applications, such as video playing software, video processing class applications, web browser applications, shopping class applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting data transmission, including but not limited to smartphones, tablet computers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present application is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for videos displayed on the terminal devices 101, 102, 103. The background server may analyze and process the received data, such as an image processing request, and feed back a processing result (for example, a video clip or other data obtained by dividing a video) to an electronic device (for example, a terminal device) communicatively connected to the background server.

It should be noted that, the method for filtering the difficult-pose face provided by the embodiment of the present application may be executed by the server 105, and accordingly, the device for filtering the difficult-pose face may be disposed in the server 105. In addition, the method for filtering the difficult-pose face provided by the embodiment of the application can also be executed by the terminal equipment 101, 102 and 103, and correspondingly, the device for filtering the difficult-pose face can also be arranged in the terminal equipment 101, 102 and 103.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. The present application is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the filtering difficult pose face method is operating does not require data transmission with other electronic devices, the system architecture may include only the electronic device (e.g., terminal devices 101, 102, 103 or server 105) on which the filtering difficult pose face method is operating.

Fig. 3 shows a general flow of an algorithm for filtering difficult pose faces according to an embodiment of the present application, which specifically includes the following steps:

1. an image containing a face is read.

2. And carrying out face detection on the image by using a face detection technology, and outputting the detected rectangular position of the face. As a result, as shown in fig. 8, each face frame alone performs the subsequent operation to determine whether or not it is difficult to pose the face. If no face is detected, no subsequent processing is performed.

3. And detecting the face key points of the detected face by using a first face key point positioning technology to obtain the positions shape of a plurality of key points on the face, wherein the shape is a coordinate position set of a series of predefined face key points in the face image.

4. And detecting the facial key points of the detected face by using a second face key point positioning technology to obtain the positions of a plurality of key points on the face.

The results of the keypoint detection are shown in fig. 9, which shows the respective keypoints.

5. The average face of the face and the face key point avgShape are known, and a similar transformation matrix Trans of the key points of the face, obtained by the first face key point positioning technology, transformed to the key points of the average face of the face is calculated.

6. And transforming the face key points obtained by the first and second face key point positioning technologies to the average face to obtain shape1 and shape2 respectively according to a similar transformation matrix Trans of the face positioned by the first positioning method to the average face.

7. And calculating the distances between every two of the shape1, shape2 and the face key points avgShape of the average face of the face. Define the distance between shape1 and shape2 as dist ₁₂ The distance between shape1 and avgShape is dist _1a The distance between shape2 and avgShape is dist _2a . If the distances between the three are small (specifically denoted as dist ₁₂ Less than threshold th1, and dist _1a Less than threshold th2, and dist _2a Less than threshold th 3), the face is a candidate easy-pose face, otherwise a difficult-pose face.

Wherein the distance between the two sets of facial keypoints is calculated using a custom shape editing distance (shape edit distance). The method comprises the following specific steps:

we define two sets of facial key points as respectivelyWherein pt _i Is the pixel coordinates of the predefined key points of the face.

1. Finding out a pair of points with minimum Euclidean distance between the first group of key points shape1 and the second group of key points shape2And calculate will->Move to +.>Distance vector (dx, dy) of the pair of points and Euclidean distance dist _i 。

2. Will beAnd removing from the shape1 and the shape2 so as not to participate in subsequent distance calculation.

3. And moving each point in the first group of key point shape1 by a distance (dx, dy) to obtain a moved key point set shape1'.

4. Returning to step 1, calculating the Euclidean distance of the pair of points with the smallest distance for the two groups of new key points shape1 'and shape2', and completing steps 2 and 3 to form two groups of new key points.

Iterating in this way until shape1 in step 4 ^’ And the shape2' is empty, namely the distance of all the point pairs of the two groups of key points is calculated. Distance dist of a pair of points obtained by each iteration _i And adding to obtain the shape editing distance between the two groups of facial key points.

8. And (3) for the candidate face easy to pose in the step seven, calculating whether the nose point is projected to the two-dimensional plane or not. Setting the flag=1 indicates that the nose point is projected into the face plane after the two-dimensional plane, and flag=0 indicates that it is absent. And projecting nose points calculated according to key points obtained by the first and second facial key point positioning technologies to a two-dimensional plane to respectively represent a flag1 and a flag2 in a face. This candidate face is determined to be a difficult pose face when flag1=0 and flag2=0.

The method for calculating whether the nose point is projected to the two-dimensional plane or not and then whether the nose point is projected to the face plane or not comprises the following steps: firstly, acquiring positioned face key points, and then judging whether the nose tip is in a rectangle formed by four points of a left eye center, a right eye center, a left mouth corner and a right mouth corner. If the nose tip is in the rectangle, the nose tip is in the face plane, otherwise, the nose tip is not in the face plane.

Referring to fig. 4, another embodiment of the present application related to an algorithm for filtering difficult pose faces is as follows:

s110, acquiring an image containing a human face;

Further, the method further comprises the following steps:

and S180, when the image containing the human face is a candidate easy-posture human face, judging whether the corresponding first key point position set and second key point position set are projected to a two-dimensional plane within the range of the image containing the human face, and when the two-dimensional plane is not in the range, judging that the candidate easy-posture human face is a difficult-posture human face. The face with the difficult gesture can be screened out more finely through the step.

Further, as shown in fig. 5, the step of calculating the distances between every two of the first transformation keypoint location set, the second transformation keypoint location set and the average face keypoint location set includes:

The steps S171 to S174 are the process of calculating the shape editing distance in the embodiment of the present application, and the present application characterizes the distance between the set of key points by the shape editing distance.

Further, the distance in step S170 may further include one or more of hamming distance, euclidean distance, and marquee distance.

The method for filtering the face with the difficult gesture acquires key points through two positioning technologies, maps the two groups of key points into the average face to perform distance calculation, judges whether the face is the face with the difficult gesture according to the distance between every two key points, and further performs projection verification in the face easy to identify, refines and screens the face with the difficult gesture. By adopting a cross verification mode, the face with a difficult gesture is identified, and the accuracy of the face identification is enhanced.

Based on the above method for filtering difficult pose faces, some other embodiments of the present application are shown in fig. 6, and a device 100 for filtering difficult pose faces is provided, which includes:

a receiving module 110, configured to acquire an image including a face;

the face rectangle detection module 120 is configured to perform face detection on the image containing the face, and obtain a face rectangle position;

the first key point extraction module 130 is configured to perform facial key point detection on the rectangular face position by using a first face key point positioning technology, so as to obtain a first key point position set;

a second key point extraction module 140, configured to perform face key point detection on the rectangular face position by using a second face key positioning technology, so as to obtain a second key point position set;

the similarity transformation matrix calculation module 150 is configured to obtain an average face of a face and a corresponding average face keypoint location set, and obtain a similarity transformation matrix for converting the first keypoint location set to the average face keypoint location set according to the average face keypoint location set and the first keypoint location set;

the key point transformation module 160 is configured to transform the first set of key point positions and the second set of key point positions to the face average face according to the similarity transformation matrix, and obtain a first set of transformed key point positions and a second set of transformed key point positions;

the screening module 170 is configured to calculate the distances between the first transformed key point position set, the second transformed key point position set, and the average face key point position set, obtain a first distance between the first transformed key point position set and the second transformed key point position set, a second distance between the first transformed key point position set and the average face key point position set, and a third distance between the second transformed key point position set and the average face key point position set, and determine that the image including the face is a candidate face with easy pose when the first distance, the second distance, and the third distance are all less than the threshold, otherwise, the image including the face is a face with difficult pose.

The specific execution steps of the above modules are described in detail in the corresponding steps in the method for filtering the face with the difficult gesture, and will not be described in detail herein.

Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing the control device of an embodiment of the present application. The control device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present application.

As shown in fig. 7, the computer system 800 includes a Central Processing Unit (CPU) 801, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 801.

The computer readable medium according to the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes an acquisition unit, a segmentation unit, a determination unit, and a selection unit. The names of these units do not limit the unit itself in some cases, and the acquisition unit may also be described as "a unit that acquires a drawing image to be processed", for example.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an image containing a human face; performing face detection on the image containing the face to obtain a rectangular position of the face; performing face key point detection on the rectangular face position by using a first face key point positioning technology to obtain a first key point set; performing face key point detection on the rectangular face position by using a second face key positioning technology to obtain a second key point position set; acquiring a face average face and a corresponding average face key point position set, and acquiring a similarity transformation matrix for converting the first key point position set into the average face key point position set according to the average face key point position set and the first key point position set; transforming the first key point position set and the second key point position set to the face average face according to the similarity transformation matrix to obtain a first transformation key point position set and a second transformation key point position set; and calculating the distances between the first transformation key point position set, the second transformation key point position set and the average face key point position set, acquiring a first distance between the first transformation key point position set and the second transformation key point position set, a second distance between the first transformation key point position set and the average face key point position set, a third distance between the second transformation key point position set and the average face key point position set, and judging that the image containing the face is a candidate face easy to pose when the first distance, the second distance and the third distance are smaller than the threshold value, otherwise, judging that the image containing the face is a face difficult to pose.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the application referred to in the present application is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. The method for filtering the face with the difficult gesture is characterized by comprising the following steps:

1. reading an image containing a human face;

2. performing face detection on the image by using a face detection technology, and outputting a detected rectangular position of the face; each face frame independently carries out subsequent operation to judge whether the face is in a difficult posture or not; if the face is not detected, no subsequent processing is carried out;

3. performing face key point detection on the detected face by using a first face key point positioning technology to obtain a shape of positions of a plurality of key points on the face, wherein the shape is a coordinate position set of a series of predefined face key points in a face image;

4. performing face key point detection on the detected face by using a second face key point positioning technology to obtain positions of a plurality of key points on the face, and marking each key point;

5. knowing the average face and the face key points avgShape, and calculating a similar transformation matrix Trans from the key points of the face obtained by the first face key point positioning technology to the key points of the average face;

6. according to a similar transformation matrix Trans of the face to be positioned, which is obtained by a first positioning method, to an average face, and respectively transforming the face key points obtained by the first and second face key point positioning technologies to the average face to obtain shape1 and shape2;

7. calculating the distance between every two face key points avgShape of the face average face and the shape1 and shape2 in the steps five and six; define the distance between shape1 and shape2 as dist ₁₂ The distance between shape1 and avgShape is dist _1a The distance between shape2 and avgShape is dist _2a The method comprises the steps of carrying out a first treatment on the surface of the If dist ₁₂ Less than threshold th1, and dist _1a Less than threshold th2, and dist _2a If the face is smaller than the threshold th3, the face is a candidate face with easy gesture, otherwise, the face is a face with difficult gesture;

8. for the candidate face easy to pose in the step seven, calculating whether the nose point is projected to the two-dimensional plane or not in the face; setting a flag=1 to indicate that the nose point is projected to a two-dimensional plane and then is in the face plane, wherein flag=0 indicates that the nose point calculated according to the key points obtained by the first and second face key point positioning technologies is not in the face plane, and whether the nose point is projected to the two-dimensional plane is respectively indicated as a flag1 and a flag2 in the face or not is determined as a difficult pose face when the flag 1=0 and the flag 2=0;

the distance between the two groups of facial key points is calculated by using a self-defined shape editing distance; the method comprises the following specific steps:

defining two groups of facial key points as respectivelyWherein pt _i Pixel coordinates of key points predefined by the human face;

(1) Finding out a pair of points with minimum Euclidean distance between the first group of key points shape1 and the second group of key points shape2And calculate will->Move to +.>Distance vector (dx, dy) of the pair of points and Euclidean distance dist _i ；

(2) Will beRemoving from shape1 and shape2, so that they do not participate in subsequent distance calculation;

(3) Moving each point in the first group of key point shape1 by a distance (dx, dy) to obtain a moved key point set shape1', and moving each point in the second group of key point shape2 by a distance (dx, dy) to obtain a moved key point set shape2';

(4) Returning to the step (1), calculating the Euclidean distance of a pair of points with the smallest distance for the two groups of new key points shape1 'and shape2', and completing the step (2), and (3) forming two groups of new key points;

iterating until the shape1 'and the shape2' are empty in the step (4), namely, the distance calculation of all the point pairs of the two groups of key points is completed, and obtaining a pair of point distance dist by each iteration _i Adding to obtain the shape editing distance between two groups of facial key points;

the method for calculating whether the nose point is in the face plane after being projected to the two-dimensional plane comprises the following steps: firstly, acquiring positioned face key points, and then judging whether the nose tip is in a rectangle formed by four points of a left eye center, a right eye center, a left mouth corner and a right mouth corner, if so, the nose tip is in a face plane, otherwise, the nose tip is not in the face plane.

2. A facial device for filtering difficult poses, comprising:

the receiving module is used for acquiring an image containing a human face;

the face rectangle detection module is used for carrying out face detection on the image by using a face detection technology and outputting the detected face rectangle position;

the first key point extraction module is used for detecting the key points of the detected face by using a first face key point positioning technology to obtain the positions shape of a plurality of key points on the face;

the second key point extraction module is used for detecting the key points of the detected face by using a second face key point positioning technology to obtain the positions of a plurality of key points on the face;

the similarity transformation matrix calculation module is used for calculating a similarity transformation matrix Trans from the key points of the face obtained by the first face key point positioning technology to the key points of the average face of the face;

the key point transformation module is used for obtaining a similar transformation matrix Trans of transforming the positioned face to the average face according to the first positioning method, and transforming the face key points obtained by the first and second face key point positioning technologies to the average face to obtain shape1 and shape2 respectively;

the screening module is used for calculating the distance between every two of the shape1, the shape2 and the face key points avgShape of the average face of the face; define the distance between shape1 and shape2 as dist ₁₂ The distance between shape1 and avgShape is dist _1a The distance between shape2 and avgShape is dist _2a The method comprises the steps of carrying out a first treatment on the surface of the If dist ₁₂ Less than threshold th1, and dist _1a Less than threshold th2, and dist _2a When the face is smaller than a threshold th3, judging that the face is a candidate face with easy gesture, otherwise, judging that the face is a face with difficult gesture; for candidate faces easy to pose, calculating whether the nose point is projected to a two-dimensional plane or not in the face; setting a flag=1 to indicate that the nose point is projected to a two-dimensional plane and then is in the face plane, wherein flag=0 indicates that the nose point calculated according to the key points obtained by the first and second face key point positioning technologies is not in the face plane, and whether the nose point is projected to the two-dimensional plane is respectively indicated as a flag1 and a flag2 in the face or not is determined as a difficult pose face when the flag 1=0 and the flag 2=0;

the specific steps for calculating the distance between two sets of facial key points using the custom shape edit distance are as follows:

3. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as recited in claim 1.

4. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method as claimed in claim 1.