WO2023231400A1

WO2023231400A1 - Method and apparatus for predicting facial angle, and device and readable storage medium

Info

Publication number: WO2023231400A1
Application number: PCT/CN2022/142276
Authority: WO
Inventors: 何金辉; 肖嵘; 王孝宇
Original assignee: 青岛云天励飞科技有限公司; 深圳云天励飞技术股份有限公司
Priority date: 2022-05-31
Filing date: 2022-12-27
Publication date: 2023-12-07
Also published as: CN117197853A

Abstract

The present application is applicable to the technical field of image recognition. Provided are a method and apparatus for predicting a facial angle, and a device and a readable storage medium. The method comprises: acquiring a facial area of a facial image to be subjected to detection; determining facial features corresponding to the facial area; according to the facial features, determining a plurality of angle probabilities for each angle type among a plurality of angle types, wherein the plurality of angle types comprise a yaw angle, a pitch angle, and a roll angle; and according to the plurality of angle probabilities for each angle type, determining a predicted angle of each angle type of a face in said facial image relative to a photographing position. Thus, in the present application, angle probabilities respectively corresponding to a plurality of angle intervals are acquired, and a predicted angle is calculated by means of angle probabilities respectively corresponding to the plurality of angle intervals, so that the accuracy of the predicted angle is ensured. The present application is applicable to a plurality of scenarios where accurate facial angles need to be determined.

Description

Face angle prediction method, device, equipment and readable storage medium

Technical field

This application requests the priority of the Chinese patent application submitted to the China Patent Office on May 31, 2022, with the application number 202210607682.7 and the invention name "Facial Angle Prediction Method, Device, Equipment and Readable Storage Medium", and its entire content incorporated herein by reference.

The present application belongs to the field of image recognition technology, and in particular relates to a face angle detection method, device, equipment and readable storage medium.

Background technique

With the development of artificial intelligence, face recognition is increasingly used in various industries, and the image quality of the face has a greater impact on the accuracy of image recognition. Among them, the angle of the face is an important factor affecting the accuracy of image recognition.

In related technologies, by capturing the facial key point characteristics of the human face in action, the facial image is first reconstructed into a three-dimensional image, and then the three-dimensional image is mapped into a two-dimensional image, and then based on the facial movement characteristics in the two-dimensional image Perform face pose prediction. When determining that the face angle is a large-angle face, the angle can be corrected before recognition or not, to improve the accuracy of image recognition.

However, since the related technology can only predict the posture of the face at the action level through the action characteristics of the face when it is moving, it is difficult to accurately predict the face angle, and cannot be applied to scenes that require prediction of face angles with high accuracy.

Technical solutions

This application provides a face angle prediction method, device, equipment and readable storage medium, which can avoid the difficulty of predicting only the approximate posture of a face and the difficulty in accurately predicting the face angle, and can adapt to the relatively low accuracy of face angle prediction. High scene.

In the first aspect, this application provides a face angle prediction method, including:

Obtain the face area of the face image to be tested;

Determine the facial features corresponding to the facial area;

According to the facial features, multiple angle probabilities of multiple angle types are determined. The multiple angle probabilities of each angle type respectively correspond to multiple angle intervals of each angle type. The multiple angle types include yaw. angle, pitch and roll angles;

According to the multiple angle probabilities of each angle type, the predicted angle of each angle type of the face in the face image to be measured relative to the shooting position is determined.

This application determines the multiple angle probabilities of multiple angle types through the facial features corresponding to the face area, and then determines the relative shooting position of the face in the face image to be measured based on the multiple angle probabilities of each angle type. The predicted angle for each angle type. Since the obtained multiple angle probabilities are angle probabilities corresponding to multiple angle intervals, the corresponding prediction angle is calculated through the angle probabilities corresponding to multiple angle intervals, ensuring the accuracy of the prediction angle and avoiding the situation where it is difficult to accurately predict the face angle. , which can be adapted to scenes with high accuracy in face angle prediction.

In a second aspect, this application provides a face angle prediction device, which is used to perform the method in the above-mentioned first aspect or any possible implementation of the first aspect. Specifically, the device may include:

The acquisition module is used to obtain the face area of the face image to be tested;

The first determination module is used to determine the facial features corresponding to the facial area;

The second determination module is used to determine multiple angle probabilities of multiple angle types according to the facial features, and the multiple angle probabilities of each angle type respectively correspond to multiple angle intervals of each angle type, said Multiple angle types including yaw, pitch, and roll angles;

The third determination module is configured to determine the predicted angle of each angle type of the human face in the face image to be measured relative to the shooting position based on multiple angle probabilities of each angle type.

In a third aspect, the present application provides an electronic device, which includes a memory and a processor. The memory is used to store instructions; the processor executes the instructions stored in the memory, so that the device performs the face angle prediction method in the first aspect or any possible implementation of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a computer, they cause the computer to execute the first aspect or any possible implementation of the first aspect. Face angle prediction method.

A fifth aspect provides a computer program product containing instructions that, when run on a device, cause the device to execute the face angle prediction method of the first aspect or any possible implementation of the first aspect.

It can be understood that the beneficial effects of the above-mentioned second aspect to the fifth aspect can be referred to the relevant description in the above-mentioned first aspect, and will not be described again here.

Description of the drawings

In order to explain the technical solutions in this application more clearly, the drawings needed to be used in the embodiments or description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only some implementations of this application. For example, for those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

Figure 1 is a schematic flowchart of a face angle prediction method provided by an embodiment of the present application;

Figure 2 is a schematic flow chart of a face angle prediction method provided by an embodiment of the present application;

Figure 3 is a schematic flow chart of a face angle prediction method provided by an embodiment of the present application;

Figure 4 is a schematic flowchart of a face angle prediction method provided by an embodiment of the present application;

Figure 5 is a schematic structural diagram of a face angle prediction device provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Embodiments of the invention

In the following description, specific details, such as specific system structures and technologies, are provided for purposes of explanation and not limitation, in order to provide a thorough understanding of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.

It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms “including,” “includes,” “having,” and variations thereof all mean “including but not limited to,” unless otherwise specifically emphasized.

This application provides a face angle prediction method, device, equipment and readable storage medium. The method can be implemented through recognition equipment and applied in access control recognition, missing person search, case investigation, intelligent security and other scenarios.

Among them, the face angle includes three angle types of the face relative to the shooting position, and the three angle types are pitch angle, yaw angle and roll angle respectively.

Among them, the recognition device refers to the device used by users to predict face angles. Identification devices can be access control devices, smartphones, desktop computers, laptops, tablets, wearable devices, handheld devices, vehicle-mounted devices, servers, etc. The embodiments of this application do not place any restrictions on the specific types of identification devices.

The identification device may include display hardware, or may have an external display.

Regarding the above scenario, the following is an example to illustrate the application of the face angle prediction method:

1. When using the face angle prediction method in an access control recognition scenario, the recognition device can predict the face angle based on the face in the image, and determine whether the next action can be taken based on the predicted face angle. For example, when When the angle of the face is too large, a prompt message such as "The angle of the face is too large and cannot be recognized" will be displayed on the display screen of the access control device.

2. When using the face angle prediction method in the missing person search scenario, it is used to input the face image into the recognition device and predict the face angle corresponding to the face in the face image. The face obtained through prediction The angle determines whether the next step can be taken, for example, determining whether the recognition device can recognize the face, determining whether the face image matches the image in the missing person image library, and then determining whether it is a missing person.

Alternatively, the recognition device can be connected through communication with the surveillance camera, and the recognition device can predict the face angle corresponding to the face in the image by acquiring the image captured by the surveillance camera.

Based on the above scenario description, the face angle prediction method provided by the embodiments of the present application will be described in detail below, taking the recognition device as an example and combining the drawings and application scenarios.

Please refer to FIG. 1 , which shows a schematic flowchart of a face angle prediction method provided by an embodiment of the present application.

As shown in Figure 1, the face angle prediction method provided by this application can include:

S101. Obtain the face area of the face image to be measured.

The face image to be tested can be directly given by the user, or it can be extracted from video data collected by image collection equipment such as surveillance cameras and video cameras.

The face area refers to the area containing faces in the face image to be measured.

In some embodiments, the face area is obtained by performing face detection on the face image to be tested, obtaining the first detection window, and intercepting the image in the first detection window.

Optionally, the recognition device can perform an external expansion process on the first detection window to obtain an expanded second detection window, and intercept an area corresponding to the size of the second detection window as the face area.

It can be understood that the detection window refers to the wire frame from which the face in the face image to be detected can be extracted.

Among them, the face detection algorithm can be used to detect the face to be tested.

The face detection algorithm can be stored in the storage device.

The storage device can communicate with the recognition device, so that the recognition device can retrieve the face detection algorithm from the storage device. This application does not limit the storage method and specific type of storage devices.

In some embodiments, the YOLO (you only look once) algorithm is used for face detection. The YOLO algorithm is an object recognition and positioning algorithm based on deep neural networks. Its biggest feature is its fast running speed.

In a specific embodiment, it is assumed that the identification device is an access control device, and the access control device includes a camera. When a person's face is close to the camera of the access control device, the camera captures a face image. The access control device uses a face detection algorithm to detect the face image and obtain the face area of the face image to be measured.

In another specific embodiment, it is assumed that the recognition device is a mobile phone, and the mobile phone has a recognition applet. The mobile phone communicates with the surveillance camera through the recognition applet to obtain images captured by the surveillance camera. The recognition applet can use the face detection algorithm to detect faces in images captured by surveillance cameras and obtain the face area of the face image to be detected.

S102. Determine the facial features corresponding to the facial area.

Based on S101, the recognition device can obtain the face area. Therefore, the recognition device can perform feature extraction on the face area and obtain the facial features corresponding to the face area.

In some embodiments, the recognition device outputs the facial features by inputting the facial region into the backbone network of the facial angle recognition model.

The backbone network is used to extract facial features from facial images.

Among them, the backbone network is stored in advance in a storage device that communicates with the identification device.

In a specific embodiment, it is assumed that the identification device is an access control device. When a person's face is close to the camera of the access control device, the camera captures the face image. The access control device performs face detection on the face image through the face detection algorithm. After obtaining the face area of the face image, it calls the backbone network to detect the face. Feature extraction is performed on the region to obtain facial features.

In another specific embodiment, it is assumed that the identification device is a mobile phone, and the mobile phone has an identification applet. The recognition applet performs face detection on the image through the face detection algorithm. After obtaining the face area corresponding to the image, it calls the backbone network to extract features of the face area to obtain the face features.

S103. Determine multiple angle probabilities of multiple angle types according to the facial features.

Among them, various angle types include yaw angle, pitch angle and roll angle.

Multiple angle probabilities for each angle type respectively correspond to multiple angle intervals for each angle type.

Multiple angle intervals refer to multiple angle intervals obtained by dividing the angle ranges of yaw angle, pitch angle and roll angle according to preset rules.

In some embodiments, the prediction rule is to divide the angle range into intervals every 5 degrees.

For example, the angle range of yaw angle and pitch angle is [-90, 90]. The angle range of yaw angle and pitch angle is divided into intervals every 5 degrees, resulting in 36 angle intervals respectively.

The 36 angle intervals of the yaw angle are [-90, -85), [-85, -80)... (80, 85], (85, 90].

The 36 angle intervals of the pitch angle are [-90, -85), [-85, -80)... (80, 85], (85, 90].

For example, the angle range of the roll angle is [-180, 180]. The angle range of the roll angle is divided into intervals every 5 degrees, resulting in 72 angle intervals.

The 72 angle intervals of the roll angle are [-180, -175), [-175, -170), [-170, -165)... (165, 170], (170, 175], (175, 180 ].

In some embodiments, the recognition device inputs the facial features into the fully connected classification network of the facial angle recognition model and outputs multiple angle probabilities for each angle type.

Among them, the fully connected classification network is used to predict the angle probabilities corresponding to the facial features in multiple angle intervals of each angle type.

The backbone network and fully connected classification network serve as face angle recognition models and are pre-stored in a storage device that communicates with the recognition device.

Specifically, the fully connected classification network is connected to the output end of the backbone network. After the recognition device backbone network extracts the facial features of the face image to be tested, the facial features are input to the fully connected classification network for angle probability prediction.

Among them, the fully connected classification network includes three fully connected layers, namely the first fully connected layer, the second fully connected layer and the third fully connected layer. The first fully connected layer and the second fully connected layer are connected to 36 nodes respectively, and the third fully connected layer is connected to 72 pivots.

It can be understood that the 36 nodes of the first fully connected layer and the second fully connected layer are used to predict the angular probability of the yaw angle and the pitch angle in their 36 angle intervals, and the 72 nodes of the third fully connected layer are used It is used to predict the angle probability of roll angle in its 72 angle intervals.

Therefore, there are 36 multiple-angle probabilities for each of the yaw angle and the pitch angle, and there are 72 multiple-angle probabilities for the roll angle.

In a specific embodiment, it is assumed that the identification device is an access control device. After acquiring the facial features, the access control device predicts the angle probabilities of the yaw angle and the pitch angle in its 36 angle intervals, and predicts the angle probability of the roll angle in its 72 angle intervals based on the facial features.

In another specific embodiment, it is assumed that the identification device is a mobile phone, and the mobile phone has an identification applet. After obtaining the facial features, the recognition applet predicts the angle probabilities of the yaw angle and the pitch angle in its 36 angle intervals, and predicts the angle probability of the roll angle in its 72 angle intervals based on the facial features.

S104. Based on the multiple angle probabilities of each angle type, determine the predicted angle of each angle type of the face in the face image to be measured relative to the shooting position.

In some embodiments, for each angle type, the relative angle of the face in the face image to be measured is determined based on multiple angle probabilities, the number of angle intervals, and the intermediate angle of each angle interval. The angle of the shooting position.

Among them, the middle angle of each angle interval refers to every five angles being the middle angle in an angle interval. For example, if a certain angle range is [0, 5), then the corresponding intermediate angle is 2.5 degrees.

The above calculation formula for determining the angle of the face in the face image to be measured relative to the shooting position is:

Predicted angles for each angle type =

Among them, n represents the number of angle intervals, Represents the middle angle of the i-th angle interval, Represents the angle probability of the i-th angle interval.

In a specific embodiment, it is assumed that the identification device is an access control device, and the access control device includes a display screen and a camera. After predicting the angular probability of the yaw angle and the pitch angle in its 36 angle intervals based on the facial features, and predicting the angular probability of the roll angle in its 72 angle interval, the access control device predicts the angular probability of the yaw angle and pitch angle in its 36 angle interval based on the facial features. Calculate the angle probability of each angle interval corresponding to the value of each angle interval, calculate the value corresponding to each interval based on the angle probability of the roll angle in its 72 angle intervals, and calculate the final face based on the value corresponding to each interval prediction angle. The access control device can determine whether the next action can be taken based on the predicted angle of the face. For example, when the face angle is too large, a prompt message such as "The face angle is too large to be recognized" will be displayed on the display of the access control device. .

In another specific embodiment, it is assumed that the identification device is a mobile phone, and the mobile phone has an identification applet. The recognition applet determines the corresponding facial features based on the facial features, predicts the angle probabilities of the yaw angle and the pitch angle in its 36 angle intervals, and predicts the angle probability of the roll angle in its 72 angle intervals based on the facial features. Finally, calculate the corresponding value of each angle interval based on the angle probability of the yaw angle and pitch angle in its 36 angle intervals, calculate the corresponding value of each interval based on the angle probability of the roll angle in its 72 angle intervals, and Calculate the final predicted angle of the face based on the value corresponding to each interval. The recognition applet can use the predicted angle of the face to determine whether the next action can be taken. For example, when the angle of the face is too large, the face image to be measured is corrected.

The face angle prediction method provided by this application obtains face features based on the face area of the face image to be measured, and then determines multiple angle probabilities for each of the multiple angle types based on the face features. Finally, based on each angle type The multiple angle probabilities are used to determine the predicted angle of each angle type of the face in the face image to be measured relative to the shooting position. Therefore, for each angle type, since the obtained multiple angle probabilities are angle probabilities corresponding to multiple angle intervals, the predicted angle is calculated based on the angle probabilities corresponding to multiple angle intervals, ensuring the accuracy of the predicted angle.

Based on the description of the S103 embodiment shown in Figure 2 above, the recognition device can obtain the maximum value of multiple angle probabilities of the roll angle. However, when the maximum value corresponds to the preset mapping angle interval, the calculated predicted angle of the roll angle is inaccurate. , can be processed in a variety of ways to ensure a more accurate prediction angle.

Among them, the range of multiple angle intervals of the roll angle is [-180, 180].

The mapping angle interval includes multiple angle intervals corresponding to [-180, -90) or multiple angle intervals corresponding to (90, 180].

Next, with reference to Figure 2, the specific implementation process of the face angle prediction method of this application is introduced in detail.

Please refer to Figure 2. Figure 2 shows a schematic flow chart of a face angle prediction method provided by an embodiment of the present application.

As shown in Figure 2, the face angle prediction method provided by this application may include:

S201. For the roll angle, determine the maximum probability angle interval.

The maximum probability angle interval is the angle interval corresponding to the maximum value among multiple angle probabilities.

For example, the maximum value among multiple angle probabilities for roll angle corresponds to the angle interval [-175, -170).

S202. When the maximum probability angle interval is within a preset mapping angle interval, linearly map multiple angle probabilities of the roll angle to obtain multiple mapped angle probabilities.

Since the angle range of the roll angle is [-180, 180], which is a circle, based on the periodicity of the roll angle, the angle interval of the edge corresponding to the maximum probability angle interval is [-180, -90) or (90 , 180] is within the angle range of the corresponding edge, the roll angle of the face is considered to be larger. When the roll angle of the face is larger, the recognition device calculates the roll angle according to the method shown in S103 in Figure 1 The predicted angle is inaccurate.

Therefore, when the recognition device determines that the maximum probability angle interval corresponding to the roll angle is in the angle interval of [-90, 90], it can calculate the predicted angle of the roll angle according to the method shown in S103 in Figure 1.

However, when the recognition device determines that the maximum probability angle interval corresponding to the roll angle is within the angle interval corresponding to [-180, -90) or the angle interval corresponding to (90, 180], the recognition device needs to determine the multiple angle probabilities of the roll angle. Perform linear mapping to obtain multiple angle probabilities after mapping.

The recognition device linearly maps multiple angle probabilities so that when the maximum probability angle interval corresponding to the roll angle is in the angle interval of [-180, 90) or (90, 180], the accurate angle value of the roll angle can be obtained .

In some embodiments, linear mapping refers to:

Replace the angle probability corresponding to the angle interval greater than or equal to 0 degrees and less than or equal to 180 degrees with the mapped angle probability corresponding to the angle interval greater than or equal to -180 degrees and less than 0 degrees;

And, replace the angle probabilities corresponding to the angle intervals of less than or equal to 0 degrees and greater than or equal to -180 degrees with the mapped angle probabilities corresponding to the angle intervals of less than or equal to 180 degrees and greater than 0 degrees.

For example, when the maximum value among the multiple angle probabilities of the roll angle corresponds to the angle interval [-175, -170), if the predicted angle of the roll angle is calculated directly according to the method shown in S103 in Figure 1, the predicted angle value obtained It may be -103.5 degrees. Obviously, -103.5 degrees is not in the angle interval [-175, -170), which is unreasonable.

Therefore, the recognition device needs to first map multiple angle probabilities of the roll angle, for example, map the angle probability corresponding to the angle interval [-175, -170) to the angle interval (-10, -5], that is, Replace the angle probability corresponding to the angle interval [-175, -170) with the angle probability corresponding to the angle interval (-10, -5).

S203. Determine the mapping angle according to the mapped multiple angle probabilities, the number of angle intervals of the roll angle, and the intermediate angle of each angle interval.

For example, according to the angle probability corresponding to the angle interval (-10, -5], the calculated predicted angle value (mapping angle) can be -5 degrees. Obviously -5 degrees is in the angle interval (-10, -5]. This is reasonable.

S204: Perform inverse linear mapping on the mapping angle to obtain the predicted angle of the roll angle.

Inverse linear mapping refers to mapping the predicted angle value (mapping angle) calculated based on the angle probability corresponding to the angle interval (-10, -5) to the angle interval [-175, -170).

For example, when the angle value obtained based on S203 is -5 degrees, -5 degrees are de-reflected, and the obtained angle value is -175 degrees, which is in the angle interval [-175, 170), which is reasonable.

In this application, for the roll angle, when the identification device determines that the maximum value of multiple angle probabilities corresponds to the preset mapping interval, the multiple angle probabilities are linearly mapped to obtain the mapped multiple angle probabilities, and then based on The mapping angle is determined based on the mapped multiple angle probabilities, the number of angle intervals of the roll angle and the middle angle of each angle interval, and the mapping angle is inversely mapped to predict the angle. When determining that the maximum value of multiple angle probabilities corresponds to the preset mapping interval, linear mapping is used to map the angle probabilities corresponding to the mapping angle interval, and then the mapping angle is calculated, and the predicted angle is calculated based on the mapping angle, so that more accuracy can be obtained prediction angle.

Based on the description of the embodiment S101 shown in Figure 1 above, when the recognition device determines the face area, it can expand the first detection window to obtain the second detection window, intercept the area corresponding to the size of the second detection window, and add the second detection window to the second detection window. The area corresponding to the size of the two detection windows is determined as the face area.

Next, with reference to Figure 3, the specific implementation process of the face angle prediction method of this application is introduced in detail.

Based on the description of S101 in Figure 1, the first detection window can be expanded to obtain more face information corresponding to the face image to be measured, ensuring that the final face angle accuracy is higher.

Please refer to FIG. 3 , which shows a schematic flowchart of a face angle prediction method provided by an embodiment of the present application.

As shown in Figure 3, the face angle prediction method provided by this application may include:

S301. A window with the center of the first detection window as the center and the long side of the first detection window as the third detection window.

In some embodiments, face detection is performed on the face image to be tested, and the first detection window obtained is a rectangle.

It can be understood that when the first detection window is a rectangle, the identification device can obtain the third detection window based on the length of the first detection window and the width as the side length and the center of the first detection window as the center.

In other embodiments, face detection is performed on the face image to be tested, and the first detection window obtained is a square.

It can be understood that when the first detection window is a square, step S302 can be directly performed on the first detection window according to the expansion coefficient.

Among them, when face detection is performed on the face image to be tested, the first detection window obtained is a rectangle or a square. Whether the first detection window is rectangular or square is usually determined by the distance from the camera, the facial expression or movement, the angle of the human face, and other aspects.

In a specific embodiment, it is assumed that the first detection window is a rectangle with a length of 60 pixels and a width of 40 pixels. Then, taking the center of the first detection window as the center and the length of the first detection window as the side length of the window, the third detection window obtained is a square with a side length of 60 pixels.

S302. According to the preset expansion coefficient, perform an expansion process on each side length of the third detection window to obtain a fourth detection window.

In some embodiments, the preset expansion coefficient is 0.1. Of course, the expansion coefficient can also be other values, such as 0.15, which can be set according to the actual situation, and will not be described in detail here.

In a specific embodiment, it is assumed that the side length of the third detection window is 60 pixels, and the expansion coefficient is 0.1. Then, the third detection window is expanded, and the fourth detection window obtained is a square with a side length of 66 pixels.

S303. Eliminate the side length of the fourth detection window that exceeds the corresponding side length of the face image to be measured to obtain a fifth detection window.

It can be understood that after the recognition device expands the third detection window, the obtained fourth detection window may exceed the original face image to be tested, that is, the fourth detection window exceeds the corresponding side length of the face image to be tested. .

When the recognition device determines that the fourth detection window exceeds the side length corresponding to the face image to be measured, it removes the excess side length, and obtains the fifth detection window after removal.

In a specific embodiment, it is assumed that the length of the face image to be measured is 90 pixels, the width is 60 pixels, and the side length of the fourth detection window is 66 pixels. After removing the side length corresponding to the face image to be tested, the length of the fifth detection window obtained is 66 pixels and the width is 60 pixels.

S304: A window with the center of the fifth detection window as the center and the shorter side of the fifth detection window as the second detection window.

In a specific embodiment, it is assumed that the length of the fifth detection window is 66 pixels and the width is 60 pixels. Then, taking the center of the fifth detection window as the center and the width of the fifth detection window as the side length, the obtained second detection window is a square with a side length of 60 pixels.

In this application, the identification device will take the center of the first detection window as the center and the long side of the first detection window as the third detection window. According to the preset expansion coefficient, each line of the third detection window will be The side lengths are all expanded to obtain the fourth detection window. Remove the side length of the fourth detection window that exceeds the corresponding side length of the face image to be tested, and obtain the fifth detection window. The center of the fifth detection window will be the center. The short side of the window is the second detection window. The recognition device obtains a second detection window by expanding the first detection window, and the face area selected by the face image to be tested is larger and includes more face information. Facial features are extracted from the face area, and the facial features obtained are more accurate. By predicting the angle of the face with more accurate facial features, a more accurate prediction angle can be obtained.

Based on the above description of the embodiment shown in Figure 1, this application also provides a generation process of a face angle recognition model including a backbone network and a fully connected classification network.

Next, with reference to Figure 4, the specific implementation process of generating the face angle recognition model of this application is introduced in detail.

Based on the description of S102 in Figure 1, when the recognition device obtains the facial features corresponding to the facial area, it obtains them through the backbone network in the facial angle recognition model.

Based on the description of S103 in Figure 1, when the recognition device obtains the facial features corresponding to the face area, it obtains them through the fully connected classification network in the face angle recognition model.

Among them, the generation process of the face angle recognition model can be completed by a model generation device, or it can be generated by other feasible devices, which will not be described again here.

Please refer to FIG. 4 , which shows a schematic flowchart of generating a face angle recognition model according to an embodiment of the present application.

As shown in Figure 4, the process of generating the face angle recognition model includes:

S401. Obtain a sample face image set.

The sample face image set includes multiple frames of sample face images and the real angle corresponding to each angle type of the face in each frame of the sample face image relative to the shooting position.

Optionally, the sample face image set at least includes a set of sample face images and real angles corresponding to each angle type of the face in the sample face image relative to the shooting position.

The sample face image set can be selected from an existing image data set (for example, the public data set 300W-LP), or it can be a face image captured by a camera in advance.

Among them, when capturing face images through a camera, it is necessary to use a camera with higher precision to capture the sample face from multiple angles in order to obtain sample face images from any angle.

The camera that captures face images can be a camera, a smartphone camera, a laptop camera, or a tablet camera.

Among them, the real angle corresponding to the sample face image can be obtained by using relevant sensors or by manual annotation.

S402: Perform data enhancement processing on each frame of the sample face image to obtain an enhanced sample face image.

Data enhancement processing may include one or a combination of random shearing, adding random noise, and color perturbation.

For example, each frame of sample face image is randomly cut, random noise is added, and color perturbation is processed to obtain an enhanced sample face image.

S403. Input the enhanced sample face image into the original angle recognition model, and output multiple angle probabilities for each angle type.

The original face angle recognition model includes the original backbone network and the original fully connected classification network.

Wherein, the output end of the original backbone network is connected to three fully connected layers of the original fully connected classification network. The three fully connected layers are respectively the first original fully connected layer, the second original fully connected layer and the third original fully connected layer. connection layer.

The first original fully connected layer and the second original fully connected layer are connected to 36 nodes respectively, and the third original fully connected layer is connected to 72 pivots.

The 36 nodes of the first original fully connected layer and the second original fully connected layer are used to predict the angle probability of yaw angle and pitch angle in their 36 angle intervals respectively, and the 72 nodes of the third original fully connected layer are used for prediction. The angle probability of the roll angle in its 72 angle intervals.

Taking the yaw angle as an example, after the model generation device inputs the sample image set into the original backbone network, it outputs facial features. The first original fully connected layer obtains the angles corresponding to the 36 angle intervals of the yaw angle based on the facial features. Probability.

S404. Adjust the model parameters of the original angle recognition model according to the multiple angle probabilities of each angle type and the real angle corresponding to each angle type.

In some embodiments, the model generation device first calculates a loss function based on multiple angle probabilities of each angle type and the real angle corresponding to each angle type, and then adjusts the model parameters of the original angle recognition model through the loss function.

Among them, the above loss function is the cross entropy loss function. Of course, the loss function can also be other types of loss functions, so I won’t go into details here.

S405. Determine the adjusted original angle recognition model as the face angle recognition model.

In some embodiments, the model generation device trains the original angle recognition model according to the loss function through an error backpropagation algorithm to obtain a trained face angle recognition model, and determines the trained face angle recognition model as the face angle Identify the model.

In this application, when the model generation device generates the face angle recognition model, it first obtains a sample face image, and performs data enhancement processing on each frame of the sample face image to obtain an enhanced sample face image. Then input the enhanced sample face image into the original angle recognition model, output multiple angle probabilities for each angle type, and adjust the original angle probability based on the multiple angle probabilities for each angle type and the true angle corresponding to each angle type. The model parameters of the angle recognition model determine the adjusted original angle recognition model as the face angle recognition model. Using preset rules to divide the angle ranges of the three angle types, and adjusting the original angle recognition model through the multiple angle probabilities corresponding to the three angle types and the real angles corresponding to the three angle types, more accurate predictions can be obtained Face angle recognition model for face angles.

Corresponding to the face angle prediction method described in the embodiment shown in Figure 1, this application also provides a face angle prediction device.

Next, the face angle prediction device provided by an embodiment of the present application will be described in detail with reference to FIG. 5 .

Please refer to FIG. 5 , which shows a schematic block diagram of a face angle prediction device provided by an embodiment of the present application.

As shown in FIG. 5 , a face angle prediction device provided by an embodiment of the present application includes an acquisition module 501 , a first determination module 502 , a second determination module 503 and a third determination module 504 .

The acquisition module 501 is used to acquire the face area of the face image to be tested;

The first determination module 502 is used to determine the facial features corresponding to the facial area;

The second determination module 503 is used to determine multiple angle probabilities of multiple angle types based on the facial features. The multiple angle probabilities of each angle type respectively correspond to multiple angle intervals of each angle type, so The various angle types include yaw angle, pitch angle and roll angle;

The third determination module 504 is configured to determine the predicted angle of each angle type of the human face in the face image to be measured relative to the shooting position based on multiple angle probabilities of each angle type.

In some embodiments, 504 the third determination module is specifically used for:

For each angle type, the angle of the face in the face image to be measured relative to the shooting position is determined based on multiple angle probabilities, the number of angle intervals, and the intermediate angle of each angle interval.

In some embodiments, the third determination module 504 is specifically used to:

For the roll angle, a maximum probability angle interval is determined, and the maximum probability angle interval is an angle interval corresponding to the maximum value among multiple angle probabilities;

When the maximum probability angle interval is within a preset mapping angle interval, perform linear mapping on multiple angle probabilities of the roll angle to obtain multiple mapped angle probabilities;

Determine the mapping angle according to the mapped multiple angle probabilities, the number of angle intervals of the roll angle, and the intermediate angle of each angle interval;

Perform inverse linear mapping on the mapping angle to obtain the predicted angle of the roll angle.

Replace the angle probabilities corresponding to the angle intervals less than or equal to 0 degrees and greater than or equal to -180 degrees with the mapped angle probabilities corresponding to the angle intervals less than or equal to 180 degrees and greater than 0 degrees.

In some embodiments, the acquisition module 501 is specifically used to:

The method of obtaining the face area of the face image to be tested includes:

Obtain the face image to be tested;

Perform face detection on the face image to be tested to obtain a first detection window, where the first detection window includes at least part of the face image to be tested;

Perform external expansion processing on the first detection window to obtain a second detection window;

In the face image to be tested, intercept an area corresponding to the size of the second detection window;

In some embodiments, the acquisition module 501 is specifically used for:

A window with the center of the first detection window as the center and the long side of the first detection window as the third detection window;

According to the preset expansion coefficient, each side length of the third detection window is expanded to obtain a fourth detection window;

Remove the side length corresponding to the face image to be measured that exceeds the fourth detection window to obtain a fifth detection window;

A window with the center of the fifth detection window as the center and the shorter side of the fifth detection window as the second detection window.

In some embodiments, the first determination module 502 is specifically used to:

Input the face area into the backbone network of the face angle recognition model and output the face features. The backbone network is used to extract the face features in the face image;

The facial features are input into the fully connected classification network of the face angle recognition model, and multiple angle probabilities of each angle type are output. The fully connected classification network is used to predict the facial features in each angle type. The angle probabilities corresponding to the multiple angle intervals of .

In some embodiments, the model generation device is used for:

Obtain a sample face image set, which includes a multi-frame sample face image and a true angle corresponding to each angle type of the face in each frame of the sample face image relative to the shooting position;

Perform data enhancement processing on each frame of sample face image to obtain an enhanced sample face image;

Input the enhanced sample face image into the original angle recognition model and output multiple angle probabilities for each angle type. The original face angle recognition model includes the original backbone network and the original fully connected classification network;

Adjust the model parameters of the original angle recognition model according to the multiple angle probabilities of each angle type and the real angle corresponding to each angle type;

The adjusted original angle recognition model is determined as the face angle recognition model.

It should be understood that the device 500 of the present application can be implemented through an application-specific integrated circuit (application-specific integrated circuit). integrated circuit (ASIC), or programmable logic device (PLD). The above PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (field-programmable) gate array, FPGA), general array logic (generic array logic, GAL) or any combination thereof. The face angle prediction method shown in Figure 1 can also be implemented through software. When the face angle prediction method shown in Figure 1 is implemented through software, the device 500 and its respective modules can also be software modules.

Figure 6 is a schematic structural diagram of an electronic device provided by this application. As shown in Figure 6, the device 600 includes a processor 601, a memory 602, a communication interface 603 and a bus 604. Among them, the processor 601, the memory 602, and the communication interface 603 communicate through the bus 604. Communication can also be achieved through other means such as wireless transmission. The memory 602 is used to store instructions, and the processor 601 is used to execute the instructions stored in the memory 602. The memory 602 stores program code 6021, and the processor 601 can call the program code 6021 stored in the memory 602 to execute the face angle prediction method shown in Figure 2.

It should be understood that in this application, the processor 601 may be a CPU, and the processor 601 may also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or Other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor, etc.

The memory 602 may include read-only memory and random access memory and provides instructions and data to the processor 601. Memory 602 may also include non-volatile random access memory. The memory 602 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be a read-only memory (read-only memory). memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (random access memory (RAM), which serves as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

In addition to a data bus, the bus 604 may also include a power bus, a control bus, a status signal bus, etc. However, for clarity of illustration, the various buses are labeled bus 604 in FIG. 6 .

It should be understood that the electronic device 600 according to the present application may correspond to the device 500 in the present application, and may correspond to the device in the method shown in FIG. 1 of the present application. When the device 600 corresponds to the device in the method shown in FIG. 2, The above and other operations and/or functions of each module in the device 600 are respectively intended to implement the operating steps of the method performed by the device in Figure 2. For the sake of brevity, they will not be described again here.

This application also provides a computer-readable storage medium that stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.

The present application provides a computer program product. When the computer program product is run on an electronic device, the steps in each of the above method embodiments can be implemented when the electronic device is executed.

It should be understood that the sequence number of each step in the above embodiment does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of this application.

It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For details of their specific functions and technical effects, please refer to the method embodiments section. No further details will be given.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the above device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the above embodiments, each embodiment is described with its own emphasis. For parts that are not detailed or documented in a certain embodiment, please refer to the relevant descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed devices/network devices and methods can be implemented in other ways. For example, the device/network equipment embodiments described above are only illustrative. For example, the division of the above modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or units. Components may be combined or may be integrated into another system, or some features may be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this application.

The above-described embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in the embodiments of this application, and should be included in within the protection scope of this application.

Claims

A face angle prediction method, which is characterized by including:

Obtain the face area of the face image to be tested;

Determine the facial features corresponding to the facial area;

According to the facial features, multiple angle probabilities of multiple angle types are determined. The multiple angle probabilities of each angle type respectively correspond to multiple angle intervals of each angle type. The multiple angle types include yaw. angle, pitch and roll angles;

According to the multiple angle probabilities of each angle type, the predicted angle of each angle type of the face in the face image to be measured relative to the shooting position is determined.
The method of claim 1, wherein the predicted angle of each angle type of the face in the face image to be measured relative to the shooting position is determined based on multiple angle probabilities of each angle type. ,include:

For each angle type, the angle of the face in the face image to be measured relative to the shooting position is determined based on multiple angle probabilities, the number of angle intervals, and the intermediate angle of each angle interval.
The method of claim 2, wherein for the roll angle, the face in the face image to be measured is determined relative to the shooting position based on multiple angle probabilities of each angle type. Predicted angles for each angle type, including:

Determine the maximum probability angle interval, which is the angle interval corresponding to the maximum value among the multiple angle probabilities;

When the maximum probability angle interval is within a preset mapping angle interval, perform linear mapping on multiple angle probabilities of the roll angle to obtain multiple mapped angle probabilities;

Determine the mapping angle according to the mapped multiple angle probabilities, the number of angle intervals of the roll angle, and the intermediate angle of each angle interval;

Perform inverse linear mapping on the mapping angle to obtain the predicted angle of the roll angle.
The method according to claim 3, wherein the range of the multiple angle intervals of the roll angle is greater than or equal to -180 degrees and less than or equal to 180 degrees, and the range of the mapping angle intervals is greater than or equal to -180 degrees and less than or equal to 180 degrees. -90 degrees or greater than or equal to 90 degrees and less than or equal to 180 degrees in the angle interval. The multiple angular probabilities of the roll angle are linearly mapped to obtain the mapped angular probabilities, including:

Replace the angle probability corresponding to the angle interval greater than or equal to 0 degrees and less than or equal to 180 degrees with the mapped angle probability corresponding to the angle interval greater than or equal to -180 degrees and less than 0 degrees;

Replace the angle probabilities corresponding to the angle intervals less than or equal to 0 degrees and greater than or equal to -180 degrees with the mapped angle probabilities corresponding to the angle intervals less than or equal to 180 degrees and greater than 0 degrees.
The method according to any one of claims 1 to 4, characterized in that obtaining the face area of the face image to be measured includes:

Obtain the face image to be tested;

Perform face detection on the face image to be tested to obtain a first detection window, where the first detection window includes at least part of the face image to be tested;

Perform external expansion processing on the first detection window to obtain a second detection window;

In the face image to be tested, intercept an area corresponding to the size of the second detection window;

An area corresponding to the size of the second detection window is determined as the face area.
The method according to claim 5, characterized in that, performing an external expansion process on the first detection window to obtain a second detection window includes:

A window with the center of the first detection window as the center and the long side of the first detection window as the third detection window;

According to the preset expansion coefficient, each side length of the third detection window is expanded to obtain a fourth detection window;

Remove the side length corresponding to the face image to be measured that exceeds the fourth detection window to obtain a fifth detection window;

A window with the center of the fifth detection window as the center and the shorter side of the fifth detection window as the second detection window.
The method according to any one of claims 1-4, characterized in that,

Determining the facial features corresponding to the facial area includes:

Input the face area into the backbone network of the face angle recognition model and output the face features. The backbone network is used to extract the face features in the face image;

Determining multiple angle probabilities for multiple angle types based on the facial features includes:

The facial features are input into the fully connected classification network of the face angle recognition model, and multiple angle probabilities of each angle type are output. The fully connected classification network is used to predict the facial features in each angle type. The angle probabilities corresponding to the multiple angle intervals of .
The method of claim 7, wherein the process of generating the face angle recognition model includes:

Obtain a sample face image set, which includes a multi-frame sample face image and a true angle corresponding to each angle type of the face in each frame of the sample face image relative to the shooting position;

Perform data enhancement processing on each frame of sample face image to obtain an enhanced sample face image;

Input the enhanced sample face image into the original angle recognition model and output multiple angle probabilities for each angle type. The original face angle recognition model includes the original backbone network and the original fully connected classification network;

Adjust the model parameters of the original angle recognition model according to the multiple angle probabilities of each angle type and the real angle corresponding to each angle type;

The adjusted original angle recognition model is determined as the face angle recognition model.
A face angle prediction device, characterized by including:

The acquisition module is used to obtain the face area of the face image to be tested;

The first determination module is used to determine the facial features corresponding to the facial area;

The second determination module is used to determine multiple angle probabilities of multiple angle types according to the facial features, and the multiple angle probabilities of each angle type respectively correspond to multiple angle intervals of each angle type, said Multiple angle types including yaw, pitch, and roll angles;

The third determination module is configured to determine the predicted angle of each angle type of the human face in the face image to be measured relative to the shooting position based on multiple angle probabilities of each angle type.
An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements claims 1 to 1 The method described in any one of 7.