CN112560685A

CN112560685A - Facial expression recognition method and device and storage medium

Info

Publication number: CN112560685A
Application number: CN202011488918.7A
Authority: CN
Inventors: 李剑; 苟巍; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-03-26

Abstract

The method comprises the steps of obtaining a target face image of a target to be detected, extracting face key points of the image, determining the position relation among the key points, and inputting the face key points and the position relation into a face expression recognition model, so that the face expression of the target to be detected is recognized by the face expression recognition model. The facial expression recognition method and device based on the face key points solve the problem of information redundancy of face pictures, improve accuracy of facial expression recognition, are high in data input speed, and reduce facial expression recognition time of a facial expression recognition model. In addition, the position relation is input into the preset facial expression recognition model, so that the information of the target to be detected input into the preset facial expression recognition model is richer, and the accuracy of the subsequent facial expression recognition result of the target to be detected is further improved.

Description

Facial expression recognition method and device and storage medium

Technical Field

The present application relates to image processing technologies, and in particular, to a method and an apparatus for recognizing facial expressions, and a storage medium.

Background

With the continuous development of economic technology, people's mode of going out is more and more diversified, for example, go out through operation vehicles such as taxi, net car of appointment etc.. However, the above travel mode brings convenience to people and also creates some new problems. For example, in the case of a net appointment car, a driver may drive the car in an abnormal state, for example, in an angry state, during driving of the car, so that passengers may have a certain safety hazard during daily trips using the net appointment car.

In order to solve the above problem, in the related art, a terminal device generally acquires a facial image of a driver, and performs facial expression recognition based on the acquired image. For example, taking a terminal device as a mobile phone of a driver as an example, during driving of a vehicle, the mobile phone acquires a facial image of the driver, and then recognizes an expression of the driver in the vehicle based on the acquired image, thereby determining whether the driver drives the vehicle in an abnormal state.

However, the facial expression recognition is performed by using the facial image, and information in the facial image is relatively redundant, so that the accuracy of facial expression recognition is relatively low, potential safety hazards cannot be found in time, and correct intervention cannot be performed on problems occurring in vehicle driving.

Disclosure of Invention

In order to solve the problems in the prior art, the application provides a facial expression recognition method, a device and a storage medium.

In a first aspect, an embodiment of the present application provides a facial expression recognition method, where the method includes:

acquiring a target face image of a target to be detected;

extracting a target face key point of the target face image;

determining the position relation between the key points of the target face;

inputting the target face key points and the position relationship into a preset face expression recognition model, wherein the preset face expression recognition model is obtained by training reference face key points, the position relationship among the reference face key points and reference face expressions corresponding to the reference face key points;

and obtaining the target facial expression of the target to be detected according to the output of the preset facial expression recognition model.

In a possible implementation manner, the extracting target face key points of the target face image includes:

inputting the target face image into a preset face key point extraction model, wherein the preset face key point extraction model is obtained by training a reference face image and a reference face key point corresponding to the reference face image;

and obtaining the target face key points according to the output of the preset face key point extraction model.

In one possible implementation manner, the target face key points include feature points of an eyebrow region, an eye region, a nose region, a mouth region and a face contour of the target to be detected;

inputting the target face key points and the position relationship into a preset facial expression recognition model, wherein the method comprises the following steps:

inputting the feature points of the eyebrow area, the eye area, the nose area, the mouth area and the face contour of the target to be detected and the position relationship among the feature points into the preset facial expression recognition model, wherein the preset facial expression recognition model is obtained by training the reference facial key points, the position relationship among the reference facial key points and the reference facial expression, and the reference facial key points comprise the feature points of the eyebrow area, the eye area, the nose area, the mouth area and the face contour of the reference target.

In a possible implementation manner, before the inputting the feature points of the eyebrow area, the eye area, the nose area, the mouth area, and the face contour of the target to be detected, and the position relationship between the feature points into the preset facial expression recognition model, the method further includes:

calculating Euclidean distance between each characteristic point and other characteristic points in the characteristic points of the eyebrow area, the eye area, the nose area, the mouth area and the face contour of the target to be detected;

and determining the position relation between each feature point and the other feature points according to the Euclidean distance.

In a possible implementation manner, the acquiring a target face image of a target to be detected includes:

performing image preprocessing on at least one face image of the target to be detected, wherein the image preprocessing comprises one or more of screening out a face image on the front side, adjusting the size of the image to be a preset size and image normalization;

and obtaining the target face image according to the image preprocessing result.

In a possible implementation manner, before the inputting the target face key point and the position relationship into a preset facial expression recognition model, the method further includes:

and carrying out normalization processing on the key points of the target face.

In one possible implementation manner, the preset facial expression recognition model includes: a full connectivity layer and softmax classifier;

the full connection layer is used for carrying out feature fusion on the target face key points and the position relation to generate fused target face features;

and the softmax classifier is connected with the full connection layer and is used for outputting the target facial expression according to the fused target facial features.

In a possible implementation manner, the preset facial expression recognition model further includes: a plurality of convolutional layers;

the plurality of convolution layers are used for performing convolution operation on the target face key points and the position relation so as to output target face characteristics of the target to be detected;

and the full connection layer is connected with the plurality of convolution layers and is used for acquiring the target human face features and performing feature fusion on the target human face features to generate the fused target human face features.

In a possible implementation manner, before the image preprocessing is performed on the at least one face image of the target to be detected, the method further includes:

and receiving the at least one face image sent by a preset image acquisition device.

and receiving the at least one face image sent by the terminal equipment of the target to be detected.

In a second aspect, an embodiment of the present application provides a facial expression recognition apparatus, where the apparatus includes:

the image acquisition module is used for acquiring a target face image of a target to be detected;

the key point extraction module is used for extracting target face key points of the target face image;

the relationship determination module is used for determining the position relationship among the key points of the target face;

a key point input module, configured to input the target face key points and the position relationships into a preset facial expression recognition model, where the preset facial expression recognition model is obtained through training reference face key points, position relationships between the reference face key points, and reference facial expressions corresponding to the reference face key points;

and the expression obtaining module is used for obtaining the target facial expression of the target to be detected according to the output of the preset facial expression recognition model.

In a possible implementation manner, the keypoint extraction module is specifically configured to:

the key point input module is specifically configured to:

In a possible implementation manner, the key point input module is specifically configured to:

In a possible implementation manner, the image obtaining module is specifically configured to:

In a possible implementation manner, the keypoint input module is further configured to:

and carrying out normalization processing on the key points of the target face.

In a possible implementation manner, the image obtaining module is further configured to:

In a third aspect, an embodiment of the present application provides a facial expression recognition apparatus, including:

a processor;

a memory; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program causes a server to execute the method according to the first aspect.

In a fifth aspect, the present application provides a computer program product, which includes computer instructions for executing the method of the first aspect by a processor.

According to the facial expression recognition method, the device and the storage medium, the target facial image of the target to be detected is obtained, then the facial key points of the image are extracted, the position relation among the facial key points is determined, the facial key points and the position relation are input into a facial expression recognition model, and therefore the facial expression recognition model is used for recognizing the facial expression of the target to be detected. The embodiment of the application identifies the facial expressions based on the facial key points, solves the problem of information redundancy of facial pictures, and improves the accuracy rate of facial expression identification. In addition, when the facial expression recognition is carried out, only the target face key points and the position relations are needed to be input into the facial expression recognition model, and compared with the situation that a face image is input into the facial expression recognition model, the data input speed is high, and the facial expression recognition time of the facial expression recognition model is shortened. In addition, the feature points are input into the preset facial expression recognition model, and the position relation is also input into the preset facial expression recognition model, so that the information of the target to be detected input into the preset facial expression recognition model is richer, and the accuracy of the subsequent facial expression recognition result of the target to be detected is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of a facial expression recognition system according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a facial expression recognition method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a target face key point according to an embodiment of the present application;

fig. 4 is a schematic diagram of a preset facial expression recognition model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a class 7 human facial expression provided by an embodiment of the present application;

fig. 6 is a schematic diagram of another preset facial expression recognition model according to an embodiment of the present application;

fig. 7 is a schematic diagram of a process of performing facial expression recognition by using a preset facial expression recognition model according to an embodiment of the present application;

fig. 8 is a schematic diagram of a training process of a preset face key point extraction model according to an embodiment of the present application;

fig. 9 is a schematic flowchart of another facial expression recognition method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a facial expression recognition apparatus according to an embodiment of the present application;

fig. 11 shows a schematic diagram of a possible structure of the facial expression recognition device of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and "fourth," if any, in the description and claims of this application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

When a user operates a vehicle to travel through a taxi, a net appointment car and the like, some problems exist, for example, in the case of the net appointment car, a driver may drive the vehicle in an abnormal state, for example, in an angry state during the driving process of the vehicle. Therefore, certain potential safety hazards exist in the process of daily travel of passengers using the net car appointment. In order to solve the above problem, in the related art, a terminal device generally acquires a facial image of a driver, and performs facial expression recognition based on the acquired image. For example, taking a terminal device as a mobile phone of a driver as an example, during driving of a vehicle, the mobile phone acquires a facial image of the driver, and then recognizes an expression of the driver in the vehicle based on the acquired image, thereby determining whether the driver drives the vehicle in an abnormal state.

However, the images collected by the terminal device may contain more redundant information, for example, the terminal device collects other information in the vehicle, such as seats, decorations, and the like in the vehicle when the images are collected. The redundant information generates interference in the process of carrying out facial expression recognition on the basis of the image by the terminal equipment, so that the error rate of carrying out facial expression recognition on the basis of the image by the terminal equipment is higher, and therefore whether a driver drives a vehicle in an abnormal state cannot be determined, and potential safety hazards cannot be found in time.

Therefore, the embodiment of the present application provides a facial expression recognition method, which extracts a facial key point of a facial image of an object to be detected, for example, a facial key point of a facial image of a driver in a vehicle, so as to recognize a facial expression of the driver using the facial key point, and determine whether the driver drives the vehicle in an abnormal state, for example, whether the driver drives the vehicle in an angry state. The embodiment of the application identifies the facial expressions based on the facial key points, solves the problem of information redundancy of facial pictures, and improves the accuracy rate of facial expression identification.

In the embodiment of the present application, taking a net appointment as an example, the target to be detected may be a driver in the net appointment. The human face image of the driver in the network appointment vehicle can be acquired through a preset image acquisition device, such as a camera, carried on the network appointment vehicle. Here, taking a preset image acquisition device as the camera as an example, the information such as the number, position, and type of the cameras may be uniformly specified by the network appointment management server. The network car booking management server is used for managing each network car booking, for example, auditing the qualification of the network car booking and a network car booking driver, and monitoring the condition of receiving orders of the network car booking. The network car booking management server can check whether the installation of the camera in the network car booking meets the requirement according to the regulation of the camera, for example, whether the installation of the camera in the network car booking meets the requirement is checked through the installation image of the camera in the network car booking uploaded by a network car booking driver. If the request is met, the network appointment vehicle management server can establish connection with a camera in the network appointment vehicle. After determining that the network appointment car receives the order, the network appointment car management server can control a camera in the network appointment car to start, collect the face image of a driver in the car until the network appointment car finishes the single task, and stop collecting.

Besides, the face image can be acquired through a terminal device of the target to be detected, for example, a mobile phone of a driver. For example, after checking the qualification of the network car booking and the network car booking driver, the network car booking management server may establish a connection with a mobile phone of the network car booking driver if the checking meets the requirement, and monitor the situation of the network car booking and the order taking through the connection. The network car booking management server can send camera shooting starting prompt information to a mobile phone of a network car booking driver after determining that the network car booking receives a bill. And the network taxi appointment driver starts the mobile phone camera according to the prompt and acquires the face image of the taxi driver. Here, after the network car booking driver starts the camera according to the prompt, the camera starting information can be fed back to the network car booking management server, so that the network car booking management server can know the mobile phone state of the network car booking driver in time.

In addition, the face image can be acquired by other equipment, such as a driving recorder, a camera on a driving route and the like. For example, the network car booking management server may establish a connection with a vehicle data recorder in the network car booking, and after determining that the network car booking accepts an order, the vehicle data recorder in the network car booking may collect a face image of a driver in the vehicle. For the online car booking with a fixed driving route, the online car booking management server can also acquire the face image of a driver in the online car booking through the camera on the driving route.

Here, after acquiring the face image of the driver in the net appointment, the embodiment of the present application may extract the face key points of the face image, so as to identify the facial expression of the driver in the net appointment and determine whether the driver drives the vehicle in an abnormal state, for example, whether the driver drives the vehicle in an angry state.

The human face key point extraction, the human face expression recognition and the like can be executed by the network car booking management server. The face image is acquired by a preset image acquisition device carried on the network appointment vehicle. After the preset image acquisition device acquires the face image, the face image can be sent to the network car booking management server. The network car booking management server extracts the face key points of the face image after acquiring the face image, and identifies the face expression of a driver in the network car booking by using the face key points, so that whether the driver drives the vehicle in an abnormal state or not is determined, for example, whether the driver drives the vehicle in an angry state or not is determined. For example, if the network appointment management server identifies the facial expression of the driver as an angry expression, the facial expression identification result can be sent to the relevant supervisor, so that the supervisor can correctly intervene in the driving of the vehicle in time, for example, stop dispatching the order to the driver.

Optionally, fig. 1 is a schematic diagram of a facial expression recognition system architecture provided in an embodiment of the present application. In fig. 1, the target to be detected is taken as a driver in a car reservation on the net. The framework comprises a network car booking management server 11 and a preset image acquisition device 12.

It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the architecture for recognizing facial expressions. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.

In a specific implementation process, the network appointment management server 11 may first determine whether to control the preset image capturing device 12 in the network appointment vehicle to start. For example, the network appointment management server 11 may control the preset image capturing device 12 in the network appointment vehicle to start up to capture the face image of the driver in the vehicle after determining that the network appointment vehicle receives the order, and stop capturing until the network appointment vehicle completes the single task. Or, the network appointment management server 11 may determine whether to control the preset image capturing device 12 to start according to the order receiving evaluation condition of the network appointment driver. Illustratively, if the order receiving evaluation of the online car booking driver is poor, for example, more than two complaints are received in the last month, the online car booking management server 11 controls the preset image acquisition device 12 to start, and the facial image of the driver in the car is acquired until the online car booking completes the single task, and the acquisition is stopped.

After the preset image acquisition device 12 is started and the face image of the driver in the online car booking is acquired, the preset image acquisition device 12 can send the acquired image to the online car booking management server 11. The network appointment management server 11 may extract the face key points of the image after receiving the image, so as to recognize the facial expression of the driver in the network appointment using the face key points, and determine whether the driver drives the vehicle in an abnormal state, for example, whether the driver drives the vehicle in an angry state. The facial expression recognition based on the face key points solves the problem of information redundancy of face pictures and improves the accuracy of facial expression recognition.

In addition, when there are a lot of car appointments, the magnitude of data received by the car appointment management server is large, and the required computing resources are large, which may result in a long period for detecting the target through the car appointment management server. In order to solve the problem, in the embodiment of the present application, the face key point extraction, the facial expression recognition, and the like may be executed by the terminal device of the target to be detected, for example, by a mobile phone of a driver. For example, the preset image capturing device 12 may transmit the captured facial image of the driver in the online appointment car to the mobile phone of the driver. The driver can extract the face key points of the image through the mobile phone, and the face key points are used for identifying the face expression of the driver in the network appointment car, so that whether the driver drives the car in an abnormal state or not is determined, for example, whether the driver drives the car in an angry state or not is determined. The network car booking management server 11 may perform corresponding setting on the mobile phone of the driver, for example, setting: when the facial expression of the driver in the online car booking recognized by the mobile phone is angry, the facial expression recognition result is immediately reported to the online car booking management server 11. In this way, the network appointment management server 11 can send the facial expression recognition result to the relevant supervisor, so that the supervisor can correctly intervene in the vehicle driving in time.

Here, in the embodiment of the application, facial expression recognition is performed in a manner of combining the terminal device and the network car booking management server, so that the computing capacity of the terminal device can be fully utilized, the computing pressure of the network car booking management server is reduced, and the processing speed of the network car booking management server is increased.

If the terminal device, for example, the mobile phone of the driver cannot perform the face key point extraction, the facial expression recognition, etc., a processing request may be sent to the online car booking management server 11. The network car-booking management server 11 may send an information acquisition request to the preset image capturing device 12 after receiving the processing request. Accordingly, the preset image capturing device 12 may transmit the captured image to the network appointment management server 11. The network car booking management server 11 extracts the face key points, recognizes the facial expressions, and the like based on the received information.

The technical solutions of the present application are described below with several embodiments as examples, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 2 is a schematic flow diagram of a facial expression recognition method according to an embodiment of the present disclosure, which may be implemented by any device for executing the facial expression recognition method, and the device may be implemented by software and/or hardware. As shown in fig. 2, on the basis of the system architecture shown in fig. 1, the facial expression recognition method provided in the embodiment of the present application includes the following steps:

s201: and acquiring a target face image of the target to be detected.

The target to be detected may be determined according to an actual situation, for example, by determining whether a driver drives the vehicle in an abnormal state during the driving of the vehicle, and the target to be detected may be a driver in the vehicle.

In the embodiment of the present application, the execution subject is the network car booking management server described above as an example. The target face image may be a face image of the target to be detected sent by the preset image acquisition device and directly received by the network car booking management server, or a received face image of the target to be detected sent by the terminal device of the target to be detected, for example, a face image sent by a mobile phone of the driver.

Here, in order to improve the accuracy of the subsequent facial expression recognition, the network appointment management server may perform image preprocessing on the received facial image. For example, the preset image capturing device may send at least one facial image of the target to be detected to the network car booking management server. Similarly, the terminal device of the target to be detected may also send at least one face image of the target to be detected to the network car appointment management server. After receiving the at least one face image, the network car booking management server may perform image preprocessing on the at least one face image, for example, screen out a front face image, adjust the size of the image to a preset size, normalize the image, and the like, so as to obtain the target face image according to an image preprocessing result. For example, the network car booking management server may screen out a front face image from the at least one face image, adjust the size of the face image to a preset size, perform normalization processing on the image, and take the processed image as the target face image. The preset size may be determined according to actual conditions, and is not particularly limited in the embodiment of the present application.

S202: and extracting the target face key points of the target face image.

For example, the network car booking management server may input the target face image into a preset face key point extraction model, and obtain the target face key point according to an output of the preset face key point extraction model. The preset face key point extraction model is obtained by training a reference face image and a reference face key point corresponding to the reference face image.

Here, the reference face image may be a face image of a reference target, and the reference face key point may be a real face key point corresponding to the reference face image. The reference target may be determined according to actual conditions, such as one or more drivers in the vehicle.

In this embodiment, the target face key points may include feature points of an eyebrow region, an eye region, a nose region, a mouth region, and a face contour of the target to be detected. As shown in fig. 3, the above target face key points may include 98 feature points (e.g., feature points numbered 0-97 in fig. 3) of an eyebrow region, an eye region, a nose region, a mouth region, and a face contour of the target to be detected.

S203: and determining the position relation among the key points of the target face.

Illustratively, the network car booking management server calculates an euclidean distance between each key point of the target face key points and other key points, and further determines a position relationship between each key point of the target face key points and other key points according to the euclidean distance.

S204: and inputting the target face key points and the position relation into a preset face expression recognition model, wherein the preset face expression recognition model is obtained by training reference face key points, the position relation among the reference face key points and reference face expressions corresponding to the reference face key points.

The reference facial expression corresponding to the reference facial key point can be a real facial expression corresponding to the reference facial key point.

Here, the preset facial expression recognition model may include a full connection layer and a softmax classifier, as exemplarily shown in fig. 4. And the full connection layer is used for carrying out feature fusion on the target face key points and the position relation so as to generate fused target face features. And the softmax classifier is connected with the full connection layer and is used for outputting the target facial expression according to the fused target facial features.

Illustratively, for example, the above target face key points include the above 98 feature points, the 98 key points are expanded in the form of "abscissa, ordinate", for example, "x 1, y1, x2, y2 … x98, y 98". The network car booking management server may further calculate an euclidean distance between each of the 98 feature points and another feature point, and determine the positional relationship based on the euclidean distance. It is assumed that facial expressions are grouped together into 7 classes (e.g., 6 classes of basic expressions: happy, sad, surprised, angry, disgust, fear, and 1 class neutral state as shown in fig. 5). Then, the dimensions of the full connection layer may perform feature fusion on the target face key points and the position relationship to generate 7 types of fused target face features. And connecting a 7-class softmax classifier, and outputting the target facial expression according to the fused target facial features.

In an embodiment of the application, in order to reduce parameters of full-connected layer processing, the preset facial expression recognition model may further include a plurality of convolutional layers. For example, as shown in fig. 6, the plurality of convolution layers perform convolution operations on the target face key points and the position relationship to output the target face features of the target to be detected. And the full connection layer is connected with the plurality of convolution layers and is used for acquiring the target human face features and performing feature fusion on the target human face features to generate the fused target human face features. Assuming that the facial expressions are totally classified into the above 7 classes, the plurality of convolutional layers may be convolutional layers with a core size of 1 × 1 (totally 7 cores, which is consistent with the number of facial expression classes).

In addition, before the target face key point and the position relationship are input into a preset facial expression recognition model, the network car booking management server can also perform preprocessing, such as normalization processing, on the target face key point, so that subsequent data are converged, and normal processing of the subsequent processing is guaranteed.

S205: and obtaining the target facial expression of the target to be detected according to the output of the preset facial expression recognition model.

For example, as shown in fig. 7, the process of performing facial expression recognition by the network car booking management server using the preset facial expression recognition model may be that, first, the network car booking management server obtains a target face image of the target to be detected, extracts target face key points of the target face image, determines a position relationship between the target face key points, inputs the target face key points and the position relationship into the preset facial expression recognition model, and further obtains a target facial expression of the target to be detected.

After obtaining the target facial expression of the target to be detected, the network appointment management server may determine whether the target to be detected, such as a driver in the vehicle, drives the vehicle in an abnormal state, such as an angry state, according to the target facial expression. For example, if the above-described network appointment management server recognizes a facial expression of a driver as an angry expression, it is determined that the driver drives the vehicle in an abnormal state. The network appointment management server can send the facial expression recognition result to relevant supervisors so that the supervisors can correctly intervene problems occurring in vehicle driving in time, for example, stop dispatching orders to the drivers.

In the embodiment of the application, the network appointment management server firstly acquires the target face image of the target to be detected, then extracts the face key points of the image, and inputs the face key points into the face expression recognition model, so that the face expression of the target to be detected is recognized by the face expression recognition model. The embodiment of the application identifies the facial expressions based on the facial key points, solves the problem of information redundancy of facial pictures, and improves the accuracy rate of facial expression identification. In addition, when the facial expression recognition is carried out, only the target face key points are required to be input into the facial expression recognition model, and compared with the method for inputting the face image into the facial expression recognition model, the data input speed is high, and the facial expression recognition time of the facial expression recognition model is shortened. In addition, the feature points are input into the preset facial expression recognition model, and the position relation is also input into the preset facial expression recognition model, so that the information of the target to be detected input into the preset facial expression recognition model is richer, and the accuracy of the subsequent facial expression recognition result of the target to be detected is further improved.

Here, before inputting the target face image into the preset face key point extraction model, the network appointment management server needs to train the preset face key point extraction model, so that the target face image is subsequently input into the trained preset face key point extraction model, and the target face key point is obtained according to the output of the preset face key point extraction model. In the training process, the network appointment management server can input the reference face image into a preset face key point extraction model, and then determine the output accuracy according to the face key points output by the preset face key point extraction model and the reference face key points corresponding to the reference face image. If the output accuracy is lower than the preset accuracy threshold, the network appointment management server can adjust the preset human face key point extraction model according to the output accuracy so as to improve the output accuracy, use the adjusted preset human face key point extraction model as a new preset human face key point extraction model, and re-execute the step of inputting the reference human face image into the preset human face key point extraction model.

If the network appointment car is taken as an example, the reference face image can be acquired by a preset image acquisition device, such as a camera, carried on the network appointment car. Alternatively, the reference face image is acquired by a terminal device of the driver, for example, acquired by a mobile phone of the driver. Or, the reference face image is acquired by a vehicle data recorder, a camera on a driving route and the like. The specific process may refer to the above-mentioned process of obtaining the target face image, and is not described herein again.

The reference face key point corresponding to the reference face image may be obtained by the network appointment management server by using a PropagationNet algorithm.

And the network appointment management server acquires the face key points output by the preset face key point extraction model after inputting the reference face image into the preset face key point extraction model, and compares the acquired face key points with the reference face key points corresponding to the reference face image. Illustratively, the similarity of both is compared. And the network appointment management server determines the output accuracy according to the similarity obtained by comparison. If the output accuracy is lower than the preset accuracy threshold, the face key point extraction effect of the preset face key point extraction model is poor, the preset face key point extraction model needs to be adjusted to improve the output accuracy, the adjusted preset face key point extraction model is used as a new preset face key point extraction model, the steps are executed again until the determined output accuracy is higher than or equal to the preset accuracy threshold, and the training is stopped. Here, the preset accuracy threshold may be set according to practical situations, for example, 90% or 95%, and the like, and this is not particularly limited in the embodiments of the present application. For example, in the training process of the preset face keypoint extraction model, as shown in fig. 8, a network appointment management server first obtains a training sample, where the training sample includes the reference face image. And then training a preset face key point extraction model by using the training sample through a network appointment management server, namely inputting the reference face image into the preset face key point extraction model, determining the output accuracy according to the face key points output by the preset face key point extraction model and the reference face key points corresponding to the reference face image, stopping training until the output accuracy is greater than or equal to a preset accuracy threshold value, and obtaining the trained preset face key point extraction model.

In addition, before inputting the target face key points into a preset face expression recognition model, the online taxi appointment management server also needs to train the preset face expression recognition model, so that the trained preset face expression recognition model is input according to the position relationship between the target face key points and the target face key points, and the face expression of the target to be detected output by the preset face expression recognition model is obtained. In the training process, the facial expression recognition model inputs reference face key points and position relations among the reference face key points, and outputs the facial expression of the driver. For example, the network car booking management server inputs the position relationship between the reference face key point and the target face key point into a preset face expression recognition model, and then determines whether the face expression of the driver output by the preset face expression recognition model is the same as the reference face expression corresponding to the reference face key point. And if not, adjusting the preset facial expression recognition model to enable the facial expression of the driver output by the preset facial expression recognition model to be the reference facial expression. Wherein, the reference facial expression may be a real facial expression of the driver.

In this embodiment of the present application, the target face key points include feature points of an eyebrow region, an eye region, a nose region, a mouth region, and a face contour of the target to be detected. And when the network appointment management server inputs the target face key points into a preset face expression recognition model, the network appointment management server also considers the position relationship among the target face key points, and thus the position relationship and the target face key points are input into the preset face expression recognition model. Fig. 9 is a schematic flowchart of another facial expression recognition method according to an embodiment of the present application. As shown in fig. 9, the method includes:

s901: and acquiring a target face image of the target to be detected.

S902: and extracting the target face key points of the target face image.

The steps S901 to S901 are the same as the steps S201 to S202, and are not described herein again.

S903: and determining the position relation among the eyebrow area, the eye area, the nose area, the mouth area and the feature points of the face contour of the target to be detected.

For example, as shown in fig. 3, the target face key points may include 98 feature points of an eyebrow region, an eye region, a nose region, a mouth region, and a face contour of the target to be detected.

The network appointment management server may calculate an euclidean distance between each of the feature points of the eyebrow region, the eye region, the nose region, the mouth region, and the face contour of the target to be detected and other feature points, and determine the positional relationship according to the euclidean distance.

For example, in fig. 3, the face contour has 33 points, the network appointment management server calculates euclidean distances between each of the 33 points and the other 32 points to obtain 528 distances in total, and determines the positional relationship between the feature points of the eyebrow region, the eye region, the nose region, the mouth region and the face contour of the target to be detected according to the euclidean distances by adding the euclidean distances calculated by the eyebrow region, the eye region, the nose region and the mouth region to obtain 1060 distances in total.

S904: and inputting feature points of the eyebrow area, the eye area, the nose area, the mouth area and the face contour of the target to be detected and the position relationship into a preset facial expression recognition model, wherein the preset facial expression recognition model is obtained by referring to key points of a human face, the position relationship among the key points of the human face and the reference facial expression, and the key points of the human face comprise the feature points of the eyebrow area, the eye area, the nose area, the mouth area and the face contour of the reference target.

Here, the network appointment management server not only inputs the feature points into the preset facial expression recognition model, but also inputs the position relationship into the preset facial expression recognition model, so that information of the target to be detected, which is input into the preset facial expression recognition model, is richer, and the accuracy of a subsequent facial expression recognition result of the target to be detected is further improved.

The position relationship between the reference face key points may be a real position relationship between the reference face key points.

Here, the preset facial expression recognition model described above is exemplified to include a full connection layer and a softmax classifier. If the target face key points comprise the 98 feature points. The above-mentioned eyebrow region, eye region, nose region, mouth region and feature points of the face contour have a total of 1060 euclidean distances. If the 98 key points are expanded in the form of "abscissa, ordinate", for example, "x 1, y1, x2, y2 … x98, y 98". As shown in fig. 5, the facial expressions are totally classified into 7 types. The dimension of the full connection layer may be 1256 × 7, and feature points of the eyebrow region, the eye region, the nose region, the mouth region and the face contour of the target to be detected and the position relationship are subjected to feature fusion to generate 7 types of fused target face features. And connecting a 7-class softmax classifier, and outputting the target facial expression according to the fused target facial features.

S905: and obtaining the target facial expression of the target to be detected according to the output of the preset facial expression recognition model.

Step S905 is the same as the implementation of step S204, and is not described herein again.

According to the embodiment of the application, the characteristic points are input into the preset facial expression recognition model, and the position relation is input into the preset facial expression recognition model, so that the information of the target to be detected input into the preset facial expression recognition model is richer, and the accuracy of the subsequent facial expression recognition result of the target to be detected is further improved. In addition, the facial expression recognition is carried out based on the facial key points, the problem of information redundancy of the facial picture is solved, and the accuracy rate of the facial expression recognition is improved. In addition, when the facial expression recognition is carried out, only the target face key points are required to be input into the facial expression recognition model, and compared with the method for inputting the face image into the facial expression recognition model, the data input speed is high, and the facial expression recognition time of the facial expression recognition model is shortened.

Here, before inputting the feature points of the eyebrow area, the eye area, the nose area, the mouth area, and the face contour of the target to be detected, and the position relationship into the preset facial expression recognition model, the network appointment management server needs to train the preset facial expression recognition model, so that the feature points and the position relationship are subsequently input into the trained preset facial expression recognition model, and the target facial expression of the target to be detected, which is output by the preset facial expression recognition model, is obtained. In the training process, the network appointment management server can input the reference face key points and the position relationship among the reference face key points into a preset face expression recognition model, and then determine whether the face expression output by the preset face expression recognition model is the same as the reference face expression. And if not, adjusting the preset facial expression recognition model to enable the facial expression output by the preset facial expression recognition model to be the reference facial expression.

Corresponding to the facial expression recognition method in the foregoing embodiment, fig. 10 is a schematic structural diagram of a facial expression recognition apparatus provided in the embodiment of the present application. For convenience of explanation, only portions related to the embodiments of the present application are shown. Fig. 10 is a schematic structural diagram of a facial expression recognition apparatus according to an embodiment of the present application, where the facial expression recognition apparatus 100 includes: an image acquisition module 1001, a key point extraction module 1002, a relationship determination module 1003, a key point input module 1004, and an expression acquisition module 1005. The facial expression recognition device may be the network car booking management server itself, or a chip or an integrated circuit for implementing the functions of the network car booking management server. It should be noted here that the division of the image acquisition module, the key point extraction module, the relationship determination module, the key point input module, and the expression acquisition module is only a division of one logic function, and the two modules may be integrated or independent physically.

The image obtaining module 1001 is configured to obtain a target face image of a target to be detected.

And a key point extracting module 1002, configured to extract a target face key point of the target face image.

A relationship determining module 1003, configured to determine a position relationship between the target face key points.

A key point input module 1004, configured to input the target face key point into a preset facial expression recognition model, where the preset facial expression recognition model is obtained through training of reference face key points, position relationships between the reference face key points, and reference facial expressions corresponding to the reference face key points.

An expression obtaining module 1005, configured to obtain a target facial expression of the target to be detected according to the output of the preset facial expression recognition model.

In one possible design, the keypoint extraction module 1002 is specifically configured to:

In a possible implementation manner, the target face key points include feature points of an eyebrow region, an eye region, a nose region, a mouth region and a face contour of the target to be detected.

The keypoint input module 1004 is specifically configured to:

In a possible implementation manner, the keypoint input module 1004 is specifically configured to:

In a possible implementation manner, the image obtaining module 1001 is specifically configured to:

In a possible implementation manner, the keypoint input module 1004 is further configured to:

and carrying out normalization processing on the key points of the target face.

In one possible implementation manner, the preset facial expression recognition model includes: a full connectivity layer and a softmax classifier.

And the full connection layer is used for carrying out feature fusion on the target face key points and the position relation so as to generate fused target face features.

In a possible implementation manner, the preset facial expression recognition model further includes: a plurality of convolutional layers.

And the plurality of convolution layers are used for performing convolution operation on the target face key points and the position relation so as to output the target face characteristics of the target to be detected.

In a possible implementation manner, the image obtaining module 1001 is further configured to:

The apparatus provided in the embodiment of the present application may be configured to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again in the embodiment of the present application.

Optionally, fig. 11 schematically provides a possible basic hardware architecture of the facial expression recognition apparatus according to the present application.

Referring to fig. 11, a facial expression recognition device 1100 includes at least one processor 1101 and a communication interface 1103. Further optionally, a memory 1102 and a bus 1104 may also be included.

The facial expression recognition device 1100 may be the aforementioned network car booking management server, and this application is not particularly limited thereto. In the facial expression recognition apparatus 1100, the number of the processors 1101 may be one or more, and fig. 11 illustrates only one of the processors 1101. Alternatively, the processor 1101 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Digital Signal Processing (DSP). If the facial expression recognition apparatus 1100 has a plurality of processors 1101, the types of the plurality of processors 1101 may be different, or may be the same. Alternatively, the plurality of processors 1101 of the facial expression recognition apparatus 1100 may also be integrated into a multi-core processor.

Memory 1102 stores computer instructions and data; the memory 1102 may store computer instructions and data required to implement the above-described facial expression recognition methods provided herein, for example, the memory 1102 stores instructions for implementing the steps of the above-described facial expression recognition methods. Memory 1102 may be any one or any combination of the following storage media: nonvolatile memory (e.g., Read Only Memory (ROM), Solid State Disk (SSD), hard disk (HDD), optical disk), volatile memory.

The communication interface 1103 may provide information input/output for the at least one processor. Any one or any combination of the following devices may also be included: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.

Optionally, the communication interface 1103 may also be used for the facial expression recognition device 1100 to perform data communication with other computing devices or terminals.

Further alternatively, fig. 11 shows bus 1104 as a thick line. The bus 1104 may connect the processor 1101 with the memory 1102 and the communication interface 1103. Thus, via bus 1104, processor 1101 can access memory 1102 and can also interact with other computing devices or terminals using communication interface 1103.

In the present application, the facial expression recognition device 1100 executes the computer instructions in the memory 1102, so that the facial expression recognition device 1100 implements the facial expression recognition method provided in the present application, or the facial expression recognition device 1100 deploys the facial expression recognition apparatus.

From the viewpoint of logical functional division, illustratively, as shown in fig. 11, the memory 1102 may include therein an image acquisition module 1001, a key point extraction module 1002, a relationship determination module 1003, a key point input module 1004, and an expression obtaining module 1005. The present disclosure includes only that the instructions stored in the memory can implement the functions of the image acquisition module, the key point extraction module, the key point input module, and the expression acquisition module, respectively, when executed, and is not limited to a physical structure.

In addition, the facial expression recognition device may be implemented by software as shown in fig. 11, or may be implemented by hardware as a hardware module or a circuit unit.

The present application provides a computer-readable storage medium, wherein the computer program product includes computer instructions that instruct a computing device to execute the above-mentioned facial expression recognition method provided in the present application.

The application provides a computer program product comprising computer instructions, wherein the computer instructions are executed by a processor to perform the above facial expression recognition method provided by the application.

The present application provides a chip comprising at least one processor and a communication interface providing information input and/or output for the at least one processor. Further, the chip may also include at least one memory for storing computer instructions. The at least one processor is used for calling and running the computer instructions to execute the facial expression recognition method provided by the application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Claims

1. A facial expression recognition method is characterized by comprising the following steps:

acquiring a target face image of a target to be detected;

extracting a target face key point of the target face image;

determining the position relation between the key points of the target face;

2. The method according to claim 1, wherein the extracting the target face key points of the target face image comprises:

3. The method according to claim 1 or 2, wherein the target face key points comprise feature points of an eyebrow region, an eye region, a nose region, a mouth region and a face contour of the target to be detected;

4. The method according to claim 3, wherein before inputting the feature points of the eyebrow area, the eye area, the nose area, the mouth area and the face contour of the object to be detected, and the position relationship between the feature points into the preset facial expression recognition model, the method further comprises:

5. The method according to claim 1 or 2, wherein the acquiring of the target face image of the target to be detected comprises:

6. The method according to claim 1 or 2, wherein before the inputting the target face key point and the position relationship into a preset facial expression recognition model, the method further comprises:

and carrying out normalization processing on the key points of the target face.

7. The method according to claim 1 or 2, wherein the preset facial expression recognition model comprises: a full connectivity layer and softmax classifier;

8. The method of claim 7, wherein the preset facial expression recognition model further comprises: a plurality of convolutional layers;

9. The method according to claim 5, wherein before the image preprocessing of the at least one face image of the object to be detected, the method further comprises:

10. The method according to claim 5, wherein before the image preprocessing of the at least one face image of the object to be detected, the method further comprises:

11. A facial expression recognition apparatus, comprising:

12. A facial expression recognition apparatus, comprising:

a processor;

a memory; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-10.

13. A computer-readable storage medium, characterized in that it stores a computer program that causes a server to execute the method of any one of claims 1-10.