CN110633665A

CN110633665A - Recognition method, device and storage medium

Info

Publication number: CN110633665A
Application number: CN201910839265.3A
Authority: CN
Inventors: 郑雷
Original assignee: Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Current assignee: Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2019-12-31
Anticipated expiration: 2039-09-05
Also published as: CN110633665B

Abstract

The embodiment of the application discloses an identification method, equipment and a storage medium, wherein the method comprises the following steps: obtaining at least two face images; aiming at each face image in the at least two face images, at least one channel image of the face image is obtained; obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image, wherein the channel attention map is at least characterized by an image of a face target region, and the spatial attention map is at least characterized by the position of the target region in the face image; obtaining a feature map of the face image based on the spatial attention map of the face image; and identifying the fatigue state of the human face based on the characteristic images of the human face images.

Description

Recognition method, device and storage medium

Technical Field

The present application relates to image processing technologies, and in particular, to an identification method, device, and storage medium.

Background

In the related art, in order to detect the problem of insufficient safety in vehicle driving in a fatigue state, the fatigue state of a person can be identified through a face image. The identification scheme is mostly carried out according to the following three steps: the method comprises the steps of collecting a face image, identifying specific areas of the face image such as human eyes and the like, and judging the fatigue state. In order to ensure the identification accuracy, a plurality of specific regions are usually selected, and the fatigue state is judged by combining image regions such as human eyes, mouth, eyebrows and the like. In practical applications, considering that different individuals, such as five sense organs of a driver, have different fatigue states, naturally, the accuracy of the method for identifying the fatigue state in the related art still needs to be improved.

Disclosure of Invention

In order to solve the existing technical problem, an embodiment of the present application provides an identification method, which can at least improve the accuracy of detecting a fatigue state.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an identification method, which comprises the following steps:

obtaining at least two face images;

for each of the at least two face images,

obtaining at least one channel image of the face image;

obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image, wherein the channel attention map is at least characterized by an image of a face target region, and the spatial attention map is at least characterized by the position of the target region in the face image;

obtaining a feature map of the face image based on the spatial attention map of the face image;

and identifying the fatigue state of the human face based on the characteristic images of the human face images.

In the foregoing solution, the identifying a fatigue state of a human face based on a feature map of each human face image includes:

based on the feature map of each face image, obtaining context information between adjacent images, wherein the context information is characterized by the change of a face target area on the image;

according to the context information, obtaining a first probability value that the face is in a fatigue state and/or a second probability value that the face is in a non-fatigue state;

and identifying the fatigue state of the face according to the first probability value and/or the second probability value.

In the foregoing solution, the obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image includes:

obtaining a channel attention diagram of the face image based on at least one channel image of the face image;

obtaining a spatial attention diagram of the face image based on a channel attention diagram of the face image;

and obtaining a feature map of the face image based on the spatial attention map of the face image.

In the foregoing aspect, the method includes:

obtaining three channel images of the face image;

performing first compression on the channel images to obtain two first target images, wherein the first compression is the compression of the images in a space dimension;

obtaining a weight parameter of each channel image based on at least two first target images;

and obtaining a channel attention diagram of the face image based on each channel image of the face image and the weight parameter of each channel image.

In the foregoing solution, the obtaining a spatial attention map of the face image based on the channel attention map of the face image includes:

performing second compression on the channel attention diagram to obtain two second target images, wherein the second compression is the compression of the images in the channel dimension;

obtaining weight parameters of at least two second target images based on the second target images;

and obtaining a spatial attention diagram of the face image according to the second target image and the weight parameter.

In the foregoing solution, the channel image is a channel image of the face image in each of at least two of N convolutional layers, where N is a positive integer greater than or equal to 2;

correspondingly, the obtaining a channel attention diagram and a spatial attention diagram of the face image based on the at least one channel image, and obtaining a feature map of the face image based on the spatial attention diagram of the face image, includes:

obtaining a channel attention diagram and a space attention diagram of the face image at a corresponding layer based on the channel image of the corresponding layer;

and obtaining the feature map of the face image at the corresponding layer based on the spatial attention map of the face image at the corresponding layer.

In the foregoing aspect, the method further includes:

and obtaining the characteristic diagram of each face image based on the characteristic diagram of each face image in each layer of the at least two layers.

In the foregoing solution, the identifying the fatigue state of the face according to the first probability value and/or the second probability value includes:

determining the fatigue state under the condition that the first probability value is larger than the second probability value;

and determining the non-fatigue state when the first probability value is smaller than the second probability value.

An embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is configured to implement the steps of the foregoing method when executed by a processor.

An embodiment of the present application provides an image processing apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to perform the steps of the foregoing method.

The identification method, the equipment and the storage medium of the embodiment of the application have the advantages that the method comprises the following steps: obtaining at least two face images; aiming at each face image in the at least two face images, at least one channel image of the face image is obtained; obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image, wherein the channel attention map is at least characterized by an image of a face target region, and the spatial attention map is at least characterized by the position of the target region in the face image; obtaining a feature map of the face image based on the spatial attention map of the face image; and identifying the fatigue state of the human face based on the characteristic images of the human face images.

According to the method and the device, the channel information and the spatial characteristics are combined, the feature map of the face image is obtained from the two angles of the channel information and the spatial characteristics of the face image, the feature map is more accurate, and the identification accuracy of the fatigue state can be at least improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart illustrating an implementation of a first embodiment of an identification method provided in the present application;

fig. 2 is a schematic flow chart illustrating an implementation of a second embodiment of the identification method provided in the present application;

fig. 3 is a schematic flow chart of an implementation of a third embodiment of the identification method provided in the present application;

fig. 4 is a schematic flow chart of an implementation of a fourth embodiment of the identification method provided in the present application;

FIG. 5 is a schematic diagram illustrating the operation of an embodiment of a CBAM attention module provided herein;

FIG. 6 is a schematic diagram of a hardware configuration of an embodiment of the identification device provided in the present application;

fig. 7 is a schematic structural diagram of an identification device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

A first embodiment of the identification method provided by the present application is applied to an identification device, as shown in fig. 1, the method includes:

step (S) 101: obtaining at least two face images;

in this step, two or more face images are collected.

S102: aiming at each face image in the at least two face images, at least one channel image of the face image is obtained;

s103: obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image, wherein the channel attention map is at least characterized by an image of a face target region, and the spatial attention map is at least characterized by the position of the target region in the face image;

s104: obtaining a feature map of the face image based on the spatial attention map of the face image;

s105: and identifying the fatigue state of the human face based on the characteristic images of the human face images.

The entity for executing steps 101-105 is an identification device. 102-104 is a process of operating each face image in the at least two face images obtained in the step 101 to obtain a feature map of each face image; step 105 may identify the fatigue state based on the feature maps of the face images obtained in step 101.

As will be understood by those skilled in the art, the face image may be divided into channel images according to image attributes such as image colors, and different channel images (channel information) generally represent different feature information of the face image. In the embodiment of the application, the characteristic diagram of the face image is acquired based on the channel image of the face image, so that the accuracy of acquiring the characteristic diagram can be ensured. In addition, according to the fact that the specific areas such as the eyes and the mouth are the target areas of the human face, and the positions of the specific areas such as the eyes and the mouth in the human face image are regarded as the spatial characteristics, the feature map of the human face image is obtained from the two angles of the channel information and the spatial characteristics of the human face image by combining the channel information and the spatial characteristics in the embodiment of the application, so that the feature map is more accurate, and the identification accuracy of the fatigue state can be improved.

For example, the facial image may be divided into an R channel image, a G channel image and a B channel image according to RGB (red, green and blue) color attributes, and different channel images represent features of the facial image in different colors of red, green and blue.

A second embodiment of the identification method provided by the present application is applied to an identification device, as shown in fig. 2, the method includes:

s201: obtaining at least two face images;

in this step, two or more face images are collected.

S202: aiming at each face image in the at least two face images, at least one channel image of the face image is obtained;

s203: obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image, wherein the channel attention map is at least characterized by an image of a face target region, and the spatial attention map is at least characterized by the position of the target region in the face image;

s204: obtaining a feature map of the face image based on the spatial attention map of the face image;

s205: based on the feature map of each face image, obtaining context information between adjacent images, wherein the context information is characterized by the change of a face target area on the image;

s206: according to the context information, obtaining a first probability value that the face is in a fatigue state and/or a second probability value that the face is in a non-fatigue state;

s207: and identifying the fatigue state of the face according to the first probability value and/or the second probability value.

The entity for executing steps 201-207 is an identification device.

In the embodiment of the application, the feature map of the face image is obtained from two angles of the channel information and the spatial characteristic of the face image, so that the obtained feature map is more accurate, and the identification accuracy of the fatigue state can be improved. In addition, not only based on the characteristics of a single face image, but also considering the correlation between adjacent images, the fatigue state is identified based on the context information between adjacent images, such as the change condition of the face target area between adjacent images, the accuracy of fatigue state identification can be further ensured, and the high-precision identification of the state of the driver can be realized.

As an implementation manner, step 207, that is, the identifying the fatigue state of the human face according to the first probability value and/or the second probability value, may be: determining the fatigue state under the condition that the first probability value is larger than the second probability value; and determining the non-fatigue state when the first probability value is smaller than the second probability value. Whether the fatigue state is determined based on the magnitude relation between the first probability value and the second probability value can ensure the identification accuracy and bring beneficial effects which are easy to realize in engineering.

In the first embodiment and/or the second embodiment of the foregoing identification method, as shown in fig. 3, the obtaining a channel attention map and a spatial attention map of the face image based on the at least one channel image includes:

s301: obtaining a channel attention diagram of the face image based on at least one channel image of the face image;

s302: obtaining a spatial attention diagram of the face image based on a channel attention diagram of the face image;

s303: and obtaining a feature map of the face image based on the spatial attention map of the face image.

The entity for executing steps 301-303 is an identification device. In the above steps, the channel attention diagram of the face image is calculated, and then the spatial attention diagram is calculated according to the channel attention diagram, and the feature map of the face image is obtained. The fatigue state is identified from the two angles of the channel information and the spatial characteristics of the face image, and the identification accuracy and precision of the fatigue state can be further improved.

As a specific implementation manner, based on the foregoing steps 301 to 303, the method further includes:

obtaining three channel images of the face image; performing first compression on the three channel images to obtain two first target images, wherein the first compression is the compression of the images in the spatial dimension; obtaining a weight parameter of each channel image based on at least two first target images; and obtaining a channel attention diagram of the face image based on each channel image of the face image and the weight parameter of each channel image. Here, dividing the face image into channel images, obtaining a weight parameter of each channel image based on the channel images after spatial compression, and assigning a weight to a channel image corresponds to highlighting a useful channel image (a channel image including a target area) and fading a useless channel image.

As a specific implementation manner, based on the foregoing steps 301 to 303, the obtaining a spatial attention map of the face image based on the channel attention map of the face image may be:

performing second compression on the channel attention diagram to obtain two second target images, wherein the second compression is the compression of the images in the channel dimension; obtaining weight parameters of at least two second target images based on the second target images; and obtaining a spatial attention diagram of the face image according to the second target image and the weight parameter. Here, obtaining the weight parameter of the (second) target image based on the (second) target image subjected to the channel dimension compression, and obtaining the spatial attention map of the face image, by the foregoing scheme, is equivalent to highlighting the position of a specific region such as eyes and mouth in the useful channel image in the face image, and fading the position of an image region such as a nose, an ear, or the like which is not very useful for fatigue state detection.

In the foregoing solution, the channel image is a channel image of each of at least two layers of N convolutional layers of the face image, where N is a positive integer greater than or equal to 2;

obtaining a channel attention diagram and a space attention diagram of the face image at a corresponding layer based on the channel image of the corresponding layer; obtaining a feature map of the face image at the corresponding layer based on the spatial attention map of the face image at the corresponding layer; and obtaining the characteristic diagram of each face image based on the characteristic diagram of each face image in each layer of the at least two layers so as to identify the fatigue state of the face based on the characteristic diagram of each face image.

It should be known to those skilled in the art that the convolutional layers may be used to obtain a feature map of a face image, different convolutional layers obtain different feature information of the face image, for example, a low convolutional layer is used to obtain low-layer information such as a contour position and an edge position of the face, and a high convolutional layer is used to obtain high-layer information such as a contour detail and an edge detail of the face. Robustness can also be ensured by extracting the feature map by using the convolutional layer.

The present application will be described in further detail with reference to fig. 4 and 5 and the specific embodiments.

In the embodiment of the application, in order to ensure the accuracy and robustness of identifying the fatigue state, a Convolutional neural network and a bidirectional Gated cyclic Unit (GRU) model based on a Central Block Access Module (CBAM) are adopted to detect the fatigue state of the driver.

It should be appreciated by those skilled in the art that a CBAM-based convolutional neural network may be considered a convolutional neural network with a CBAM attention module embedded therein, including at least the CBAM attention module and the convolutional neural network. The convolutional neural network at least comprises a plurality of convolutional layers, a full connection layer and an output layer. The attention mechanism provided by the CBAM attention module can focus more attention to the areas of concern. The places worth paying attention in the face image are areas where human eyes, mouths and the like are helpful for identifying fatigue states. Further, the CBAM attention module in the embodiments of the present application includes a channel attention module and a spatial attention module. The channel attention module is used for distinguishing which channel images in the input multiple channel images are worth attention and which channel images need to fade attention. The spatial attention module is used to determine the position of the areas of the human eye, mouth, etc. that are helpful for fatigue state detection in the channel images of interest. In the embodiment of the application, the CBAM-based convolutional neural network firstly determines the channel image which is worth paying attention to by using the CBAM attention module, then determines the positions of the regions which are helpful to fatigue state detection, such as human eyes and mouths, in the channel image which is worth paying attention to, and inputs the determined results to the convolutional layer for feature extraction of the face image based on the CBAM attention module, which is equivalent to that useful information in the face image is extracted by the CBAM attention module between the input convolutional layers, and the convolutional layer performs feature extraction of the face image by using the extracted useful information, so that the accuracy of the feature map can be ensured, and the accuracy of fatigue state recognition based on the feature map is also improved.

As shown in fig. 4, the main process of detecting the fatigue state of the driver by the CBAM-based convolutional neural network and the bidirectional Gated cyclic Unit (GRU) model according to the embodiment of the present application is as follows:

step 401: shooting a face image of a driver through an image acquisition device to obtain a face video;

in this step, the image capturing device may be any device capable of capturing a face image, such as an infrared camera, a fisheye camera, a plane camera, a depth camera, and the like. In the embodiment of the application, the face of the driver is shot by using the infrared camera. In practical applications, an infrared camera with a focal length of 3.5mm (millimeters) and a viewing angle of 60 degrees can be used for shooting the face image of the driver. It is considered that the infrared camera can take a picture of the face in the daytime or in the dark in practical use. In order to meet the requirement of night shooting, the infrared camera in the embodiment of the application needs to be additionally matched with an infrared light source of 850nm (nanometer) for light supplement, and meanwhile, a narrow-band filter of 850nm is used for reducing the interference of light rays with other wavelengths. The design of the infrared camera can ensure the shooting clarity and accuracy of the face image, can provide clear and accurate face images for subsequent schemes, and can improve the robustness.

Step 402: acquiring a plurality of face images from a face video to obtain a face image sequence;

in this step, a face video within a certain time duration is randomly read from the face video, and a face image is acquired from the face video within the certain time duration according to a certain acquisition frequency, for example, the acquisition frequency of 12 frames/second is acquired from the face video within 30s, so that 360 face images of 30s × 12 frames/second are obtained. It can be understood that, in the embodiment of the present application, the sequential acquisition is performed, and 360 acquired face images may be combined into a face image sequence according to time information.

Step 403: inputting each face image in the face image sequence into a convolutional neural network based on CBAM to obtain a feature vector of the face image sequence;

the convolutional neural network in the embodiment of the application adopts a deep convolutional neural network, which comprises 10 layers, wherein the convolutional layers are 7 layers, the full-connection layer is 2 layers, and the output layer is 1 layer. Those skilled in the art will appreciate that each of the 7 convolutional layers are connected in sequence, and the 7 th convolutional layer, the 2 fully-connected layers, and the output layer are connected in sequence. Among them, it can be understood that the role of convolutional layers in a deep convolutional neural network is to find a feature map. For example, the low convolution layer is used to obtain low-layer information such as a contour position and an edge position of a face, and the high convolution layer is used to obtain high-layer information such as a contour detail and an edge detail of the face. The size and number of convolution kernels used in convolution layers in embodiments of the present application may be any reasonable number. For example, 30, 60, 120, 240, 480, 960 convolution kernels are used in sequence for 7 convolution layers, and the sizes of the convolution kernels are 16 × 16, 12 × 12, 9 × 9, 7 × 7, 5 × 5, 3 × 3, 2 × 2, respectively. In consideration of the computational complexity, in the embodiment of the present application, a maximum pooling layer needs to be accessed after the 1 st, 2 nd, 3 th and 5 th convolutional layers to reduce the dimensionality of data, so as to avoid an excessive amount of computation.

In the embodiment of the application, before convolution of the convolution layers 1, 2, 4 and 7, a CBAM attention module is added. The CBAM attention module includes a channel attention module and a spatial attention module. The process of the CBAM attention module before convolution of the layer 1 convolutional layer is taken as an example for explanation: any one face image in the face image sequence is divided according to R, G, B components to obtain three channel images, wherein the channel images can be represented by a matrix F, and the dimension of the matrix F is represented by H multiplied by W multiplied by C. Wherein, H and W are expressed as parameters of space dimension, H is expressed as the length of the channel image, and W is expressed as the height of the channel image; c is characterized as a parameter of channel dimension, and takes a value as the number of channel images, where C is 3. The working principle of the CBAM attention module is shown in fig. 5. In the CBAM attention module, the three channel images need to be processed by the channel attention module first, and a channel image of interest that needs to be faded out are selected from the three channel images. In particular, the method comprises the following steps of,

and (3) compressing the three channel images in a spatial dimension, specifically performing global average pooling and maximum pooling of the space to obtain a 1 × 1 × C image (first target image) respectively. And then the two first target images compressed by the spatial dimension are sent into two layers of fully-connected neural networks. It should be understood by those skilled in the art that the first target image and the second target image are used as inputs of a two-layer fully-connected neural network, each input is connected with each neuron (as a node) of a first layer of the two layers of the fully-connected neural network, the output of each neuron of the first layer is used as the input of a neuron of a second layer, each connection has a weight parameter, and a logical function-Sigmoid function is used as an activation function. It can be understood that, the two first target images compressed through the spatial dimension are sent into the two layers of fully-connected neural networks, which is equivalent to the distribution of the weight parameters for the three channel images. Through the automatic learning of the two layers of fully-connected neural networks, a larger weight parameter can be distributed to an image worth paying attention to in the three channel images, and a smaller weight parameter can be distributed to the image needing to be lightened. Among them, the image of interest is generally an image with much useful information in the three channel images, such as an image with more specific areas, such as human eyes and mouth, and the channel image of interest to be faded often includes less useful information. In the fully-connected neural network, the assignment of the weight parameters to the respective channel images needs to comply with a constraint condition that the sum of the weight parameters assigned to the respective channel images is 1. And (3) assuming that the obtained weight parameter is an Mc matrix through automatic learning of the two layers of fully-connected neural networks, multiplying the matrix F by the weight parameter to obtain a channel attention diagram F1. It can be understood that the multiplication of the Mc matrix by the matrix F is equivalent to the multiplication of each of the three channel images by the respective weight parameter, and the channel image containing more useful information has a larger weight, so that the channel image containing more useful information is highlighted in the channel attention, and the channel image containing less useful information is faded.

After processing by the channel attention module, processing by the spatial attention module is required. Further, the channel attention map F1 is compressed in the channel dimension, specifically, F1 is subjected to an average pooling operation and a maximum pooling operation to obtain two H × W × 1 images (second target images), and the two images are stitched. And (3) processing the spliced image by a convolution layer with the convolution kernel of 5 multiplied by 5 and the activation function of Sigmoid to obtain a weight coefficient Ms matrix. The weight coefficient Ms is multiplied by the matrix F1 to obtain a spatial attention diagram F2. The spatial attention map F2 is equivalent to determining the positions of the coordinate points in the channel image from the spatial characteristics, and the weight parameter of the coordinate point of interest in the channel image of interest, such as the coordinate point of a specific area of the human eye, mouth, etc., is usually larger, and the weight parameter of the coordinate point of no interest, such as the coordinate point of the human face background, is usually smaller. It can be understood that the fact that the channel images containing more useful information are highlighted by using a larger weight parameter in the aforementioned F1 obtained through the processing of the channel attention module, and the processing of the channel images through the spatial attention module is equivalent to the determination of the position of a specific region such as human eyes and mouth which is helpful for fatigue state detection in the channel images containing more useful information. After the above processing, the spatial attention map F2 is input to the 1 st convolutional layer and is subjected to convolution processing, so as to obtain a feature map of the face image in the layer.

The above description is given by taking the example of adding the CBAM attention module before performing convolution on the 1 st convolutional layer, and for the processing of other convolutional layers added with the CBAM attention module, such as the 2 nd, 4 th and 7 th convolutional layers, reference may be made to the aforementioned process of adding the CBAM attention module to the 1 st convolutional layer, which is not described in detail. It is worth noting that for the CBAM attention module added before the 1 st convolutional layer, its input F is the three-channel component of the acquired (original) face image. The CBAM attention module added before the 2 nd, 4 th, and 7 th convolutional layers is input as a three-channel component of the feature map of the face image because there is already a convolutional layer before the 2 nd, 4 th, and 7 th convolutional layers, which performs extraction of the feature map of the corresponding layer.

And (3) sequentially processing the 7 convolution layers aiming at each face image in the face image sequence to obtain a feature map of each face image, inputting the feature map of each face image into the full-connection layer, and outputting through the output layer to obtain a feature vector of each face image. Assuming that the fully connected layer employs 1024 neurons, the output matrix Fa of the output layer has dimensions 360 × 1024. Each face image of the 360 collected face images is processed by a deep convolution neural network with a CBAM attention module embedded in the upper partial convolution layer, so that 1024 feature vectors of each face image are obtained.

In the scheme, based on the deep convolutional neural network of the CBAM, the channel images worth paying attention to are screened out from the plurality of channel images by using the CBAM attention module, then the positions of the regions, which are helpful for fatigue state detection, of the eyes, the mouth and the like, in the channel images worth paying attention to are determined, and the determination result based on the CBAM attention module is input to the convolutional layer for feature extraction of the face image.

Step 404: and inputting the characteristic vector of the human face image sequence into the GRU model so as to obtain a fatigue state recognition result aiming at the collected human face image sequence.

As will be appreciated by those skilled in the art, the GRU model is a variant of the Long Short-Term Memory network LSTM (Long Short-Term Memory) that predicts the correlation between events over time. In the embodiment of the application, the face images in the face image sequence are sequenced according to time, and the face images in the face image sequence are input into the GRU model one by one, so that the context information between two adjacent face images can be obtained. For example, it is determined that the human eyes have performed the blinking operation in the adjacent two images according to a change in the human eye region from the 1 st to 2 nd face images in the adjacent two images, such as from opening in the 1 st image to closing in the 2 nd image. And determining that the mouth opens and closes in the two adjacent images according to the opening of the mouth region in the 1 st image and the closing of the mouth region in the 2 nd image. The GRU model in the embodiment of the application is trained in advance to have image changes of human face regions such as human eyes and mouths which are helpful for fatigue state detection in a fatigue state and image changes of the regions in a non-fatigue state, and based on context information and pre-trained information, the GRU model calculates the probability of the human face in the fatigue state and the probability of the non-fatigue state by using a normalization exponential function (softmax) function according to 360 collected human face images. And (3) supposing that the probability value of the fatigue state is calculated as a first probability value and the probability of the non-fatigue state is calculated as a second probability value, comparing the first probability value with the second probability value, and selecting the probability value with a larger value as a comparison result. Specifically, under the condition that the first probability value is larger than the second probability value, the fatigue state of the human face is judged based on the collected 360 human face images; and under the condition that the first probability value is smaller than the second probability value, determining that the human face is in a non-fatigue state based on the collected 360 human face images, and further realizing the recognition of the fatigue state based on the collected 360 human face images.

In the foregoing solution, taking an example that the deep convolutional neural network includes 10 layers, where the convolutional layer is 7 layers, and the CBAM attention module is added before the convolutional layers 1, 2, 4, and 7 are convolved, it can be understood that the deep convolutional neural network may be any reasonable layers, such as 30 layers and 45 layers, where the convolutional layer may also be any reasonable number of layers, such as 21 layers. The CBAM attention module can be added before any reasonable convolutional layer. This is not particularly limited. Can be flexibly set according to actual conditions. The GRU model in the foregoing scheme may be a unidirectional GRU model or a bidirectional GRU model, which is not described in detail herein.

In the scheme, the fatigue state of the driver is detected by using the convolutional neural network and the GRU model based on the CBAM, useful information in the face image is extracted through the CBAM attention module before at least part of convolutional layers in the convolutional neural network, and then the features of the face image are extracted through the convolutional layers, so that the accuracy of the feature map can be ensured, and the accuracy of fatigue state recognition based on the feature map is also improved. In addition, by using the convolutional neural network of the CBAM attention module, more image features can be better and automatically learned, and specific areas of the image, such as the eyes and the mouth of a driver, can be focused. The CBAM attention module is a lightweight model that incurs little or no consumption of resources. An accurate recognition result can be obtained without consuming most of the computing resources. In addition, the GRU model is used for judging the fatigue state, the relationship among different face images can be better excavated based on the face image sequence, the fatigue state is identified based on the context information among the different face images, and the identification precision can be improved. The use of a convolutional neural network for identification can improve robustness.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, perform at least the steps of the method shown in any one of fig. 1 to 5. The computer readable storage medium may be specifically a memory. The memory may be the memory 62 as shown in fig. 6.

The embodiment of the application also provides the identification equipment. Fig. 6 is a schematic diagram of a hardware structure of an identification device according to an embodiment of the present application, and as shown in fig. 6, the identification device includes: a communication component 63 for data transmission, at least one processor 61 and a memory 62 for storing computer programs capable of running on the processor 61. The various components in the terminal are coupled together by a bus system 64. It will be appreciated that the bus system 64 is used to enable communications among the components. The bus system 64 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 64 in fig. 6.

Wherein the processor 61 executes the computer program to perform at least the steps of the method of any of fig. 1 to 5.

It will be appreciated that the memory 62 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 62 described in embodiments herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiments of the present application may be applied to the processor 61, or implemented by the processor 61. The processor 61 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 61. The processor 61 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 61 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 62, and the processor 61 reads the information in the memory 62 and performs the steps of the aforementioned method in conjunction with its hardware.

In an exemplary embodiment, the recognition Device may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components for performing the aforementioned recognition method.

An embodiment of the present application provides an identification device, as shown in fig. 7, including: a first obtaining unit 701, a second obtaining unit 702, a third obtaining unit 703, a fourth obtaining unit 704, and a recognition unit 705; wherein,

a first obtaining unit 701, configured to obtain at least two face images;

a second obtaining unit 702, configured to obtain, for each of the at least two face images, at least one channel image of the face image;

a third obtaining unit 703, configured to obtain, based on the at least one channel image, a channel attention map and a spatial attention map of the face image, where the channel attention map is at least characterized by an image of a target region of a face, and the spatial attention map is at least characterized by a position of the target region in the face image;

a fourth obtaining unit 704, configured to obtain a feature map of the face image based on the spatial attention map of the face image;

the identifying unit 705 is configured to identify a fatigue state of the face based on the feature map of each face image.

In an alternative embodiment, the identifying unit 705 is configured to obtain context information between adjacent images based on a feature map of each face image, where the context information is characterized by a change of a face target area on the image; according to the context information, obtaining a first probability value that the face is in a fatigue state and/or a second probability value that the face is in a non-fatigue state; and identifying the fatigue state of the face according to the first probability value and/or the second probability value.

In an optional embodiment, the third obtaining unit 703 is configured to obtain a channel attention map of the face image based on at least one channel image of the face image; obtaining a spatial attention diagram of the face image based on a channel attention diagram of the face image; and obtaining a feature map of the face image based on the spatial attention map of the face image. Further, the third obtaining unit 703 is configured to obtain three channel images of the face image; performing first compression on the channel images to obtain two first target images, wherein the first compression is the compression of the images in a space dimension; obtaining a weight parameter of each channel image based on at least two first target images; obtaining a channel attention diagram of the face image based on each channel image of the face image and the weight parameter of each channel image; performing second compression on the channel attention diagram to obtain two second target images, wherein the second compression is the compression of the images in the channel dimension; obtaining weight parameters of at least two second target images based on the second target images; and obtaining a spatial attention diagram of the face image according to the second target image and the weight parameter.

In an optional embodiment, the channel image is a channel image of each of at least two of N convolutional layers of the face image, where N is a positive integer greater than or equal to 2;

correspondingly, the third obtaining unit 703 is configured to obtain a channel attention diagram and a spatial attention diagram of the face image at a corresponding layer based on a channel image of the corresponding layer;

the fourth obtaining unit 704 is configured to obtain a feature map of the facial image at the corresponding layer based on the spatial attention map of the facial image at the corresponding layer. And obtaining the characteristic diagram of each face image based on the characteristic diagram of each face image in each layer of the at least two layers.

In an alternative embodiment, the identifying unit 705 is configured to determine the fatigue state if the first probability value is greater than the second probability value; and determining the non-fatigue state when the first probability value is smaller than the second probability value.

The identification device provided by the above embodiment and the identification method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment, which is not described herein again. The first obtaining unit 701, the second obtaining unit 702, the third obtaining unit 703, the fourth obtaining unit 704 and the identifying unit 705 can be implemented by a Digital Signal Processor (DSP), a Central Processing Unit (CPU), a logic programming array (FPGA), a controller (MCU), etc.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An identification method, characterized in that the method comprises:

obtaining at least two face images;

for each of the at least two face images,

obtaining at least one channel image of the face image;

2. The method according to claim 1, wherein the identifying the fatigue state of the face based on the feature map of each face image comprises:

3. The method of claim 1, wherein obtaining the channel attention map and the spatial attention map of the face image based on the at least one channel image comprises:

4. The method of claim 3, wherein the method comprises:

obtaining three channel images of the face image;

5. The method of claim 3 or 4, wherein the obtaining a spatial attention map of the face image based on the channel attention map of the face image comprises:

6. The method according to any one of claims 1 to 4, wherein the channel image is a channel image of the face image in each of at least two of N convolutional layers, where N is a positive integer greater than or equal to 2;

7. The method of claim 6, further comprising:

8. The method of claim 2, wherein the identifying the fatigue state of the face according to the first probability value and/or the second probability value comprises:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.

10. An image processing apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 8 are implemented when the processor executes the program.