CN112597823A

CN112597823A - Attention recognition method and device, electronic equipment and storage medium

Info

Publication number: CN112597823A
Application number: CN202011438525.5A
Authority: CN
Inventors: 陈海波; 罗志鹏; 车丽轩
Original assignee: Shenyan Technology Beijing Co ltd
Current assignee: Shenyan Technology Beijing Co ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-04-02

Abstract

The application relates to the technical field of computer vision, and provides an attention recognition method, an attention recognition device, electronic equipment and a storage medium, wherein the attention recognition method comprises the following steps: determining a face image of a driver to be recognized; inputting the face image into an attention classification fusion model to obtain a face direction angle output by the attention classification fusion model; the attention classification fusion model is obtained by fusing a plurality of attention classification models of different types. The method, the device, the electronic equipment and the storage medium can fully identify the face characteristics of the driver, have strong generalization capability and improve the accuracy of the attention identification of the driver.

Description

Attention recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to an attention recognition method and apparatus, an electronic device, and a storage medium.

Background

Fatigue driving refers to the phenomenon of decline of driving performance due to the psychological and physiological functions of the driver after continuous driving. Driving fatigue is generally considered to be technical fatigue involving both mental and physical strength. When a driver drives a vehicle, physiological or psychological dysfunction often occurs due to various reasons, attention is reduced, and driving safety is seriously affected.

In the prior art, the fatigue state of a driver is generally analyzed by collecting electroencephalograms of the forehead of the driver, and then the attention state of the driver is obtained. The accuracy of recognizing the attention state of the driver by the method is low.

Disclosure of Invention

The application provides an attention recognition method, an attention recognition device, electronic equipment and a storage medium, which can fully recognize the human face characteristics of a driver, have strong generalization capability and improve the accuracy of the attention recognition of the driver.

The application provides an attention recognition method, comprising the following steps:

determining a face image of a driver to be recognized;

inputting the face image into an attention classification fusion model to obtain a face direction angle output by the attention classification fusion model;

the attention classification fusion model is obtained by fusing a plurality of attention classification models of different types.

According to the attention recognition method provided by the application, the determination method of the attention classification fusion model comprises the following steps:

respectively selecting an attention classification model with the highest performance score from a plurality of attention classification models of each type as a basic model of each type;

and fusing the basic models of each type to obtain the attention classification fusion model.

According to an attention recognition method provided by the present application, the step of respectively selecting an attention classification model with the highest performance score from a plurality of attention classification models of each type as a base model of each type includes:

determining a plurality of face training sets and corresponding face verification sets thereof;

training any type of initial model based on each face training set to obtain an attention classification model corresponding to each face training set;

performing performance scoring on the attention classification model corresponding to each face training set based on the face verification set corresponding to each face training set;

and taking the attention classification model with the highest performance score as a basic model corresponding to any type.

According to the attention recognition method provided by the application, the determining of the plurality of face training sets and the face verification sets corresponding to the face training sets comprises the following steps:

determining a sample face data set;

and splitting the sample face data set into a plurality of subsets, and determining a plurality of face training sets and corresponding face verification sets by adopting a cross verification method.

According to the attention recognition method provided by the application, the fusing of each type of basic model to obtain the attention classification fusion model comprises the following steps:

determining a weight corresponding to each basic model based on the performance score of each type of basic model;

and fusing the plurality of basic models based on the weight corresponding to each basic model to obtain the attention classification fusion model.

According to the attention recognition method provided by the application, the step of determining the face image of the driver to be recognized comprises the following steps:

acquiring image data containing the face of the driver;

inputting the image data into a face detection model to obtain a face image output by the face detection model;

the Face detection model is established based on a Retina Face model, and a feature extraction layer of the Retina Face model is increment V4.

According to an attention recognition method provided by the application, the plurality of different types of attention classification models comprise at least one of DenseNet121, EfficientNet-B0, EfficientNet-B3 and EfficientNet-B7.

The present application also provides an attention recognition device, comprising:

a determination unit for determining a face image of a driver to be recognized;

the recognition unit is used for inputting the face image into an attention classification fusion model to obtain a face direction angle output by the attention classification fusion model;

The present application further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the attention recognition method as described in any of the above when executing the program.

The present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of attention recognition as described in any of the above.

According to the attention recognition method, the attention recognition device, the electronic equipment and the storage medium, the input face image of the driver to be recognized is recognized through the attention classification fusion model, and the face direction angle of the driver is obtained. The attention classification fusion model is obtained by fusing a plurality of attention classification models of different types, so that the face characteristics of the driver can be fully recognized, the generalization capability is strong, and the accuracy of the driver attention recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions in the present application or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an attention recognition method provided in the present application;

FIG. 2 is a schematic flow chart of a method for determining an attention classification fusion model provided in the present application;

FIG. 3 is a schematic flow chart of a method for determining a base model provided herein;

FIG. 4 is a schematic flow chart of a cross-validation method provided herein;

FIG. 5 is a schematic flow chart of a model fusion method provided herein;

fig. 6 is a schematic flow chart of a face image determination method provided in the present application;

FIG. 7 is a block diagram of a basic module of a feature extraction layer provided herein;

FIG. 8 is a schematic flow chart of a training method of the attention classification fusion model provided in the present application;

FIG. 9 is a schematic structural diagram of an attention recognition device provided herein;

fig. 10 is a schematic structural diagram of a fusion model determination unit provided in the present application;

FIG. 11 is a schematic structural diagram of a basic model determination subunit provided in the present application;

FIG. 12 is a block diagram of a training validation determination module provided herein;

FIG. 13 is a schematic diagram of the structure of a base model fusion subunit provided herein;

fig. 14 is a schematic structural diagram of a determination unit provided in the present application;

fig. 15 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic flow chart of an attention recognition method provided in the present application, and as shown in fig. 1, the method includes:

step 110, determining a face image of the driver to be recognized.

Specifically, the image of the face of the driver can be acquired in real time by a camera mounted inside the vehicle. A plurality of cameras can be installed at different positions in the vehicle, and each camera acquires images of the face of a driver in different postures. For example, the installation position of the camera can be the positions of a rearview mirror, a center console, an instrument panel, a windshield and the like. The number of cameras may be set to 9.

Step 120, inputting the face image into an attention classification fusion model to obtain a face direction angle output by the attention classification fusion model;

Specifically, the human face image of the driver is input into the attention classification fusion model, and the attention classification fusion model is used for identifying the human face posture in the human face image to obtain the human face direction angle. The face direction angle is used to characterize the direction of attention of the driver. The attention direction of the driver can be judged according to the face direction angle, and the attention focusing time of the driver can be judged according to the time of the face direction angle.

A plurality of attention classification models of different types can be obtained through pre-training, and the models are fused to obtain an attention classification fusion model.

The attention classification model of a certain type can be obtained through pre-training, and specifically, the attention classification model can be obtained through the following training modes: firstly, collecting a large number of sample face images of a driver, determining a face direction angle label of each sample face image in a manual labeling mode, and then inputting the large number of sample face images and the face direction angle labels corresponding to the sample face images into an initial model for training, so as to obtain the type of attention classification model by aiming at improving the recognition capability of the initial model on the face direction angles in the sample face images.

The initial model of the attention classification model may be a decision tree model, an artificial neural network model, a support vector machine model, or the like, which is not specifically limited in this embodiment of the present application.

The attention classification fusion model obtained after fusion can also be subjected to joint training, for example, the attention classification fusion model can be trained by adopting a global loss function with the aim of reducing the overall loss of the attention classification fusion model.

According to the attention recognition method, the input face image of the driver to be recognized is recognized through the attention classification fusion model, and the face direction angle of the driver is obtained. The attention classification fusion model is obtained by fusing a plurality of attention classification models of different types, so that the face characteristics of the driver can be fully recognized, the generalization capability is strong, and the accuracy of the driver attention recognition is improved.

Based on the above embodiment, fig. 2 is a schematic flow chart of a method for determining an attention classification fusion model provided in the present application, and as shown in fig. 2, the method for determining an attention classification fusion model includes:

step 210, selecting the attention classification model with the highest performance score from the plurality of attention classification models of each type as the basic model of each type.

Specifically, the base model is a model with optimal performance in each type of the multiple attention classification models, and can be used as a basic component unit of the attention classification fusion model.

For each type of initial model, multiple attention classification models may be trained using multiple data sets. For example, a convolutional neural network is selected as an initial model, and 4 different data sets are used for training to obtain 4 attention classification models, which are model 1, model 2, model 3 and model 4 in sequence. Due to sample differences in the data sets, the accuracy, convergence speed, etc. of the 4 attention classification models are different.

The accuracy can be set as a performance index according to needs, the convergence rate can also be set as a performance index, and a value obtained by weighting and summing the accuracy and the convergence rate can also be set as a performance index. And according to the set performance indexes, performing performance grading on each attention classification model, and taking the attention classification model with the highest performance grade as a basic model of each type. For example, the accuracy is set as a performance index, the performance of the 4 attention classification models obtained according to the convolutional neural network is scored, the accuracy of the model 3 is the highest, the performance score is also the highest, and the model 3 can be used as a basic model in a model of the convolutional neural network type.

And step 220, fusing the basic models of each type to obtain an attention classification fusion model.

Specifically, model fusion is to train multiple models, and fuse the multiple models into one model according to a certain method. The model fusion method comprises a linear weighted fusion method, a cross fusion method, a waterfall fusion method, a feature fusion method, a prediction fusion method and the like.

By fusing each type of basic model, the advantages of each different type of model can be fully utilized, the adaptability of the fusion model to different samples is improved, and the overall performance of the fusion model is improved.

Based on any of the above embodiments, fig. 3 is a schematic flowchart of the method for determining a basic model provided in the present application, and as shown in fig. 3, step 210 includes:

step 2101, determine a plurality of face training sets and their corresponding face verification sets.

Specifically, the face training set is used for training each type of initial model, and the face verification set is used for adjusting the hyper-parameters of the initial model and evaluating the training effect of the initial model. The hyper-parameters are parameters which are preset before the initial model is trained.

The face training set and the face verification set may be determined from the same face data set. For example, a face training set and a face verification set can be obtained by splitting the same face data set. The sizes of the face training set and the face verification set can be set according to needs. According to different splitting modes, a plurality of face training sets and corresponding face verification sets can be obtained.

Step 2102, training any type of initial model based on each face training set to obtain an attention classification model corresponding to each face training set.

Specifically, for any type of initial model, a plurality of face training sets are adopted for training, and an attention classification model corresponding to each face training set is obtained respectively. For example, the initial model selection may be selected as DenseNet121, and different face training sets a1, a2, A3, a4 and a5 are used to train the DenseNet121, so as to obtain an attention classification model M1 corresponding to a1, an attention classification model M2 corresponding to a2, an attention classification model M3 corresponding to A3, an attention classification model M4 corresponding to a4 and an attention classification model M5 corresponding to a5, respectively.

And 2103, performing performance scoring on the attention classification model corresponding to each face training set based on the face verification set corresponding to each face training set.

Specifically, the face verification sets corresponding to the face training sets a1, a2, A3, a4 and a5 are B1, B2, B3, B4 and B5, respectively. Performance scoring was performed on the attention classification model M1 using B1, performance scoring was performed on the attention classification model M2 using B2, performance scoring was performed on the attention classification model M3 using B3, performance scoring was performed on the attention classification model M4 using B4, and performance scoring was performed on the attention classification model M5 using B5.

At step 2104, the attention classification model with the highest performance score is used as the base model corresponding to the type.

Specifically, the performance scores of the attention classification models M1, M2, M3, M4 and M5 were compared, and the attention classification model M3 with the highest score was used as the base model corresponding to DenseNet 121.

According to the attention recognition method provided by the embodiment of the application, the performance grading is carried out on the plurality of attention classification models of the same type to obtain the basic model corresponding to the type, and the influence of the plurality of training sets on the performance of the attention classification models is fully considered.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of the cross-validation method provided in the present application, and as shown in fig. 4, step 2101 includes:

step 21011, determining a sample face data set;

step 21012, the sample face data set is split into a plurality of subsets, and a plurality of face training sets and corresponding face verification sets thereof are determined by a cross-validation method.

Specifically, the sample face data set may be obtained from a public data source, or a real photograph of the driver may be acquired to obtain the sample face data set.

Cross validation (Cross validation) takes out most samples from a given modeling sample to build a model, and leaves a small portion of samples to be forecasted by the just built model. K-fold cross validation (K-fold cross validation) may be used. The K-fold cross validation is to divide a data set into K parts in equal proportion, wherein one part is used as a validation set, and the other K-1 parts are used as a training set. Then, the experiment is performed once, and the K-fold cross validation is completed only after K times of the experiment, that is, the cross validation actually repeats the experiment for K times, each time of the experiment selects a different data part from the K parts as a validation set, the data of the K parts are ensured to be respectively tested, and the rest K-1 parts are taken as training sets.

For example, the sample face data set may be split into 5 subsets by an average splitting method, each subset is respectively used as a primary verification set, and the remaining 5 subsets are used as training sets, so as to obtain 5 training sets and face verification sets corresponding to the training sets.

By adopting the K-fold cross validation, an average value obtained by scoring a plurality of attention classification models of the same type by the K validation sets may be used as the score of the attention classification model of the type, which is not specifically limited in the embodiment of the present application.

According to the attention recognition method provided by the embodiment of the application, the cross verification method is adopted to determine the face training sets and the face verification sets corresponding to the face training sets, so that the over-fitting and under-fitting in model training are effectively avoided, and the accuracy of the attention classification model in driver attention recognition is improved.

Based on any of the above embodiments, fig. 5 is a schematic flow chart of the model fusion method provided in the present application, and as shown in fig. 5, step 220 includes:

step 2201, determining a weight corresponding to each basic model based on the performance score of each type of basic model;

and 2202, fusing the plurality of basic models based on the weight corresponding to each basic model to obtain an attention classification fusion model.

Specifically, a linear weighting method may be adopted to fuse a plurality of base models into one attention classification fusion model.

And determining a weight corresponding to each basic model according to the performance score of each type of basic model, and fusing the basic models by using a weighted summation mode to obtain the attention classification fusion model.

According to the attention recognition method provided by the embodiment of the application, the attention classification fusion model is obtained by fusing the plurality of basic models by adopting a linear weighting method, the complexity of the algorithm is reduced, and the method is simple and easy to implement.

Based on any of the above embodiments, fig. 6 is a schematic flow chart of the face image determination method provided by the present application, and as shown in fig. 6, step 110 includes:

step 1101, acquiring image data containing a face of a driver;

step 1102, inputting image data into a face detection model to obtain a face image output by the face detection model; the Face detection model is established based on a Retina Face model, and the feature extraction layer of the Retina Face model is increment V4.

Specifically, the image data including the face of the driver is sent to a face detection model, and the face detection model detects the face region in the image data to obtain a face image.

The face detection model may be pre-trained before step 1102 is performed. First, a large amount of sample image data including the face of the driver is collected. Secondly, a human face area mark in each sample image data is determined in a manual marking mode. Then, a large amount of sample image data and the face region marks in each sample image data are input into the initial model for training, so that the recognition capability of the initial model on the face regions in the image data is improved, and a face detection model is obtained.

Here, the initial model may select a Retina Face model. The Retina Face model adopts the technology of a characteristic pyramid, realizes multi-scale fusion and has an important effect on detecting small objects.

The inclusion V4 can be used as a feature extraction layer of the Retina Face model. The Incep V4 adopts a plurality of basic modules connected in series, each basic module comprises convolution kernels with different scales and different connection modes, and the features of input image data are extracted, so that human faces with different sizes can be detected.

For example, 3 basic modules, module 1, module 2, and module 3, respectively, may be provided in the feature extraction layer.

Fig. 7 is a schematic structural diagram of basic modules of the feature extraction layer provided in the present application, and as shown in fig. 7, the

modules

1, 2, and 3 respectively employ convolution kernels of different scales, such as 1 × 1 convolution kernel, 3 × 3 convolution kernel, 7 × 1 convolution kernel, and 3 × 1 convolution kernel, and overlap in different connection manners. Firstly, the image data I is adjusted to 299 × 299 (pixels), then the input module 1 obtains a feature map F1, the feature map F1 is input into the module 2 to obtain a feature map F2, and the feature map F2 is input into the module 3 to obtain a feature map F3. And respectively proposing (proposal) to each layer in the three-layer feature maps of F1, F2 and F3, generating detection frames on 3 different scales, wherein each scale has anchors (anchors) with different sizes, and ensuring that human faces with different sizes can be detected.

In the training process of the face detection model, the loss function can adopt Focal loss to solve the problem of serious unbalance of positive and negative samples in target detection, and the loss function reduces the weight occupied by a large number of simple samples in training. Focal loss function L_flCan be formulated as:

wherein y is a real label of the sample, y' is a predicted value of the sample, gamma is an influence factor, and gamma is more than 0, so that the loss of the sample which is easy to classify can be reduced, and the sample which is difficult to classify and is wrongly classified is concerned more; alpha is a balance factor to balance the importance of positive and negative samples. For example, γ is 2, and for the positive type sample, the prediction result is 0.95, which is a simple sample, so the γ power of (1-0.95) is small, and the loss function value becomes smaller. Whereas the loss is relatively large for samples with a prediction probability of 0.3. For negative class samples as well, the result of predicting 0.1 should be much smaller than the sample loss value of predicting 0.7. For a prediction probability of 0.5, the loss is reduced by only 0.25 times, so this indistinguishable sample is of greater interest. Therefore, the influence of simple samples is reduced, and the effect of overlapping a large number of samples with small prediction probability is more effective.

According to the attention recognition method provided by the embodiment of the application, the Face detection model is established based on the Retina Face model, the characteristic extraction layer is set as the inclusion V4, the Face image of a driver can be accurately extracted, and interference factors of image data are eliminated.

Based on any of the above embodiments, the plurality of different types of attention classification models includes at least one of DenseNet121, EffectientNet-B0, EffectientNet-B3, and EffectientNet-B7.

Specifically, DenseNet proposes a more aggressive dense connection mechanism than ResNet: i.e. interconnecting all layers, in particular each layer will accept as its additional input all layers in front of it. ResNet is the short-circuiting of each layer to a previous layer (typically 2-3 layers) by element-level addition. In DenseNet, each layer is connected (concat) dimensionally with all previous layers and serves as input for the next layer. For a L-layer network, DenseNet contains L (L +1)/2 connections in total, which is a dense connection compared to ResNet. And DenseNet is directly connected with feature maps from different layers, so that feature reuse can be realized, and efficiency is improved.

The EfficientNet has an excellent effect on image classification. A better Backbone network (Backbone) can be searched by utilizing the neural network, namely EfficientNet-B0. On the basis, in order to improve the complexity of the model and further improve the performance of the model, the model can be scaled appropriately. Common scaling methods are: increasing the width of the model, increasing the depth of the model, increasing the resolution of the input picture of the model, etc. The combined scaling method can be formulated as:

depth：d＝α^φ

width：w＝β^φ

resolution：r＝γ^φ

s.t.α·β²·γ²≈2，α≥1，β≥1，γ≥1

where phi is a single combined scaling factor, scaling the width d, depth w and picture resolution r simultaneously. α, β, γ are scaling bases for depth, width, resolution.

In the above formula, α · β is restricted²·γ²≈2，α≥1，β≥1，γ≥1。

For EfficientNet-B0, the corresponding scaling basis is α ═ 1.2, β ═ 1.1, and γ ═ 1.15. EfficientNet-B0 through EfficientNet-B7 can be derived from the scaling factors of Table 1.

TABLE 1 Scale factor Table

DenseNet121, EfficientNet-B0, EfficientNet-B3 and EfficientNet-B7 can be selected as initial models to be trained to obtain a plurality of attention classification models of different types, and the training method comprises the following steps:

(1) adjusting the size of the face image to be the same size, such as 224 × 224 (pixel points), and obtaining eye marks (landmarks) in the face image;

(2) inputting the adjusted face image into DenseNet121, adding an eye mark, selecting cross entropy loss as a loss function for training, and obtaining a basic model 1 corresponding to DenseNet 121;

(3) inputting the adjusted face image into EfficientNet-B0, adding eye marks, selecting cross entropy loss as a loss function for training, and obtaining a basic model 2 corresponding to EfficientNet-B0;

(4) inputting the adjusted face image into EfficientNet-B3, adding eye marks, selecting cross entropy loss as a loss function for training, and obtaining a basic model 3 corresponding to EfficientNet-B3;

(5) inputting the adjusted face image into EfficientNet-B7, adding eye marks, selecting cross entropy loss as a loss function for training, and obtaining a basic model 4 corresponding to EfficientNet-B7;

(6) and inputting the adjusted face image into EfficientNet-B7, selecting cross entropy loss as a loss function for training without adding eye marks, and obtaining a basic model 5 corresponding to EfficientNet-B7.

According to the attention recognition method provided by the embodiment of the application, the DenseNet121, the EfficientNet-B0, the EfficientNet-B3 and the EfficientNet-B7 are adopted for training to obtain a plurality of attention classification models of different types, so that the attention classification fusion model obtained after fusion has high accuracy and generalization capability.

Based on any of the above embodiments, fig. 8 is a schematic flowchart of a training method of an attention classification fusion model provided in the present application, and as shown in fig. 8, the method includes:

inputting a plurality of image data containing the Face of a driver into a Retina Face model to obtain a Face image corresponding to each image data, wherein a feature extraction layer of the Retina Face model is expression V4;

performing five-fold cross validation on the obtained multiple face images to obtain 5 groups of training sets and validation sets corresponding to the training sets;

and step three, inputting the 5 groups of training sets and the corresponding verification sets into DenseNet121, EfficientNet-B0, EfficientNet-B3 and EfficientNet-B7 respectively for training. Wherein, 2 groups of training sets and corresponding verification sets are respectively input into EfficientNet-B7 for training, the first group is added with eye marks during training, and the second group is not added with eye marks during training. Obtaining 5 attention classification models of each type after training;

performing performance grading on the 5 attention classification models of each type, and selecting the attention classification model with the highest performance grading as a basic model of each type to obtain 5 basic models;

and step five, fusing the 5 basic models by adopting a linear weighting method to obtain an attention classification fusion model.

And inputting the face image of any driver into the attention classification fusion model to obtain the face direction angle of the driver.

The following describes the attention recognition device provided in the present application, and the attention recognition device described below and the attention recognition method described above may be referred to correspondingly.

Based on any of the above embodiments, fig. 9 is a schematic structural diagram of an attention recognition device provided in the present application, and as shown in fig. 9, the device includes:

a determination unit 910 configured to determine a face image of a driver to be recognized;

the recognition unit 920 is configured to input the face image into the attention classification fusion model to obtain a face direction angle output by the attention classification fusion model;

Specifically, the determination unit 910 determines a face image of the driver to be recognized. The recognition unit 920 inputs the face image into the attention classification fusion model to obtain the face direction angle output by the attention classification fusion model.

The attention recognition device provided by the application recognizes the input face image of the driver to be recognized through the attention classification fusion model to obtain the face direction angle of the driver. The attention classification fusion model is obtained by fusing a plurality of attention classification models of different types, so that the face characteristics of the driver can be fully recognized, the generalization capability is strong, and the accuracy of the driver attention recognition is improved.

Based on any of the above embodiments, fig. 10 is a schematic structural diagram of a fusion model determining unit provided in the present application, as shown in fig. 10, the apparatus further includes a fusion model determining unit 930, and the fusion model determining unit 930 includes:

a basic model determining subunit 9301, configured to select, from the multiple attention classification models of each type, an attention classification model with the highest performance score as a basic model of each type;

and a basic model fusion subunit 9302 for fusing the basic models of each type to obtain an attention classification fusion model.

Based on any of the above embodiments, fig. 11 is a schematic structural diagram of the basic model determining subunit provided in the present application, and as shown in fig. 11, the basic model determining subunit 9301 includes:

a training verification determining module 93011, configured to determine a plurality of face training sets and face verification sets corresponding to the face training sets;

the model training module 93012 is configured to train any type of initial model based on each face training set to obtain an attention classification model corresponding to each face training set;

the model scoring module 93013 is configured to perform performance scoring on the attention classification model corresponding to each face training set based on the face verification set corresponding to each face training set;

and the model screening module 93014 is used for taking the attention classification model with the highest performance score as a basic model corresponding to any type.

Based on any of the above embodiments, fig. 12 is a schematic structural diagram of a training verification determining module provided in the present application, and as shown in fig. 12, the training verification determining module 93011 includes:

a data set determination submodule 930111 for determining a sample face data set;

and the data set splitting submodule 930112 is configured to split the sample face data set into multiple subsets, and determine multiple face training sets and corresponding face verification sets by using a cross-validation method.

Based on any of the above embodiments, fig. 13 is a schematic structural diagram of the basic model fusion subunit provided in the present application, and as shown in fig. 13, the basic model fusion subunit 9302 includes:

a weight determination module 93021, configured to determine a weight corresponding to each basic model based on the performance score of each type of basic model;

and the model fusion module 93022 is configured to fuse the multiple basic models based on the weight corresponding to each basic model to obtain an attention classification fusion model.

Based on any of the above embodiments, fig. 14 is a schematic structural diagram of a determining unit provided in the present application, and as shown in fig. 14, the determining unit 910 includes:

an image acquiring subunit 9101, configured to acquire image data including a face of a driver;

a face detection subunit 9102, configured to input image data to a face detection model, so as to obtain a face image output by the face detection model; the Face detection model is established based on a Retina Face model, and the feature extraction layer of the Retina Face model is increment V4.

The attention recognition device provided in the embodiment of the present application is used for executing the above-mentioned attention recognition method, and the implementation manner of the attention recognition device is consistent with that of the attention recognition method provided in the present application, and the same beneficial effects can be achieved, and details are not described here.

Based on any of the above embodiments, fig. 15 is a schematic structural diagram of an electronic device provided in the present application, and as shown in fig. 15, the electronic device may include: a Processor (Processor)1510, a communication Interface (Communications Interface)1520, a Memory (Memory)1530 and a communication Bus (Communications Bus)1540, wherein the Processor 1510, the communication Interface 1520 and the Memory 1530 communicate with each other via the communication Bus 1540. The processor 1510 may call the logic command in the memory 1530 to execute the method provided by the above embodiments, the method includes:

determining a face image of a driver to be recognized; inputting the face image into an attention classification fusion model to obtain a face direction angle output by the attention classification fusion model; the attention classification fusion model is obtained by fusing a plurality of attention classification models of different types.

In addition, the logic commands in the memory 1530 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic commands are sold or used as independent products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including commands for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The processor in the electronic device provided in the embodiment of the present application may call a logic instruction in a memory to implement the above attention identification method, and a specific implementation manner of the method is consistent with the method implementation manner and may achieve the same beneficial effects, which is not described herein again.

The present application also provides a non-transitory computer-readable storage medium, which is described below, and the non-transitory computer-readable storage medium described below and the attention recognition method described above are referred to in correspondence.

Embodiments of the present application provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes:

When the computer program stored on the non-transitory computer readable storage medium provided in the embodiment of the present application is executed, the above attention identification method is implemented, and the specific implementation manner is consistent with the method implementation manner and can achieve the same beneficial effects, which is not described herein again.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An attention recognition method, comprising:

determining a face image of a driver to be recognized;

2. The attention recognition method according to claim 1, wherein the determination method of the attention classification fusion model comprises:

3. The attention recognition method according to claim 2, wherein the step of selecting the attention classification model with the highest performance score from the plurality of attention classification models of each type as the base model of each type comprises:

4. The attention recognition method of claim 3, wherein the determining a plurality of face training sets and corresponding face verification sets comprises:

determining a sample face data set;

5. The attention recognition method of claim 2, wherein the fusing the base models of each type to obtain the attention classification fusion model comprises:

6. The attention recognition method according to any one of claims 1 to 5, wherein the determining a face image of a driver to be recognized includes:

acquiring image data containing the face of the driver;

7. The attention recognition method of any one of claims 1 to 5, wherein the plurality of different types of attention classification models comprise at least one of DenseNet121, EfficientNet-B0, EfficientNet-B3, and EfficientNet-B7.

8. An attention recognition device, comprising:

a determination unit for determining a face image of a driver to be recognized;

9. The attention recognition device according to claim 8, further comprising a fusion model determination unit including:

the basic model determining subunit is used for respectively selecting the attention classification model with the highest performance score from the plurality of attention classification models of each type as the basic model of each type;

and the basic model fusion subunit is used for fusing the basic models of each type to obtain the attention classification fusion model.

10. The attention recognition device of claim 9, wherein the base model determining subunit comprises:

the training verification determining module is used for determining a plurality of face training sets and corresponding face verification sets;

the model training module is used for training any type of initial model based on each face training set to obtain an attention classification model corresponding to each face training set;

the model scoring module is used for performing performance scoring on the attention classification model corresponding to each face training set based on the face verification set corresponding to each face training set;

and the model screening module is used for taking the attention classification model with the highest performance score as the basic model corresponding to any type.

11. The attention recognition device of claim 10, wherein the training verification determination module comprises:

a data set determining submodule for determining a sample face data set;

and the data set splitting submodule is used for splitting the sample face data set into a plurality of subsets and determining a plurality of face training sets and corresponding face verification sets by adopting a cross verification method.

12. The attention recognition device of claim 9, wherein the base model fusion subunit comprises:

the weight determination module is used for determining the weight corresponding to each basic model based on the performance score of each type of basic model;

and the model fusion module is used for fusing the plurality of basic models based on the weight corresponding to each basic model to obtain the attention classification fusion model.

13. The attention recognition device according to claim 8, wherein the determination unit includes:

an image acquisition subunit configured to acquire image data including a face of the driver;

the face detection subunit is used for inputting the image data into a face detection model to obtain a face image output by the face detection model;

14. The attention recognition device of any one of claims 8 to 13, wherein the plurality of different types of attention classification models comprises at least one of DenseNet121, EfficientNet-B0, EfficientNet-B3, and EfficientNet-B7.

15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the attention recognition method according to any one of claims 1 to 7 when executing the computer program.

16. A non-transitory computer readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the attention recognition method according to any one of claims 1 to 7.