CN117173758A - Learning attention state assessment method based on multidimensional feature fusion network - Google Patents
Learning attention state assessment method based on multidimensional feature fusion network Download PDFInfo
- Publication number
- CN117173758A CN117173758A CN202211662783.0A CN202211662783A CN117173758A CN 117173758 A CN117173758 A CN 117173758A CN 202211662783 A CN202211662783 A CN 202211662783A CN 117173758 A CN117173758 A CN 117173758A
- Authority
- CN
- China
- Prior art keywords
- feature
- learner
- module
- graph
- attention state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000001815 facial effect Effects 0.000 claims abstract description 37
- 230000008569 process Effects 0.000 claims abstract description 27
- 238000000605 extraction Methods 0.000 claims abstract description 22
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims abstract description 17
- 239000008280 blood Substances 0.000 claims abstract description 17
- 210000004369 blood Anatomy 0.000 claims abstract description 17
- 229910052760 oxygen Inorganic materials 0.000 claims abstract description 17
- 239000001301 oxygen Substances 0.000 claims abstract description 17
- 238000011156 evaluation Methods 0.000 claims abstract description 13
- 230000008859 change Effects 0.000 claims abstract description 9
- 238000003384 imaging method Methods 0.000 claims abstract description 6
- 238000007619 statistical method Methods 0.000 claims abstract description 3
- 239000013598 vector Substances 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 24
- 238000010276 construction Methods 0.000 claims description 21
- 238000013507 mapping Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000010606 normalization Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 14
- 238000005096 rolling process Methods 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 230000003044 adaptive effect Effects 0.000 claims description 4
- 230000002452 interceptive effect Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 101100194362 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res1 gene Proteins 0.000 claims description 3
- 101100478210 Schizosaccharomyces pombe (strain 972 / ATCC 24843) spo2 gene Proteins 0.000 claims description 3
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000009448 modified atmosphere packaging Methods 0.000 claims 7
- 235000019837 monoammonium phosphate Nutrition 0.000 claims 1
- 210000003128 head Anatomy 0.000 description 61
- 239000000284 extract Substances 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000005286 illumination Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 241001282135 Poromitra oscitans Species 0.000 description 1
- 206010048232 Yawning Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Image Analysis (AREA)
Abstract
Aiming at the problem that a learner is difficult to evaluate the self attention state in real time, the invention discloses a learning attention state evaluation method based on a multidimensional feature fusion network. The method comprises the following steps: 1) Acquiring a learner video resource acquired by a binocular imaging device (a short wave infrared camera and a laser radar scanner) on an office table, and dividing the learner video resource into multiple frames of images; the hand wearable device acquires blood oxygen saturation and heart rate signals of a learner; 2) And (5) locating the face area and the facial feature points of the SWIR image of the learner. Segmenting the head region 3D point cloud set; 3) Inputting the SWIR image of the face region and the head 3D point cloud set into a corresponding feature extraction network to obtain a feature topological graph, and inputting the feature topological graph into a Cauchy tag distribution regression module to obtain the head posture angle of the learner after the feature topological graph is fused by a self-attention weighting module. Extracting blood oxygen saturation and heart rate variation characteristics, and judging the fatigue level of a learner; 4) Comprehensively evaluating the attention state according to the head posture angle and the facial feature point change of the learner and the fatigue degree, and reminding the learner if the attention state is not concentrated; 5) And counting the attention state condition in the learning process, and feeding back a statistical analysis report. The invention helps the learner to improve concentration degree and cultivate good learning habit by comprehensively evaluating and statistically feeding back the attention state of the learner.
Description
Technical Field
The invention relates to the field of computer vision and behavior analysis, in particular to a learning attention state evaluation method based on a multidimensional feature fusion network.
Background
Along with the increasing and comprehensive acquisition ways of learning resources, the improvement of personal ability through self-learning is just like the future learning development trend, but under the unsupervised environment, especially under the home environment, the attention of many learners is easy to be dispersed, and the learning efficiency is low. In recent years, due to the convenience and high efficiency of the artificial intelligence technology, the technology is widely applied in a plurality of fields, and in a self-learning environment, a learner cannot always realize and correct own behaviors in time, so that the learning condition of the learner can be supervised in real time by using the artificial intelligence technology, particularly the gesture recognition technology, and whether the attention of the learner is concentrated or not is judged, thereby better helping the learner to develop good learning habits.
The head gesture is an important expression form reflecting the attention of the learner, the attention of the learner can be effectively judged by analyzing the head gesture angle change of the learner in the learning process and combining with the facial feature point change, and if the head deflection angle of the learner is positioned outside a desk or a screen in the learning process or the behavior of frequently yawning or even closing eyes occurs, the system can timely detect and remind the learner to concentrate the attention. In addition, the fatigue degree of the learner can be well reflected through the blood oxygen saturation and heart rate variation characteristics, and the fatigue degree can directly influence the concentration degree of the attention. However, current head pose estimation also faces some challenges:
the learner has the problems that hands are blocked, hair styles are blocked, heads are blocked by clothes and the like easily in the learning process, and in addition, the problem of insufficient illumination of indoor scenes easily occurs. These all result in poor quality images that are acquired and complete information cannot be obtained. There is therefore a need for an image acquisition device that can acquire head pose information in multiple dimensions and is immune to illumination variations.
The data set that is used for training head gesture angle today, the distribution of training sample is extremely unbalanced, does not have enough big gesture training sample, and the picture in many training sets all has the problem of head gesture angle mislabel. Resulting in an inability to train to obtain robust network parameters.
According to the method, the head posture estimation problem is solved, most of the methods are only based on RGB images or head 3D point cloud data to return to the head posture angle, accuracy of regression angles of the two-dimensional image-based methods is difficult to improve, and the three-dimensional point cloud-based methods are often overlarge in calculated amount. There is therefore a need for a lightweight method of head pose estimation that combines two-dimensional and three-dimensional features.
Disclosure of Invention
Aiming at the improvement requirement of the prior art, the invention adopts binocular imaging equipment with a short wave infrared camera and a laser radar scanner, and provides a learning attention state evaluation method based on a multi-dimensional feature fusion network, which can monitor whether the learner has the action of distraction of head deflection angle in a desk or an off-screen area in real time in combination with the fatigue degree of the learner, and can prompt the learner to concentrate attention in time and generate a attention concentration report during learning so as to help the learner to develop good learning habit.
The technical scheme adopted for solving the technical problems is as follows: a learning attention state evaluation method based on a multidimensional feature fusion network comprises the following steps:
acquiring a learner video resource acquired by a binocular imaging device (a short wave infrared camera and a laser radar scanner) on an office table, and dividing the learner video resource into multiple frames of images; the hand wearable device acquires blood oxygen saturation and heart rate signals of a learner;
and (5) locating the face area and the facial feature points of the SWIR image of the learner. Segmenting the head region 3D point cloud set;
inputting the SWIR image of the face region and the head 3D point cloud set into a corresponding feature extraction network to obtain a feature topological graph, and inputting the feature topological graph into a Cauchy tag distribution regression module to obtain the head posture angle of the learner after the feature topological graph is fused by a self-attention weighting module. Extracting blood oxygen saturation and heart rate variation characteristics, and judging the fatigue level of a learner;
comprehensively evaluating the attention state according to the head posture angle and the facial feature point position change of the learner and the fatigue degree, and reminding the learner if the attention state is not concentrated;
and counting the attention state condition in the learning process, and feeding back a statistical analysis report.
According to the scheme, the facial area and facial feature point positioning module processes are as follows:
step 1.1.1: each frame of SWIR image of the interactive object is adjusted to 624×624 pixels, and is input into a lightweight mask-CNN network which is pre-trained on a face data set to obtain a face region (I) x ,I y ,m,n);
Step 1.2.1: the cut face region SWIR image is input into a global rough feature extraction network RG-Net, wherein the RG-Net network structure can be expressed as { conv1-res1-res2-res3-glDSC-fc }, wherein conv1 represents a convolution layer, res represents a residual connection layer, glDSC represents global channel separable convolution, fc represents a full connection layer, and the final global rough feature point coordinate vector P is regressed 0 ;
Step 1.2.2: taking an output characteristic diagram of res1 layer in RG-Net networkCut out at the correspondence to rough feature points (x j ,y j ) The characteristic diagram with p multiplied by q as the center is used for obtaining a first-level refinement characteristic diagram +.>Will F R1 Input local refinement network FL-Net extract feature vector +.>First-order refinement facial feature point coordinate vector +.>
Step 1.2.3: taking out output characteristic diagram of conv1 layer in RG-Net networkCut out at the correspondence to rough feature points (x j ,y j ) A characteristic map of p×q size as the center. Obtaining a second level refinement feature map->Will F R2 Input local refinement network FL-Net extract feature vector +.>Two-level refinement facial feature point coordinate vectorP 2 T Namely, sparse facial feature point coordinate vectors.
The head gesture two-dimensional feature extraction model comprises a channel separable convolution module, a pixel space transform module, a fusion feature topological graph construction module and a self-adaptive graph convolution module. The channel separable convolution module is used for extracting local features of the preprocessed SWIR face region image pixel space. The pixel space Transformer extracts a pixel space global feature relationship from the local feature map. And the self-adaptive graph rolling module updates the value of the graph vertex to obtain a head posture fusion characteristic topological graph with a new dimension.
According to the scheme, the process of the channel separable convolution module is as follows:
step 2.1.1: a group of cut SWIR face area images I with 328 multiplied by 328 pixel value swir ∈R N ×H×W×C Inputting the images into a separable convolution network of a double branch channel, and extracting local features of the images;
step 2.1.2: the I branch structure is { SC_MAX (16) -SC 1 (32) -sc_max (32) }, where SC 1 The modular structure is [ SC, BN, RL]SC represents a separable convolution of channels, and local features are extracted for each channel by point convolution. BN represents normalization processing of a batch of images input by the batch, batch normalization processing is performed on C channels respectively, and a calculation formula of batch normalization can be expressed as follows:
wherein Γ= { Γ 1 ,,,Γ N×H×W Each channel represents a corresponding set of elements or set of pixel values,represents the average value of the set of pixel values, i.e +.>ζ is an extremely small positive number, the standard deviation is avoided, namely the denominator is 0, and a, b are network training parameters, and the final standardized result is scaled and translated. The RL activation function replaces the negative element with zero, making the feature map element values easier to converge. SC_MAX is at SC 1 Local maximization patch_max is carried out on the basis to obtain a head posture local feature map I s1_1 ∈R N×H′×W′×C′ ;
Step 2.1.3: the II branch structure is { SC_AVE (16) -SC 2 (32) -SC_AVE (32) }, wherein SC 2 The modular structure is [ SC, BN, TH ]]The TH activation function normalizes the element value range to (-1, 1) so that the network is more prone to convergence. SC_AVE is at SC 2 Local averaging is carried out on the basis to obtain a head gesture local characteristic diagram I s2_1 ∈R N×H′×W′×C′ 。
According to the scheme, the pixel space transducer module training process is as follows:
step 2.2.1: map I of local features s1_1 ,I s2_1 Inputting a double-branch two-stage pixel space Transformer network to extract global features of a pixel space and generating a fusion feature map;
step 2.2.2: will I s1_1 The I branch is input, the first stage structure of the I branch is { SC_MAX (32) -transducer-Patch_Max }, and the second stage structure is { SC_MAX (32) -transducer }. The pixel space transducer layer is formed by cascading three pixel space transducer encoders, and the global characteristic relation of the pixel space is extracted;
step 2.2.3: the pixel space transform encoder separates the outputs of the convolutional layers SC
Go out feature map I sc ∈R N×H″×W″×C′ Stretching into three-dimensional embedded vector I emb ∈R N×A′×C′ Wherein a' =h "×w";
step 2.2.4: respectively giving the corresponding embedded vector of each imagei∈[1,N]The addition of position codes to each element, i.e., pixel point, can be expressed as the following expression,
wherein m is E [0, A' -1 ]],n∈[0,(C′-1)/2]. Embedding vector I emb Update to I emb +I P I is prepared emb +I P Inputting a multi-head self-attention mapping module;
step 2.2.5: the multi-head self-attention mapping module comprises 8-channel self-attention heads, each channel obtains a self-mapping weight matrix based on input, and the self-attention mapping of each channel is obtained through nonlinear transformation after the input is dot multiplied with the self-mapping weight matrixThe final output of the multi-head self-attention mapping module is +.>
Step 2.2.6: output I of multi-head self-attention mapping module A And obtaining the output of the pixel space transducer encoder through the residual error normalization layer and the full connection layer. Three pixel space transform encoders are cascaded to finally obtain a head posture fusion feature MAP MAP 1 ∈R N×H″′×W″′×C′ ;
Step 2.2.7: will I s2_1 Inputting a II branch which is similar to the I branch in structure, extracting different feature graphs based on local average, wherein the first stage structure is { SC_AVE (32) -transducer-Patch_Ave }, and the second stage structure is { SC_AVE (32) -transducer }, and finally obtaining a head posture fusion feature graph MAP 2 ∈R N×H″′×W″′×C′ 。
According to the above scheme, the fused feature topological graph construction module comprises a fused feature graph vertexMoment of connection with topologyAnd (3) constructing an array T. MAP will be 1 ,MAP 2 Element point multiplication of (2) to obtain a total fusion feature MAP MAP, mapping the MAP to a low-dimensional fusion feature vector +.>N represents the number of images in a batch, and a fusion characteristic topological graph is respectively constructed for each frame of image. Fused feature graph vertex->The value of (2) is the fusion feature vector of the single image>And the fusion characteristic topological graph and the 3D point cloud topological graph share a topological connection matrix T. Construction of a fused feature topology graph G 2 =(V M T), wherein
The head point cloud segmentation module processes as follows, according to two-dimensional coordinate information (I) provided by a face region detection frame x ,I y M, n), comparing the point cloud images of each frame with the point cloud set pic corresponding to the dense point cloud set, and screening [ I ] from the point cloud images x ,I y ]~[I x +m,I y +n]A point cloud set in the range is obtained to obtain a dense point cloud set pic of the head area 1 ={(x 1 ,y 1 ,z 1 ),,,,,(x n ,y n ,z n )}。
The head posture three-dimensional feature extraction model comprises a facial feature point 3D point cloud topological graph construction module and a self-adaptive graph rolling module. The facial feature point 3D point cloud topological graph construction module comprises a 3D point cloud graph vertexAnd constructing a topological connection matrix T. And the self-adaptive graph rolling module extracts weight relations among all vertex pairs of the topological graph, updates the values of the vertex pairs of the graph and obtains a head posture 3D point cloud topological graph with new dimensions.
According to the scheme, the facial feature point 3D point cloud topological graph construction module comprises the following steps:
step 3.1.1: according to the two-dimensional coordinate information P of the facial feature points 2 T Aggregating pic from point clouds 1 And selecting corresponding 3D point cloud coordinates. 3D point cloud vertexThe value of (a) is the face key point 3D point cloud coordinate value pic key =(x key_i ,y key_i ,z key_i ),i=1,,,25;
Step 3.1.2: searching each graph vertex based on KD-Tree algorithmConnecting 5 vertexes closest to each other in Euclidean space to construct a topological connection matrix T E R N×N N is the number of the feature points, and the value of T (i, j) is 1, which represents that the vertices of the graph are connected, otherwise, the value of T (i, j) is 0;
step 3.1.3: construction of a 3D Point cloud topology graph G 1 =(V D T), wherein
According to the scheme, the self-adaptive graph rolling module training process is as follows:
step 4.1.1: the network structure of the self-adaptive graph convolution module is a self-adaptive graph convolution layer, a batch normalization layer, a RL function activation layer, a 1-dimensional convolution layer, a batch normalization layer and a RL function activation layer, and graph vertex values v of a plurality of input characteristic topological graphs G are calculated i Updating to 192-dimensional eigenvalues
Step 4.1.2: the self-adaptive graph convolution layer selects each vertex V of the feature graph G n K vertexes whose neighborhood is nearest to the vertex pair are formed, for each vertex pairM channels are constructed, each channel independently calculates characteristic values, and the stageAnd connecting K vertex pairs to the characteristic value. Obtaining updated graph vertex ++through channel maximum pooling>Wherein K is taken as 6 and M is taken as 192.
The blood oxygen saturation-electrocardiosignal characteristic extraction module comprises the following steps:
step 5.1.1: for the blood oxygen saturation spo2, the mean square error of sampling values in one period is calculatedWhere N is the number of samplings, sp i For the i-th sample value, +.>Is the average value of the samples in one period;
step 5.1.2: for electrocardiosignals, calculating the standard deviation sigma of the occurrence intervals of adjacent R waves of two continuous heartbeat signals RR Ratio of high frequency to ultra low frequency energy spectral density in adjacent R-wave periods Representing the spacing between two adjacent peaks. Calculating the spectrum density theta of the ultra-low frequency and the high frequency according to the spectrum diagram in the interval HF ,Θ SLF The ratio is gamma;
step 5.1.3: comprehensive analysis of sigma sp ,σ RR Change of γ, Δσ sp <0.005,Δσ RR <10ms,Δγ<And 0.2 is judged to be of fatigue grade 1, consciousness is awake, and thinking is active. If delta sigma sp ∈[0.005,0.01),Δσ RR E [10ms,35 ms), Δγe [0.2, 0.8) is judged as fatigue level 2, and consciousness is blurred and thinking is relaxed. If delta sigma sp ≥0.01,Δσ RR If the total length of the fatigue grade is more than or equal to 35ms and the delta gamma is more than or equal to 0.8, the fatigue grade is 3, the consciousness is fuzzy, and the thinking cannot be concentrated.
The self-attention weighting module processes as follows:
step 6.1.1: the self-attention weighting module includes a self-attention layer, a fully connected layer, and a softmax regression layer. Updated three-dimensional point cloud topological graph obtained by the last moduleAnd two-dimensional fusion feature topology->Update after input from attention layer +.>
Step 6.1.2: full connection layer willMapping to a vector of dimension 1 XN', finally calculating +.>And->Is a weighting parameter alpha of (a) 1 ,α 2 Obtaining the final weighted fusion characteristic topological graph as
The Cauchy tag distribution regression module processes as follows:
step 7.1.1: weighting and fusing characteristic topological graph through full connection layerMapping into multidimensional feature vectors, regressing accurate head posture angles, and calculating MAE (mean average value) with real angles as Loss function Loss M ;
Step 7.1.2: for each training set image I i Converting actual angle labels into Cauchy label distributionAt the same time theThe module trains the network to generate three groups of parameters delta, eta and zeta to obtain the predicted Cauchy tag probability distribution (P) A (I i ;δ),P B (I i ;η),P C (I i ;ζ));
Step 7.1.4: calculating the spatial distance Loss between the probability distribution of the predicted Cauchy tag and the actual Cauchy tag θ And KL divergenceAs a Loss function Loss G And Loss function Loss M Weighting to obtain the final Loss function Loss total =Loss G +0.06Loss M 。
According to the scheme, the optimal network parameters are obtained by training on a training set in advance according to the loss function, the short wave infrared image of the learner and the 3d point cloud data are input into the pre-trained multi-dimensional feature fusion self-attention network, and then the real-time head posture angle Yaw, pitch and Roll of the learner can be obtained, whether the learner is located in the inattention zone or not is judged, meanwhile, the position condition of the facial feature points and the fatigue degree of the learner are combined, the attention concentration condition of the learner is comprehensively judged, and if the attention concentration condition of the learner is not concentrated, the learner is reminded.
Overall, compared with the prior art, the invention has the beneficial effects:
(1) According to the invention, the short wave infrared video image and the 3D point cloud data are respectively acquired, and the head posture information is acquired from multiple dimensions and is not influenced by illumination variation. The two-dimensional and three-dimensional information of the head posture is comprehensively considered, and a more accurate head posture angle can be obtained through regression.
(2) The multidimensional feature fusion self-attention network utilizes facial key point space information to construct a topological graph structure, and a head gesture topological graph is respectively constructed according to two-dimensional features and three-dimensional features, wherein the head gesture two-dimensional feature extraction combines convolution operation for extracting local information and pixel space transform for extracting global information, and the local-global two-dimensional fusion features with more comprehensive features are extracted. The cauchy label distribution regression module fully considers the similarity between adjacent head poses, thereby solving the problem that the training set does not have enough large-pose training samples.
(3) In order to assist the head posture angle to represent the attention concentration of the learner, the blood oxygen saturation and the electrocardiosignal are collected, and the corresponding parameter sigma is extracted through comprehensive analysis sp ,σ RR The change of gamma can qualitatively judge the fatigue degree of the learner.
Drawings
FIG. 1 is a flow chart of a learning attention state assessment method based on a multidimensional feature fusion network according to an embodiment of the invention
FIG. 2 is a schematic diagram of data acquisition in a home environment;
fig. 3 is a schematic diagram of a multi-dimensional feature fusion network according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, the embodiment of the invention is a learning attention state evaluation method based on a multidimensional feature fusion network, which comprises the following steps:
step 1: and acquiring a learner video resource acquired by a binocular imaging device (a short-wave infrared camera and a laser radar scanner) on the office table, and dividing the learner video resource into multiple frames of images according to time sequence. Simultaneously, the blood oxygen saturation and heart rate signals of the learner are acquired through the hand wearable equipment.
Step 2: and (3) locating the face region and the facial feature points of the SWIR image of the learner, and simultaneously dividing the head region 3D point cloud set.
Step 3: inputting the SWIR image of the face area and the head 3D point cloud set into a corresponding head gesture two-dimensional, three-dimensional feature extraction network to obtain a feature topological graph, and inputting the feature topological graph into a Cauchy tag distribution regression module to obtain the head gesture angle of a learner after the feature topological graph is fused by a self-attention weighting module. And simultaneously extracting the blood oxygen saturation and heart rate variation characteristics, and judging the fatigue level of the learner.
According to the scheme, the process of the blood oxygen saturation-electrocardiosignal characteristic extraction module is as follows: taking 5 minutes as a period, calculating the mean square error of sampling values of the blood oxygen saturation spo2 in the periodWhere N is the number of samplings, sp i For the i-th sample value, +.>Is the average of samples over one period. Simultaneously, all wave peaks of the electrocardiosignal time domain waveform in one period are detected by wavelet transformation, and the interval between two adjacent wave peaks is calculated>Obtaining standard deviation of adjacent R wave occurrence intervals of two continuous heartbeat signals> Converting the time domain signal in the period into frequency domain signal by using fast Fourier transform, analyzing the spectrogram, and calculating the spectral density theta of ultra-low frequency and high frequency HF ,Θ SLF Obtaining the second parameter +.>
Comprehensive analysis of sigma sp ,σ RR Change of γ, Δσ sp <0.005,Δσ RR <10ms,Δγ<And 0.2 is judged to be of fatigue grade 1, consciousness is awake, and thinking is active. If delta sigma sp ∈[0.005,0.01),Δσ RR E [10ms,35 ms), Δγe [0.2, 0.8) is judged as fatigue level 2, and consciousness is blurred and thinking is relaxed. If delta sigma sp ≥0.01,Δσ RR If the total length of the fatigue grade is more than or equal to 35ms and the delta gamma is more than or equal to 0.8, the fatigue grade is 3, the consciousness is fuzzy, and the thinking cannot be concentrated.
As shown in fig. 2, the learner is learning at home, in which a short wave infrared camera and a lidar scanner are used to capture a video sequence of the learner's face. The SWIR image and the 3D point cloud image of the multi-frame learner acquired in the scene provide important data sources for the head posture estimation module.
As shown in fig. 3, in this embodiment, the multi-dimensional feature fusion self-attention network includes a face region and facial feature point positioning module, a head point cloud segmentation module, a head pose two-dimensional feature extraction module, a head pose three-dimensional feature extraction module, a self-attention weighting module, and a cauchy tag distribution regression module.
According to the scheme, the facial area and facial feature point positioning module processes are as follows:
step 3.1.1: adjusting each frame of SWIR image of the interactive object to 624×624 pixels, inputting into a lightweight Mask R-CNN network trained on the face data set to obtain face region (I) x ,I y ,m,n);
Step 3.1.2: according to the face region (I) x ,I y M, n) cutting out each frame of SWIR image of the interactive object, inputting a sparse facial feature point extraction network, wherein the network consists of a global rough feature point extraction network RG-Net and a cascade local refinement network FL-Net;
step 3.1.3: the cut face region SWIR image is input into RG-Net, the RG-Net network structure can be expressed as { conv1-res1-res2-res3-glDSC-fc }, wherein conv1 represents a convolution layer, res represents a residual connection layer, glDSC represents global channel separable convolution, fc represents a full connection layer, and the final global rough feature point coordinate vector P is regressed 0 ;
Step 3.1.4: taking an output characteristic diagram of res1 layer in RG-Net networkCut out at the correspondence to rough feature points (x j ,y j ) The characteristic diagram with p multiplied by q as the center is used for obtaining a first-level refinement characteristic diagram +.>Will F R1 Inputting local refinement network FL-Net, firstly reducing the dimension of the multichannel feature map to two-dimensional vector by convolution, then carrying out normalization and relu activation function nonlinear transformation, and finally returning the first-order feature vector by full connection layer->First-order refinement facial feature point coordinate vector +.>
Step 3.1.5: taking out output characteristic diagram of conv1 layer in RG-Net networkCut out at the correspondence to rough feature points (x j ,y j ) Obtaining a second level of refinement of the feature map for the centered p×q feature map>Inputting local refinement network FL-Net to obtain secondary eigenvector +.>Second level refined facial feature point coordinate vector +.>P 2 T The final sparse facial feature point coordinate vector is obtained. The extraction procedure can be expressed as follows:
P l =P l-1 +PL l (ψ(RG(I) l ,P l-1 ))#(3)
wherein P is 0 Extracting the output of network RG-Net for global rough feature point, l represents layer number, FL l Representing local refinement network FL-Net cascade l times, RG (I) l Output feature map representing global rough feature point extraction network RG-Net first layer, and psi ()' representing feature multiplexing, namely constructing a rough feature point (x j ,y j ) A characteristic map of p×q size as the center.
The head pose two-dimensional feature extraction model comprises a channelThe system comprises a separable convolution module, a pixel space transform module, a fusion characteristic topological graph construction module and an adaptive graph convolution module. The channel separable convolution module converts the SWIR image into a multi-channel local feature map for extracting the local features of the preprocessed SWIR face region image pixel space. And the pixel space Transformer extracts a pixel space global characteristic relation from the multi-channel local characteristic map to generate a pixel space fusion characteristic map. The fusion characteristic topological graph construction module is used for constructing a fusion characteristic graph vertexAnd constructing a topological connection matrix T. And the self-adaptive graph convolution module extracts weight relations among all vertex pairs of the topological graph, updates the values of the vertex pairs of the graph and obtains a head posture fusion characteristic topological graph with new dimensions.
According to the scheme, the process of the channel separable convolution module is as follows:
step 3.2.1: window of the face area which is positioned Adjust to->Simultaneously, the size of a face area window is adjusted to 328 multiplied by 328, and a group of cut SWIR face area images I are obtained swir ∈R N×H×W×C The local characteristics of the images are extracted by inputting the images into a double-branch channel separable convolution network;
step 3.2.2: i swir Input branch I, the branch structure is { SC_MAX (16), SC 1 (32) SC_MAX (32) }, where SC 1 The modular structure is [ SC, BN, RL]SC represents the separable convolution of channels, local features are extracted from the point-by-point convolution of each channel, BN represents the normalization processing of a batch of images input by the batch, and the calculation formula of batch normalization is that Γ={Γ 1 ,,,Γ N×H×W And represents a corresponding set of elements, i.e., pixel values, for each channel. The RL activation function replaces the negative element with zero, making the feature map element values easier to converge. SC_MAX is at SC 1 Based on the local maximization Patch_Max, obtaining a head posture local feature map I s1_1 ∈R N×H′×W′×C′ ;
Step 3.2.3: i swir Input branch II, the branch structure is { SC_AVE (16), SC 2 (32) SC_AVE (32) }, where SC 2 The modular structure is [ SC, BN, TH ]]The TH activation function normalizes the element value range to (-1, 1) so that the network is more prone to convergence. SC_AVE is at SC 2 On the basis of local average Patch_Ave, the head posture local feature map I is obtained s2_1 ∈R N×H′×W′×C′ 。
According to the scheme, the pixel space transducer module training process is as follows:
step 3.3.1: map I of local features s1_1 ,I s2_1 Inputting a double-branch two-stage pixel space Transformer network to extract global features of a pixel space and generating a fusion feature map;
step 3.3.2: will I s1_1 The I branch is input, the first stage structure of the I branch is { SC_MAX (32) -transducer-Patch_Max }, and the second stage structure is { SC_MAX (32) -transducer }. The pixel space transducer layer is formed by cascading three pixel space transducer encoders, and the global characteristic relation of the pixel space is extracted;
step 3.3.3: the pixel space transform encoder outputs a characteristic map I of the separable convolutional layer SC sc ∈R N ×H″×W″×C′ Stretching into three-dimensional embedded vector I emb ∈R N×A′×C′ Wherein a' =h "×w";
step 3.3.4: respectively giving the corresponding embedded vector of each imagei∈[1,N]Each element of (i.e. pixel point) adds position codeCode->Wherein m is E [0, A' -1 ]],n∈[0,(C′-1)/2]. Embedding vector I emb Update to I emb +I P . Will I emb +I P Inputting a multi-head self-attention mapping module;
step 3.3.5: the multi-head self-attention mapping module comprises an 8-channel self-attention head, and self-attention mapping of each channelThe computational expression is as follows:
wherein each channel's self-mapped weight matrixBased on the input, the access vector R v And key value vector P v From input I emb +I P And->And obtaining the product after dot multiplication. The final output of the multi-head self-attention mapping module is +.>
Step 3.3.6: output I of multi-head self-attention mapping module A Through the residual normalization layer and the full connection layer, the output of the pixel space transducer encoder is obtained through normalization, and the calculation flow can be summarized as follows:
I TF =Norm(f(max(0,Norm(I emb +I A )))+Norm(I emb +I A ))#(5)
norm normalizes a 'x C' pixels per layer to a standard normal distribution, f (°) representing a linear transformation. Three pixel space transform encoders are cascaded to finally obtain a head posture fusion feature MAP MAP 1 ∈R N×H″′×W″′×C′ ;
Step 3.3.7: will I s2_1 Inputting a II branch which is similar to the I branch in structure, extracting different feature graphs based on local average, wherein the first stage structure is { SC_AVE (32) -transducer-Patch_Ave }, and the second stage structure is { SC_AVE (32) -transducer }, and finally obtaining a head posture fusion feature graph MAP 2 ∈R N×H″′×W″′×C′ 。
According to the above scheme, the fused feature topological graph construction module comprises a fused feature graph vertexAnd constructing a topological connection matrix T. MAP will be 1 ,MAP 2 Element point multiplication of (2) to obtain a total fusion feature MAP MAP, mapping the MAP to a low-dimensional fusion feature vector +.>N represents the number of images in a batch, and a fusion characteristic topological graph is respectively constructed for each frame of image. Fused feature graph vertex->The value of (2) is the fusion feature vector of the single image>And the fusion characteristic topological graph and the 3D point cloud topological graph share a topological connection matrix T. Construction of a fused feature topology graph G 2 =(V M T), wherein
The head point cloud segmentation module processes as follows, according to two-dimensional coordinate information (I) provided by a face region detection frame x ,I y M, n), comparing the point cloud images of each frame with the point cloud set pic corresponding to the dense point cloud set, and screening [ I ] from the point cloud images x ,I y ]~[I x +m,I y +n]A point cloud set in the range is obtained to obtain a dense point cloud set pic of the head area 1 ={(x 1 ,y 1 ,z 1 ),,,,,(x n ,y n ,z n )}。
The head posture three-dimensional feature extraction model comprises a facial feature point 3D point cloud topological graph construction module and a self-adaptive graph rolling module. The facial feature point 3D point cloud topological graph construction module comprises a 3D point cloud graph vertexAnd constructing a topological connection matrix T. And the self-adaptive graph rolling module extracts weight relations among all vertex pairs of the topological graph, updates the values of the vertex pairs of the graph and obtains a head posture 3D point cloud topological graph with new dimensions.
According to the scheme, the facial feature point 3D point cloud topological graph construction module comprises the following steps:
step 3.4.1:3D point cloud topological graph construction module and 3D point cloud graph vertexAnd constructing a topological connection matrix T. According to the two-dimensional coordinate information P of the facial feature points 2 T Aggregating pic from point clouds 1 And selecting corresponding 3D point cloud coordinates. 3D Point cloud Point->The value of (a) is the face key point 3D point cloud coordinate value pic key =(x key_i ,y key_i ,z key_i ),i=1,,,25;
Step 3.4.2: searching each graph vertex based on KD-Tree algorithmConnecting 5 vertexes closest to each other in Euclidean space to construct a topological connection matrix T E R N×N N is the number of the feature points, and the value of T (i, j) is 1, which represents that the vertices of the graph are connected, otherwise, the value of T (i, j) is 0;
step 3.1.3: construction of a 3D Point cloud topology graph G 1 =(V D T), wherein
According to the scheme, the self-adaptive graph rolling module training process is as follows:
step 3.5.1: the network structure of the self-adaptive graph convolution module is a self-adaptive graph convolution layer, a batch normalization layer, a RL function activation layer, a 1-dimensional convolution layer, a batch normalization layer and a RL function activation layer, and the self-adaptive graph convolution layer extracts the weight relation between each vertex pair of the topological graph and correspondingly updates the value of the vertex of the graph. The relation among the sequences is further extracted by the 1-dimensional convolution layer, the network is easier to converge due to batch normalization operation and RL activation function, and finally, the graph vertex values v of the characteristic topological graphs G of a plurality of inputs are obtained i Updating to 192-dimensional eigenvalues
Step 3.5.2: the self-adaptive graph convolution layer selects each vertex V of the feature graph G n K vertexes whose neighborhood is nearest toForm vertex pairs, for each vertex pair +.>M channels are constructed, and characteristic values are independently calculated for each channel, and the calculation process is summarized as follows:
wherein [ A, B ] represents A, B vector cascade, "-represents dot product," -represents MLP layer, "-represents RL (") represents nonlinear activation of RL function, and negative element is converted to 0;
step 3.5.3: updating the eigenvalue of each vertex to M-dimensional eigenvectorConcatenating K vertex pair eigenvaluesObtaining updated graph vertex ++through channel maximum pooling>Wherein K is taken as 6 and M is taken as 192.
According to the scheme, the self-attention weighting module comprises the following steps:
step 3.6.1: the self-attention weighting module includes a self-attention layer, a fully connected layer, and a softmax regression layer. Three-dimensional point cloud topological graph updated by last moduleAnd two-dimensional fusion feature topology->Input self-attention layer and update toThe calculation process is summarized as follows:
step 3.6.2: the full connection layer calculation process can be expressed as follows, where f (-) represents a linear transformation, which will beVector mapped to 1 XN' dimension +.>
Finally calculate by using softmax functionAnd->Is a weighting parameter alpha of (a) 1 ,α 2 The calculation process can be expressed as:
obtaining the final weighted fusion characteristic topological graph as
According to the scheme, the cauchy tag distribution regression module comprises the following steps:
step 3.6.3: weighting and fusing characteristic topological graph through full connection layerMapping into multidimensional feature vectors, regressing accurate head posture angles, and calculating MAE (mean average value) with real angles as Loss function Loss M 。
Step 3.6.4: considering that the head pose similarity along the three directions Yaw, pitch, roll is different for the same head pose angle variation, the deflection angles of the three directions { -90 °,,0,, 90 ° } are divided into 46, 100, 62 segments, respectively, i.e. the angles are encoded as corresponding tag sets a = { a 1 ,,,A 45 },B={B 1 ,,,B 99 },C={C 1 ,,,C 61 }。
Step 3.6.5: for each training set image I i Converting actual angle labels into Cauchy label distributionWherein->Element value->Can be expressed as
Wherein i represents the i-th tag, t y Representing the code value corresponding to the real yaw angle, and the standard deviation delta of the label 1 Set to 4.Element value->Can be expressed as
Wherein j represents the j-th tag, t p Representing the corresponding coding value of the real pitch angle and the standard deviation delta of the label 2 Set to 10.Element value->Can be expressed as
Wherein k represents the kth tag, t r Representing the corresponding coded value of the real rolling angle and the standard deviation delta of the label 3 Set to 6.
Meanwhile, the module trains the network to generate three groups of parameters delta, eta and zeta which respectively correspond to three groups of tag sets A, B and C. Obtaining a predictive cauchy tag probability distribution (P A (I i ;δ),P B (I i ;η),P C (I i ;ζ))。
Calculating the predicted Cauchy tag probability distribution and the actual Ke XibiaoSpatial distance of signature distribution Loss θ And KL divergenceAs a Loss function Loss G And Loss function Loss M Weighting to obtain the final Loss function Loss total =Loss G +0.06Loss M 。
Step 3.6.6: calculating the spatial distance Loss between the probability distribution of the predicted Cauchy tag and the actual Cauchy tag θ And KL divergence
Final loss functionAccording to the scheme, the optimal network parameters are obtained by training on a training set in advance according to the loss function, short-wave infrared images of learners and 3d point cloud data are input into a pre-trained multi-dimensional feature fusion self-attention network, and a final head attitude angle (Yaw, pitch, roll) can be obtained.
Step 4: and judging the fatigue level according to the head attitude angle and the facial feature point positions of the learner at different moments and combining the blood oxygen saturation and electrocardiosignal change conditions, and comprehensively evaluating the attention state of the learner. And judging whether the learning machine is located in a non-concentration zone, if so, the learner is not concentrated at the moment, otherwise, the learner is concentrated.
Table 1 learner attentional state comprehensive evaluation rules
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (10)
1. The learning attention state evaluation method based on the multidimensional feature fusion network is characterized by comprising the following steps of:
acquiring a learner video resource acquired by a binocular imaging device (a short wave infrared camera and a laser radar scanner) on an office table, and dividing the learner video resource into multiple frames of images; the hand wearable device acquires blood oxygen saturation and heart rate signals of a learner;
and (5) locating the face area and the facial feature points of the SWIR image of the learner. Segmenting the head region 3D point cloud set;
inputting the SWIR image of the face region and the head 3D point cloud set into a corresponding feature extraction network to obtain a feature topological graph, and inputting the feature topological graph into a Cauchy tag distribution regression module to obtain the head posture angle of the learner after the feature topological graph is fused by a self-attention weighting module. Extracting blood oxygen saturation and heart rate variation characteristics, and judging the fatigue level of a learner;
comprehensively evaluating the attention state according to the head posture angle and the facial feature point position change of the learner and the fatigue degree, and reminding the learner if the attention state is not concentrated;
and counting the attention state condition in the learning process, and feeding back a statistical analysis report.
2. The learning attention state evaluation method based on the multidimensional feature fusion network as claimed in claim 1, wherein the face area and facial feature point positioning module processes are as follows:
step 1.1.1: adjusting each frame of SWIR image of the interactive object to 624×624 pixels, inputting into a pre-trained lightweight Mask R-CNN network to obtain face region (I) x ,I y ,m,n);
Step 1.2.1: inputting the cut face region SWIR image into a global rough feature point extraction network RG-Net with the structure of { conv1-res1-res2-res3-glDSC-fc }, wherein res represents a residual connecting layer, glDSC represents global channel separable convolution, fc represents a full connecting layer, and returning to a final global rough feature point coordinate vector P 0 ;
Step 1.2.2: taking res1 layer output characteristic diagram of RG-NetCut out to rough feature points (x j ,y j ) The characteristic diagram with p multiplied by q as the center is input into a local refinement network FL-Net to extract characteristic vectors +.>The coordinate vector of the facial feature point of the first level refinement is +.>
Step 1.2.3: taking out the conv1 layer output characteristic diagram of RG-NetCut out to rough feature points (x j ,y j ) The characteristic diagram with p multiplied by q as the center is input into a local refinement network FL-Net to extract characteristic vectors +.> Namely, sparse facial feature point coordinate vectors.
3. The learning attention state evaluation method based on the multidimensional feature fusion network according to claim 1, wherein the head gesture two-dimensional feature extraction model comprises a channel separable convolution module, a pixel space transform module, a fusion feature topological graph construction module and an adaptive graph convolution module.
4. A method for learning attention state assessment based on a multi-dimensional feature fusion network as claimed in claim 3, wherein said channel separable convolution module training procedure is as follows:
step 2.1.1: SWIR face region image I swir ∈R N×H×W×C Inputting the two-dimensional images into a double-branch channel separable convolution network to extract local features of the two-dimensional images;
step 2.1.2: the I branch structure is { SC_MAX (16) -SC 1 (32) -sc_max (32) }, where SC 1 The modular structure is [ SC, BN, RL]SC represents the separable convolution of channels, local features are extracted by point-to-point convolution of each channel, BN represents normalization processing of the batch of input images, and RL activation function replaces negative elements with zeros. SC_MAX is at SC 1 Local maximization is carried out on the basis to obtain a head posture local feature map I s1_1 ∈R N×H′×W′×C′ ;
Step 2.1.3: the II branch structure is { SC_AVE (16) -SC 2 (32) -SC_AVE (32) }, wherein SC 2 The modular structure is [ SC, BN, TH ]]The TH activation function normalizes the element to (-1, 1), the SC_AVE is at SC 2 Local averaging is carried out on the basis, and finally the head gesture local feature map I is obtained s2_1 ∈R N×H′×W′×C′ 。
5. The method for estimating a learning attention state based on a multidimensional feature fusion network as recited in claim 3, wherein the pixel space transducer module training process is as follows:
step 2.2.1: will I s1_1 The I branch is input, and the first stage of the I branch is { SC_MAX (32) -transducer-Patch_Max }. The second stage is { SC_MAX (32) -transducer }. The pixel space transducer layer is formed by cascading three pixel space transducer encoders, and the global characteristic relation of the pixel space is extracted;
step 2.2.2: the pixel space transform encoder outputs the characteristic diagram I of the SC layer sc ∈R N×H″×W″×C′ Stretching into three-dimensional vector I emb And adds position codesWill I emb +I P Input multi-head self-attention mapping module and final output The method is obtained by carrying out dot multiplication and nonlinear transformation on an input and self-mapping weight matrix. I A And obtaining the output of the transducer encoder through the residual normalization layer and the full connection layer. Three pixel space transform encoders are cascaded to finally obtain a head posture fusion feature MAP MAP 1 ∈R N×H″′×W″′×C′ ;
Step 2.2.3: will I s2_1 Inputting a II branch, wherein the II branch is similar to the I branch in structure, extracting different feature MAPs based on local average Patch_Ave, and finally obtaining a head posture fusion feature MAP MAP 2 ∈RN ×H″′×W″′×C′ 。
6. The method for learning attention state assessment based on multi-dimensional feature fusion network of claim 4, wherein the fused feature topology construction module comprises a fused feature graph vertexAnd constructing a topological connection matrix T. MAP will be 1 ,MAP 2 Element point multiplication and mapping to low-dimensional fusion feature vector through full connection layer>Head posture fusion feature map vertex->The values of the three-dimensional point cloud topological graph are the fusion feature vector M of a single image, and the head gesture fusion feature topological graph and the 3D point cloud topological graph share a topological connection matrix T.
7. The learning attention state evaluation method based on the multidimensional feature fusion network as claimed in claim 1, wherein the head posture three-dimensional feature extraction model comprises a facial feature point 3D point cloud topological graph construction module and an adaptive graph rolling module.
8. The learning attention state evaluation method based on the multidimensional feature fusion network as claimed in claim 7, wherein the facial feature point 3D point cloud topological graph construction module process is as follows:
step 3.1.1:3D point cloud topological graph construction module and 3D point cloud graph vertexAnd constructing a topological connection matrix T. According to the two-dimensional coordinate information P of the facial feature points 2 T Aggregating pic from point clouds 1 And selecting corresponding 3D point cloud coordinates. 3D point cloud vertexThe value of (1) is the 3D point cloud coordinate value of the key point of the face;
step 3.1.2: searching each graph vertex based on KD-Tree algorithmDistance in Euclidean spaceThe nearest 5 vertexes are connected to construct a topological connection matrix T epsilon R N×N N is the number of feature points, and a value of T (i, j) is 1, which represents that the vertices of the graph are connected, and otherwise, the value of T (i, j) is 0.
9. The learning attention state evaluation method based on the multidimensional feature fusion network as recited in claim 1, wherein the adaptive graph rolling module process is as follows:
step 4.1.1: selecting each vertex V of the feature map G n K vertexes whose neighborhood is nearest toForming vertex pairs, and updating characteristic values;
step 4.1.2: for each vertex pairM channels are constructed, and each channel independently calculates characteristic values [A,B]Representing the A, B vector cascade, +.representative dot product, (-) representative MLP layer, RL (-) representative RL function nonlinear activation, converting the negative element to 0. The characteristic values of all vertex pairs are cascaded and are subjected to channel maximum pooling to obtain updated M-dimensional graph vertices +.>
10. The learning attention state evaluation method based on a multidimensional feature fusion network as recited in claim 1, wherein the blood oxygen saturation-electrocardiosignal feature extraction module includes calculating a mean square error sigma of blood oxygen saturation spo2 sampling points in one period sp Standard deviation sigma of R wave occurrence interval of two adjacent heartbeat signals RR Ratio of high frequency to ultra low frequency energy spectral density in adjacent R-wave periods
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211662783.0A CN117173758A (en) | 2022-12-23 | 2022-12-23 | Learning attention state assessment method based on multidimensional feature fusion network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211662783.0A CN117173758A (en) | 2022-12-23 | 2022-12-23 | Learning attention state assessment method based on multidimensional feature fusion network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117173758A true CN117173758A (en) | 2023-12-05 |
Family
ID=88935740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211662783.0A Pending CN117173758A (en) | 2022-12-23 | 2022-12-23 | Learning attention state assessment method based on multidimensional feature fusion network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117173758A (en) |
-
2022
- 2022-12-23 CN CN202211662783.0A patent/CN117173758A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Two-stream 3d convolutional neural network for skeleton-based action recognition | |
CN110348330B (en) | Face pose virtual view generation method based on VAE-ACGAN | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN109815826B (en) | Method and device for generating face attribute model | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
CN112418095A (en) | Facial expression recognition method and system combined with attention mechanism | |
CN112418041B (en) | Multi-pose face recognition method based on face orthogonalization | |
CN107169117B (en) | Hand-drawn human motion retrieval method based on automatic encoder and DTW | |
KR20210025020A (en) | Face image recognition using pseudo images | |
Sharma et al. | Vision-based sign language recognition system: A Comprehensive Review | |
CN111428689B (en) | Face image feature extraction method based on multi-pool information fusion | |
CN111914643A (en) | Human body action recognition method based on skeleton key point detection | |
CN106096517A (en) | A kind of face identification method based on low-rank matrix Yu eigenface | |
CN113095149A (en) | Full-head texture network structure based on single face image and generation method | |
CN111028319A (en) | Three-dimensional non-photorealistic expression generation method based on facial motion unit | |
CN112836680A (en) | Visual sense-based facial expression recognition method | |
CN115661246A (en) | Attitude estimation method based on self-supervision learning | |
CN116645717A (en) | Microexpressive recognition method and system based on PCANet+ and LSTM | |
CN117935339A (en) | Micro-expression recognition method based on multi-modal fusion | |
CN115908896A (en) | Image identification system based on impulse neural network with self-attention mechanism | |
CN113111797B (en) | Cross-view gait recognition method combining self-encoder and view transformation model | |
CN112418399B (en) | Method and device for training gesture estimation model and method and device for gesture estimation | |
Zhao et al. | Research on human behavior recognition in video based on 3DCCA | |
CN116311472A (en) | Micro-expression recognition method and device based on multi-level graph convolution network | |
CN117173758A (en) | Learning attention state assessment method based on multidimensional feature fusion network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |