CN117152815A - Student activity accompanying data analysis method, device and equipment - Google Patents

Student activity accompanying data analysis method, device and equipment Download PDF

Info

Publication number
CN117152815A
CN117152815A CN202311119659.4A CN202311119659A CN117152815A CN 117152815 A CN117152815 A CN 117152815A CN 202311119659 A CN202311119659 A CN 202311119659A CN 117152815 A CN117152815 A CN 117152815A
Authority
CN
China
Prior art keywords
facial feature
face
facial
self
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311119659.4A
Other languages
Chinese (zh)
Inventor
徐建
张昭理
刘海
吴晨
吴砥
郭惠敏
代书铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202311119659.4A priority Critical patent/CN117152815A/en
Publication of CN117152815A publication Critical patent/CN117152815A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a student activity accompanying data analysis method, a device and equipment, wherein the method comprises the following steps: acquiring accompanying video data of an object to be analyzed in a learning activity process, and carrying out face detection to generate a face image; carrying out facial feature point sampling on the face image by using a self-attention feature point sampling model based on the local face to obtain a set of multiple groups of facial feature points; inputting the set of facial feature points into a self-attention feature extraction model based on residual prompt to extract facial features; comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof; and carrying out visual analysis based on the companion video data and the face feature data. The face recognition method and the face recognition system can improve accuracy of face recognition, and carry out visual analysis on the accompanying face data based on the recognition result, so that the requirement of accurately guiding students to learn is met.

Description

Student activity accompanying data analysis method, device and equipment
Technical Field
The application relates to the technical field of face data acquisition and recognition, in particular to a student activity accompanying data analysis method, device and equipment.
Background
The development of artificial intelligence technology and the arrival of big data age boost the way of student personalized study, make things convenient for the school to develop specialized, individualized teaching to the student, abandon the teaching mode of "assembly line" in the past, through data enabling, the school can rely on data management platform to gather student study activity companion formula data, develop dynamic analysis on this basis, teacher, head of a family can utilize dynamic analysis accurate guide student study, better service student growth. However, due to the complexity and uniqueness of learning of the student learning activity, the digital track or the description of the digital portrait of the existing student learning activity is only remained on the surface, and the statistics or the presentation of the side surface in the student learning activity is mainly performed by means of some static and periodic data, so that accompanying data acquisition of the student learning activity is urgently needed to realize uninterrupted data coverage and acquisition of all elements.
At present, the acquisition and analysis of the learning concentration of students are mainly concentrated in schools, and the acquisition and analysis of the concentration of students in the learning activities of families are ignored. In addition, the premise of using the learning data of students to carry out visual dynamic analysis is that accurate face detection and face recognition are required to be carried out based on the data, but the detection accuracy of face images with the conditions of blurring, shielding and the like in the existing face recognition technology is not high, and the whole calculation complexity is high.
Disclosure of Invention
Aiming at least one defect or improvement requirement of the prior art, the application provides a student activity accompanying type data analysis method, device and equipment, which aim to improve the accuracy of face recognition and carry out visual analysis on accompanying type face data based on recognition results so as to realize the requirement of accurately guiding students to learn.
To achieve the above object, according to a first aspect of the present application, there is provided a student activity syndrome data analysis method comprising the steps of:
s1, acquiring accompanying video data of an object to be analyzed in a learning activity process, and carrying out face detection to generate a face image;
s2, carrying out facial feature point sampling on the face image by using a self-attention feature point sampling model based on the local face to obtain a set of multiple groups of facial feature points; comprising the following steps:
extracting image features from the face image to generate a feature map;
cutting out a local image block with a preset size from the feature map by taking the predicted facial feature points as the center;
extracting local coordinates of each facial feature point in the corresponding image block from the local image block, and generating a set with facial feature points by taking the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image block as final predicted coordinates;
s3, inputting the set of facial feature points into a self-attention feature extraction model based on residual prompt to extract facial features;
s4, comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof;
and S5, carrying out visual analysis based on the companion video data and the face feature data.
Further, in the student activity accompanying data analysis method, the extracting local coordinates of each facial feature point in the corresponding image block from the local image block includes:
mapping the local image blocks into vectors, and coding the distance between the facial feature points in each local image block to obtain the relative position relation of the facial feature points in the local image blocks;
modeling the relative position relation among the facial feature points in each local image block according to the coding result, and outputting the local coordinates of each facial feature point relative to the local image block.
Further, in the student activity syndrome data analysis method, the step of inputting the set of facial feature points into a self-attention feature extraction model based on residual prompt to extract facial features includes:
dividing image blocks on the feature map by taking the coordinates of each facial feature point as the center, mapping each image block to a two-dimensional vector, and adding classification marks and position codes;
superposing the image blocks subjected to position coding to obtain an aggregate vector;
inputting the aggregate vector into a self-attention feature extraction model to perform feature extraction, wherein the extraction process is expressed as follows:
wherein, a prompt sign S is introduced 0 ∈R M×d The prompt sign S 0 Transmitting the image blocks and the classification marks together through an encoder, wherein each image block interacts with other image blocks through a self-attention layer at each layer; split means splitting a propagation sequence of length n+l+1 into n+1 image blocks and L hints; s is S J To prompt vector, pair S J The final representation is obtained after the average pooling treatmentI 0 J The output value of the classification mark after passing through a transducer encoder; x represents the final output of the whole model, using the loss function +.>Performing face recognition supervision training on the output x, wherein N is the number of samples; d is the edge distance of the glass fiber reinforced plastic plate,of the middle->Is the weighting matrix of the last linear layer, W i Normalized column i, y representing weight matrix j Is the corresponding true value, among them +>b j Is the j-th sample.
Further, in the student activity syndrome data analysis method, in the feature extraction process, the first layer MSA calculation may be expressed as:
wherein,as a learnable parameter, each self-attention s h Attention self-Attention calculation is performed using matrix +.>Fusing each self-attention header s h Obtaining R;
residual errors are hinted in the computation within the MSA block of middle layer lAdding to the propagation cues, the calculation of the residual cues can be expressed as:
wherein,zero matrix in b× (n+1) dimension,>is a residual indicator; only residual cues are added to the propagated cue locations, not to the locations of image blocks or classification marks.
According to a second aspect of the present application, there is also provided a student activity accompanying data analysis device, comprising:
the video collector is used for collecting accompanying video data of the object to be analyzed in the learning activity process;
a face detector for performing face detection on the companion video data to generate a face image;
the face recognition module comprises a self-attention feature point sampling model based on a local face and a self-attention feature extraction model based on residual prompting;
wherein the self-attention feature point sampling model is configured to:
extracting image features from the face image to generate a feature map;
cutting out a local image block with a preset size from the feature map by taking the predicted facial feature points as the center;
extracting local coordinates of each facial feature point in the corresponding image block from the local image block, and generating a set with facial feature points by taking the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image block as final predicted coordinates;
the self-attention feature extraction model is configured to perform feature extraction on the set of facial feature points to generate facial features;
the comparison module is used for comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof;
and the analysis module is used for carrying out visual analysis based on the accompanying video data and the face characteristic data.
Further, in the student activity accompanying data analysis device, the self-attention feature point sampling model includes:
the backbone network is used for extracting image features from the face image and generating a feature map;
a linear embedding layer, which is used for cutting out a local image block with a preset size from the feature map by taking the facial feature point as the center and mapping the local image block into a vector;
a distance coding layer for coding the distance between the facial feature points in each local image block to obtain the relative position relationship of the facial feature points in the local image block;
a relationship modeling layer for modeling the relative position relationship between the facial feature points in each local image block and outputting the local coordinates of each facial feature point relative to the local image block
And the coordinate prediction layer outputs the coordinates of the finally predicted facial feature points, wherein the coordinates are the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image blocks.
Further, in the student activity companion data analysis device described above, the relationship modeling layer includes an MSA block, an MMA block, and an MLP block, and a normalization layer is provided before each block.
Further, in the student activity-accompanied data analysis device, the self-attention feature extraction model includes:
a preprocessing layer for dividing the image block on the feature map with the coordinates of each facial feature point as the center;
the position embedding layer is used for mapping each image block to a two-dimensional vector, adding a classification mark and position codes, and superposing the image blocks subjected to the position codes to obtain an aggregate vector;
the feature extraction layer is used for extracting facial features according to the aggregate vectors, and the extraction process is expressed as follows:
wherein, a prompt sign S is introduced 0 ∈R M×d The prompt sign S 0 Transmitting the image blocks and the classification marks together through an encoder, wherein each image block interacts with other image blocks through a self-attention layer at each layer; split means splitting a propagation sequence of length n+l+1 into n+1 image blocks and L hints; s is S J To prompt vector, pair S J The final representation is obtained after the average pooling treatmentI 0 J The output value of the classification mark after passing through a transducer encoder; x represents the final output of the whole model, using the loss function +.>Performing face recognition supervision training on the output x, wherein N is the number of samples; d is the edge distance of the glass fiber reinforced plastic plate,of the middle->Is the weighting matrix of the last linear layer, W i Normalized column i, y representing weight matrix j Is the corresponding true value, among them +>b j Is the j-th sample.
Further, in the student activity accompanying data analysis device, in the feature extraction layer, the first layer MSA calculation may be expressed as:
wherein,as a learnable parameter, each self-attention s h Attention self-Attention calculation is performed using matrix +.>Fusing each self-attention header s h Obtaining R;
residual errors are hinted in the computation within the MSA block of middle layer lAdding to the propagation cues, the calculation of the residual cues can be expressed as:
wherein,zero matrix in b× (n+1) dimension,>is a residual indicator; only residual cues are added to the propagated cue locations, not to the locations of image blocks or classification marks.
According to a third aspect of the present application there is also provided a student activity companion data analysis device comprising at least one processing unit and at least one storage unit, wherein the storage unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform the steps of any one of the methods described above.
In general, the above technical solutions conceived by the present application, compared with the prior art, enable the following beneficial effects to be obtained:
(1) The face recognition model used by the application utilizes the inherent association between the facial feature points based on the local face self-attention mechanism to learn, improves the robustness of face alignment under the conditions of blurring and heavy shielding, and reduces the computational complexity; the features of face images with facial feature points are learned using a self-attention mechanism based on residual cues, with residual markers added to the propagated cues to facilitate layered modulation in various layered calculations of the transform encoder without increasing the number of cues per layer. The expressive hint adjusting method with the residual marks has the advantage of effective self-adaption of parameters, and simultaneously, the performance in the face recognition task is remarkably improved.
(2) According to the application, the acquired face data of the student learning activity accompanied by the learning activity and the learning activity video stream are stored, dynamic analysis can be carried out on the basis, and students can be accurately guided to learn by utilizing the dynamic analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a logic block diagram of a student activity companion data analysis device according to an embodiment of the present application;
fig. 2 is a schematic diagram of a composition structure of a face detector according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a network structure of a self-attention feature point sampling model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a composition structure of a relational modeling layer according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a network structure of a self-attention feature extraction model according to an embodiment of the present application;
fig. 6 is a flow chart of a student activity accompanying data analysis method according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. In addition, the technical features of the embodiments of the present application described below may be combined with each other as long as they do not collide with each other.
The terms first, second, third and the like in the description and in the claims and in the above drawings, are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Furthermore, well-known or widely-used techniques, elements, structures, and processes may not be described or shown in detail in order to avoid obscuring the understanding of the present application by the skilled artisan. Although the drawings represent exemplary embodiments of the present application, the drawings are not necessarily to scale and certain features may be exaggerated or omitted in order to better illustrate and explain the present application.
The embodiment provides a student activity accompanying data analysis device which can be realized in a software and/or hardware mode and can be integrated on electronic equipment; fig. 1 is a schematic diagram of an analysis device provided in this embodiment, referring to fig. 1, the device includes a video collector, a face detector, a face recognition module, a comparison module and an analysis module; wherein,
the video collector is used for collecting accompanying video data of the object to be analyzed in the learning activity process;
in a specific example, the video collector adopts a monocular visible light camera, places the face of an object to be analyzed (such as a student and a parent) in a detectable area of the monocular visible light camera, and shoots accompanying video data of the object to be analyzed in the learning process.
The face detector is used for carrying out face detection on the accompanying video data to generate a face image;
fig. 2 shows the constituent structure of the face detector; specifically, the face detector firstly reads input accompanying video data, performs self-adaptive image scaling operation on each frame of image, and then inputs the image into the backbone network for feature extraction; and then, inputting the extracted characteristics into a characteristic fusion module for multi-scale characteristic fusion, utilizing a prediction head module to predict and return to a target according to the fusion characteristics, converting a prediction frame output by the prediction head module into a 4-dimensional format through normalization and convolution operation, and processing classified prediction branches by using a sigmoid activation function and predicting branch decoding by using a boundary frame. Splicing 3 classification prediction branches and boundary box branches with different scales output by a prediction head module, and performing non-maximum suppression (NMS) operation to generate a detected face image X l ∈R H×W×C (c=3), l=1,..l-1; l represents the number of face images.
In this embodiment, a multi-head pooled self-attention module PSAE and a lightweight convolution attention module CBAM are introduced into a network structure of the face detector, and an attention area of an image is extracted by using a space and channel attention mechanism of the CBAM, so that a network model focuses on a target object, wherein a processing flow of the PSAE module can be expressed as follows:
LN in the above equation refers to the Layer normalization Layer Norm, which is intended to normalize the entire image. The polarized self-attention mechanism PSA is an improvement over the multi-head self-attention mechanism MSA in order to reduce self-attention memory footprint. The multi-layer perceptron MLP consists of a full connection layer, a GELU activation function, DW convolution (Depthwise convolution) and the full connection layer. The calculation of the self-attention of the MSA can be expressed as:
wherein,a matrix composed of query vector, key vector and value vector, respectively, n is the length of the sequence, d h Is the dimension of the vector.
The calculation of each self-attention head of the PSA can be expressed as:
wherein the dimension of U is kept unchanged, the dimension of the E, Y matrix is changed into P after the average pooling treatment 2 ×C i P is the pool size in PSA, C i The number of channels output in stage i. d h Is the dimension of the vectorDegree. SR (E), SR (Y) refers to downsampling the matrix E, Y by a pooling operation, thereby reducing the complexity of multi-headed self-attention to linear complexity.
The PSA, like the MSA, comprises a plurality of self-focusing heads, the PSA splices the results obtained by each self-focusing head, and the spliced results are passed throughFusion is performed and can be expressed as:
the face recognition module comprises a self-attention feature point sampling model based on a local face and a self-attention feature extraction model based on residual prompting;
fig. 3 is a schematic diagram of a network structure of a self-attention feature point sampling model provided in this embodiment, and as shown in fig. 3, the self-attention feature point sampling model includes a backbone network, a linear embedding layer, a distance coding layer, a relational modeling layer, and a coordinate prediction layer;
the backbone network is used for extracting image features from the face image and generating a feature map;
in one specific example, the backbone network employs an HRNet network; the HRNet network inputs the extracted feature map into a linear embedded layer for processing;
the linear embedding layer is used for cutting out local image blocks with preset sizes from the feature map by taking the facial feature points as the centers, and embedding each image block into a vector which can be regarded as representing the corresponding facial feature points;
the distance coding layer codes the distance between the facial feature points in each local image block to obtain the relative position relation of the facial feature points in the local image blocks;
the relation modeling layer models the relative position relation between the facial feature points in each local image block and outputs the local coordinates of each facial feature point relative to the local image block
The coordinate prediction layer outputs the coordinates f of the final predicted facial feature points i =[x i ,y i ],i=1,...S 2 The coordinates are local coordinates of the facial feature pointsAnd the coordinates of the corresponding local image block; will have facial feature points f i As input to the self-attention feature extraction model.
Fig. 4 is a schematic diagram of the composition structure of the relational modeling layer provided in the present embodiment; in an alternative embodiment, the relational modeling Layer mainly comprises MSA blocks, MMA blocks, and MLP blocks, and Layer normalization is performed using Layer Norm before each block. Learning query vector L for each facial feature point using MSA blocks q The relationship between the point location feature of each facial feature point and the query vector L is learned by using MMA blocks q Relationship between them. Then the query vector L of the facial feature points q Inputs into decoder to learn the inherent relation between feature points, feeds back the output to shared MLP, predicts the position of each facial feature point, and finally outputs predicted facial feature point f i Is set F.
The MSA block calculation can be expressed as:
wherein,a learnable parameter for a linear layer, +.>Representing input and L of the i-th layer self-attention head respectively qC I Is a dimension.
MMA block learns point location feature representation of feature points and query vector L by using cross self-attention mechanism q The process of the relation between the two is as follows:
wherein,a learnable parameter for a linear layer, +.>For the input of an i-layer MMA block, +.>For the purpose of the position coding,is a point location feature representation of the feature point.
As shown in fig. 4, the MLP block contains a fully connected layer, a GELU activation function, DW convolution (Depthwise convolution), and a fully connected layer.
The self-attention feature extraction model is configured to perform feature extraction on a set of facial feature points to generate facial features; FIG. 5 is a schematic diagram of a network structure of a self-attention feature extraction model provided in this embodiment, and as an alternative implementation manner, the self-attention feature extraction model includes a preprocessing layer, a position embedding layer and a feature extraction layer;
the preprocessing layer is used for dividing the image blocks on the feature map by taking the coordinates of each facial feature point as the center; in one specific example, each image block is extracted using an STN differentiable grid sampling method.
The position embedding layer is used for mapping each image block to a two-dimensional vector, adding a classification mark and position codes, and superposing the image blocks subjected to the position codes to obtain an aggregate vector;
in this embodiment, a linear embedded layer is utilizedMapping each image block to two-dimensional vectors, marking as Token, adding a classification Token for classification before the Token, adding position codes to the Token, and then superposing the position-coded Token to obtain an aggregate vector I 0 . The process of position embedding is as follows:
wherein vector I 0 For a Token that is superposition-position-coded,M pos ∈R (N+1)×D
the feature extraction layer is used for extracting facial features according to the aggregate vectors, and the extraction process is expressed as follows:
wherein, a prompt sign S is introduced 0 ∈R M×d The prompt sign S 0 Transmitting the image blocks and the classification marks together through an encoder, wherein each image block interacts with other image blocks through a self-attention layer at each layer; split means splitting a propagation sequence of length n+l+1 into n+1 image blocks and L hints; s is S J To prompt vector, pair S J The final representation is obtained after the average pooling treatmentI 0 J The output value of the classification mark after passing through a transducer encoder; x represents the final output of the whole model, using the loss function +.>Performing face recognition supervision training on the output x, wherein N is the number of samples; d is the edge distance of the glass fiber reinforced plastic plate,of the middle->Is the weighting matrix of the last linear layer, W i Normalized column i, y representing weight matrix j Is the corresponding true value, among them +>b j Is the j-th sample.
In the feature extraction layer, the first layer MSA calculation can be expressed as:
wherein,as a learnable parameter, each self-attention s h Attention self-Attention calculation is performed using matrix +.>Fusing each self-attention header s h R is obtained.
Shallow cues can model some desired Token relationships for a task, and residual cues help re-weight the attention weight of a particular task at each layer independently, aggregating the context information of image blocks Token and making adjustments. To enhance hinting capability to introduce layered residual hints, the present embodiment hints residual in computation within the MSA block of middle layer lAdding to the propagation cues, the calculation of the residual cues can be expressed as:
wherein,zero matrix in b× (n+1) dimension,>is a residual indicator.
In this embodiment, the residual cues are only added to the propagated cue locations, not to the locations of the image blocks or classification marks. In various layered calculations of the transform encoder, residual markers are added to the propagated cues to facilitate layered modulation without increasing the number of cues per layer. The expressive hint adjusting method with the residual marks has the advantage of effective self-adaption of parameters, and simultaneously, the performance in the face recognition task is remarkably improved.
The comparison module is used for comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof;
specifically, the comparison module compares the extracted facial features with the face feature library of the students and the parents thereof recorded in the face feature library to obtain the face recognition result, if the recognition is successful, the identity information of the students and the parents thereof is output, and if the recognition is failed, the feedback recognition is failed.
The analysis module performs visual analysis based on the companion video data and the face feature data.
In this embodiment, face images detected every minute and identity information of face image recognition during a student learning activity are recorded, and when the student learning activity is finished, a video stream and accompanying face data during the student learning activity are stored in a student accompanying database. Based on the stored data, visual analysis is performed, as an optional implementation manner, the analysis module counts the time when no face is detected and the time when no student is recognized by face recognition, the time indicates that the face of the student leaves the learning area, the frequency of leaving the learning area by the chemical face is visualized by using the pie chart, and the concentration of the student can be reflected from the side surface by the frequency of leaving the learning area by the face of the student. In addition, counting the time when parents appear in the student learning area, utilizing the frequency that the cake-shaped graph visualization parents face appears in the student learning area, the frequency that the parents face appears in the student learning area can reflect the degree that the students parents participated in the student learning activity from the side. Generating a thermodynamic diagram according to the position and duration of the occurrence of the face of the student, taking the center position of the face when the student starts to learn activities as the origin of a coordinate system, establishing the coordinate system, and counting the position of the movement of the face of the student and the duration of stay at the position in the learning process of the student. Each time a position is moved, there is a point on the thermodynamic diagram, the longer the dwell time, the more the color of the point is biased towards a dark red color, and conversely the more the point is biased towards a light red color. The generated thermodynamic diagram can be used for knowing the change of the face position in the learning process of the student and analyzing the concentration of the learning activity of the student.
Fig. 6 is a flow chart of a student activity accompanying data analysis method provided in this embodiment, please refer to fig. 6, the method mainly includes the following steps:
s1, acquiring accompanying video data of an object to be analyzed in a learning activity process, and carrying out face detection to generate a face image;
in this step, the object to be analyzed is a student who completes the learning activity in the home, and may further include parents who participate in the learning process of the student.
In a specific example, the visible light camera is used to collect accompanying video data of students during activities, and the students and parents face are ensured to be placed in a detectable area of the monocular visible light camera during collection. Then, the accompanying video data is input into a face detector to detect the face, if no face image is detected, the time when the face is not detected is saved in a cloud database, and if the face image is detected, the detected face image X is output l
S2, carrying out facial feature point sampling on the face image by using a self-attention feature point sampling model based on the local face to obtain a set of multiple groups of facial feature points; comprising the following steps:
s21, extracting image features from the face image to generate a feature map;
in a specific example, the face image is input into the HRNet network for feature extraction to form a feature map.
S22, cutting out a local image block with a preset size from the feature map by taking the predicted facial feature point position coordinates as the center;
in this embodiment, the prediction of the facial feature points is divided into a plurality of stages, and as the stages increase, the more accurate the predicted feature point positions are; the first stage uses the coordinates of a group of P=S×S standard frontal facial feature points as a priori knowledge, and each subsequent stage cuts out a local image block with preset size from the feature map by taking the facial feature points of the previous stage as the center;
s23, extracting local coordinates of each facial feature point in the corresponding image block from the local image block, and generating a set with facial feature points by taking the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image block as final predicted coordinates;
as an optional implementation manner, extracting the local coordinates of each facial feature point in the corresponding image block from the local image block includes:
mapping the local image blocks into vectors, and coding the distance between the facial feature points in each local image block to obtain the relative position relation of the facial feature points in the local image blocks;
modeling the relative position relation among the facial feature points in each local image block according to the coding result, and outputting the local coordinates of each facial feature point relative to the local image block.
S3, inputting the set of facial feature points into a self-attention feature extraction model based on residual prompt to extract facial features;
as an alternative embodiment, the facial feature extraction process includes:
s31, dividing the image block on the feature map by taking the coordinates of each facial feature point as the center;
s32, mapping each image block to a two-dimensional vector, adding a classification mark and position codes, and superposing the image blocks subjected to the position codes to obtain an aggregate vector;
in particular, using linear embedded layersMapping each image block to two-dimensional vectors, marking as Token, adding a classification Token for classification before the Token, adding position codes to the Token, and then superposing the position-coded Token to obtain a vector I 0 . The process of location embedding is expressed as:
wherein vector I 0 For a Token that is superposition-position-coded,M pos ∈R (N+1)×D
s33, inputting the aggregate vector into a self-attention feature extraction model to perform feature extraction, wherein the extraction process is expressed as follows:
wherein, a prompt sign S is introduced 0 ∈R M×d The prompt sign S 0 Transmitting the image blocks and the classification marks together through an encoder, wherein each image block interacts with other image blocks through a self-attention layer at each layer; split means splitting a propagation sequence of length n+l+1 into n+1 image blocks and L hints; s is S J To prompt vector, pair S J The final representation is obtained after the average pooling treatmentI 0 J The output value of the classification mark after passing through a transducer encoder; x represents the final output of the whole model, using the loss function +.>Performing face recognition supervision training on the output x, wherein N is the number of samples; d is the edge distance of the glass fiber reinforced plastic plate,of the middle->Is the weighting matrix of the last linear layer, W i Normalized column i, y representing weight matrix j Is the corresponding true value, among them +>b j Is the j-th sample.
In the feature extraction process, the first layer MSA calculation can be expressed as:
wherein,as a learnable parameter, each self-attention s h Attention self-Attention calculation is performed using matrix +.>Fusing each self-attention header s h Obtaining R;
residual errors are hinted in the computation within the MSA block of middle layer lAdding to the propagation cues, the calculation of the residual cues can be expressed as:
wherein,zero matrix in b× (n+1) dimension,>is a residual indicator.
In this embodiment, the residual cues are only added to the propagated cue locations, not to the locations of the image blocks or classification marks.
S4, comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof;
and S5, carrying out visual analysis based on the companion video data and the face feature data.
For specific details of the student activity companion data analysis method, reference may be made to the above definition of the student activity companion data analysis device, and repeated details are not repeated here.
It should be noted that while in the above-described embodiments the operations of the methods of the embodiments of the present specification are described in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
The embodiment also provides a student activity accompanying data analysis device, which comprises at least one processor and at least one memory, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the student activity accompanying data analysis method, and the specific steps are referred to above and are not repeated herein; in the present embodiment, the types of the processor and the memory are not particularly limited, for example: the processor may be a microprocessor, digital information processor, on-chip programmable logic system, or the like; the memory may be volatile memory, non-volatile memory, a combination thereof, or the like.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing terminal, display, etc.), with one or more terminals that enable a user to interact with the electronic device, and/or with any terminal (e.g., network card, modem, etc.) that enables the electronic device to communicate with one or more other computing terminals. Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via a network adapter.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the application and is not intended to limit the application, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims (10)

1. The student activity accompanying data analysis method is characterized by comprising the following steps:
s1, acquiring accompanying video data of an object to be analyzed in a learning activity process, and carrying out face detection to generate a face image;
s2, carrying out facial feature point sampling on the face image by using a self-attention feature point sampling model based on the local face to obtain a set of multiple groups of facial feature points; comprising the following steps:
extracting image features from the face image to generate a feature map;
cutting out a local image block with a preset size from the feature map by taking the predicted facial feature points as the center;
extracting local coordinates of each facial feature point in the corresponding image block from the local image block, and generating a set with facial feature points by taking the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image block as final predicted coordinates;
s3, inputting the set of facial feature points into a self-attention feature extraction model based on residual prompt to extract facial features;
s4, comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof;
and S5, carrying out visual analysis based on the companion video data and the face feature data.
2. The student activity companion data analysis method of claim 1, wherein the extracting the local coordinates of each facial feature point in the corresponding image block from the local image block comprises:
mapping the local image blocks into vectors, and coding the distance between the facial feature points in each local image block to obtain the relative position relation of the facial feature points in the local image blocks;
modeling the relative position relation among the facial feature points in each local image block according to the coding result, and outputting the local coordinates of each facial feature point relative to the local image block.
3. The student activity companion data analysis method of claim 1, wherein inputting the set of facial feature points into a residual hint based self-attention feature extraction model to extract facial features comprises:
dividing image blocks on the feature map by taking the coordinates of each facial feature point as the center, mapping each image block to a two-dimensional vector, and adding classification marks and position codes;
superposing the image blocks subjected to position coding to obtain an aggregate vector;
inputting the aggregate vector into a self-attention feature extraction model to perform feature extraction, wherein the extraction process is expressed as follows:
wherein, a prompt sign S is introduced 0 ∈R M×d The prompt sign S 0 Transmitting the image blocks and the classification marks together through an encoder, wherein each image block interacts with other image blocks through a self-attention layer at each layer; split means splitting a propagation sequence of length n+l+1 into n+1 image blocks and L hints; s is S J To prompt vector, pair S J The final representation is obtained after the average pooling treatmentI 0 J The output value of the classification mark after passing through a transducer encoder; x represents the final output of the whole model, using the loss function +.>Performing face recognition supervision training on the output x, wherein N is the number of samples; d is the edge distance of the glass fiber reinforced plastic plate,of the middle->Is the weighting matrix of the last linear layer, W i Normalized column i, y representing weight matrix j Is the corresponding true value, among them +>b j Is the j-th sample.
4. The student activity companion data analysis method of claim 1 wherein in the feature extraction process, the first layer MSA computation can be expressed as:
wherein,as a learnable parameter, each self-attention s h Self-attention calculation is performed using the matrix +.>Fusing each self-attention header s h Obtaining R;
residual errors are hinted in the computation within the MSA block of middle layer lAdding to the propagation cues, the calculation of the residual cues can be expressed as:
wherein,zero matrix in b× (n+1) dimension,>is a residual indicator; only residual cues are added to the propagated cue locations, not to the locations of image blocks or classification marks.
5. A student activity companion data analysis device comprising:
the video collector is used for collecting accompanying video data of the object to be analyzed in the learning activity process;
a face detector for performing face detection on the companion video data to generate a face image;
the face recognition module comprises a self-attention feature point sampling model based on a local face and a self-attention feature extraction model based on residual prompting;
wherein the self-attention feature point sampling model is configured to:
extracting image features from the face image to generate a feature map;
cutting out a local image block with a preset size from the feature map by taking the predicted facial feature points as the center;
extracting local coordinates of each facial feature point in the corresponding image block from the local image block, and generating a set with facial feature points by taking the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image block as final predicted coordinates;
the self-attention feature extraction model is configured to perform feature extraction on the set of facial feature points to generate facial features;
the comparison module is used for comparing the facial features with the facial features stored in the facial feature library to generate facial feature data and corresponding identity information thereof;
and the analysis module is used for carrying out visual analysis based on the accompanying video data and the face characteristic data.
6. The student activity companion data analysis device of claim 5, wherein the self-attention feature point sampling model comprises:
the backbone network is used for extracting image features from the face image and generating a feature map;
a linear embedding layer, which is used for cutting out a local image block with a preset size from the feature map by taking the facial feature point as the center and mapping the local image block into a vector;
a distance coding layer for coding the distance between the facial feature points in each local image block to obtain the relative position relationship of the facial feature points in the local image block;
the relation modeling layer is used for modeling the relative position relation among the facial feature points in each local image block and outputting the local coordinates of each facial feature point relative to the local image block;
and the coordinate prediction layer outputs the coordinates of the finally predicted facial feature points, wherein the coordinates are the sum of the local coordinates of the facial feature points and the relative position coordinates of the corresponding local image blocks.
7. The student activity companion data analysis device of claim 6, wherein the relational modeling layer comprises an MSA block, an MMA block, and an MLP block, and a normalization layer is provided before each block.
8. The student activity companion data analysis device of claim 5, wherein the self-attention feature extraction model comprises:
a preprocessing layer for dividing the image block on the feature map with the coordinates of each facial feature point as the center;
the position embedding layer is used for mapping each image block to a two-dimensional vector, adding a classification mark and position codes, and superposing the image blocks subjected to the position codes to obtain an aggregate vector;
the feature extraction layer is used for extracting facial features according to the aggregate vectors, and the extraction process is expressed as follows:
wherein, a prompt sign S is introduced 0 ∈R M×d The prompt sign S 0 Transmitting the image blocks and the classification marks together through an encoder, wherein each image block interacts with other image blocks through a self-attention layer at each layer; split means splitting a propagation sequence of length n+l+1 into n+1 image blocks and L hints; s is S J To prompt vector, pair S J Averaging is performedObtaining final representation after poolingI 0 J The output value of the classification mark after passing through a transducer encoder; x represents the final output of the whole model, using the loss function +.>Performing face recognition supervision training on the output x, wherein N is the number of samples; d is the edge distance of the glass fiber reinforced plastic plate,of the middle->Is the weighting matrix of the last linear layer, W i Normalized column i, y representing weight matrix j Is the corresponding true value, among them +>b j Is the j-th sample.
9. The student activity companion data analysis device of claim 8, wherein in the feature extraction layer, a first layer MSA calculation can be expressed as:
wherein,as a learnable parameter, each self-attention s h Self-attention calculation is performed using the matrix +.>Fusing eachSelf-attention head s h Obtaining R;
residual errors are hinted in the computation within the MSA block of middle layer lAdding to the propagation cues, the calculation of the residual cues can be expressed as:
wherein,zero matrix in b× (n+1) dimension,>is a residual indicator; only residual cues are added to the propagated cue locations, not to the locations of image blocks or classification marks.
10. A student activity companion data analysis device comprising at least one processing unit and at least one storage unit, wherein the storage unit stores a computer program which, when executed by the processing unit, causes the processing unit to perform the steps of the method of any one of claims 1 to 4.
CN202311119659.4A 2023-08-30 2023-08-30 Student activity accompanying data analysis method, device and equipment Pending CN117152815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311119659.4A CN117152815A (en) 2023-08-30 2023-08-30 Student activity accompanying data analysis method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311119659.4A CN117152815A (en) 2023-08-30 2023-08-30 Student activity accompanying data analysis method, device and equipment

Publications (1)

Publication Number Publication Date
CN117152815A true CN117152815A (en) 2023-12-01

Family

ID=88909492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311119659.4A Pending CN117152815A (en) 2023-08-30 2023-08-30 Student activity accompanying data analysis method, device and equipment

Country Status (1)

Country Link
CN (1) CN117152815A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558056A (en) * 2024-01-11 2024-02-13 东云睿连(武汉)计算技术有限公司 Accompanying relation recognition method and system based on face image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558056A (en) * 2024-01-11 2024-02-13 东云睿连(武汉)计算技术有限公司 Accompanying relation recognition method and system based on face image
CN117558056B (en) * 2024-01-11 2024-04-02 东云睿连(武汉)计算技术有限公司 Accompanying relation recognition method and system based on face image

Similar Documents

Publication Publication Date Title
CN111709409B (en) Face living body detection method, device, equipment and medium
CN110852256B (en) Method, device and equipment for generating time sequence action nomination and storage medium
CN106570464A (en) Human face recognition method and device for quickly processing human face shading
CN110781413B (en) Method and device for determining interest points, storage medium and electronic equipment
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN111046275A (en) User label determining method and device based on artificial intelligence and storage medium
CN112541529A (en) Expression and posture fusion bimodal teaching evaluation method, device and storage medium
CN110465089B (en) Map exploration method, map exploration device, map exploration medium and electronic equipment based on image recognition
CN109902681B (en) User group relation determining method, device, equipment and storage medium
CN117152815A (en) Student activity accompanying data analysis method, device and equipment
CN114612902A (en) Image semantic segmentation method, device, equipment, storage medium and program product
CN115471771A (en) Video time sequence action positioning method based on semantic level time sequence correlation modeling
CN114943937A (en) Pedestrian re-identification method and device, storage medium and electronic equipment
CN114461853A (en) Training sample generation method, device and equipment of video scene classification model
CN114528762A (en) Model training method, device, equipment and storage medium
CN109886251A (en) A kind of recognition methods again of pedestrian end to end guiding confrontation study based on posture
Liu et al. Digital twins by physical education teaching practice in visual sensing training system
CN114419514B (en) Data processing method, device, computer equipment and storage medium
CN115131291A (en) Object counting model training method, device, equipment and storage medium
CN116704588B (en) Face image replacing method, device, equipment and storage medium
CN118155231B (en) Document identification method, device, equipment, medium and product
CN117953589B (en) Interactive action detection method, system, equipment and medium
CN115238805B (en) Training method of abnormal data recognition model and related equipment
CN110472728B (en) Target information determining method, target information determining device, medium and electronic equipment
CN113573043B (en) Video noise point identification method, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination