CN116665086A - Teaching method and system based on intelligent analysis of learning behaviors - Google Patents

Teaching method and system based on intelligent analysis of learning behaviors Download PDF

Info

Publication number
CN116665086A
CN116665086A CN202310435863.0A CN202310435863A CN116665086A CN 116665086 A CN116665086 A CN 116665086A CN 202310435863 A CN202310435863 A CN 202310435863A CN 116665086 A CN116665086 A CN 116665086A
Authority
CN
China
Prior art keywords
learning
context
vector
image
feature vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310435863.0A
Other languages
Chinese (zh)
Inventor
吴仲毓
龙行
杨世义
沈思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ruishu Technology Co ltd
Original Assignee
Hangzhou Ruishu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ruishu Technology Co ltd filed Critical Hangzhou Ruishu Technology Co ltd
Priority to CN202310435863.0A priority Critical patent/CN116665086A/en
Publication of CN116665086A publication Critical patent/CN116665086A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Educational Administration (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Technology (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the field of intelligent teaching, and particularly discloses a teaching method and a teaching system based on intelligent analysis of learning behaviors, which utilize an artificial intelligent detection technology based on deep learning to excavate high-dimensional time sequence implicit associated characteristic information of a person to be detected about experimental operation behaviors in an experimental process from a learning monitoring video of the person to be detected through a ViT image encoder and a context encoder based on a converter, and perform classification judgment of learning behavior normalization based on the high-dimensional time sequence implicit associated characteristic information. Therefore, the experimental operation behaviors of the personnel to be detected can be corrected in time, so that the learning efficiency is improved, and the occurrence of safety accidents can be avoided.

Description

Teaching method and system based on intelligent analysis of learning behaviors
Technical Field
The application relates to the field of intelligent teaching, in particular to a teaching method and system based on intelligent analysis of learning behaviors.
Background
With the improvement of teaching conditions and the improvement of education level, a plurality of schools in China establish laboratory teaching modes at present, and the capability of applying theory to practice by students is cultivated.
In a traditional laboratory teaching mode, for example, in a chemical experiment course facing middle school students, a teacher generally demonstrates operation steps and a demonstration experiment process, and the students finish the teaching of practice links by simulating and learning the practice of the teacher, so that the students can understand the course content. However, due to limited attention of teachers, students cannot be guided one by one, and in the process, the students are unavoidably prevented from operating behaviors which are out of specification, so that learning efficiency is reduced, and potential safety hazards can exist.
Therefore, a teaching scheme based on intelligent analysis of learning behaviors is desired.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a teaching method and a teaching system based on intelligent analysis of learning behaviors, which utilize an artificial intelligent detection technology based on deep learning to excavate high-dimensional time sequence implicit associated characteristic information of a person to be detected about experimental operation behaviors in an experimental process from a learning monitoring video of the person to be detected through a ViT image encoder and a context encoder based on a converter, and perform classification judgment of learning behavior normalization based on the high-dimensional time sequence implicit associated characteristic information. Therefore, the experimental operation behaviors of the personnel to be detected can be corrected in time, so that the learning efficiency is improved, and the occurrence of safety accidents can be avoided.
According to one aspect of the present application, there is provided a teaching method based on learning behavior intelligent analysis, comprising:
acquiring a learning monitoring video of a person to be detected;
extracting a plurality of learning key frames from the learning monitoring video;
the plurality of learning key frames are respectively passed through an image noise reducer based on an automatic coder and decoder to obtain a plurality of noise reduction learning key frames;
The plurality of noise reduction learning key frames are respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors;
passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain a time-series associated learning feature vector; and
and the time sequence associated learning feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not.
In the teaching method based on intelligent analysis of learning behavior, extracting a plurality of learning key frames from the learning monitoring video includes: a plurality of learning key frames are extracted from the learning monitoring video at a predetermined sampling frequency.
In the teaching method based on intelligent analysis of learning behavior, the steps of obtaining a plurality of noise-reduction learning key frames by respectively passing the plurality of learning key frames through an image noise reducer based on an automatic codec include: inputting the learning key frame into an encoder of the image noise reducer, wherein the encoder uses a convolution layer to perform explicit spatial encoding on the learning key frame to obtain image characteristics; and inputting the image features into a decoder of the image noise reducer, wherein the decoder uses a deconvolution layer to deconvolute the image features to obtain the noise reduction learning key frame.
In the teaching method based on learning behavior intelligent analysis, the steps of passing the plurality of noise reduction learning key frames through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors respectively include: image blocking processing is carried out on each noise reduction learning key frame so as to obtain a sequence of image blocks; using the embedding layer of the ViT image encoder to respectively carry out embedded coding on each image block in the sequence of the image blocks so as to obtain a sequence of image block embedded vectors; and inputting the sequence of image block embedded vectors into a converter module of the ViT image encoder to obtain the learning semantic understanding feature vector.
In the teaching method based on learning behavior intelligent analysis, the embedding layer of the ViT image encoder is used for respectively performing embedded coding on each image block in the sequence of image blocks to obtain a sequence of image block embedded vectors, and the method comprises the following steps: respectively expanding each image block in the sequence of image blocks into one-dimensional pixel input vectors to obtain a plurality of one-dimensional pixel input vectors; and performing full-concatenated coding on each one-dimensional pixel input vector of the plurality of one-dimensional pixel input vectors by using an embedding layer of the ViT image encoder to obtain a sequence of the image block embedding vectors.
In the teaching method based on intelligent analysis of learning behavior, the step of passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain time sequence associated learning feature vectors includes: global context semantic coding the plurality of learning semantic understanding feature vectors using the converter-based context encoder to obtain a plurality of context learning feature vectors; calculating a Gaussian mixture model of the plurality of context learning feature vectors, wherein the mean vector of the Gaussian mixture model is a per-position mean vector of the plurality of context learning feature vectors, and the value of each position in a covariance matrix of the Gaussian mixture model is the variance between the feature values of the corresponding two positions in the per-position mean vector; calculating Gaussian probability density distribution distance indexes of each context learning feature vector in the context learning feature vectors relative to the Gaussian mixture model to obtain a plurality of Gaussian probability density distribution distance indexes; weighting the context learning feature vectors by taking the Gaussian probability density distribution distance indexes as weights to obtain corrected context learning feature vectors; and cascading the plurality of corrected context learning feature vectors to obtain the time-series-associated learning feature vector.
In the teaching method based on learning behavior intelligent analysis, performing global context semantic coding on the plurality of learning semantic understanding feature vectors by using the context encoder based on the converter to obtain a plurality of context learning feature vectors, including: arranging the plurality of learning semantic understanding feature vectors into an input vector; respectively converting the input vector into a query vector and a key vector through a learning embedding matrix; calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix; carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix; inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and multiplying the self-attention feature matrix with each learning semantic understanding feature vector in the plurality of learning semantic understanding feature vectors as a value vector to obtain the plurality of context learning feature vectors.
In the teaching method based on learning behavior intelligent analysis, calculating a gaussian probability density distribution distance index of each of the plurality of context learning feature vectors relative to the gaussian mixture model to obtain a plurality of gaussian probability density distribution distance indexes includes: calculating Gaussian probability density distribution distance indexes of each context learning feature vector in the plurality of context learning feature vectors relative to the Gaussian mixture model according to the following Gaussian probability density distribution distance index formula to obtain a plurality of Gaussian probability density distribution distance indexes; the Gaussian probability density distribution distance index formula is as follows:
Wherein V is i Is the i-th context learning feature vector, μ u Sum sigma u Is the mean vector and covariance matrix of the Gaussian mixture model, i.e. μ u A per-position mean vector representing the plurality of context learning feature vectors, and Σ u The value of each position in the per-position mean vector is the variance between the eigenvalues of the corresponding two positions, wherein the vector is a column vector,representing matrix multiplication->Represents the subtraction by position, exp (·) represents the natural exponential function operation, w i Is the i-th gaussian probability density distribution distance index.
In the teaching method based on intelligent analysis of learning behaviors, the time sequence associated learning feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for indicating whether learning behaviors of the person to be detected are standard or not, and the method comprises the following steps: inputting the time sequence associated learning feature vector as a classification feature vector into a Softmax classification function of the classifier to obtain a probability value of the classification feature vector belonging to each classification label; and determining the classification label corresponding to the maximum probability value as the classification result.
According to another aspect of the present application, there is provided a teaching system based on learning behavior intelligent analysis, including:
The monitoring module is used for acquiring a learning monitoring video of the person to be detected;
the sampling module is used for extracting a plurality of learning key frames from the learning monitoring video;
the noise reduction module is used for respectively enabling the plurality of learning key frames to pass through an image noise reducer based on an automatic coder and decoder so as to obtain a plurality of noise reduction learning key frames;
the key frame semantic understanding module is used for enabling the plurality of noise reduction learning key frames to respectively pass through a ViT image encoder so as to obtain a plurality of learning semantic understanding feature vectors;
a timing correlation module for passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain a timing correlated learning feature vector; and
and the detection result generation module is used for enabling the time sequence associated learning feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the teaching method based on learning behavior intelligent analysis as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the teaching method based on learning behavior intelligent analysis as described above.
Compared with the prior art, the teaching method and the teaching system based on the intelligent analysis of the learning behavior, provided by the application, utilize an artificial intelligent detection technology based on the deep learning to excavate high-dimensional time sequence implicit associated characteristic information of the personnel to be detected about the experimental operation behavior in the experimental process from a learning monitoring video of the personnel to be detected through a ViT image encoder and a context encoder based on a converter, and perform classification judgment of the normalization of the learning behavior based on the high-dimensional time sequence implicit associated characteristic information. Therefore, the experimental operation behaviors of the personnel to be detected can be corrected in time, so that the learning efficiency is improved, and the occurrence of safety accidents can be avoided.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a flowchart of a teaching method based on learning behavior intelligent analysis according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a teaching method based on intelligent analysis of learning behavior according to an embodiment of the present application.
Fig. 3 is a flowchart of a learning behavior intelligent analysis-based teaching method according to an embodiment of the present application, in which the plurality of noise-reduction learning key frames are respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors.
Fig. 4 is a flowchart of a method for learning semantic understanding feature vectors through a context encoder based on a converter to obtain time-series associated learning feature vectors in a learning behavior intelligent analysis based teaching method according to an embodiment of the present application.
Fig. 5 is a block diagram of a teaching system based on intelligent analysis of learning behavior according to an embodiment of the present application.
Fig. 6 is a block diagram of an electronic device according to an embodiment of the application.
Detailed Description
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Summary of the application
Accordingly, due to limited attention of teachers, students cannot be guided one by one, and in the process, the students are unavoidably prevented from operating behaviors which are out of specification, so that learning efficiency is reduced, and potential safety hazards can exist. In the technical scheme of the application, the high-dimensional implicit associated characteristic information about the experimental operation behaviors of the personnel to be detected in the experimental process is expected to be mined from the learning monitoring video of the personnel to be detected, and the classification judgment of the normalization of the learning behaviors is carried out based on the high-dimensional implicit associated characteristic information. In this process, since the experimental operation behaviors of the person to be detected have implicit relevance in time sequence, it is difficult to fully and precisely mine the time sequence high-dimensional implicit relevance characteristic information about the experimental operation behaviors of the person to be detected from the learning monitoring video of the person to be detected.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.
The development of deep learning and neural networks provides a new solution idea and scheme for mining time sequence high-dimensional implicit association characteristic information about experimental operation behaviors of people to be detected in the experimental process. Those of ordinary skill in the art will appreciate that a deep learning based deep neural network model may adjust parameters of the deep neural network model by appropriate training strategies, such as by a gradient descent back-propagation algorithm, to enable modeling of complex nonlinear correlations between things, which is obviously suitable for mining time-sequential high-dimensional implicit correlation characteristic information of the person under test about experimental operation behavior during the experiment.
Specifically, in the technical scheme of the application, firstly, a camera acquires a learning monitoring video of a person to be detected.
Next, it is considered that in the learning monitor video, the experimental operation behavior change characteristic may be represented by a difference between two adjacent frames in the learning monitor video, that is, a change condition of the experimental operation behavior is represented by image characterization of the adjacent image frames. However, since a large amount of data redundancy exists in consideration of the small difference between adjacent frames in the learning monitor video, in order to reduce the amount of calculation and avoid adverse effects of data redundancy on the classification judgment of the normalization of learning behavior, the learning monitor video is key-frame-sampled at a predetermined sampling frequency to extract a plurality of learning key frames from the learning monitor video. Here, it is worth mentioning that the sampling frequency may be adjusted based on the application requirements of the actual scenario, instead of the default value.
In consideration of the fact that a large amount of information irrelevant to experimental operation behaviors of the person to be detected exists in the plurality of learning key frames, for example, large-area ground, desktop and the like, in order to improve accuracy of classification judgment of normalization of learning behaviors of the person to be detected, the plurality of learning key frames are further processed through an image noise reducer based on an automatic coder and decoder to obtain a plurality of noise reduction learning key frames. In particular, here, the automatic encoder-based image noise reducer includes an encoder that explicitly spatially encodes the learning key frame using a convolutional layer to obtain image features, and a decoder that deconvolves the image features using a deconvolution layer to obtain the noise-reducing learning key frame.
In order to mine the implicit association mode features of the experimental operation behaviors of the person to be detected in the plurality of noise reduction learning key frames, the plurality of noise reduction learning key frames are further respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors. That is, the convolution neural network model with excellent performance in the field of local correlation feature extraction is used to perform feature filtering based on convolution kernels on the plurality of noise reduction learning key frames respectively to capture high-dimensional local implicit correlation features in the plurality of noise reduction learning key frames. In the technical scheme of the application, firstly, image blocking processing is carried out on each noise reduction learning key frame to obtain a sequence of image blocks, and here, the image blocks which are not overlapped and have fixed sizes can be obtained by carrying out blocking operation on each noise reduction learning key frame; then, using the embedding layer of the ViT image encoder to respectively perform embedded coding on each image block in the sequence of image blocks to obtain a sequence of image block embedded vectors, that is, performing linear projection on each image block in the sequence of image blocks with a learnable embedded matrix to obtain the sequence of image block embedded vectors; the sequence of image block embedded vectors is then input to a converter module of the ViT image encoder to derive the learning semantic understanding feature vector. The converter module of the ViT image encoder is used for performing global context-based semantic coding on the sequence of the image block embedded vectors to obtain a plurality of context semantic understanding feature vectors corresponding to the sequence of the image block embedded vectors, and obtaining the learning semantic understanding feature vectors after cascading the plurality of context semantic understanding feature vectors.
When the experimental operation is actually carried out, some steps can be exchanged according to the requirement, and some steps have strict causal logic relationship and cannot be exchanged, namely, the experimental operation behavior of the person to be detected in the experimental process has a time sequence global implicit association relationship, and in order to mine the global implicit association relationship, the plurality of learning semantic understanding feature vectors are used for obtaining time sequence association learning feature vectors through a context encoder based on a converter.
That is, based on the transducer concept, the converter is capable of capturing long-distance context-dependent characteristics, and performing global-based context semantic coding on each of the plurality of learning semantic understanding feature vectors to obtain a context semantic association feature representation in which the overall semantic association of the plurality of learning semantic understanding feature vectors is a context, that is, the time-series associated learning feature vector. It should be understood that in the technical solution of the present application, the encoder based on the converter may capture the global long-distance dependency correlation characteristic distribution information based on the global long-distance dependency correlation characteristic of the implicit characteristic of the experimental operation of the person to be detected in the time dimension under each learning key frame.
After the time sequence associated learning feature vector is obtained, the time sequence associated learning feature vector is passed through a classifier to obtain a classification result, and the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not. That is, in the technical solution of the present application, the label of the classifier includes a learning behavior specification (first label) of the person to be detected, and a learning behavior non-specification (second label) of the person to be detected, wherein the classifier determines to which classification label the time-series-associated learning feature vector belongs through a soft maximum function. It should be understood that in the technical scheme of the application, after the classification result is obtained, prompt information can be generated based on the classification result to correct the experimental operation behavior of the person to be detected in time, thereby improving the learning efficiency and avoiding the occurrence of safety accidents.
Here, when the plurality of learning semantic understanding feature vectors are passed through the context encoder based on the converter to obtain the time-series associated learning feature vector, the plurality of learning semantic understanding feature vectors are directly concatenated through the plurality of context learning feature vectors obtained through the context encoder based on the converter to obtain the time-series associated learning feature vector, which may cause the time-series associated learning feature vector to have poor consistency and correlation in a fused feature dimension of the plurality of context learning feature vectors as a target classification dimension, thereby affecting accuracy of classification results of the time-series associated learning feature vector.
Therefore, it is desirable to converge the difference between the plurality of context learning feature vectors at the gaussian probability density level, specifically, first calculate a gaussian mixture model of the plurality of context learning feature vectors, and then further calculate a gaussian probability density distribution distance index of each context learning feature vector from the gaussian mixture model, expressed as:
wherein V is i Is the i-th context learning feature vector, μ u Sum sigma u Is the mean vector and covariance matrix of the Gaussian mixture model, i.e. μ u A per-position mean vector representing the plurality of context learning feature vectors, and Σ u The value of each position in the average value vector is the variance between the eigenvalues of the corresponding two positions in the average value vector, wherein the vector is a column vector.
Here, by calculating the gaussian probability density distribution distance index of each context learning feature vector and the gaussian mixture model, the feature distribution distance of the feature distribution of the target feature vector relative to the joint gaussian probability density distribution represented by the gaussian mixture model may be represented, and by weighting each context learning feature vector of the plurality of context learning feature vectors by it, the compatibility of the time-series associated learning feature vector obtained by cascading to the probability density joint distribution correlation migration of the gaussian probability density on the target domain may be improved, thereby improving the consistency and correlation of the gaussian probability density distribution on the fusion feature dimension of the plurality of context learning feature vectors serving as the target classification dimension, and improving the accuracy of the classification result of the time-series associated learning feature vector.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Exemplary method
Fig. 1 is a flowchart of a teaching method based on learning behavior intelligent analysis according to an embodiment of the present application. As shown in fig. 1, a teaching method based on intelligent analysis of learning behavior according to an embodiment of the present application includes: s110, acquiring a learning monitoring video of a person to be detected; s120, extracting a plurality of learning key frames from the learning monitoring video; s130, the plurality of learning key frames are respectively passed through an image noise reducer based on an automatic coder-decoder to obtain a plurality of noise reduction learning key frames; s140, enabling the plurality of noise reduction learning key frames to pass through a ViT image encoder respectively so as to obtain a plurality of learning semantic understanding feature vectors; s150, passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain time sequence associated learning feature vectors; and S160, enabling the time sequence associated learning feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not.
Fig. 2 is a schematic diagram of a teaching method based on intelligent analysis of learning behavior according to an embodiment of the present application. As shown in fig. 2, in the architecture, first, a learning monitoring video of a person to be detected is acquired; then, extracting a plurality of learning key frames from the learning monitoring video; then, the plurality of learning key frames are respectively passed through an image noise reducer based on an automatic coder and decoder to obtain a plurality of noise reduction learning key frames; then, the plurality of noise reduction learning key frames are respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors; the plurality of learning semantic understanding feature vectors pass through a context encoder based on a converter to obtain time sequence associated learning feature vectors; and finally, the time sequence associated learning feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not.
Accordingly, due to limited attention of teachers, students cannot be guided one by one, and in the process, the students are unavoidably prevented from operating behaviors which are out of specification, so that learning efficiency is reduced, and potential safety hazards can exist. In the technical scheme of the application, the high-dimensional implicit associated characteristic information about the experimental operation behaviors of the personnel to be detected in the experimental process is expected to be mined from the learning monitoring video of the personnel to be detected, and the classification judgment of the normalization of the learning behaviors is carried out based on the high-dimensional implicit associated characteristic information. In this process, since the experimental operation behaviors of the person to be detected have implicit relevance in time sequence, it is difficult to fully and precisely mine the time sequence high-dimensional implicit relevance characteristic information about the experimental operation behaviors of the person to be detected from the learning monitoring video of the person to be detected.
In recent years, deep learning and neural networks have been widely used in the fields of computer vision, natural language processing, text signal processing, and the like. In addition, deep learning and neural networks have also shown levels approaching and even exceeding humans in the fields of image classification, object detection, semantic segmentation, text translation, and the like.
The development of deep learning and neural networks provides a new solution idea and scheme for mining time sequence high-dimensional implicit association characteristic information about experimental operation behaviors of people to be detected in the experimental process. Those of ordinary skill in the art will appreciate that a deep learning based deep neural network model may adjust parameters of the deep neural network model by appropriate training strategies, such as by a gradient descent back-propagation algorithm, to enable modeling of complex nonlinear correlations between things, which is obviously suitable for mining time-sequential high-dimensional implicit correlation characteristic information of the person under test about experimental operation behavior during the experiment.
In step S110, a learning monitoring video of a person to be detected is acquired. In the technical scheme of the application, a camera is used for acquiring the learning monitoring video of the personnel to be detected.
In step S120, a plurality of learning key frames are extracted from the learning monitoring video. Considering that in the learning monitoring video, the experimental operation behavior change characteristic can be represented by the difference between two adjacent frames in the learning monitoring video, that is, the change condition of the experimental operation behavior is represented by the image representation of the adjacent image frames. However, since a large amount of data redundancy exists in consideration of the small difference between adjacent frames in the learning monitor video, in order to reduce the amount of calculation and avoid adverse effects of data redundancy on the classification judgment of the normalization of learning behavior, the learning monitor video is key-frame-sampled at a predetermined sampling frequency to extract a plurality of learning key frames from the learning monitor video. Here, it is worth mentioning that the sampling frequency may be adjusted based on the application requirements of the actual scenario, instead of the default value.
In step S130, the plurality of learning key frames are respectively passed through an image noise reducer based on an automatic codec to obtain a plurality of noise reduction learning key frames. In consideration of the fact that a large amount of information irrelevant to experimental operation behaviors of the person to be detected exists in the plurality of learning key frames, for example, large-area ground, desktop and the like, in order to improve accuracy of classification judgment of normalization of learning behaviors of the person to be detected, the plurality of learning key frames are further processed through an image noise reducer based on an automatic coder and decoder to obtain a plurality of noise reduction learning key frames. In particular, here, the automatic encoder-based image noise reducer includes an encoder and a decoder.
Specifically, in the embodiment of the present application, the encoding process of passing the plurality of learning key frames through an image noise reducer based on an automatic codec to obtain a plurality of noise reduction learning key frames includes: firstly, inputting the learning key frame into an encoder of the image noise reducer, wherein the encoder uses a convolution layer to carry out explicit space coding on the learning key frame so as to obtain image characteristics; and then inputting the image features into a decoder of the image noise reducer, wherein the decoder uses a deconvolution layer to deconvolve the image features to obtain the noise reduction learning key frame.
In step S140, the plurality of noise reduction learning key frames are respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors. In order to mine the implicit association mode features of the experimental operation behaviors of the person to be detected in the plurality of noise reduction learning key frames, the plurality of noise reduction learning key frames are further respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors. That is, the convolution neural network model with excellent performance in the field of local correlation feature extraction is used to perform feature filtering based on convolution kernels on the plurality of noise reduction learning key frames respectively to capture high-dimensional local implicit correlation features in the plurality of noise reduction learning key frames.
In the technical scheme of the application, firstly, image blocking processing is carried out on each noise reduction learning key frame to obtain a sequence of image blocks, and here, the image blocks which are not overlapped and have fixed sizes can be obtained by carrying out blocking operation on each noise reduction learning key frame; then, using the embedding layer of the ViT image encoder to respectively perform embedded coding on each image block in the sequence of image blocks to obtain a sequence of image block embedded vectors, that is, performing linear projection on each image block in the sequence of image blocks with a learnable embedded matrix to obtain the sequence of image block embedded vectors; the sequence of image block embedded vectors is then input to a converter module of the ViT image encoder to derive the learning semantic understanding feature vector. The converter module of the ViT image encoder is used for performing global context-based semantic coding on the sequence of the image block embedded vectors to obtain a plurality of context semantic understanding feature vectors corresponding to the sequence of the image block embedded vectors, and obtaining the learning semantic understanding feature vectors after cascading the plurality of context semantic understanding feature vectors.
More specifically, in an embodiment of the present application, the encoding process for performing embedded encoding on each image block in the sequence of image blocks to obtain a sequence of image block embedded vectors by using the embedded layer of the ViT image encoder includes: firstly, respectively expanding each image block in the sequence of the image blocks into one-dimensional pixel input vectors to obtain a plurality of one-dimensional pixel input vectors; then, each one-dimensional pixel input vector of the plurality of one-dimensional pixel input vectors is fully concatenated encoded using an embedding layer of the ViT image encoder to obtain a sequence of the image block embedded vectors.
Fig. 3 is a flowchart of a learning behavior intelligent analysis-based teaching method according to an embodiment of the present application, in which the plurality of noise-reduction learning key frames are respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors. As shown in fig. 3, the step of passing the plurality of noise reduction learning key frames through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors includes the steps of: s210, performing image blocking processing on each noise reduction learning key frame to obtain a sequence of image blocks; s220, respectively performing embedded coding on each image block in the sequence of image blocks by using an embedded layer of the ViT image encoder to obtain a sequence of image block embedded vectors; and S230, inputting the sequence of the image block embedded vectors into a converter module of the ViT image encoder to obtain the learning semantic understanding feature vector.
In step S150, the plurality of learning semantic understanding feature vectors are passed through a context encoder based on a converter to obtain a time-series correlated learning feature vector. When the experimental operation is actually carried out, some steps can be exchanged according to the requirement, and some steps have strict causal logic relationship and cannot be exchanged, namely, the experimental operation behavior of the person to be detected in the experimental process has a time sequence global implicit association relationship, and in order to mine the global implicit association relationship, the plurality of learning semantic understanding feature vectors are used for obtaining time sequence association learning feature vectors through a context encoder based on a converter. That is, based on the transducer concept, the converter is capable of capturing long-distance context-dependent characteristics, and performing global-based context semantic coding on each of the plurality of learning semantic understanding feature vectors to obtain a context semantic association feature representation in which the overall semantic association of the plurality of learning semantic understanding feature vectors is a context, that is, the time-series associated learning feature vector. It should be understood that in the technical solution of the present application, the encoder based on the converter may capture the global long-distance dependency correlation characteristic distribution information based on the global long-distance dependency correlation characteristic of the implicit characteristic of the experimental operation of the person to be detected in the time dimension under each learning key frame.
Fig. 4 is a flowchart of a method for learning semantic understanding feature vectors through a context encoder based on a converter to obtain time-series associated learning feature vectors in a learning behavior intelligent analysis based teaching method according to an embodiment of the present application. As shown in fig. 4, passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain a time-series associated learning feature vector, comprising the steps of: s310, performing global context semantic coding on the plurality of learning semantic understanding feature vectors by using the context encoder based on the converter to obtain a plurality of context learning feature vectors; s320, calculating a Gaussian mixture model of the plurality of context learning feature vectors, wherein the mean vector of the Gaussian mixture model is a position-based mean vector of the plurality of context learning feature vectors, and the value of each position in a covariance matrix of the Gaussian mixture model is the variance between the feature values of the corresponding two positions in the position-based mean vector; s330, calculating Gaussian probability density distribution distance indexes of each context learning feature vector in the context learning feature vectors relative to the Gaussian mixture model to obtain a plurality of Gaussian probability density distribution distance indexes; s340, weighting the context learning feature vectors by taking the Gaussian probability density distribution distance indexes as weights to obtain corrected context learning feature vectors; and S350, cascading the plurality of corrected context learning feature vectors to obtain the time sequence associated learning feature vector.
More specifically, in an embodiment of the present application, the encoding process for performing global context semantic encoding on the plurality of learning semantic understanding feature vectors using the context encoder based on the converter to obtain a plurality of context learning feature vectors includes: firstly, arranging the plurality of learning semantic understanding feature vectors into input vectors; then, the input vector is respectively converted into a query vector and a key vector through a learning embedding matrix; then, calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix; then, carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix; inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and finally, multiplying the self-attention feature matrix by each learning semantic understanding feature vector in the plurality of learning semantic understanding feature vectors as a value vector to obtain the plurality of context learning feature vectors.
Here, when the plurality of learning semantic understanding feature vectors are passed through the context encoder based on the converter to obtain the time-series associated learning feature vector, the plurality of learning semantic understanding feature vectors are directly concatenated through the plurality of context learning feature vectors obtained through the context encoder based on the converter to obtain the time-series associated learning feature vector, which may cause the time-series associated learning feature vector to have poor consistency and correlation in a fused feature dimension of the plurality of context learning feature vectors as a target classification dimension, thereby affecting accuracy of classification results of the time-series associated learning feature vector.
Therefore, it is desirable to converge the difference between the plurality of context learning feature vectors at the gaussian probability density level, specifically, first calculate a gaussian mixture model of the plurality of context learning feature vectors, and then further calculate a gaussian probability density distribution distance index of each context learning feature vector from the gaussian mixture model, expressed as:
wherein V is i Is the i-th context learning feature vector, μ u Sum sigma u Is the mean vector and covariance matrix of the Gaussian mixture model, i.e. μ u A per-position mean vector representing the plurality of context learning feature vectors, and Σ u The value of each position in the per-position mean vector is the variance between the eigenvalues of the corresponding two positions, wherein the vector is a column vector,representing matrix multiplication->Represents the subtraction by position, exp (·) represents the natural exponential function operation, w i Is the i-th gaussian probability density distribution distance index.
Here, by calculating the gaussian probability density distribution distance index of each context learning feature vector and the gaussian mixture model, the feature distribution distance of the feature distribution of the target feature vector relative to the joint gaussian probability density distribution represented by the gaussian mixture model may be represented, and by weighting each context learning feature vector of the plurality of context learning feature vectors by it, the compatibility of the time-series associated learning feature vector obtained by cascading to the probability density joint distribution correlation migration of the gaussian probability density on the target domain may be improved, thereby improving the consistency and correlation of the gaussian probability density distribution on the fusion feature dimension of the plurality of context learning feature vectors serving as the target classification dimension, and improving the accuracy of the classification result of the time-series associated learning feature vector.
In step S160, the time-series-associated learning feature vector is passed through a classifier to obtain a classification result, where the classification result is used to indicate whether the learning behavior of the person to be detected is normalized. That is, in the technical solution of the present application, the label of the classifier includes a learning behavior specification (first label) of the person to be detected, and a learning behavior non-specification (second label) of the person to be detected, wherein the classifier determines to which classification label the time-series-associated learning feature vector belongs through a soft maximum function. It should be noted that the first tag and the second tag do not include the concept of artificial setting, and in fact, during the training process, the computer model does not have the concept of "whether the learning behavior of the person to be detected is normalized", but only has two classification tags, such as L1 and L2, and the probability that the output feature is under the two classification tags, i.e., the sum of p1 and p2 is one. Therefore, the classification result of whether the learning behavior of the person to be detected is normalized is actually converted into the classified probability distribution conforming to the natural rule through classifying the labels, and the physical meaning of the natural probability distribution of the labels is essentially used instead of the language text meaning of whether the learning behavior of the person to be detected is normalized. It should be understood that in the technical scheme of the application, after the classification result is obtained, prompt information can be generated based on the classification result to correct the experimental operation behavior of the person to be detected in time, thereby improving the learning efficiency and avoiding the occurrence of safety accidents.
Specifically, in the embodiment of the present application, the encoding process of passing the time-series-associated learning feature vector through a classifier to obtain a classification result includes: firstly, inputting the time sequence associated learning feature vector as a classification feature vector into a Softmax classification function of the classifier to obtain a probability value of the classification feature vector belonging to each classification label; and then, determining the classification label corresponding to the maximum probability value as the classification result.
In summary, the teaching method based on intelligent analysis of learning behavior according to the embodiment of the application is explained, which utilizes an artificial intelligent detection technology based on deep learning to mine high-dimensional time sequence implicit associated characteristic information of a person to be detected about experimental operation behavior in the experimental process from a learning monitoring video of the person to be detected through a ViT image encoder and a context encoder based on a converter, and based on the high-dimensional time sequence implicit associated characteristic information, performs classification judgment of learning behavior normalization. Therefore, the experimental operation behaviors of the personnel to be detected can be corrected in time, so that the learning efficiency is improved, and the occurrence of safety accidents can be avoided.
Exemplary System
Fig. 5 is a block diagram of a teaching system based on intelligent analysis of learning behavior according to an embodiment of the present application. As shown in fig. 5, a teaching system 100 based on learning behavior intelligent analysis according to an embodiment of the present application includes: the monitoring module 110 is used for acquiring a learning monitoring video of a person to be detected; the sampling module 120 is configured to extract a plurality of learning key frames from the learning monitoring video; a noise reduction module 130, configured to pass the plurality of learning key frames through an image noise reducer based on an automatic codec, respectively, to obtain a plurality of noise reduction learning key frames; a key frame semantic understanding module 140, configured to pass the plurality of noise reduction learning key frames through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors, respectively; a timing correlation module 150 for passing the plurality of learning semantic understanding feature vectors through a converter-based context encoder to obtain a timing correlated learning feature vector; and a detection result generating module 160, configured to pass the time-series-associated learning feature vector through a classifier to obtain a classification result, where the classification result is used to indicate whether the learning behavior of the person to be detected is normalized.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described learning behavior intelligent analysis-based teaching system 100 have been described in detail in the above description of the learning behavior intelligent analysis-based teaching method with reference to fig. 1 to 4, and thus, repetitive descriptions thereof will be omitted.
As described above, the teaching system 100 based on learning behavior intelligent analysis according to the embodiment of the present application may be implemented in various terminal devices, for example, a server or the like for teaching based on learning behavior intelligent analysis. In one example, the learning behavior intelligent analysis-based teaching system 100 according to an embodiment of the present application may be integrated into a terminal device as one software module and/or hardware module. For example, the learning behavior intelligent analysis-based tutorial system 100 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the learning behavior intelligent analysis-based teaching system 100 can also be one of a plurality of hardware modules of the terminal device.
Alternatively, in another example, the learning behavior intelligent analysis-based tutorial system 100 and the terminal device may be separate devices, and the learning behavior intelligent analysis-based tutorial system 100 may be connected to the terminal device through a wired and/or wireless network and transmit interactive information in a contracted data format.
Exemplary electronic device
Next, an electronic device according to an embodiment of the present application is described with reference to fig. 6. Fig. 6 is a block diagram of an electronic device according to an embodiment of the application. As shown in fig. 6, the electronic device 10 includes one or more processors 11 and a memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. On which one or more computer program instructions may be stored that may be executed by the processor 11 to implement the functions in the learning behavior intelligent analysis based teaching method and/or other desired functions of the various embodiments of the present application described above. Various contents such as a learning monitoring video of a person to be detected may also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
The input means 13 may comprise, for example, a keyboard, a mouse, etc.
The output device 14 may output various information including the classification result and the like to the outside. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 10 that are relevant to the present application are shown in fig. 6 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in the functionality of the teaching method based on learning behavior intelligent analysis according to various embodiments of the application described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, on which computer program instructions are stored, which, when being executed by a processor, cause the processor to perform steps in the functionality of the learning behavior intelligent analysis based teaching method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (10)

1. The teaching method based on intelligent analysis of learning behaviors is characterized by comprising the following steps:
acquiring a learning monitoring video of a person to be detected;
Extracting a plurality of learning key frames from the learning monitoring video;
the plurality of learning key frames are respectively passed through an image noise reducer based on an automatic coder and decoder to obtain a plurality of noise reduction learning key frames;
the plurality of noise reduction learning key frames are respectively passed through a ViT image encoder to obtain a plurality of learning semantic understanding feature vectors;
passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain a time-series associated learning feature vector; and
and the time sequence associated learning feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not.
2. The teaching method based on intelligent analysis of learning behavior according to claim 1, wherein extracting a plurality of learning key frames from the learning monitoring video comprises:
a plurality of learning key frames are extracted from the learning monitoring video at a predetermined sampling frequency.
3. The teaching method based on intelligent analysis of learning behavior according to claim 2, wherein passing the plurality of learning key frames through an automatic codec-based image denoising device to obtain a plurality of denoising learning key frames, respectively, comprises:
Inputting the learning key frame into an encoder of the image noise reducer, wherein the encoder uses a convolution layer to perform explicit spatial encoding on the learning key frame to obtain image characteristics; and
and inputting the image features into a decoder of the image noise reducer, wherein the decoder uses a deconvolution layer to deconvolute the image features to obtain the noise reduction learning key frame.
4. The teaching method based on intelligent analysis of learning behavior according to claim 3, wherein the step of passing the plurality of noise reduction learning key frames through ViT image encoders to obtain a plurality of learning semantic understanding feature vectors, respectively, comprises:
image blocking processing is carried out on each noise reduction learning key frame so as to obtain a sequence of image blocks;
using the embedding layer of the ViT image encoder to respectively carry out embedded coding on each image block in the sequence of the image blocks so as to obtain a sequence of image block embedded vectors; and
the sequence of image block embedded vectors is input to a converter module of the ViT image encoder to derive the learning semantic understanding feature vector.
5. The teaching method based on intelligent analysis of learning behavior according to claim 4, wherein the embedding layer of the ViT image encoder is used to perform embedded encoding on each image block in the sequence of image blocks to obtain a sequence of image block embedded vectors, respectively, comprising:
Respectively expanding each image block in the sequence of image blocks into one-dimensional pixel input vectors to obtain a plurality of one-dimensional pixel input vectors; and
and performing full-concatenated coding on each one-dimensional pixel input vector in the plurality of one-dimensional pixel input vectors by using an embedding layer of the ViT image encoder to obtain a sequence of the image block embedding vectors.
6. The learning behavior intelligent analysis based teaching method of claim 5, wherein passing the plurality of learning semantic understanding feature vectors through a converter-based context encoder to obtain a time-series correlated learning feature vector, comprising:
global context semantic coding the plurality of learning semantic understanding feature vectors using the converter-based context encoder to obtain a plurality of context learning feature vectors;
calculating a Gaussian mixture model of the plurality of context learning feature vectors, wherein the mean vector of the Gaussian mixture model is a per-position mean vector of the plurality of context learning feature vectors, and the value of each position in a covariance matrix of the Gaussian mixture model is the variance between the feature values of the corresponding two positions in the per-position mean vector;
Calculating Gaussian probability density distribution distance indexes of each context learning feature vector in the context learning feature vectors relative to the Gaussian mixture model to obtain a plurality of Gaussian probability density distribution distance indexes;
weighting the context learning feature vectors by taking the Gaussian probability density distribution distance indexes as weights to obtain corrected context learning feature vectors; and
and cascading the plurality of corrected context learning feature vectors to obtain the time sequence associated learning feature vector.
7. The learning behavior intelligent analysis based teaching method of claim 6, wherein globally context-semantically encoding the plurality of learning semantic understanding feature vectors using the converter-based context encoder to obtain a plurality of context learning feature vectors, comprising:
arranging the plurality of learning semantic understanding feature vectors into an input vector;
respectively converting the input vector into a query vector and a key vector through a learning embedding matrix;
calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix;
Carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix;
inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and
and multiplying the self-attention feature matrix by each learning semantic understanding feature vector in the plurality of learning semantic understanding feature vectors as a value vector to obtain the plurality of context learning feature vectors.
8. The learning behavior intelligent analysis-based teaching method according to claim 7, wherein calculating a gaussian probability density distribution distance index of each of the plurality of context learning feature vectors with respect to the gaussian mixture model to obtain a plurality of gaussian probability density distribution distance indexes comprises:
calculating Gaussian probability density distribution distance indexes of each context learning feature vector in the plurality of context learning feature vectors relative to the Gaussian mixture model according to the following Gaussian probability density distribution distance index formula to obtain a plurality of Gaussian probability density distribution distance indexes;
the Gaussian probability density distribution distance index formula is as follows:
Wherein V is i Is the i-th context learning feature vector, μ u Sum sigma u Is the mean vector and covariance matrix of the Gaussian mixture model, i.e. μ u A per-position mean vector representing the plurality of context learning feature vectors, and Σ u The value of each position in the per-position mean vector is the variance between the eigenvalues of the corresponding two positions, wherein the vector is a column vector,representing matrix multiplication->Represents the subtraction by position, exp (·) represents the natural exponential function operation, w i Is the i-th gaussian probability density distribution distance index.
9. The teaching method based on intelligent analysis of learning behavior according to claim 8, wherein the step of passing the time-series-associated learning feature vector through a classifier to obtain a classification result, wherein the classification result is used for indicating whether learning behavior of the person to be detected is normalized, comprises:
inputting the time sequence associated learning feature vector as a classification feature vector into a Softmax classification function of the classifier to obtain a probability value of the classification feature vector belonging to each classification label; and
and determining the classification label corresponding to the maximum probability value as the classification result.
10. A teaching system based on intelligent analysis of learning behavior, comprising:
The monitoring module is used for acquiring a learning monitoring video of the person to be detected;
the sampling module is used for extracting a plurality of learning key frames from the learning monitoring video;
the noise reduction module is used for respectively enabling the plurality of learning key frames to pass through an image noise reducer based on an automatic coder and decoder so as to obtain a plurality of noise reduction learning key frames;
the key frame semantic understanding module is used for enabling the plurality of noise reduction learning key frames to respectively pass through a ViT image encoder so as to obtain a plurality of learning semantic understanding feature vectors;
a timing correlation module for passing the plurality of learning semantic understanding feature vectors through a context encoder based on a converter to obtain a timing correlated learning feature vector; and
and the detection result generation module is used for enabling the time sequence associated learning feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the learning behavior of the person to be detected is standard or not.
CN202310435863.0A 2023-04-12 2023-04-12 Teaching method and system based on intelligent analysis of learning behaviors Pending CN116665086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310435863.0A CN116665086A (en) 2023-04-12 2023-04-12 Teaching method and system based on intelligent analysis of learning behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310435863.0A CN116665086A (en) 2023-04-12 2023-04-12 Teaching method and system based on intelligent analysis of learning behaviors

Publications (1)

Publication Number Publication Date
CN116665086A true CN116665086A (en) 2023-08-29

Family

ID=87716072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310435863.0A Pending CN116665086A (en) 2023-04-12 2023-04-12 Teaching method and system based on intelligent analysis of learning behaviors

Country Status (1)

Country Link
CN (1) CN116665086A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117158904A (en) * 2023-09-08 2023-12-05 上海市第四人民医院 Old people cognitive disorder detection system and method based on behavior analysis
CN117257302A (en) * 2023-09-20 2023-12-22 湖北万维科技发展有限责任公司 Personnel mental health state assessment method and system
CN117314709A (en) * 2023-11-30 2023-12-29 吉林省拓达环保设备工程有限公司 Intelligent monitoring system for sewage treatment progress
CN117438024A (en) * 2023-12-15 2024-01-23 吉林大学 Intelligent acquisition and analysis system and method for acute diagnosis patient sign data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117158904A (en) * 2023-09-08 2023-12-05 上海市第四人民医院 Old people cognitive disorder detection system and method based on behavior analysis
CN117158904B (en) * 2023-09-08 2024-05-24 上海市第四人民医院 Old people cognitive disorder detection system and method based on behavior analysis
CN117257302A (en) * 2023-09-20 2023-12-22 湖北万维科技发展有限责任公司 Personnel mental health state assessment method and system
CN117314709A (en) * 2023-11-30 2023-12-29 吉林省拓达环保设备工程有限公司 Intelligent monitoring system for sewage treatment progress
CN117438024A (en) * 2023-12-15 2024-01-23 吉林大学 Intelligent acquisition and analysis system and method for acute diagnosis patient sign data
CN117438024B (en) * 2023-12-15 2024-03-08 吉林大学 Intelligent acquisition and analysis system and method for acute diagnosis patient sign data

Similar Documents

Publication Publication Date Title
CN115203380B (en) Text processing system and method based on multi-mode data fusion
CN116665086A (en) Teaching method and system based on intelligent analysis of learning behaviors
CN108959482B (en) Single-round dialogue data classification method and device based on deep learning and electronic equipment
CN110446063B (en) Video cover generation method and device and electronic equipment
CN114511906A (en) Cross-modal dynamic convolution-based video multi-modal emotion recognition method and device and computer equipment
CN115783923B (en) Elevator fault mode identification system based on big data
CN116245513B (en) Automatic operation and maintenance system and method based on rule base
CN115471216B (en) Data management method of intelligent laboratory management platform
CN116308754B (en) Bank credit risk early warning system and method thereof
CN116257406A (en) Gateway data management method and system for smart city
CN116015837A (en) Intrusion detection method and system for computer network information security
CN112487786A (en) Natural language model pre-training method based on disorder rearrangement and electronic equipment
CN111881398A (en) Page type determination method, device and equipment and computer storage medium
CN117251699A (en) Medical big data analysis method and system based on artificial intelligence
CN116486308A (en) Teaching management system and method based on intelligent education
CN116432019A (en) Data processing method and related equipment
CN116759053A (en) Medical system prevention and control method and system based on Internet of things system
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN110826325A (en) Language model pre-training method and system based on confrontation training and electronic equipment
CN117316462A (en) Medical data management method
CN115862151B (en) Data processing system and method for predicting response capability of old people based on game
Abualkishik et al. Intelligent Gesture Recognition System for Deaf People by using CNN and IoT.
CN113792143B (en) Multi-language emotion classification method, device, equipment and storage medium based on capsule network
CN116484224A (en) Training method, device, medium and equipment for multi-mode pre-training model
CN116432705A (en) Text generation model construction method, text generation device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination