CN110807585A - Student classroom learning state online evaluation method and system - Google Patents

Student classroom learning state online evaluation method and system Download PDF

Info

Publication number
CN110807585A
CN110807585A CN201911047730.6A CN201911047730A CN110807585A CN 110807585 A CN110807585 A CN 110807585A CN 201911047730 A CN201911047730 A CN 201911047730A CN 110807585 A CN110807585 A CN 110807585A
Authority
CN
China
Prior art keywords
student
state
expression
class
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911047730.6A
Other languages
Chinese (zh)
Inventor
李霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Institute of Commerce and Technology
Original Assignee
Shandong Institute of Commerce and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Institute of Commerce and Technology filed Critical Shandong Institute of Commerce and Technology
Priority to CN201911047730.6A priority Critical patent/CN110807585A/en
Publication of CN110807585A publication Critical patent/CN110807585A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Abstract

The disclosure provides a method and a system for online evaluation of class learning state of a student. The on-line assessment method for the class learning state of the student comprises the following steps: synchronously collecting video images and sound signals of a student; correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same; and inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state of the student in the classroom.

Description

Student classroom learning state online evaluation method and system
Technical Field
The disclosure belongs to the field of classroom learning state evaluation of students, and particularly relates to an online classroom learning state evaluation method and system for students.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The correct evaluation of the classroom teaching effect is an important means for promoting the growth of students, the professional development of teachers and improving the classroom teaching quality, and the performance of students in classroom as the main body of classroom is an important component of the evaluation of the teaching effect. However, at present, because the number of students in a class is large, the classroom performance of each student is difficult to evaluate, and the evaluation results are general and not specific. This is not conducive to the quantitative assessment of classroom effectiveness, but also to the "targeted" training of each student. The invention can realize the on-line monitoring and evaluation of the class state of each student, effectively make up for the deficiency of the evaluation mode of the current student, and provide a basis for the evaluation of the teaching effect of the teacher. The on-line evaluation system for the class learning state of the student collects the facial image and the voice signal of the student in real time, and comprehensively infers the current learning state of the student through the analysis of facial expressions, voice speed and voice tone.
The inventor finds that in the on-line evaluation of the class learning state of the trainees, the problem to be solved is how to accurately evaluate the learning state of each trainee in real time. Specifically, the following problems can be distinguished: (1) the real-time collection problem of facial expressions and sound signals of each student. The existing method of installing cameras at four corners of a classroom can only monitor the dynamic state of the whole classroom macroscopically, and the facial expression and sound signals of each student are difficult to capture in real time; (2) real-time recognition of facial expressions. This involves the problem of the real-time and accuracy of the positioning and processing of the trainee's facial images and the recognition of facial expressions. (3) And combining the facial expression recognition result with the voice emotion recognition result to realize the learning state recognition problem. If only the facial expression or voice recognition result is relied on, it is difficult to accurately and comprehensively judge the mental state of the student at the moment, and the recognition results of the two are combined, and the mental state of the student at the last moment is also considered to improve the recognition accuracy.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a student classroom learning state online evaluation method and system, which identify a facial expression of a student in learning in real time through processing a video image, identify an emotional condition of the student in learning in real time through processing a sound signal, and integrate an expression identification result and an emotion identification result through a specific model, thereby implementing real-time identification of the learning state of the student.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
a first aspect of the present disclosure provides an online assessment method for a class learning state of a student, which includes:
synchronously collecting video images and sound signals of a student;
correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same;
inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state category of the student in the classroom; wherein, the student state evaluation model is as follows:
Rti=(αEti+βSti)·γ(t-1)i
wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; rtiThe probability value of the ith class learning state at the moment t is represented; etiProbability value of ith expression category at t moment; stiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ(t-1)i=1。
A second aspect of the present disclosure provides an online evaluation system for classroom learning states of trainees, comprising:
a state detector configured to:
synchronously collecting video images and sound signals of a student;
correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same;
the upper computer is configured to receive the expression recognition result and the emotion state recognition result, input the expression recognition result and the emotion state recognition result into the student state evaluation model and output evaluated class of class learning states of the students; wherein, the student state evaluation model is as follows:
Rti=(αEti+βSti)·γ(t-1)i
wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; rtiThe probability value of the ith class learning state at the moment t is represented; etiProbability value of ith expression category at t moment; stiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ(t-1)i=1。
A third aspect of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in the on-line assessment method for the classroom learning state of a student as described above.
A fourth aspect of the present disclosure provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the online evaluation method for classroom learning state of trainees as described above.
The beneficial effects of this disclosure are:
the method and the device have the advantages that the facial expressions of the students during learning are recognized in real time through processing of video images, meanwhile, the emotion conditions of the students during learning are recognized in real time through processing of sound signals, the expression recognition results and the emotion recognition results are fused through the specific model, the learning states of the students are recognized in real time, the recognition effect of the learning states of each student is effectively improved, the on-line monitoring and evaluation of the class state of each student are realized, the defects of the evaluation mode of the current students are effectively overcome, and reliable and full bases are provided for the targeted cultivation of the students and the evaluation of the teaching effects of teachers.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a flowchart of an online evaluation method for classroom learning states of trainees according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an online evaluation system for classroom learning states of trainees according to an embodiment of the present disclosure;
fig. 3(a) is a front view of a state detector provided by an embodiment of the present disclosure;
fig. 3(b) is a side view of a state detector provided in an embodiment of the disclosure
FIG. 4 is a schematic structural diagram of a state detector provided in an embodiment of the present disclosure;
fig. 5 is a face detection feature provided in an embodiment of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example 1
As shown in fig. 1, the method for online evaluating the classroom learning state of a student of the present embodiment includes:
step S101: synchronously collecting video images and sound signals of the trainees.
In the specific implementation, the video image and the sound signal of the student are synchronously collected from the video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.
Step S102: correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same.
In this embodiment, the categories into which the expression, the emotional state, and the classroom learning state are classified include: happy, calm, tired, confused, excited and depressed.
Specifically, the process of inputting the video image into the trained expression recognition model and outputting the expression category and the corresponding probability corresponding to the student comprises the following steps:
(1.1) face detection
The human face detection algorithm based on AdaBoost is a method based on statistical theory, and compared with other detection algorithms, the human face detection algorithm has higher accuracy and higher detection speed, but the real-time requirement of the system is difficult to meet. In general, the AdaBoost algorithm is combined with the Haar-like features to realize face detection, and the detection method reduces the detection efficiency to some extent due to the large number of windows needing to be detected. Therefore, if the number of detection windows is reduced, the real-time performance of detection can be improved.
The embodiment adopts a mode of combining coarse detection and fine detection to improve the real-time performance of detection.
And (3) a coarse detection process: firstly, acquiring a contour image of the upper half of a human body by utilizing boundary tracking, then extracting a constant-moment feature vector of the contour image, matching the feature vector with a feature vector of a template image, adopting Euclidean distance as similarity measurement, and if the Euclidean distance is smaller than a set threshold value, determining that a processed region is a human body contour region. Then the body area is retained and the other areas are discarded.
And (3) fine detection process: and processing the image area reserved after the coarse detection, and finally realizing the face detection. And adopting an AdaBoost face detection algorithm based on Haar-Like characteristics for fine detection. Features Like the features shown in fig. 5 are added into the original Haar-Like feature set to improve the success rate of face detection. Calculating Haar-Like characteristics by adopting an integral diagram method, then training a weak classifier by utilizing the characteristics, forming a strong classifier by weighting and overlaying after the weak classifier is trained as an optimal weak classifier, and then forming a cascade classifier after the strong classifier is trained. The detection adopts a binary decision tree form, and the sub-window image of the cascade classifier is screened to be the face region.
(1.2) face image cropping
And cutting the obtained face image area to obtain a face image area.
(1.3) rotation correction
Since the target image may have different degrees of angular deviation in each frame, this may affect the effect of expression recognition. For this reason, the obtained face image region needs to be subjected to rotation correction. The rotation correction of the face image is realized using equation 1 to obtain a standard face image.
Figure BDA0002254542310000061
Wherein (x)i,yj) Representing the original coordinates of the face image, (x'i,y′j) And (3) pixel values of pixels representing the coordinate points (i, j) are subjected to rotation transformation, the rotation angle theta is an included angle between a connecting line of the two eyes and a horizontal axis, and the face image after rotation is more standard.
(1.4) expression recognition
In order to realize expression recognition, the invention uses a two-channel weighted mixed CNN-LSTM network, and two channels in the network structure are both composed of partial VGG16 networks and LSTM networks which are connected in series. The two channels are respectively input by a face image and a local binary pattern image obtained by the face image, the two images are respectively input to a partial VGG16 network to realize feature vector extraction, a feature vector sequence is sent to an LSTM network for training, and after the LSTM network is trained, the output vectors of the two channels of the LSTM network are subjected to weighted fusion to obtain a final recognition result.
The weighted fusion and identification process is that the output vectors of two channels are assumed to be F respectively1And F2,F1And F2Respectively passing through three full connection layers, Fc1={v1,v2,…,v800},Fc2={s1,s2,…,s400And Fc3={c1,c2,…,c6In this embodiment, the number of these three full-connection layer neurons is respectively 800, 400, and 6, and the feature vector M can be obtained respectively1And M2. To M1And M2Can obtain O by averaging1={o11,o12,…,o16And O2={o21,o22,…,o26And weighting and fusing the two vectors according to the formula 2 to obtain an output R ═ R1,r2,…,r6}. For the output R, the probability of each expression category can be found using the softmax function (equation 3).
ri=k·o1i+(1-k)·o2i(i=1,2,…,6;k∈[0,1])(2)
Where k is the fusion weight, obtained by 10-fold cross validation.
Figure BDA0002254542310000071
Wherein y isi' is the probability of the ith expression category.
The present embodiment replaces the conventional convolution with a separable convolution and cuts down the amount of operations by a suitable network.
Specifically, the process of inputting the voice signal into the trained emotional state recognition model and outputting the emotional state category and the corresponding probability corresponding to the trainee is as follows:
pre-emphasis processing is carried out on the acquired voice signals by adopting a pre-emphasis filter;
a finite impulse response filter is used, as shown in equation 4.
H(z)=1-αz-1(4)
Wherein α is a pre-emphasis coefficient, and the value in the invention is 0.938.
The speech signal after filtering by the pre-emphasis filter is framed to 25ms one frame with 10ms overlap between frames. The speech signal is multiplied by a hamming window function for framing to reduce discontinuities in the speech signal, which also avoids leakage in the spectrum. The method adopts a double-threshold method based on short-time energy and short-time average zero crossing rate to carry out end point detection so as to detect the starting point and the ending point of the voice.
Extracting the characteristics of a pre-emphasis processed sound signal, wherein the characteristics of the sound signal comprise a fundamental tone frequency, short-time energy, a short-time zero-crossing rate, a formant and a Mel frequency domain cepstrum coefficient;
fundamental tone frequency is called fundamental frequency for short, actually refers to the rule of vocal cord vibration during sounding, and is one of the most important features in speech signal analysis. In order to avoid formant interference, a cepstrum method is selected to extract the pitch signal. The cepstrum of the signal x (n) is defined as the inverse discrete fourier transform of the logarithm of its spectrum, as shown in equation 5.
Figure BDA0002254542310000081
The short-time energy reflects the amplitude characteristic of the speech signal and can be used for distinguishing speech from noise. Furthermore, the intensity of the speaker's voice can vary significantly under different emotional conditions, e.g., the speech signal for happy and angry emotions is much higher than the energy for sadness. The definition of the short-time energy is shown in equation 6.
Wherein xn(m) represents the mth data point in the nth frame, and N is the total number of data points.
The short-time zero-crossing rate represents the number of times the speech signal wave passes through the horizontal axis in each frame, and can be used for endpoint detection and silence removal. The formula for the short-time zero-crossing rate is shown in fig. 7.
Figure BDA0002254542310000091
Formants refer to regions in the frequency spectrum of sound where energy is relatively concentrated, and are not only determinants of sound quality, but also reflect physical characteristics of vocal tract. Formants are usually extracted by linear prediction, and the prediction error can be represented by equation 8.
Figure BDA0002254542310000092
In the formula, xn(m) is a speech signal, aiIs the prediction coefficient, n is the frame number, p is the order, e (m) is the prediction error. Taking future samples by observing the previous p samples, the goal being to adjust aiTo minimize prediction error.
Mel frequency domain cepstrum coefficients (MFCC) are one of the most widely used features in speech analysis, and the parameters are extracted based on the auditory system of human ears and combined with the speech generation mechanism, so that a natural and real reference is provided for speech recognition. The MFCC parameters are obtained through the following steps: after preprocessing a voice signal, performing fast Fourier transform on an obtained frame signal to obtain an energy spectrum of the frame signal, then filtering by using 24 Mel filter banks, performing logarithm operation on all filter outputs to obtain a corresponding logarithm power spectrum, then performing inverse discrete cosine transform to obtain 14 static MFCCs, and finally performing first-order difference and second-order difference on the obtained static MFCCs to obtain corresponding dynamic characteristics.
Inputting the extracted characteristics of the sound signals into a trained emotional state recognition model, and outputting the emotional state category and the probability corresponding to the student; the emotional state recognition model is a DBN (database-based network + softmax) classifier structure.
According to the embodiment, the DBN is trained to realize speech emotion recognition, the DBN is modified, and then the speech emotion recognition rate is further improved by combining with the SVM. Firstly, training RBM network parameters of each layer by layer from bottom to top through unsupervised learning to enable DBN network parameters to achieve global optimization, then utilizing a BP network to conduct supervised learning training, namely comparing output of the DBN network with training data through the BP neural network, and returning obtained errors from top to bottom to correct the network parameters to an optimal value. After the DBN network training is finished, a softmax classifier at the top layer of the network is used
SVM is substituted to improve the accuracy of classification. In the new speech recognition classifier, the DBN part is used for extracting features, the SVM part is trained by using the obtained features, and corresponding SVM parameters are obtained, so that the training of the whole network is completed.
Step S103: inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state category of the student in the classroom; wherein, the student state evaluation model is as follows:
Rti=(αEti+βSti)·γ(t-1)i(9)
wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; rtiThe probability value of the ith class learning state at the moment t is represented; etiProbability value of ith expression category at t moment; stiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ(t-1)i=1。
In this example, α, β are 0.6 and 0.4, respectively.
As an embodiment, the online evaluation method for the classroom learning state of the trainee further comprises the following steps:
and recording the class and the state duration of the classroom learning state of the evaluated student, and further calculating the classroom learning state of the student in a preset time period and the proportion of the classroom learning state of the student in the preset time period.
The learning state of the student is comprehensively judged for a period of time by using historical data (week, month, school date or school year).
The class learning state of the student in a certain fixed time period in one week (month, school time or school year) and the proportion of the class learning state of each student can be counted, as shown in the formula 10. For example, the mental state of the trainee and the proportion of a certain state in the period of 8:00 to 10:00 every two weeks in a month. The display modes such as a table, a bar chart or a pie chart can be selected.
Figure BDA0002254542310000111
Wherein p isiIs the proportion of the ith mental state; n is the number of time periods within the counted time range, for example, in one month, if the time period from 8:00 to 10:00 of tuesday is 4, n is 4; t is tiThe duration of the i-th state within a time period.
Or counting the mental states of all students in a certain fixed time period in a week (month, school date or school year) and the proportion of each mental state, as shown in formula 11.
Figure BDA0002254542310000112
Wherein m is the total number of the students; pikRepresents the proportion of the ith mental state of the kth student; a. theiRepresenting the proportion of the ith mental state of all students. This can be used as the basis for the evaluation of the teaching effect of a certain course.
The facial expression of the student during learning is recognized in real time through processing of video images, meanwhile, through processing of sound signals, the emotion condition of the student during learning is recognized in real time, the expression recognition result and the emotion recognition result are fused through the specific model, real-time recognition of the learning state of the student is achieved, the recognition effect of the learning state of each student is effectively improved, on-line monitoring and evaluation of the class state of each student are achieved, the defects of the evaluation mode of the current student are effectively overcome, and reliable and full bases are provided for targeted cultivation of the student and evaluation of the teaching effect of a teacher in class.
Example 2
As shown in fig. 2, an online evaluation system for classroom learning states of trainees comprises:
a state detector 1 configured to:
synchronously collecting video images and sound signals of a student;
correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same.
In the specific implementation, a state detector is arranged right in front of each student, the state detectors are mounted on a desk and used for collecting front video signals and sound signals of the students in class, processing the two signals in real time and transmitting the processing result to an upper computer through WiFi; the upper computer is arranged in the teacher office and used for archiving and counting the learning state data of the students.
As shown in fig. 3(a) and 3(b), the state detector includes a camera 11, and the camera 11 is used for collecting a front video image signal including a face of the trainee; a sound pick-up 12 for collecting the voice signal of the student, an LED lamp 13 for auxiliary lighting when the light is insufficient in the classroom to optimize the learning environment of the student, and a main cabinet 14 of the state detection machine. The state detector main case 14 is provided at the bottom thereof with a state detector support 15, and the state detector support 15 is provided at the bottom thereof with a state detector base 16.
The state detection machine and the upper computer adopt an N:1 link mode, and each detection machine is provided with an independent IP address through WiFi communication. And the upper computer reads the expression and sound emotion recognition results of the state detection machine in sequence in a polling mode, and stores and analyzes the results. And a direct transmission mode can be adopted, and video and sound signals of the specified state detection machine can be directly transmitted to the upper computer.
As shown in fig. 4, in the schematic structural diagram of the state detector, a video image capture card 21 and a sound signal capture card 22 are disposed in a main case of the state detector, which mainly achieve optimal capture of video and sound signals, the captured signals are respectively sent to a video image processing chip 23 and a sound processing chip 24, the signals are processed and then the processed result is sent to an upper computer through a communication circuit 25, and the communication circuit 25 can also send the instruction of the upper computer to the video image processing chip 23 or the sound processing chip 24, for example, achieve direct transmission of the video image and the sound signal.
The method comprises the steps that video images and sound signals of students are synchronously collected from a video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.
In the expression and emotion state classification module, inputting a video image into a trained expression recognition model, and outputting the expression category and the corresponding probability corresponding to the student as follows:
firstly, acquiring a contour image of the upper half of a human body from a video image by utilizing boundary tracking, then extracting a constant-moment characteristic vector of the contour image, matching the characteristic vector with a characteristic vector of a template image, and roughly detecting a human body contour region by adopting Euclidean distance as similarity measurement; adopting an AdaBoost face detection algorithm based on Haar-Like characteristics to the image area reserved after the coarse detection, and finely detecting the face area;
cutting the finely detected face region to obtain a face image region, inputting the face image region into a trained expression recognition model after rotation correction, and outputting the expression type and the probability corresponding to the student; the expression recognition model is a CNN-LSTM network with a double-channel weighted mixture.
In the expression and emotional state classification module, a voice signal is input into the trained emotional state recognition model, and the process of outputting the emotional state category and the corresponding probability corresponding to the student is as follows:
pre-emphasis processing is carried out on the acquired voice signals by adopting a pre-emphasis filter;
extracting the characteristics of a pre-emphasis processed sound signal, wherein the characteristics of the sound signal comprise a fundamental tone frequency, short-time energy, a short-time zero-crossing rate, a formant and a Mel frequency domain cepstrum coefficient;
inputting the extracted characteristics of the sound signals into a trained emotional state recognition model, and outputting the emotional state category and the probability corresponding to the student; the emotional state recognition model is a DBN (database-based network + softmax) classifier structure.
The upper computer 2 is configured to receive the expression recognition result and the emotion state recognition result, input the expression recognition result and the emotion state recognition result into a student state evaluation model, and output evaluated class learning state categories of the student in a classroom; wherein, the student state evaluation model is as follows:
Rti=(αEti+βSti)·γ(t-1)i
wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; rtiThe probability value of the ith class learning state at the moment t is represented; etiProbability value of ith expression category at t moment; stiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ(t-1)i=1。
The upper computer is further configured to:
and recording the class and the state duration of the classroom learning state of the evaluated student, and further calculating the classroom learning state of the student in a preset time period and the proportion of the classroom learning state of the student in the preset time period.
The facial expression of the student during learning is recognized in real time through processing of video images, meanwhile, through processing of sound signals, the emotion condition of the student during learning is recognized in real time, the expression recognition result and the emotion recognition result are fused through the specific model, real-time recognition of the learning state of the student is achieved, the recognition effect of the learning state of each student is effectively improved, on-line monitoring and evaluation of the class state of each student are achieved, the defects of the evaluation mode of the current student are effectively overcome, and reliable and full bases are provided for targeted cultivation of the student and evaluation of the teaching effect of a teacher in class.
Example 3
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the on-line assessment method for the classroom learning state of a trainee as described in embodiment 1.
The facial expression of the student during learning is recognized in real time through processing of video images, meanwhile, through processing of sound signals, the emotion condition of the student during learning is recognized in real time, the expression recognition result and the emotion recognition result are fused through the specific model, real-time recognition of the learning state of the student is achieved, the recognition effect of the learning state of each student is effectively improved, on-line monitoring and evaluation of the class state of each student are achieved, the defects of the evaluation mode of the current student are effectively overcome, and reliable and full bases are provided for targeted cultivation of the student and evaluation of the teaching effect of a teacher in class.
Example 4
This embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the online evaluation method for classroom learning status of trainees in embodiment 1.
The facial expression of the student during learning is recognized in real time through processing of video images, meanwhile, through processing of sound signals, the emotion condition of the student during learning is recognized in real time, the expression recognition result and the emotion recognition result are fused through the specific model, real-time recognition of the learning state of the student is achieved, the recognition effect of the learning state of each student is effectively improved, on-line monitoring and evaluation of the class state of each student are achieved, the defects of the evaluation mode of the current student are effectively overcome, and reliable and full bases are provided for targeted cultivation of the student and evaluation of the teaching effect of a teacher in class.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. An online assessment method for classroom learning state of a student is characterized by comprising the following steps:
synchronously collecting video images and sound signals of a student;
correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same;
inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state category of the student in the classroom; wherein, the student state evaluation model is as follows:
Rti=(αEti+βSti)·γ(t-1)i
wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; rtiThe probability value of the ith class learning state at the moment t is represented; etiFor the ith expression category at time tA probability value of (d); stiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ(t-1)i=1。
2. The on-line assessment method for classroom learning status of trainees as claimed in claim 1, wherein the video image and the sound signal of the trainees are synchronously collected from the video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.
3. The on-line assessment method for classroom learning status of a student as claimed in claim 1, wherein said method further comprises:
and recording the class and the state duration of the classroom learning state of the evaluated student, and further calculating the classroom learning state of the student in a preset time period and the proportion of the classroom learning state of the student in the preset time period.
4. The on-line assessment method for classroom learning status of trainees as claimed in claim 1, wherein the process of inputting the video images into the trained expression recognition model and outputting the corresponding expression classes and corresponding probabilities of the trainees comprises:
firstly, acquiring a contour image of the upper half of a human body from a video image by utilizing boundary tracking, then extracting a constant-moment characteristic vector of the contour image, matching the characteristic vector with a characteristic vector of a template image, and roughly detecting a human body contour region by adopting Euclidean distance as similarity measurement; adopting an AdaBoost face detection algorithm based on Haar-Like characteristics to the image area reserved after the coarse detection, and finely detecting the face area;
cutting the finely detected face region to obtain a face image region, inputting the face image region into a trained expression recognition model after rotation correction, and outputting the expression type and the probability corresponding to the student; the expression recognition model is a CNN-LSTM network with a double-channel weighted mixture.
5. The on-line assessment method for classroom learning status of trainees as claimed in claim 1, wherein the process of inputting the voice signal into the trained emotional state recognition model and outputting the corresponding emotional state type and probability of the trainees comprises:
pre-emphasis processing is carried out on the acquired voice signals by adopting a pre-emphasis filter;
extracting the characteristics of a pre-emphasis processed sound signal, wherein the characteristics of the sound signal comprise a fundamental tone frequency, short-time energy, a short-time zero-crossing rate, a formant and a Mel frequency domain cepstrum coefficient;
inputting the extracted characteristics of the sound signals into a trained emotional state recognition model, and outputting the emotional state category and the probability corresponding to the student; the emotional state recognition model is a DBN (database-based network + softmax) classifier structure.
6. An online assessment system for classroom learning states of trainees, comprising:
a state detector configured to:
synchronously collecting video images and sound signals of a student;
correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same;
the upper computer is configured to receive the expression recognition result and the emotion state recognition result, input the expression recognition result and the emotion state recognition result into the student state evaluation model and output evaluated class of class learning states of the students; wherein, the student state evaluation model is as follows:
Rti=(αEti+βSti)·γ(t-1)i
wherein i is the ith class learning state, iThe value range is 1-M, and M represents the general class of the classroom learning state; rtiThe probability value of the ith class learning state at the moment t is represented; etiProbability value of ith expression category at t moment; stiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ(t-1)i=1。
7. The system of claim 6, wherein the video image and the audio signal of the student are synchronously collected from the video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.
8. The student classroom learning state online evaluation system of claim 6, wherein the host computer is further configured to:
recording the class and the state duration of the classroom learning state of the evaluated student, and further calculating the classroom learning state of the student within a preset time period and the proportion of the classroom learning state of the student within the preset time period;
or in the expression and emotion state classification module, inputting the video image into the trained expression recognition model, and outputting the expression category and the corresponding probability corresponding to the student as follows:
firstly, acquiring a contour image of the upper half of a human body from a video image by utilizing boundary tracking, then extracting a constant-moment characteristic vector of the contour image, matching the characteristic vector with a characteristic vector of a template image, and roughly detecting a human body contour region by adopting Euclidean distance as similarity measurement; adopting an AdaBoost face detection algorithm based on Haar-Like characteristics to the image area reserved after the coarse detection, and finely detecting the face area;
cutting the finely detected face region to obtain a face image region, inputting the face image region into a trained expression recognition model after rotation correction, and outputting the expression type and the probability corresponding to the student; the expression recognition model is a CNN-LSTM network with a double-channel weighted mixture;
or in the expression and emotional state classification module, inputting the voice signal into the trained emotional state recognition model, and outputting the emotional state category and the corresponding probability corresponding to the student as follows:
pre-emphasis processing is carried out on the acquired voice signals by adopting a pre-emphasis filter;
extracting the characteristics of a pre-emphasis processed sound signal, wherein the characteristics of the sound signal comprise a fundamental tone frequency, short-time energy, a short-time zero-crossing rate, a formant and a Mel frequency domain cepstrum coefficient;
inputting the extracted characteristics of the sound signals into a trained emotional state recognition model, and outputting the emotional state category and the probability corresponding to the student; the emotional state recognition model is a DBN (database-based network + softmax) classifier structure.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps in the on-line assessment method for the classroom learning state of a trainee as claimed in any one of claims 1 to 5.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the on-line assessment method for classroom learning status of a student as claimed in any one of claims 1-5 when executing the program.
CN201911047730.6A 2019-10-30 2019-10-30 Student classroom learning state online evaluation method and system Pending CN110807585A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911047730.6A CN110807585A (en) 2019-10-30 2019-10-30 Student classroom learning state online evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911047730.6A CN110807585A (en) 2019-10-30 2019-10-30 Student classroom learning state online evaluation method and system

Publications (1)

Publication Number Publication Date
CN110807585A true CN110807585A (en) 2020-02-18

Family

ID=69489723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911047730.6A Pending CN110807585A (en) 2019-10-30 2019-10-30 Student classroom learning state online evaluation method and system

Country Status (1)

Country Link
CN (1) CN110807585A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507227A (en) * 2020-04-10 2020-08-07 南京汉韬科技有限公司 Multi-student individual segmentation and state autonomous identification method based on deep learning
CN111683289A (en) * 2020-08-17 2020-09-18 江苏清微智能科技有限公司 System and method for acquiring online duration data
CN112185191A (en) * 2020-09-21 2021-01-05 信阳职业技术学院 Intelligent digital teaching model
CN112418068A (en) * 2020-11-19 2021-02-26 中国平安人寿保险股份有限公司 On-line training effect evaluation method, device and equipment based on emotion recognition
CN113221784A (en) * 2021-05-20 2021-08-06 杭州麦淘淘科技有限公司 Multi-mode-based student learning state analysis method and device
CN116757524A (en) * 2023-05-08 2023-09-15 广东保伦电子股份有限公司 Teacher teaching quality evaluation method and device
CN116797090A (en) * 2023-06-26 2023-09-22 国信蓝桥教育科技股份有限公司 Online assessment method and system for classroom learning state of student

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831411A (en) * 2012-09-07 2012-12-19 云南晟邺科技有限公司 Quick face detection method
CN106878677A (en) * 2017-01-23 2017-06-20 西安电子科技大学 Student classroom Grasping level assessment system and method based on multisensor
CN107785061A (en) * 2017-10-10 2018-03-09 东南大学 Autism-spectrum disorder with children mood ability interfering system
CN108491835A (en) * 2018-06-12 2018-09-04 常州大学 Binary channels convolutional neural networks towards human facial expression recognition
CN109493886A (en) * 2018-12-13 2019-03-19 西安电子科技大学 Speech-emotion recognition method based on feature selecting and optimization
CN110268444A (en) * 2019-02-26 2019-09-20 武汉资联虹康科技股份有限公司 A kind of number of people posture tracing system for transcranial magnetic stimulation diagnosis and treatment
US20190311188A1 (en) * 2018-12-05 2019-10-10 Sichuan University Face emotion recognition method based on dual-stream convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831411A (en) * 2012-09-07 2012-12-19 云南晟邺科技有限公司 Quick face detection method
CN106878677A (en) * 2017-01-23 2017-06-20 西安电子科技大学 Student classroom Grasping level assessment system and method based on multisensor
CN107785061A (en) * 2017-10-10 2018-03-09 东南大学 Autism-spectrum disorder with children mood ability interfering system
CN108491835A (en) * 2018-06-12 2018-09-04 常州大学 Binary channels convolutional neural networks towards human facial expression recognition
US20190311188A1 (en) * 2018-12-05 2019-10-10 Sichuan University Face emotion recognition method based on dual-stream convolutional neural network
CN109493886A (en) * 2018-12-13 2019-03-19 西安电子科技大学 Speech-emotion recognition method based on feature selecting and optimization
CN110268444A (en) * 2019-02-26 2019-09-20 武汉资联虹康科技股份有限公司 A kind of number of people posture tracing system for transcranial magnetic stimulation diagnosis and treatment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟鑫: "基于SVM和DBN的情绪识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
黄奕玮: "基于深度学习的面部表情识别研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507227A (en) * 2020-04-10 2020-08-07 南京汉韬科技有限公司 Multi-student individual segmentation and state autonomous identification method based on deep learning
CN111507227B (en) * 2020-04-10 2023-04-18 南京汉韬科技有限公司 Multi-student individual segmentation and state autonomous identification method based on deep learning
CN111683289A (en) * 2020-08-17 2020-09-18 江苏清微智能科技有限公司 System and method for acquiring online duration data
CN112185191A (en) * 2020-09-21 2021-01-05 信阳职业技术学院 Intelligent digital teaching model
CN112418068A (en) * 2020-11-19 2021-02-26 中国平安人寿保险股份有限公司 On-line training effect evaluation method, device and equipment based on emotion recognition
CN113221784A (en) * 2021-05-20 2021-08-06 杭州麦淘淘科技有限公司 Multi-mode-based student learning state analysis method and device
CN116757524A (en) * 2023-05-08 2023-09-15 广东保伦电子股份有限公司 Teacher teaching quality evaluation method and device
CN116757524B (en) * 2023-05-08 2024-02-06 广东保伦电子股份有限公司 Teacher teaching quality evaluation method and device
CN116797090A (en) * 2023-06-26 2023-09-22 国信蓝桥教育科技股份有限公司 Online assessment method and system for classroom learning state of student
CN116797090B (en) * 2023-06-26 2024-03-26 国信蓝桥教育科技股份有限公司 Online assessment method and system for classroom learning state of student

Similar Documents

Publication Publication Date Title
CN110807585A (en) Student classroom learning state online evaluation method and system
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN106599881A (en) Student state determination method, device and system
US11138989B2 (en) Sound quality prediction and interface to facilitate high-quality voice recordings
Jaumard-Hakoun et al. An articulatory-based singing voice synthesis using tongue and lips imaging
CN101199207A (en) Method, system, and program product for measuring audio video synchronization independent of speaker characteristics
CN110544481B (en) S-T classification method and device based on voiceprint recognition and equipment terminal
Muckenhirn et al. Understanding and Visualizing Raw Waveform-Based CNNs.
Le Cornu et al. Reconstructing intelligible audio speech from visual speech features.
CN110827793A (en) Language identification method
Sefara The effects of normalisation methods on speech emotion recognition
CN109584888A (en) Whistle recognition methods based on machine learning
CN113920534A (en) Method, system and storage medium for extracting video highlight
Murugaiya et al. Probability enhanced entropy (PEE) novel feature for improved bird sound classification
Chowdhury et al. Extracting sub-glottal and supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals
CN114582355A (en) Audio and video fusion-based infant crying detection method and device
CN111932056A (en) Customer service quality scoring method and device, computer equipment and storage medium
CN110956142A (en) Intelligent interactive training system
CN114302301B (en) Frequency response correction method and related product
Milner et al. Reconstructing intelligible audio speech from visual speech features
CN111091816B (en) Data processing system and method based on voice evaluation
CN111210845B (en) Pathological voice detection device based on improved autocorrelation characteristics
CN116935889B (en) Audio category determining method and device, electronic equipment and storage medium
Pathonsuwan et al. RS-MSConvNet: A Novel End-to-End Pathological Voice Detection Model
Francisco Carlos et al. An analysis of visual speech features for recognition of non-articulatory sounds using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200218