CN110807585A

CN110807585A - Student classroom learning state online evaluation method and system

Info

Publication number: CN110807585A
Application number: CN201911047730.6A
Authority: CN
Inventors: 李霞
Original assignee: Shandong Institute of Commerce and Technology
Current assignee: Shandong Institute of Commerce and Technology
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-18

Abstract

The disclosure provides a method and a system for online evaluation of class learning state of a student. The on-line assessment method for the class learning state of the student comprises the following steps: synchronously collecting video images and sound signals of a student; correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same; and inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state of the student in the classroom.

Description

Student classroom learning state online evaluation method and system

Technical Field

The disclosure belongs to the field of classroom learning state evaluation of students, and particularly relates to an online classroom learning state evaluation method and system for students.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The correct evaluation of the classroom teaching effect is an important means for promoting the growth of students, the professional development of teachers and improving the classroom teaching quality, and the performance of students in classroom as the main body of classroom is an important component of the evaluation of the teaching effect. However, at present, because the number of students in a class is large, the classroom performance of each student is difficult to evaluate, and the evaluation results are general and not specific. This is not conducive to the quantitative assessment of classroom effectiveness, but also to the "targeted" training of each student. The invention can realize the on-line monitoring and evaluation of the class state of each student, effectively make up for the deficiency of the evaluation mode of the current student, and provide a basis for the evaluation of the teaching effect of the teacher. The on-line evaluation system for the class learning state of the student collects the facial image and the voice signal of the student in real time, and comprehensively infers the current learning state of the student through the analysis of facial expressions, voice speed and voice tone.

The inventor finds that in the on-line evaluation of the class learning state of the trainees, the problem to be solved is how to accurately evaluate the learning state of each trainee in real time. Specifically, the following problems can be distinguished: (1) the real-time collection problem of facial expressions and sound signals of each student. The existing method of installing cameras at four corners of a classroom can only monitor the dynamic state of the whole classroom macroscopically, and the facial expression and sound signals of each student are difficult to capture in real time; (2) real-time recognition of facial expressions. This involves the problem of the real-time and accuracy of the positioning and processing of the trainee's facial images and the recognition of facial expressions. (3) And combining the facial expression recognition result with the voice emotion recognition result to realize the learning state recognition problem. If only the facial expression or voice recognition result is relied on, it is difficult to accurately and comprehensively judge the mental state of the student at the moment, and the recognition results of the two are combined, and the mental state of the student at the last moment is also considered to improve the recognition accuracy.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides a student classroom learning state online evaluation method and system, which identify a facial expression of a student in learning in real time through processing a video image, identify an emotional condition of the student in learning in real time through processing a sound signal, and integrate an expression identification result and an emotion identification result through a specific model, thereby implementing real-time identification of the learning state of the student.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

a first aspect of the present disclosure provides an online assessment method for a class learning state of a student, which includes:

synchronously collecting video images and sound signals of a student;

correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same;

inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state category of the student in the classroom; wherein, the student state evaluation model is as follows:

R_ti＝(αE_ti+βS_ti)·γ_(t-1)i

wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; r_tiThe probability value of the ith class learning state at the moment t is represented; e_tiProbability value of ith expression category at t moment; s_tiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is_(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ_(t-1)i＝1。

A second aspect of the present disclosure provides an online evaluation system for classroom learning states of trainees, comprising:

a state detector configured to:

synchronously collecting video images and sound signals of a student;

the upper computer is configured to receive the expression recognition result and the emotion state recognition result, input the expression recognition result and the emotion state recognition result into the student state evaluation model and output evaluated class of class learning states of the students; wherein, the student state evaluation model is as follows:

R_ti＝(αE_ti+βS_ti)·γ_(t-1)i

A third aspect of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in the on-line assessment method for the classroom learning state of a student as described above.

A fourth aspect of the present disclosure provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the online evaluation method for classroom learning state of trainees as described above.

The beneficial effects of this disclosure are:

the method and the device have the advantages that the facial expressions of the students during learning are recognized in real time through processing of video images, meanwhile, the emotion conditions of the students during learning are recognized in real time through processing of sound signals, the expression recognition results and the emotion recognition results are fused through the specific model, the learning states of the students are recognized in real time, the recognition effect of the learning states of each student is effectively improved, the on-line monitoring and evaluation of the class state of each student are realized, the defects of the evaluation mode of the current students are effectively overcome, and reliable and full bases are provided for the targeted cultivation of the students and the evaluation of the teaching effects of teachers.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

Fig. 1 is a flowchart of an online evaluation method for classroom learning states of trainees according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an online evaluation system for classroom learning states of trainees according to an embodiment of the present disclosure;

fig. 3(a) is a front view of a state detector provided by an embodiment of the present disclosure;

fig. 3(b) is a side view of a state detector provided in an embodiment of the disclosure

FIG. 4 is a schematic structural diagram of a state detector provided in an embodiment of the present disclosure;

fig. 5 is a face detection feature provided in an embodiment of the present disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example 1

As shown in fig. 1, the method for online evaluating the classroom learning state of a student of the present embodiment includes:

step S101: synchronously collecting video images and sound signals of the trainees.

In the specific implementation, the video image and the sound signal of the student are synchronously collected from the video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.

Step S102: correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same.

In this embodiment, the categories into which the expression, the emotional state, and the classroom learning state are classified include: happy, calm, tired, confused, excited and depressed.

Specifically, the process of inputting the video image into the trained expression recognition model and outputting the expression category and the corresponding probability corresponding to the student comprises the following steps:

(1.1) face detection

The human face detection algorithm based on AdaBoost is a method based on statistical theory, and compared with other detection algorithms, the human face detection algorithm has higher accuracy and higher detection speed, but the real-time requirement of the system is difficult to meet. In general, the AdaBoost algorithm is combined with the Haar-like features to realize face detection, and the detection method reduces the detection efficiency to some extent due to the large number of windows needing to be detected. Therefore, if the number of detection windows is reduced, the real-time performance of detection can be improved.

The embodiment adopts a mode of combining coarse detection and fine detection to improve the real-time performance of detection.

And (3) a coarse detection process: firstly, acquiring a contour image of the upper half of a human body by utilizing boundary tracking, then extracting a constant-moment feature vector of the contour image, matching the feature vector with a feature vector of a template image, adopting Euclidean distance as similarity measurement, and if the Euclidean distance is smaller than a set threshold value, determining that a processed region is a human body contour region. Then the body area is retained and the other areas are discarded.

And (3) fine detection process: and processing the image area reserved after the coarse detection, and finally realizing the face detection. And adopting an AdaBoost face detection algorithm based on Haar-Like characteristics for fine detection. Features Like the features shown in fig. 5 are added into the original Haar-Like feature set to improve the success rate of face detection. Calculating Haar-Like characteristics by adopting an integral diagram method, then training a weak classifier by utilizing the characteristics, forming a strong classifier by weighting and overlaying after the weak classifier is trained as an optimal weak classifier, and then forming a cascade classifier after the strong classifier is trained. The detection adopts a binary decision tree form, and the sub-window image of the cascade classifier is screened to be the face region.

(1.2) face image cropping

And cutting the obtained face image area to obtain a face image area.

(1.3) rotation correction

Since the target image may have different degrees of angular deviation in each frame, this may affect the effect of expression recognition. For this reason, the obtained face image region needs to be subjected to rotation correction. The rotation correction of the face image is realized using equation 1 to obtain a standard face image.

Wherein (x)_i,y_j) Representing the original coordinates of the face image, (x'_i,y′_j) And (3) pixel values of pixels representing the coordinate points (i, j) are subjected to rotation transformation, the rotation angle theta is an included angle between a connecting line of the two eyes and a horizontal axis, and the face image after rotation is more standard.

(1.4) expression recognition

In order to realize expression recognition, the invention uses a two-channel weighted mixed CNN-LSTM network, and two channels in the network structure are both composed of partial VGG16 networks and LSTM networks which are connected in series. The two channels are respectively input by a face image and a local binary pattern image obtained by the face image, the two images are respectively input to a partial VGG16 network to realize feature vector extraction, a feature vector sequence is sent to an LSTM network for training, and after the LSTM network is trained, the output vectors of the two channels of the LSTM network are subjected to weighted fusion to obtain a final recognition result.

The weighted fusion and identification process is that the output vectors of two channels are assumed to be F respectively₁And F₂，F₁And F₂Respectively passing through three full connection layers, Fc₁＝{v₁,v₂,…,v₈₀₀}，Fc₂＝{s₁,s₂,…,s₄₀₀And Fc₃＝{c₁,c₂,…,c₆In this embodiment, the number of these three full-connection layer neurons is respectively 800, 400, and 6, and the feature vector M can be obtained respectively₁And M₂. To M₁And M₂Can obtain O by averaging₁＝{o₁₁,o₁₂,…,o₁₆And O₂＝{o₂₁,o₂₂,…,o₂₆And weighting and fusing the two vectors according to the formula 2 to obtain an output R ═ R₁,r₂,…,r₆}. For the output R, the probability of each expression category can be found using the softmax function (equation 3).

r_i＝k·o_1i+(1-k)·o_2i(i＝1,2,…,6；k∈[0,1])(2)

Where k is the fusion weight, obtained by 10-fold cross validation.

Wherein y is_i' is the probability of the ith expression category.

The present embodiment replaces the conventional convolution with a separable convolution and cuts down the amount of operations by a suitable network.

Specifically, the process of inputting the voice signal into the trained emotional state recognition model and outputting the emotional state category and the corresponding probability corresponding to the trainee is as follows:

pre-emphasis processing is carried out on the acquired voice signals by adopting a pre-emphasis filter;

a finite impulse response filter is used, as shown in equation 4.

H(z)＝1-αz^-1(4)

Wherein α is a pre-emphasis coefficient, and the value in the invention is 0.938.

The speech signal after filtering by the pre-emphasis filter is framed to 25ms one frame with 10ms overlap between frames. The speech signal is multiplied by a hamming window function for framing to reduce discontinuities in the speech signal, which also avoids leakage in the spectrum. The method adopts a double-threshold method based on short-time energy and short-time average zero crossing rate to carry out end point detection so as to detect the starting point and the ending point of the voice.

Extracting the characteristics of a pre-emphasis processed sound signal, wherein the characteristics of the sound signal comprise a fundamental tone frequency, short-time energy, a short-time zero-crossing rate, a formant and a Mel frequency domain cepstrum coefficient;

fundamental tone frequency is called fundamental frequency for short, actually refers to the rule of vocal cord vibration during sounding, and is one of the most important features in speech signal analysis. In order to avoid formant interference, a cepstrum method is selected to extract the pitch signal. The cepstrum of the signal x (n) is defined as the inverse discrete fourier transform of the logarithm of its spectrum, as shown in equation 5.

The short-time energy reflects the amplitude characteristic of the speech signal and can be used for distinguishing speech from noise. Furthermore, the intensity of the speaker's voice can vary significantly under different emotional conditions, e.g., the speech signal for happy and angry emotions is much higher than the energy for sadness. The definition of the short-time energy is shown in equation 6.

Wherein x_n(m) represents the mth data point in the nth frame, and N is the total number of data points.

The short-time zero-crossing rate represents the number of times the speech signal wave passes through the horizontal axis in each frame, and can be used for endpoint detection and silence removal. The formula for the short-time zero-crossing rate is shown in fig. 7.

Formants refer to regions in the frequency spectrum of sound where energy is relatively concentrated, and are not only determinants of sound quality, but also reflect physical characteristics of vocal tract. Formants are usually extracted by linear prediction, and the prediction error can be represented by equation 8.

In the formula, x_n(m) is a speech signal, a_iIs the prediction coefficient, n is the frame number, p is the order, e (m) is the prediction error. Taking future samples by observing the previous p samples, the goal being to adjust a_iTo minimize prediction error.

Mel frequency domain cepstrum coefficients (MFCC) are one of the most widely used features in speech analysis, and the parameters are extracted based on the auditory system of human ears and combined with the speech generation mechanism, so that a natural and real reference is provided for speech recognition. The MFCC parameters are obtained through the following steps: after preprocessing a voice signal, performing fast Fourier transform on an obtained frame signal to obtain an energy spectrum of the frame signal, then filtering by using 24 Mel filter banks, performing logarithm operation on all filter outputs to obtain a corresponding logarithm power spectrum, then performing inverse discrete cosine transform to obtain 14 static MFCCs, and finally performing first-order difference and second-order difference on the obtained static MFCCs to obtain corresponding dynamic characteristics.

Inputting the extracted characteristics of the sound signals into a trained emotional state recognition model, and outputting the emotional state category and the probability corresponding to the student; the emotional state recognition model is a DBN (database-based network + softmax) classifier structure.

According to the embodiment, the DBN is trained to realize speech emotion recognition, the DBN is modified, and then the speech emotion recognition rate is further improved by combining with the SVM. Firstly, training RBM network parameters of each layer by layer from bottom to top through unsupervised learning to enable DBN network parameters to achieve global optimization, then utilizing a BP network to conduct supervised learning training, namely comparing output of the DBN network with training data through the BP neural network, and returning obtained errors from top to bottom to correct the network parameters to an optimal value. After the DBN network training is finished, a softmax classifier at the top layer of the network is used

SVM is substituted to improve the accuracy of classification. In the new speech recognition classifier, the DBN part is used for extracting features, the SVM part is trained by using the obtained features, and corresponding SVM parameters are obtained, so that the training of the whole network is completed.

Step S103: inputting the expression recognition result and the emotion state recognition result into a student state evaluation model, and outputting the evaluated class learning state category of the student in the classroom; wherein, the student state evaluation model is as follows:

R_ti＝(αE_ti+βS_ti)·γ_(t-1)i(9)

In this example, α, β are 0.6 and 0.4, respectively.

As an embodiment, the online evaluation method for the classroom learning state of the trainee further comprises the following steps:

and recording the class and the state duration of the classroom learning state of the evaluated student, and further calculating the classroom learning state of the student in a preset time period and the proportion of the classroom learning state of the student in the preset time period.

The learning state of the student is comprehensively judged for a period of time by using historical data (week, month, school date or school year).

The class learning state of the student in a certain fixed time period in one week (month, school time or school year) and the proportion of the class learning state of each student can be counted, as shown in the formula 10. For example, the mental state of the trainee and the proportion of a certain state in the period of 8:00 to 10:00 every two weeks in a month. The display modes such as a table, a bar chart or a pie chart can be selected.

Wherein p is_iIs the proportion of the ith mental state; n is the number of time periods within the counted time range, for example, in one month, if the time period from 8:00 to 10:00 of tuesday is 4, n is 4; t is t_iThe duration of the i-th state within a time period.

Or counting the mental states of all students in a certain fixed time period in a week (month, school date or school year) and the proportion of each mental state, as shown in formula 11.

Wherein m is the total number of the students; p_ikRepresents the proportion of the ith mental state of the kth student; a. the_iRepresenting the proportion of the ith mental state of all students. This can be used as the basis for the evaluation of the teaching effect of a certain course.

The facial expression of the student during learning is recognized in real time through processing of video images, meanwhile, through processing of sound signals, the emotion condition of the student during learning is recognized in real time, the expression recognition result and the emotion recognition result are fused through the specific model, real-time recognition of the learning state of the student is achieved, the recognition effect of the learning state of each student is effectively improved, on-line monitoring and evaluation of the class state of each student are achieved, the defects of the evaluation mode of the current student are effectively overcome, and reliable and full bases are provided for targeted cultivation of the student and evaluation of the teaching effect of a teacher in class.

Example 2

As shown in fig. 2, an online evaluation system for classroom learning states of trainees comprises:

a state detector 1 configured to:

synchronously collecting video images and sound signals of a student;

correspondingly inputting the video image and the sound signal into the trained expression recognition model and emotion state recognition model respectively, and outputting the expression type and emotion state type corresponding to the student and the corresponding probability; the categories of the expression, the emotion state and the class learning state are the same.

In the specific implementation, a state detector is arranged right in front of each student, the state detectors are mounted on a desk and used for collecting front video signals and sound signals of the students in class, processing the two signals in real time and transmitting the processing result to an upper computer through WiFi; the upper computer is arranged in the teacher office and used for archiving and counting the learning state data of the students.

As shown in fig. 3(a) and 3(b), the state detector includes a camera 11, and the camera 11 is used for collecting a front video image signal including a face of the trainee; a sound pick-up 12 for collecting the voice signal of the student, an LED lamp 13 for auxiliary lighting when the light is insufficient in the classroom to optimize the learning environment of the student, and a main cabinet 14 of the state detection machine. The state detector main case 14 is provided at the bottom thereof with a state detector support 15, and the state detector support 15 is provided at the bottom thereof with a state detector base 16.

The state detection machine and the upper computer adopt an N:1 link mode, and each detection machine is provided with an independent IP address through WiFi communication. And the upper computer reads the expression and sound emotion recognition results of the state detection machine in sequence in a polling mode, and stores and analyzes the results. And a direct transmission mode can be adopted, and video and sound signals of the specified state detection machine can be directly transmitted to the upper computer.

As shown in fig. 4, in the schematic structural diagram of the state detector, a video image capture card 21 and a sound signal capture card 22 are disposed in a main case of the state detector, which mainly achieve optimal capture of video and sound signals, the captured signals are respectively sent to a video image processing chip 23 and a sound processing chip 24, the signals are processed and then the processed result is sent to an upper computer through a communication circuit 25, and the communication circuit 25 can also send the instruction of the upper computer to the video image processing chip 23 or the sound processing chip 24, for example, achieve direct transmission of the video image and the sound signal.

The method comprises the steps that video images and sound signals of students are synchronously collected from a video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.

In the expression and emotion state classification module, inputting a video image into a trained expression recognition model, and outputting the expression category and the corresponding probability corresponding to the student as follows:

firstly, acquiring a contour image of the upper half of a human body from a video image by utilizing boundary tracking, then extracting a constant-moment characteristic vector of the contour image, matching the characteristic vector with a characteristic vector of a template image, and roughly detecting a human body contour region by adopting Euclidean distance as similarity measurement; adopting an AdaBoost face detection algorithm based on Haar-Like characteristics to the image area reserved after the coarse detection, and finely detecting the face area;

cutting the finely detected face region to obtain a face image region, inputting the face image region into a trained expression recognition model after rotation correction, and outputting the expression type and the probability corresponding to the student; the expression recognition model is a CNN-LSTM network with a double-channel weighted mixture.

In the expression and emotional state classification module, a voice signal is input into the trained emotional state recognition model, and the process of outputting the emotional state category and the corresponding probability corresponding to the student is as follows:

The upper computer 2 is configured to receive the expression recognition result and the emotion state recognition result, input the expression recognition result and the emotion state recognition result into a student state evaluation model, and output evaluated class learning state categories of the student in a classroom; wherein, the student state evaluation model is as follows:

R_ti＝(αE_ti+βS_ti)·γ_(t-1)i

The upper computer is further configured to:

Example 3

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the on-line assessment method for the classroom learning state of a trainee as described in embodiment 1.

Example 4

This embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the online evaluation method for classroom learning status of trainees in embodiment 1.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An online assessment method for classroom learning state of a student is characterized by comprising the following steps:

synchronously collecting video images and sound signals of a student;

R_ti＝(αE_ti+βS_ti)·γ_(t-1)i

wherein i is the ith class learning state, the value range of i is 1-M, and M represents the general class of the class learning state; r_tiThe probability value of the ith class learning state at the moment t is represented; e_tiFor the ith expression category at time tA probability value of (d); s_tiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is_(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ_(t-1)i＝1。

2. The on-line assessment method for classroom learning status of trainees as claimed in claim 1, wherein the video image and the sound signal of the trainees are synchronously collected from the video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.

3. The on-line assessment method for classroom learning status of a student as claimed in claim 1, wherein said method further comprises:

4. The on-line assessment method for classroom learning status of trainees as claimed in claim 1, wherein the process of inputting the video images into the trained expression recognition model and outputting the corresponding expression classes and corresponding probabilities of the trainees comprises:

5. The on-line assessment method for classroom learning status of trainees as claimed in claim 1, wherein the process of inputting the voice signal into the trained emotional state recognition model and outputting the corresponding emotional state type and probability of the trainees comprises:

6. An online assessment system for classroom learning states of trainees, comprising:

a state detector configured to:

synchronously collecting video images and sound signals of a student;

R_ti＝(αE_ti+βS_ti)·γ_(t-1)i

wherein i is the ith class learning state, iThe value range is 1-M, and M represents the general class of the classroom learning state; r_tiThe probability value of the ith class learning state at the moment t is represented; e_tiProbability value of ith expression category at t moment; s_tiThe probability value of the ith emotional state category at the time t, α and β are known weight coefficients, α + β is 1, and gamma is_(t-1)iThe coefficient is identified for the result at the moment t-1, if the moment t-1 is identified as the ith class learning state, the value is 1, otherwise, the value is 0.9; t is a positive integer greater than or equal to 1, when t is 1, γ_(t-1)i＝1。

7. The system of claim 6, wherein the video image and the audio signal of the student are synchronously collected from the video image collecting area; the video image acquisition area is set according to the position of a student; each video image acquisition area corresponds to a trainee.

8. The student classroom learning state online evaluation system of claim 6, wherein the host computer is further configured to:

recording the class and the state duration of the classroom learning state of the evaluated student, and further calculating the classroom learning state of the student within a preset time period and the proportion of the classroom learning state of the student within the preset time period;

or in the expression and emotion state classification module, inputting the video image into the trained expression recognition model, and outputting the expression category and the corresponding probability corresponding to the student as follows:

cutting the finely detected face region to obtain a face image region, inputting the face image region into a trained expression recognition model after rotation correction, and outputting the expression type and the probability corresponding to the student; the expression recognition model is a CNN-LSTM network with a double-channel weighted mixture;

or in the expression and emotional state classification module, inputting the voice signal into the trained emotional state recognition model, and outputting the emotional state category and the corresponding probability corresponding to the student as follows:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps in the on-line assessment method for the classroom learning state of a trainee as claimed in any one of claims 1 to 5.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the on-line assessment method for classroom learning status of a student as claimed in any one of claims 1-5 when executing the program.