CN113221784B - Multi-mode-based student learning state analysis method and device - Google Patents

Multi-mode-based student learning state analysis method and device Download PDF

Info

Publication number
CN113221784B
CN113221784B CN202110552512.9A CN202110552512A CN113221784B CN 113221784 B CN113221784 B CN 113221784B CN 202110552512 A CN202110552512 A CN 202110552512A CN 113221784 B CN113221784 B CN 113221784B
Authority
CN
China
Prior art keywords
student
algorithm
learning
image
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110552512.9A
Other languages
Chinese (zh)
Other versions
CN113221784A (en
Inventor
胡东明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Haoxuetong Technology Co ltd
Original Assignee
Hangzhou Haoxuetong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Haoxuetong Technology Co ltd filed Critical Hangzhou Haoxuetong Technology Co ltd
Priority to CN202110552512.9A priority Critical patent/CN113221784B/en
Publication of CN113221784A publication Critical patent/CN113221784A/en
Application granted granted Critical
Publication of CN113221784B publication Critical patent/CN113221784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Social Psychology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)

Abstract

The invention provides a multi-mode-based student learning state analysis method and device, which are used for processing collected first RGB images, depth images, second RGB images and environment sound signals, performing multi-mode analysis by combining the love degree, behavior actions and environment sounds of students and finally obtaining the attention concentration of the students, effectively evaluating the learning attention of the students to obtain the learning state of the students, being beneficial to guiding teachers and parents to pertinently guide and correct the learning state of the students, timely and effectively correcting bad habits of the students in the learning process and helping the students to learn better.

Description

Multi-mode-based student learning state analysis method and device
Technical Field
The invention relates to the technical field of student learning state analysis, in particular to a student learning state analysis method and device based on multiple modes.
Background
According to observation, most of middle and primary school students are easy to catch attention, like doing small movements or speaking at will in class at present, do a moment when doing homework, and are easy to be interfered by the outside, and the like. These are important expressions of lack of attention, and cause a decrease in learning efficiency and a deterioration in learning effect. And with the pace of life becoming faster and faster, parents take little time to guide students to study, when learning and studying have problems, parents can't fix a position the question of the student fast, there is no way to help students to correct the problem and then cause the question such as family's dissonance too either.
Furthermore, student studies have found that attention deficit can be divided into three subtypes: attention deficit is the primary type, hyperactivity is the primary and mixed types. Attention deficiency is the main type and mainly manifests as easy distraction, carelessness, forgetting things, difficulty in continuously concentrating on the same thing, frequently switching from one thing to another, and the like; the main symptoms of excessive movement are often dysphoria, restlessness, continuous speech, no patience, difficulty in giving lessons, eating time, time for doing lessons, good sitting devitalized and the like; the mixed type is the same as the above two types. Aiming at different attention deficit, the attention deficit should be guided and corrected according to the performance of the attention deficit.
Refer to chinese patent and authorize an interactive system in intelligent classroom that publication number is CN111179133B, including: the image acquisition module acquires images in at least classrooms containing all students by using a plurality of cameras; the student identification module is used for carrying out student positioning detection on the image acquired by the image acquisition module, positioning a plurality of detected students in a rectangular frame form, and carrying out face identification, expression identification and state identification on the detected students; the individual learning state analysis module is used for obtaining the learning numbers and name information of students by using the result of face recognition, and further analyzing the individual learning effect of each student on the current knowledge point based on the results of expression recognition and state recognition, wherein the individual learning effect is expressed in a score form; the class integral learning state analysis module is used for carrying out comprehensive analysis by using the result obtained by the individual learning state analysis module to obtain the integral learning effect of the whole class, and the integral learning effect is expressed in a score form; and the effect feedback display module is used for performing feedback display on the result of the individual learning state analysis module and the result of the class integral learning state analysis module by using a display screen, guiding a teacher to accelerate, decelerate and re-explain the currently explained knowledge point or explain the currently explained knowledge point by means of an augmented reality technology, and dynamically adjusting the teaching effect.
And obtaining the number and name information of the students by using the result of the face recognition, and further analyzing the individual learning effect of each student on the current knowledge point based on the results of the expression recognition and the state recognition, wherein the individual learning effect is expressed in a score form, the attention of the students is not evaluated, and the learning states of the students cannot be guided and corrected in a targeted manner.
Disclosure of Invention
The invention solves the problems that the attention of students is not evaluated and the learning states of the students cannot be guided and corrected in a targeted manner in the prior art, and provides a student learning state analysis method and device based on multiple modes, which help parents and teachers to know the learning states of the students more conveniently and quickly, correct bad habits of the students in the learning process timely and effectively according to the current learning states of the students and help the students to learn better.
In order to realize the purpose, the following technical scheme is provided:
a multi-mode-based student learning state analysis method comprises the following steps:
s1, collecting a first RGB image of the upper body of the student, carrying out face detection on the first RGB image, carrying out key point positioning on a face area, and obtaining the current love degree of the student through an expression recognition algorithm;
s2, collecting the depth image of the upper half of the student, fusing the depth image and the first RGB image to generate a new feature fusion image with 3D information, recognizing the behavior action type of the student by using an action recognition algorithm according to the feature fusion image, and outputting the behavior action type and the confidence Thpose
S3, collecting a second RGB image of the desktop area where the student is located, inputting the second RGB image into a desktop object recognition algorithm, and recognizing the type of the current desktop object;
s4, collecting environmental sound signals of the student, and recognizing the noise degree of the current voice through a voice recognition algorithm;
and S5, analyzing the current love degree, the type of behavior and action, the type of the current desktop object and the noise degree of the student in a period of time through a logic processing algorithm, comprehensively and logically judging, and outputting the learning state of the student.
According to the invention, the collected first RGB image, the depth image, the second RGB image and the environmental sound signal are processed, and multi-mode analysis is carried out by combining the liking degree, behavior action and environmental sound of the student, so that the attention concentration of the student is finally obtained, the learning attention of the student can be effectively evaluated, the learning state of the student is obtained, and the method is beneficial to guiding teachers and parents to pertinently guide and correct the learning state of the student, timely and effectively correct the bad habits of the student in the learning process, and helps the student to better learn.
Preferably, the face detection algorithm and the face key point positioning adopt an open source algorithm Dlib detection model, which specifically comprises the following steps:
s101, inputting the RGB image into a face detection algorithm to obtain face position information P (x, y, w, h) in the current image, wherein x and y are coordinates of the upper left corner of the face, and w and h are width and height of the face respectively;
s102, obtaining a face image according to face position information, and performing key point detection on the face image to obtain coordinate information of a plurality of key points;
s103, inputting the coordinate information of the key points into an expression recognition algorithm to obtain result categories and confidence degrees of expression recognition, wherein the result categories comprise likes and dislikes, and the confidence degree of the result categories which are the likes is used as a likeness degree value The
Preferably, the expression recognition algorithm is obtained by training on a homemade data set by using a resnet _ v2_34 network, a resnet _ v1_34 network, a resnet _ v1_50 network, a resnet _ v2_50 network, a VGG16 network or an inclusion network of a caffe framework, and the specific training process is as follows:
s131, making a training sample, extracting a face image from images containing various expressions of a human body by using a human detection algorithm, cutting the face image out, and storing the face image into an image to obtain a face sample;
s132, classifying the expression samples, classifying the face samples according to requirements, and calibrating two sample sets for liking and disliking according to the expressions;
s133, making a training set and a testing set, and carrying out calibration on the calibrated sample set according to the ratio of 8: 2, randomly dividing according to a proportion, wherein 80% of samples are used as a training set, and 20% of samples are used as a testing set;
s134, training a classification algorithm model, and performing model training by adopting a deep learning classification algorithm; s135, testing the evaluation algorithm model and selecting a final expression recognition model, wherein the method comprises the following steps of: and testing the training set and the test set by using the trained models, calculating the classification accuracy of the algorithm, judging whether the difference between the accuracy of the test set and the result of the training set is within an error range, if so, successfully training the algorithm model, and selecting the model with the highest accuracy as the final expression recognition model.
Preferably, the S134 specifically includes the following steps: training by using a deep learning training platform, converting a training data format according to the requirements of the deep learning training platform, selecting a suitable network such as resnet _ v2_50, configuring relevant parameters such as image size 224x224, iteration times 30000, batch32, class number 2 and the like, and training a model until a termination condition is met.
Preferably, the types of the behavioral actions in S2 include learning, playing, dozing, or doodling.
The motion recognition algorithm model uses a network, namely a resnet _ v2_50 network, a resnet _ v1_34 network, a resnet _ v1_50 network, a resnet _ v2_34 network, or a VGG16, an inclusion network, and the training process of the motion recognition algorithm model is similar to that of an expression recognition algorithm.
Preferably, the types of the items in S3 include a learning class including a language book, an english book, a math book or a drawing book, and a playing class including a toy car, a mobile phone, a magic cube or a cloth doll.
The network used by the desktop object recognition algorithm is resnet _ v2_50, and can also be replaced by networks such as resnet _ v1_34, resnet _ v1_50, resnet _ v2_34, VGG16, inclusion and the like, and the training process of the desktop object recognition algorithm model is similar to that of the expression recognition algorithm.
Preferably, the speech recognition algorithm in S4 uses an open source algorithm sound, which inputs the live environment sound signal for a period of time into the speech recognition algorithm and outputs the type of the current sound scene, which is classified into quiet, general, music, and noise.
The time period is 20 seconds or 30 seconds or 40 seconds or 60 seconds, and the open source algorithm SoundNet is trained through a resnet _ v1_34 network by using a torch platform.
Preferably, the S5 specifically includes the following steps:
s501, counting types N appearing in action recognition results within learning time T, and calculating times P of action switchingn
S502, counting the number n of the types of the objects appearing in the identification result of the desktop object within the learning time T2And the number of articles n1Calculating the coefficient of the article On=max(0,0.5*(n1-1)+0.5*(n2-1));
S503, calculating the average value of the expression recognition result love degrees within the learning time T
Figure BDA0003075706150000051
S504, counting noise influence coefficients S appearing in learning time TnThe calculation formula is as follows:
Figure BDA0003075706150000052
s505, calculating the concentration degree score, wherein the calculation formula is as follows:
R=max(0,min(100*E(X)-10*Pn-10*On+15*Sn,100))
and (3) outputting a result:
Figure BDA0003075706150000053
a student learning state monitoring device based on multiple modes adopts the student learning state analysis method based on multiple modes, and comprises the following steps:
the first camera is used for collecting a first RGB image of the upper half of the student;
the TOF sensor is used for acquiring a depth image of the upper half of the student;
the second camera is used for collecting a second RGB image of a desktop area where the student is located;
the voice recording equipment is used for collecting environmental sound signals of students;
the hardware processing and analyzing module comprises a human body video input processing module for converting the first RGB image into a digital signal, a TOF module for converting the depth image into the digital signal, a desktop video input processing module for converting the second RGB image into the digital signal and a voice recognition module for converting an environment sound signal into the digital signal;
the main control chip is used for receiving the digital signals input by the hardware processing and analyzing module, operating a logic processing algorithm and outputting the learning state of the student;
and the communication equipment is used for being connected with the main control chip, uploading relevant data of the learning state of the student to the cloud and storing the relevant data.
First camera and TOF sensor together install the front at study table bookshelf, the second camera is installed below the bookshelf, wherein bookshelf is apart from desktop height 45cm, 40cm along the table apart from, study state in student's a period of time is caught to desk front camera and TOF sensor, the desktop condition of desk is caught to the camera below the desk, still install pronunciation receiving and recording equipment below the desk simultaneously, receive and record on-the-spot sound signal, the four kinds of data synchronization that will acquire at last transmit into hardware processing analysis module, hardware processing analysis module passes on the high in the clouds with data after obtaining student's study state and preserves, watch in order to supply the head of a family.
The beneficial effects of the invention are: through to gathering first RGB image, the depth map, second RGB image and environment sound signal are handled, combine student's like degree, the action, the multi-modal analysis is carried out to environment sound, finally reach student's concentration of attention, can effectively assess student's study attention, reach student's study state, and be favorable to guiding teacher and the head of a family pertinence guide and correcting student's study state, in time, the effectual bad custom of correcting the student and appearing in the learning process, help better study of student.
Drawings
FIG. 1 is a schematic view of the overall structure of the apparatus of the embodiment;
FIG. 2 is a diagram of the Dlib keypoint effect.
Detailed Description
Example (b):
the embodiment provides a student learning state analysis method based on multiple modes, which comprises the following steps:
s1, collecting a first RGB image of the upper half of the student, carrying out face detection on the first RGB image, carrying out key point positioning on a face area, and obtaining the current love degree of the student through an expression recognition algorithm;
the face detection algorithm and the face key point positioning adopt an open source algorithm Dlib detection model, and the method specifically comprises the following steps:
s101, inputting the RGB image into a face detection algorithm to obtain face position information P (x, y, w, h) in the current image, wherein x and y are coordinates of the upper left corner of the face, and w and h are width and height of the face respectively;
s102, obtaining a face image according to the face position information, carrying out key point detection on the face image, and referring to FIG. 2, obtaining coordinate information of 68 key points;
s103, inputting the coordinate information of the key points into an expression recognition algorithm to obtain the result category and confidence coefficient of expression recognition, wherein the result category comprises likes and dislikes, and the confidence coefficient of the result category which is liked is used as a likeness degree value The. The expression recognition algorithm is obtained by training on a self-made data set by adopting a mask frame through using a resnet _ v2_34 network, a resnet _ v1_34 network, a resnet _ v1_50 network, a resnet _ v2_50 network, a VGG16 network or an inclusion network, and the specific training process is as follows:
s131, making a training sample, extracting a face image from images containing various human expressions by using a human detection algorithm, cutting the face image out, and storing the face image into an image to obtain a face sample;
s132, classifying the expression samples, classifying the face samples according to requirements, and calibrating two sample sets for liking and disliking according to the expressions;
s133, making a training set and a testing set, and carrying out calibration on the calibrated sample set according to the following steps of 8: 2, randomly dividing according to the proportion, wherein 80% of samples are used as a training set, and 20% of samples are used as a testing set;
s134, training a classification algorithm model, and performing model training by adopting a deep learning classification algorithm; s135, testing the evaluation algorithm model and selecting the final expression recognition model, wherein the method comprises the following steps: and testing the training set and the test set by using the trained models, calculating the classification accuracy of the algorithm, judging whether the difference between the accuracy of the test set and the result of the training set is within an error range, if so, successfully training the algorithm model, and selecting the model with the highest accuracy as the final expression recognition model.
S134 specifically includes the following steps: training by using a deep learning training platform, converting a training data format according to the requirements of the deep learning training platform, selecting a suitable network such as resnet _ v2_50, configuring relevant parameters such as image size 224x224, iteration times 30000, batch32, class number 2 and the like, and training a model until a termination condition is met.
S2, collecting depth image of upper body of student, fusing the depth image and the first RGB image to generate new feature fusion image with 3D information, recognizing type of behavior action of student by action recognition algorithm according to the feature fusion image, and outputting type of behavior action and confidence Thpose(ii) a Types of behavioral actions include learning, playing, drowsiness, or graffiti.
S3, collecting a second RGB image of a desktop area where a student is located, inputting the second RGB image into a desktop object recognition algorithm, and recognizing the type of a current desktop object; the types of articles include a learning class including a language book, an english book, a math book, or a drawing book, and a playing class including a toy car, a mobile phone, a magic cube, or a cloth doll.
S4, collecting environmental sound signals of students, and recognizing the noise degree of the current voice through a voice recognition algorithm; the voice recognition algorithm uses an open source algorithm SoundNet to input a period of live environment sound signals into the voice recognition algorithm, and outputs the type of the current sound scene through the open source algorithm SoundNet, wherein the type of the current sound scene is classified into quiet, general, music and noise. The time period is 20 seconds or 30 seconds or 40 seconds or 60 seconds, and the open source algorithm SoundNet is obtained by training through a resnet _ v1_34 network by using a torch platform.
And S5, analyzing the current love degree, the type of behavior and action, the type of the current desktop object and the noise degree of the student in a period of time through a logic processing algorithm, comprehensively and logically judging, and outputting the learning state of the student.
S5 specifically includes the steps of:
s501, counting types N appearing in action recognition results within learning time T, and calculating times P of action switchingn
S502, counting the number n of the types of the objects appearing in the identification result of the desktop object within the learning time T2And the number of articles n1Calculating the coefficient of the article On=max(0,0.5*(n1-1)+0.5*(n2-1));
S503, calculating the average value of the expression recognition result love degrees within the learning time T
Figure BDA0003075706150000081
S504, counting the noise influence coefficient S appearing in the learning time TnThe calculation formula is as follows:
Figure BDA0003075706150000082
s505, calculating the concentration degree score, wherein the calculation formula is as follows:
R=max(0,min(100*E(X)-10*Pn-10*On+15*Sn,100))
and outputting a result:
Figure BDA0003075706150000091
the motion recognition algorithm model uses a network, namely a resnet _ v2_50 network, a resnet _ v1_34 network, a resnet _ v1_50 network, a resnet _ v2_34 network, or a VGG16, an inclusion network, and the training process of the motion recognition algorithm model is similar to that of an expression recognition algorithm.
The network used by the desktop object recognition algorithm is resnet _ v2_50, and can also be replaced by networks such as resnet _ v1_34, resnet _ v1_50, resnet _ v2_34, VGG16, inclusion and the like, and the training process of the desktop object recognition algorithm model is similar to that of the expression recognition algorithm.
According to the invention, the collected first RGB image, the depth image, the second RGB image and the environmental sound signal are processed, and multi-mode analysis is carried out by combining the liking degree, behavior action and environmental sound of the student, so that the attention concentration of the student is finally obtained, the learning attention of the student can be effectively evaluated, the learning state of the student is obtained, and the method is beneficial to guiding teachers and parents to pertinently guide and correct the learning state of the student, timely and effectively correct the bad habits of the student in the learning process, and helps the student to better learn.
A multi-mode-based student learning state monitoring device, referring to fig. 1, the multi-mode-based student learning state analysis method, including:
the first camera is used for collecting a first RGB image of the upper half of the student;
the TOF sensor is used for acquiring a depth image of the upper half of the student; adopts a versaScanl20 OPO tunable laser of Germany GWU company, the wavelength tuning range is 410 and 2500nm, the pulse width is 3ns, and the repetition frequency is 100 Hz; or a dual-wavelength pulse laser diode is adopted, the model PL520 is Oselan; model MLlOlJ23, mitsubishi, with wavelengths of 520nm and 650nm, output powers of 50mW and 150mW, adjustable pulse widths of 1-1OOOns, and adjustable repetition frequencies of l-30 KHz.
The second camera is used for collecting a second RGB image of a desktop area where the student is located;
the voice recording equipment is used for collecting environmental sound signals of students;
the hardware processing and analyzing module comprises a human body video input processing module for converting the first RGB image into a digital signal, a TOF module for converting the depth image into the digital signal, a desktop video input processing module for converting the second RGB image into the digital signal and a voice recognition module for converting an environmental sound signal into the digital signal;
the main control chip is used for receiving the digital signals input by the hardware processing and analyzing module, operating a logic processing algorithm and outputting the learning state of the student; the main control chip adopts a Haesihi 3516 chip or Rui-core micro RV 1109.
And the communication equipment is used for being connected with the main control chip, uploading relevant data of the learning state of the student to the cloud and storing the relevant data.
The working process of the device is as follows:
s1) plugging a power supply for the device and turning on hardware equipment;
s2) pressing a key, opening a camera and a TOF sensor, and activating multiple functions;
s3) a first camera collects and generates a first RGB image, a face detection algorithm carries out face key point positioning in the RGB image, and then the current love degree of the student is obtained through an expression recognition algorithm;
s4) the TOF module collects and generates a depth image, the depth image and the first RGB image are fused to generate a new 3D information feature fusion graph, and behavior and actions of students are identified according to the feature fusion graph, wherein the types of the actions include learning, playing, dozing, doodling and the like;
s5), a second camera collects and generates a second RGB image of a desktop area, then the second RGB image is input into a desktop object recognition algorithm to recognize a current desktop object, the objects are divided into two categories of learning and playing, the learning type of objects comprises a Chinese book, an English book, a digital book, a drawing book and the like, when the action type is playing, a playing-related object recognition model is called, and the playing type of objects comprises a toy car, a mobile phone, a magic cube, a cloth doll and the like;
s6) the sound recording equipment collects the current environment sound signal, inputs the sound signal to the speech recognition module, and recognizes the noise degree of the current speech;
s7) analyzing the favorite value, behavior information, desktop articles and noise degree of the student in a period of time through a logic processing algorithm, and then performing comprehensive logic judgment to output the learning state of the student.
Referring to table one, the logic processing algorithm outputs the learning status of the student.
Table-processing result example table
Figure BDA0003075706150000101
Figure BDA0003075706150000111
First camera and TOF sensor together install the front at study table bookshelf, the second camera is installed below the bookshelf, wherein bookshelf is apart from the desktop height 45cm, 40cm is followed apart from the table, study state in student's a period is caught to the positive camera of desk and TOF sensor, the desktop condition of desk is caught to the camera below the desk, still install pronunciation receiving and recording equipment below the desk simultaneously, receive and record on-the-spot sound signal, at last with four kinds of data synchronization introduction hardware processing analysis module that will acquire, hardware processing analysis module obtains to pass up the high in the clouds with data after the student study state and preserves, watch for the head of a family.

Claims (8)

1. A student learning state analysis method based on multiple modes is characterized by comprising the following steps:
s1, collecting a first RGB image of the upper body of the student, carrying out face detection on the first RGB image, carrying out key point positioning on a face area, and obtaining the current love degree of the student through an expression recognition algorithm;
s2, collecting the depth image of the upper half of the student, fusing the depth image and the first RGB image to generate a new feature fusion image with 3D information, recognizing the behavior action type of the student by using an action recognition algorithm according to the feature fusion image, and outputting the behavior action type and the confidence Thpose
S3, collecting a second RGB image of the desktop area where the student is located, inputting the second RGB image into a desktop object recognition algorithm, and recognizing the type of the current desktop object;
s4, collecting environmental sound signals of students, and recognizing the noise degree of the current voice through a voice recognition algorithm;
s5, analyzing the current liking degree, the type of behavior and action, the type of current desktop object and the noise degree of the student in a period of time through a logic processing algorithm, comprehensively and logically judging, and outputting the learning state of the student;
the S5 specifically includes the following steps:
s501, counting types N appearing in action recognition results within learning time T, and calculating times P of action switchingn
S502, counting the number n of the types of the objects appearing in the identification result of the desktop object within the learning time T2And the number of articles n1Calculating the coefficient of the article On=max(0,0.5*(n1-1)+0.5*(n2-1));
S503, calculating the average value of the expression recognition result love degrees within the learning time T
Figure FDA0003590998040000011
S504, counting the noise influence coefficient S appearing in the learning time TnThe calculation formula is as follows:
Figure FDA0003590998040000012
s505, calculating the concentration degree score, wherein the calculation formula is as follows:
R=max(0,min(100*E(X)-10*Pn-10*On+15*Sn,100))
and outputting a result:
Figure FDA0003590998040000021
2. the multi-modality-based student learning state analysis method as claimed in claim 1, wherein the face detection algorithm and face key point positioning adopt an open source algorithm Dlib detection model, and specifically comprises the following steps:
s101, inputting the RGB image into a face detection algorithm to obtain face position information P (x, y, w, h) in the current image, wherein x and y are coordinates of the upper left corner of the face, and w and h are width and height of the face respectively;
s102, obtaining a face image according to face position information, and performing key point detection on the face image to obtain coordinate information of a plurality of key points;
s103, inputting the coordinate information of the key points into an expression recognition algorithm to obtain the result category and confidence degree of expression recognition, wherein the result category comprises likes and dislikes, and the confidence degree of the result category which is like is used as a like degree value The
3. The multi-mode-based student learning state analysis method according to claim 2, wherein the expression recognition algorithm is trained on a homemade data set by using a resnet _ v2_34 network, a resnet _ v1_34 network, a resnet _ v1_50 network, a resnet _ v2_50 network, a VGG16 network or an inclusion network of a caffe framework, and the specific training process is as follows:
s131, making a training sample, extracting a face image from images containing various expressions of a human body by using a human detection algorithm, cutting the face image out, and storing the face image into an image to obtain a face sample;
s132, classifying the expression samples, classifying the face samples according to requirements, and calibrating two sample sets for liking and disliking according to the expressions;
s133, making a training set and a testing set, and carrying out calibration on the calibrated sample set according to the following steps of 8: 2, randomly dividing according to a proportion, wherein 80% of samples are used as a training set, and 20% of samples are used as a testing set;
s134, training a classification algorithm model, and performing model training by adopting a deep learning classification algorithm; s135, testing the evaluation algorithm model and selecting a final expression recognition model, wherein the method comprises the following steps of: and testing the training set and the test set by using the trained models, calculating the classification accuracy of the algorithm, judging whether the difference between the accuracy of the test set and the result of the training set is within an error range, if so, successfully training the algorithm model, and selecting the model with the highest accuracy as the final expression recognition model.
4. The multi-modality-based student learning state analysis method as claimed in claim 3, wherein the S134 specifically comprises the following steps: training by using a deep learning training platform, converting a training data format according to the requirements of the deep learning training platform, selecting a proper network, configuring relevant parameters, and training a model until a termination condition is met.
5. The method as claimed in claim 1, wherein the types of the behavioral actions in S2 include learning, playing, dozing and doodling.
6. The multi-modal based student status analysis method as claimed in claim 1, wherein the types of articles in S3 include learning category and playing category, the learning category includes language book, english book, mathematics book or paintbook, and the playing category includes toy car, mobile phone, magic cube or doll.
7. The multi-modal-based student learning state analysis method as claimed in claim 1, wherein the speech recognition algorithm in S4 uses an open source algorithm SoundNet to input the live environment sound signal for a period of time to the speech recognition algorithm, and outputs the type of the current sound scene through the open source algorithm SoundNet, wherein the type of the current sound scene is classified into quiet, general, music and noise.
8. A multi-modality-based student learning state monitoring apparatus adopting the multi-modality-based student learning state analysis method of claim 1, comprising:
the first camera is used for collecting a first RGB image of the upper body of the student;
the TOF sensor is used for acquiring a depth image of the upper half of the student;
the second camera is used for collecting a second RGB image of a desktop area where the student is located;
the voice recording equipment is used for collecting environmental sound signals of students;
the hardware processing and analyzing module comprises a human body video input processing module for converting the first RGB image into a digital signal, a TOF module for converting the depth image into the digital signal, a desktop video input processing module for converting the second RGB image into the digital signal and a voice recognition module for converting an environment sound signal into the digital signal;
the main control chip is used for receiving the digital signals input by the hardware processing and analyzing module, operating a logic processing algorithm and outputting the learning state of the student;
and the communication equipment is used for being connected with the main control chip, uploading relevant data of the learning state of the student to the cloud and storing the relevant data.
CN202110552512.9A 2021-05-20 2021-05-20 Multi-mode-based student learning state analysis method and device Active CN113221784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110552512.9A CN113221784B (en) 2021-05-20 2021-05-20 Multi-mode-based student learning state analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110552512.9A CN113221784B (en) 2021-05-20 2021-05-20 Multi-mode-based student learning state analysis method and device

Publications (2)

Publication Number Publication Date
CN113221784A CN113221784A (en) 2021-08-06
CN113221784B true CN113221784B (en) 2022-07-15

Family

ID=77093365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110552512.9A Active CN113221784B (en) 2021-05-20 2021-05-20 Multi-mode-based student learning state analysis method and device

Country Status (1)

Country Link
CN (1) CN113221784B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115470986A (en) * 2022-09-14 2022-12-13 北京工业大学 Behavior monitoring and preventing system and method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170046965A1 (en) * 2015-08-12 2017-02-16 Intel Corporation Robot with awareness of users and environment for use in educational applications
CN106599881A (en) * 2016-12-30 2017-04-26 首都师范大学 Student state determination method, device and system
CN106803913A (en) * 2017-03-10 2017-06-06 武汉东信同邦信息技术有限公司 A kind of detection method and its device of the action that taken the floor for Auto-Sensing student
CN108805009A (en) * 2018-04-20 2018-11-13 华中师范大学 Classroom learning state monitoring method based on multimodal information fusion and system
CN110807585A (en) * 2019-10-30 2020-02-18 山东商业职业技术学院 Student classroom learning state online evaluation method and system

Also Published As

Publication number Publication date
CN113221784A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN110991381B (en) Real-time classroom student status analysis and indication reminding system and method based on behavior and voice intelligent recognition
CN106599881A (en) Student state determination method, device and system
CN111027865B (en) Teaching analysis and quality assessment system and method based on behavior and expression recognition
CN109637207A (en) A kind of preschool education interactive teaching device and teaching method
CN110807585A (en) Student classroom learning state online evaluation method and system
CN110930781B (en) Recording and broadcasting system
CN113221784B (en) Multi-mode-based student learning state analysis method and device
CN111428686A (en) Student interest preference evaluation method, device and system
CN111695442A (en) Online learning intelligent auxiliary system based on multi-mode fusion
CN116109455B (en) Language teaching auxiliary system based on artificial intelligence
CN110245253A (en) A kind of Semantic interaction method and system based on environmental information
CN109754653B (en) Method and system for personalized teaching
CN111178263B (en) Real-time expression analysis method and device
KR20180058298A (en) System and method for testing a school readiness of the school-age child
CN113434714B (en) Auxiliary learning device and method
CN117314700A (en) Dual-teacher teaching management method and system for preschool education
KR20180072956A (en) Robot for testing a school readiness of the school-age child
Tseng et al. Collaborative Machine Learning Model Building with Families Using Co-ML
CN115984956A (en) Man-machine cooperation student classroom attendance multi-mode visual analysis system
CN114297418A (en) System and method for identifying learning emotion to carry out personalized recommendation
TWM600908U (en) Learning state improvement management system
Panchal et al. Artificial intelligence used in school’s of China
CN113256453A (en) Learning state improvement management system
CN110353703A (en) Autism based on language paradigm behavioural analysis of repeating the words of others like a parrot assesses apparatus and system
KR102383457B1 (en) Active artificial intelligence tutoring system that support teaching and learning and method for controlling the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220424

Address after: 311200 room 1806, building 1, Zhejiang chamber of Commerce building, No. 299, Pinglan Road, ningwei street, Xiaoshan District, Hangzhou, Zhejiang Province

Applicant after: Hangzhou haoxuetong Technology Co.,Ltd.

Address before: Room 1302, building 2, Huanyu business center, 626 kejiguan street, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province, 310051

Applicant before: Hangzhou maitaotao Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant