WO2020024400A1

WO2020024400A1 - Class monitoring method and apparatus, computer device, and storage medium

Info

Publication number: WO2020024400A1
Application number: PCT/CN2018/106435
Authority: WO
Inventors: 周建伟
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-08-02
Filing date: 2018-09-19
Publication date: 2020-02-06
Also published as: CN109344682A

Abstract

A class monitoring method and apparatus, a computer device, and a storage medium. The class monitoring method comprises: regularly collecting a first target human face image according to a first time interval, and inputting the first target human face image into an expression recognition model to obtain a first recognition result (S10); if the first recognition result is abnormal, sending a first prompt message to a monitoring terminal (S20); adding the first target human face image with the abnormal first recognition result to a focus queue, wherein the focus queue comprises a focus person identifier (S30); regularly collecting a second target human face image according to a second time interval and the focus person identifier in the focus queue, and inputting the second target human face image into the expression recognition model to obtain a second recognition result (S40); and if the second recognition result is abnormal, sending a second prompt message to the monitoring terminal (S50). According to the class monitoring method, the amount of computation of data collection and recognition is reduced while ensuring the monitoring of a class condition, thereby improving overall efficiency.

Description

Classroom monitoring method, device, computer equipment and storage medium

This application is based on a Chinese invention patent application filed on August 2, 2018 with an application number of 201810870693.8, entitled "Classroom Monitoring Method, Apparatus, Computer Equipment, and Storage Medium", and claims its priority.

Technical field

The present application belongs to the field of image processing, and more specifically, relates to a classroom monitoring method, device, computer equipment, and storage medium.

Background technique

In the teaching classroom, when the current teacher is attending a class, he or she may not observe that some of his classmates are thinking badly, and if he does not listen carefully, he may cause students to miss some important knowledge points. With the rapid development of image recognition technology, there are currently technologies that try to use expression recognition technology to obtain the expressions of students in the classroom in real time, and use the camera to collect images to analyze the collected images in the camera. If there are multiple faces, each face is Both facial expressions and movements are analyzed to determine whether the student is currently in a state of being in a bad position. However, due to the long duration of the classroom and the large number of students, continuous data collection and identification will cause excessive data volume, resulting in low overall efficiency.

Summary of the invention

The embodiments of the present application provide a classroom monitoring method, device, computer equipment, and storage medium to solve the technical problem of low efficiency in identifying a student's class status due to an excessive amount of data.

A classroom monitoring method includes:

Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;

If the first identification result is abnormal, sending a first prompt message to the monitoring end;

Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;

Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;

If the second identification result is abnormal, a second prompt message is sent to the monitoring end.

A classroom monitoring device includes:

A first recognition result acquisition module, configured to regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;

A first prompt message sending module, configured to send a first prompt message to a monitoring terminal if the first identification result is abnormal;

The focus attention queue adding module is configured to add the first target face image with an abnormality in the first recognition result to a focus attention queue, where the focus attention queue includes an attention person identification;

A second recognition result acquisition module, configured to regularly collect a second target face image according to a second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image to the expression In the recognition model, a second recognition result is obtained;

A second prompt message sending module is configured to send a second prompt message to the monitoring terminal if the second identification result is abnormal.

A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below, and other features and advantages of the present application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only for the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained according to these drawings without paying creative labor.

1 is a schematic diagram of an application environment of a classroom monitoring method according to an embodiment of the present application;

2 is a flowchart of a classroom monitoring method according to an embodiment of the present application;

3 is another flowchart of a classroom monitoring method according to an embodiment of the present application;

4 is another flowchart of a classroom monitoring method according to an embodiment of the present application;

5 is another flowchart of a classroom monitoring method according to an embodiment of the present application;

6 is a schematic block diagram of a classroom monitoring device according to an embodiment of the present application;

7 is a principle block diagram of a first recognition result acquisition module in a classroom monitoring apparatus according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a computer device in an embodiment of the present application.

detailed description

In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

The classroom monitoring method provided by this application can be applied in the application environment as shown in FIG. 1, wherein the monitoring end communicates with the server through the network, and the server collects the first target face image through the camera at a first time interval, and The obtained first target face image is input into an expression recognition model to obtain a first recognition result. If the first recognition result is abnormal, the server sends a first prompt message to the monitoring end, and the first recognition result is abnormal. The first target face image is added to the focus queue, where the focus queue includes the person's identity; then the server collects the second target face through the camera according to the second time interval and the person's identity in the focus group. For the image, the obtained second target face image is input into an expression recognition model to obtain a second recognition result. If the second recognition result is abnormal, the server sends a second prompt message to the monitoring end. Among them, the monitoring end may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2, a classroom monitoring method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:

S10: Collect a first target face image according to a first time interval, input the first target face image into an expression recognition model, and obtain a first recognition result.

The first time interval is a preset time period, and can be specifically set according to actual needs, such as 5 minutes, 8 minutes, or 10 minutes. The first target face image refers to the face images of all students in the entire class. A plurality of cameras can be set in the classroom, and each camera collects a face image of a student in a fixed area, so as to realize the collection of the first target face image. Understandably, the greater the number of cameras, the higher the acquisition accuracy. Optionally, the camera collects video data, and the face image is obtained after the video data is framed by a predetermined frame rate.

The expression recognition model is a recognition model for judging the emotions of faces in the current image. The expression recognition model can judge the probability values of the faces in the current image corresponding to a variety of emotions. If the probability value of a certain emotion exceeds the corresponding By presetting the threshold, the emotion corresponding to the standard face image is the first recognition result. For example, in this embodiment, for the situation in the classroom, the emotions in the expression recognition model can be set to five types: listening, doubting, understanding, resisting, and disdain. Specifically, a large number of sample images representing the five emotions can be collected in advance to form a sample image set, and then a corresponding neural network model or classifier is selected for training, and an expression recognition model is finally obtained.

S20: If the first recognition result is abnormal, send a first prompt message to the monitoring end.

After the expression recognition model outputs a first recognition result, a first prompt message is sent according to the output first recognition result.

The expression recognition model will determine the expression result corresponding to the facial area in the first target face image; for example, listening, doubting, understanding, resisting, or disdain. Optionally, when the expression recognition model determines that the emotion corresponding to the facial area in the first target face image is resistive or disdain, the corresponding output first recognition result is abnormal; when the expression recognition model determines the first target face image When the emotion corresponding to the middle facial area is listening, doubting, or understanding, the first recognition result corresponding to the output is normal. If the first recognition result is abnormal, a first prompt message is sent to the monitoring terminal to notify the student that the student may be in a state of opening error. Optionally, the monitoring end is a mobile phone or a computer or other communication device of a teacher, a class teacher, or other relevant person, so that the relevant person can understand the class status of the student, and accordingly make evaluations or stop actions.

S30: Add the first target face image whose first recognition result is abnormal to the focus attention queue, and the focus attention queue includes the attention person identification.

Among them, the focus attention queue refers to a queue composed of students who may be in a state of being in a state of being in a state of small gaps. Specifically, the queue may be embodied by focusing on the identity of the person. The follower logo is a logo used to distinguish students who need to focus on. For example, it can be a student number, an ID number, or a seat number.

S40: Collect a second target face image at regular intervals according to the second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result.

The second time interval is also a preset time period, which can be specifically set according to actual needs, such as 1 minute, 3 minutes, or 5 minutes. Preferably, the second time interval is smaller than the first time interval. The second target face image refers to the face image of the student corresponding to the person's identity. Specifically, the corresponding camera can be found by following the person's identity, and the second target face image is obtained from the camera. The method for acquiring the second target face image is similar to the method for acquiring the first target face image in step S10, and details are not described herein again. The second target face image is input into the expression recognition model, and the process of obtaining the second recognition result is similar to the corresponding step in step S10, which is not repeated here.

S50: If the second recognition result is abnormal, send a second prompt message to the monitoring end.

After the expression recognition model outputs a second recognition result, a second prompt message is sent according to the output second recognition result. The expression recognition model may determine the emotion corresponding to the facial area in the second target face image, such as listening, doubting, understanding, resisting, or disdain. Optionally, when the expression recognition model determines that the emotion corresponding to the facial area in the second target face image is resistive or disdain, the corresponding output second recognition result is abnormal; when the expression recognition model determines the second target face image When the emotion corresponding to the middle facial area is listening, doubting or understanding, the second recognition result corresponding to the output is normal. If the second recognition result is abnormal, a second prompt message is sent to the monitoring terminal to indicate that the student may be in a state of opening error.

Further, a second target face image whose second recognition result is abnormal is added to the focus attention queue. In one embodiment, the second target face image whose second recognition result is abnormal is added to the focus attention queue while the second target face image whose second recognition result is normal is released from the focus attention queue. . It is understandable that a student whose first recognition result is abnormal may not be attentive at a certain moment, but will return to normal soon, which is acceptable. By updating the focus queue, you can effectively reduce the amount of data collection and identification calculations, and improve the relevance of classroom monitoring.

In the embodiment corresponding to FIG. 2, a first target face image is periodically collected according to a first time interval, and the first target face image is input into an expression recognition model to obtain a first recognition result. If it is abnormal, send a first prompt message to the monitoring end; then adjust the focus attention queue according to the first recognition result, and periodically collect a second target face image according to the identification of the person in the focus attention queue and the second time interval, and The second target face image is input into the expression recognition model to obtain a second recognition result. If the second recognition result is abnormal, a second prompt message is sent to the monitoring end. By using a certain time interval to collect images and establish a focus on the queue to achieve classroom monitoring, targeted monitoring can be achieved, while ensuring good monitoring of the classroom situation, while reducing data collection and identification calculations Volume, improving overall efficiency.

In one embodiment, the expression recognition model is an expression recognition model trained using a convolutional neural network. Among them, Convolutional Neural Network (CNN) is a multilayer neural network that is good at processing related machine learning problems of images, especially large images. The basic structure of CNN includes two layers: a convolutional layer and a pooling layer. Optionally, a 10-layer convolutional neural network is used. Because the more layers of the neural network, the longer the calculation time and the higher the degree of expression recognition, the 10-layer convolutional neural network can achieve training in a short time. Precision requirements. In this embodiment, the expression recognition model includes: an eye recognition model, a lip recognition model, and a head recognition model. Before step S10, that is, the first target face image is regularly collected according to the first time interval, and the first target face image is input into the expression recognition model, and before the step of obtaining the first recognition result, as shown in FIG. 3 The classroom monitoring method provided in the embodiment of the present application further includes:

S61: Obtain a sample of a face image, and obtain an eye training image, a lip training image, and a head training image according to the sample of the face image.

The sample of the face image may be a face image of a student collected in advance.

Specifically, the corresponding regions can be divided in advance according to the face image as the eye region image, lip region image, and head region image, and then the eye region image, lip region image, and head region obtained from the sample set are respectively The collection of images is used as the eye training image, the lip training image, and the head training image, respectively. The method for dividing the area may be preset according to requirements, and is not specifically limited in the embodiment of the present application.

S62: The eye training image is input into an eye recognition model, and a standard eye recognition result is obtained.

The standard eye recognition result refers to a result recognized by the eye recognition model according to eye features in the eye training image. Optionally, the eyebrow feature in the eye training image may be used as the eye feature, and the standard definition is based on five preset expressions of listening, doubt, understanding, resistance, and disdain. For example, the eyebrow characteristics corresponding to the listening expression should be The eyebrows stretch naturally, and the angle β between the ends of the eyebrows and the midpoint of the eye is less than or equal to 120 degrees.

Specifically, by classifying eye training images according to eyebrow features, and inputting the classified eye training images into an eye recognition model, eyebrow features corresponding to five expressions of listening, doubting, understanding, resisting, and disdain are obtained. The combination of the five corresponding eyebrow features and expressions can be used as the standard eye recognition result.

S63: The lip training image is input into a lip recognition model to obtain a standard lip recognition result.

The standard lip recognition result refers to a result recognized by the lip recognition model according to lip features in the lip training image. Optionally, the mouth corner feature in the lip training image may be used as the lip feature. For example, when the expression is a doubtful expression, the corresponding mouth corner feature shall be the mouth corner moving downward, that is, the mouth corner feature line and its bisector line show a negative deviation.

Specifically, the standard definition is based on five preset expressions of listening, doubt, understanding, resistance, and disdain. By classifying the lip training images according to the corner features of the lips, the classified lip training images are input to the lip recognition model. In the analysis, the mouth corner features corresponding to the five expressions of listening, doubt, understanding, resistance, and disdain are obtained, and the combination of the five corresponding mouth corner features and expressions can be used as the standard lip recognition result.

S64: The head training image is input into a head recognition model to obtain a standard head recognition result.

The standard head recognition result refers to a result recognized by the head recognition model according to the head features in the head training image. Optionally, the head corner feature in the head training image may be used as the head feature. For example, when the expression is resisted, the head corner feature should be a bow or a twist, and the head angle change α is greater than 60 degrees.

Specifically, the standard definition is based on the five expressions of listening, doubt, understanding, resistance, and disdain. By classifying the head training image according to the features of the mouth corners, the classified head training image is input into the head recognition model, and The mouth corner features corresponding to the five expressions of listening, doubting, understanding, resisting and disdain. The combination of the five corresponding head corner features and expressions can be used as the standard head recognition result.

S65: The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into a standard expression recognition result set.

Specifically, the features corresponding to the standard eye recognition result, standard lip recognition result, and standard head recognition result are respectively composed into a standard expression recognition result set. For example, when the expression is doubtful, the corresponding eye characteristics are the two ends of the eyebrows and The angle β of the midpoint of the eye is greater than 120 degrees. The corresponding lip feature is a negative deviation between the mouth corner feature line and its bisector. The corresponding head feature is that the head angle changes between 0 and 60 degrees. By combining all standard eye recognition results, standard lip recognition results, and standard head recognition results, a standard expression recognition result set is formed.

In the embodiment corresponding to FIG. 3, an eye training image, a lip training image, and a head training image are obtained according to a sample set of a face image, and then the eye training images are respectively input into an eye recognition model to obtain a standard Eye recognition results; input the lip training image into the lip recognition model to get the standard lip recognition result; input the head training image into the head recognition model to get the standard head recognition result; finally, standard eye The recognition result, standard lip recognition result, and standard head recognition result form a standard expression recognition result set, which can provide support for expression recognition of the target face image collected subsequently.

In an embodiment, as shown in FIG. 4, in step S10, that is, the first target face image is periodically collected according to a first time interval, and specifically includes the following steps:

S11: Obtain the original video data according to the first time interval.

Among them, the original video data is video data collected by cameras in the classroom. If there are a plurality of cameras, the original video data of the corresponding cameras can be obtained according to the identifier corresponding to the first target face image.

Specifically, the camera of the classroom is opened to collect video, and the original video data is acquired at a first time interval.

S12: Frame and normalize the original video data to obtain a first target face image.

Specifically, the framing processing refers to dividing the original video data according to a preset time to obtain at least one frame of a video image to be identified. Among them, normalization is a way of simplifying calculations, that is, transforming dimensional expressions into dimensionless expressions and becoming scalars. For example, in the original video data in this embodiment, a facial area of the first target is required to extract corresponding facial features, so the pixels of the framed video image to be identified need to be normalized to uniform pixels, for example, 260 * 260 to obtain the first target face image for subsequent recognition of each frame of the video image to be identified.

In the embodiment corresponding to FIG. 4, by obtaining the original video data periodically according to a first time interval, and then framing and normalizing the original video data, a first target face image is obtained, which can correspond to subsequent face images. Provides support for facial expression recognition.

In one embodiment, the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting, and disdain. As shown in FIG. 5, in step S10, the first target face image is input into the expression recognition model. To obtain the first recognition result, which may specifically include the following steps:

S13: Use a face feature point detection algorithm to obtain a first target face feature point from a first target face image.

The first target face feature point refers to a feature coordinate point obtained from the first target face image according to a preset requirement. The facial feature point detection algorithm refers to an algorithm that automatically locates facial feature points based on the input facial image. Optionally, the following facial feature point detection algorithms may be adopted to obtain facial feature point information:

(1) OpenCV's own Viola-Jones algorithm based on Harr features;

Among them, OpenCV is a cross-platform computer vision library that can run on Linux, Windows, Android, and Mac OS operating systems. It consists of a series of C functions and a small number of C ++ classes. It also provides interfaces for languages such as Python, Ruby, and MATLAB. Many general algorithms in image processing and computer vision have been implemented, and the Viola-Jones algorithm based on Harr features is one of facial feature point detection algorithms. Haar feature is a feature that reflects the gray change of the image, and is a feature that reflects the pixel sub-module difference. Haar features are divided into three categories: edge features, linear features, and center-diagonal features. The Viola-Jones algorithm is a method for face detection based on haar feature values of a face.

(2) dlib based on HOG + SVM features;

Among them, dlib is a modern C ++ toolbox, which contains machine learning algorithms and tools for creating complex software in C ++ to solve practical problems. HOG is Histogram of Oriented Gradient (HOG), SVM ( Support Vector Machine (Machine) refers to a support vector machine, which is a common discrimination method. It is usually used for pattern recognition, classification, and regression analysis. HOG features combined with SVM classifiers are widely used in image recognition.

(3) Three face detection methods (DPM, HeadHunter and HeadHunter_baseline) from the doppia library.

Among them, DPM (Deformable Part Model) is an object detection algorithm, which has become an important part of many classifiers, segmentation, human pose and behavior classification. DPM can be seen as an extension of HOG. The method is to first calculate the gradient direction histogram, and then use SVM to obtain the target gradient model, and then classify it so that the model matches the target. The HeadHunter and HeadHunter_baseline algorithms are the same in method as DPM, the difference is that the models used are different.

The following uses the (1) face feature point detection algorithm as an example to illustrate the process of obtaining the first target face feature point: First, a sample image of the input face image is obtained, and the sample image is preprocessed (normalized). Perform training to obtain a facial feature point model, that is, the Viola-Jones algorithm of Harr features; then obtain the input first target face image, perform the same preprocessing on the first target face image, and then sequentially perform skin color region segmentation, and human Steps of face feature region segmentation and face feature region classification. Finally, the Viola-Jones algorithm based on Harr features is used to perform matching calculations with face feature region classification to obtain the first target face feature point.

S14: Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image.

According to the obtained first target facial feature points, the eye area image, the lip area image, and the head area image are respectively obtained, specifically:

The eye area image can use the facial feature point detection algorithm to first locate the left and right eye corner and eyebrow center coordinates in the same eye area, and then use the left and right eye corner horizontal coordinates as the left and right coordinates, and the eyebrow center ordinate as the upper side. The coordinates are the vertical coordinate of the corner of the eye plus the vertical distance from the position of the eyebrow center to the corner of the eye as the lower coordinate. The rectangular area formed by the coordinates of these four points is the eye area, and the eye area image is obtained by acquiring the eye area.

The lip area image can use facial feature points to first locate the left and right mouth corner and nose tip coordinates, and then use the left and right mouth corner abscissas as the left and right coordinates, take the nose tip ordinate as the upper coordinate, and add the mouth corner ordinate to The vertical distance from the tip of the upper nose to the corner of the mouth is the lower coordinate, and the rectangular area formed by these four coordinates is the lip area. The lip area image is obtained by acquiring the lip area.

The head area image may directly use the first target face image as the head area image.

S15: input the eye region image into the eye recognition model to obtain the test eye recognition result; input the lip region image into the lip recognition model to obtain the test lip recognition result; input the head region image to the head In the external recognition model, test head recognition results are obtained.

The test eye recognition result refers to the corresponding eye recognition result in the target face image obtained according to the real-time video data, the test lip recognition result and the test head recognition result, and so on.

Specifically, the eye region image is input into the eye recognition model, and according to the expression defined by the eye recognition model, the expression corresponding to the eye region image can be obtained, which is the test eye recognition result. Similarly, the lip area image is input to the lip recognition model to obtain the test lip recognition result; the head area image is input to the head recognition model to obtain the test head recognition result.

S16: Match the test eye recognition result, test lip recognition result, and test head recognition result with the standard expression recognition result set, and use the standard expression recognition result with a matching degree greater than a preset threshold as the output expression result. For listening, doubting, understanding, the first recognition result is normal; if the output expression result is resist or disdain, the first recognition result is abnormal.

The preset threshold is a preset threshold for the degree of matching between the test recognition result and the standard expression recognition result. The preset threshold can be set according to the specific situation of the expression recognition model, and is not specifically limited here. Understandably, the expression recognition results corresponding to the test eye recognition results, test lip recognition results, and test head recognition knots may be inconsistent, which are also different from the standard expression recognition results. Therefore, the test recognition results are compared with the standard expressions. Each standard expression recognition result of the recognition result set is matched, and if the degree of matching exceeds a preset threshold, then the corresponding standard expression recognition result is used as the output expression result.

Specifically, the test eye recognition result, test lip recognition result, and test head recognition result are matched with the standard facial expression recognition result set. If the overall matching result of a standard facial expression recognition result in the standard facial expression recognition result set is greater than If a threshold is set, the standard recognition result is used as the output expression result. Optionally, when the output expression result is listening, doubting, understanding, the first recognition result is normal; when the output expression result is resisting and disdain, the first recognition result is abnormal. For example, if the preset threshold is 80%, a student's test eye recognition result, test lip recognition result, and test head recognition result are matched with the standard expression recognition result set and found. The matched result matches the standard expression recognition The matching expression of the disdain expression in the result set exceeds a preset threshold of 80%, then the standard expression recognition result of the disdain is used as the output expression result, and the first recognition result is abnormal at this time.

In the embodiment corresponding to FIG. 5, a first target face feature point is obtained from a first target face image by using a face feature point detection algorithm, and then the first target face image is divided according to the first target face feature point. To obtain the eye area image, lip area image, and head area image; and then input the eye area image, lip area image, and head area image to the eye recognition model, lip recognition model, and head recognition model, respectively The test eye recognition result, test lip recognition result, and test head recognition result are obtained. Finally, the test eye recognition result, test lip recognition result, and test head recognition result are matched with a standard expression recognition result set. The standard expression recognition result with a matching degree greater than a preset threshold is used as the output expression result. If the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal. Implementing the facial recognition model through three recognition models can improve the accuracy of facial recognition model recognition and improve the efficiency of classroom monitoring.

In an embodiment, the classroom monitoring method provided in the embodiment of the present application further includes:

Obtain the reference point position information in real time, collect the third target face image according to the reference point position information; then input the third target face image into the expression recognition model to obtain a third recognition result. If the third recognition result is abnormal, then Send a third prompt message to the monitoring end.

The reference point position information refers to the position information of the teacher. The position information of the teacher can be obtained by obtaining the GPS positioning information of the mobile phone of the teacher, or the teacher can be tracked and collected through the camera to locate the position information of the teacher.

Understandably, when the teacher is in the podium area and faces the students, because the entire class of students are within the teacher's line of sight, the third target face image is not collected. When the teacher walks off the podium, a blind spot appears or the teacher leaves the classroom, the third target face image is collected.

Specifically, a position information mapping table may be established in advance, and according to the position information mapping table, a corresponding third target face image to be collected may be acquired from the position information mapping table according to the real-time position information of the teacher. For example, when the teacher is standing on the left side of the classroom, the student on the right side of the classroom is taken as the object that needs to be collected for the third target face image; when the teacher is standing on the right side of the classroom, the student on the left side of the classroom is taken Objects that need to be collected for the third target face image. The setting of the specific location information mapping table may be set according to actual needs and scenarios, and is not limited herein. Then, the collected third target face image is input into an expression recognition model to obtain a third recognition result. If the third recognition result is abnormal, a third prompt message is sent to the monitoring end. This identification and message sending process is similar to the corresponding steps of the foregoing embodiment, and is not repeated here.

Further, a third target face image whose third recognition result is abnormal may be added to the focus attention queue, where the focus attention queue includes the attention person identification. It should be understood that the process of adding a third target face image whose third recognition result is abnormal to the focus attention queue is parallel to the process of adding a face image whose recognition result is abnormal to the focus attention queue in the above embodiment. process. Optionally, a face image whose third recognition result is abnormal may also be added to the focus attention queue to implement the update of the focus attention queue and more specifically monitor the classroom effect.

In this embodiment, by acquiring the position information of the teacher in real time and collecting the third target face image according to the position of the teacher's reference point, the amount of video data can be reduced; by inputting the third target face image into the expression In the recognition model, a third recognition result is obtained. If the third recognition result is abnormal, a third prompt message is sent to the monitoring end, which can more specifically obtain the class status of the student in the classroom, find students who have a small difference, and reduce data collection. And the amount of calculations for identification, improving the efficiency of classroom monitoring.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In one embodiment, a classroom monitoring device is provided. The classroom monitoring device corresponds to the classroom monitoring method in the foregoing embodiment. As shown in FIG. 6, the classroom monitoring apparatus includes a first recognition result obtaining module 10, a first prompt message sending module 20, a focus attention queue adding module 30, a second recognition result obtaining module 40, and a second prompt message sending module 50. The detailed description of each function module is as follows:

A first recognition result acquisition module 10 is configured to periodically collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result.

The first prompt message sending module 20 is configured to send a first prompt message to the monitoring end if the first identification result is abnormal.

The focus attention queue adding module 30 is configured to add a first target face image whose first recognition result is abnormal to the focus attention queue, wherein the focus attention queue includes a person identification.

A second recognition result acquisition module 40 is configured to regularly collect a second target face image according to the second time interval and the identifier of the person of interest in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second Identify the results.

The second prompt message sending module 50 is configured to send a second prompt message to the monitoring end if the second identification result is abnormal.

Further, the expression recognition model is an expression recognition model trained using a convolutional neural network, and includes: an eye recognition model, a lip recognition model, and a head recognition model. The classroom monitoring device further includes a standard result set acquisition module. The standard result set acquisition module includes a training image acquisition unit, a standard eye recognition result acquisition unit, a standard lip recognition result acquisition unit, a standard head recognition result acquisition unit, and a standard result set. Get the unit.

A training image obtaining unit is configured to obtain a sample of a face image, and obtain an eye training image, a lip training image, and a head training image according to the sample of the face image.

A standard eye recognition result acquisition unit is configured to input an eye training image into an eye recognition model to obtain a standard eye recognition result.

A standard lip recognition result acquisition unit is used to input a lip training image into a lip recognition model to obtain a standard lip recognition result.

A standard head recognition result acquisition unit is configured to input a head training image into a head recognition model to obtain a standard head recognition result.

The standard result set acquisition unit is configured to combine a standard eye recognition result, a standard lip recognition result, and a standard head recognition result into a standard expression recognition result set.

Further, the standard expression recognition result set includes five expression results of listening, doubt, understanding, resistance, and disdain. The first recognition result acquisition module 10 includes a video data acquisition unit and a first target face image acquisition unit.

The video data acquiring unit is configured to acquire the original video data at a timing according to the first time interval.

The first target face image acquisition unit is configured to perform frame framing and normalization processing on the original video data to obtain a first target face image.

Further, as shown in FIG. 7, the first recognition result acquisition module 10 further includes a facial feature point acquisition unit 13, an area image acquisition unit 14, a test recognition result acquisition unit 15, and a first recognition result unit 16.

The facial feature point acquiring unit 13 is configured to acquire a first target facial feature point from a first target facial image by using a facial feature point detection algorithm.

An area image acquisition unit 14 is configured to divide a first target face image according to a first target face feature point to obtain an eye area image, a lip area image, and a head area image.

The test recognition result acquisition unit 15 is configured to input an eye area image into an eye recognition model to obtain a test eye recognition result; input a lip area image into a lip recognition model to obtain a test lip recognition result; The head region image is input into the head recognition model, and the test head recognition result is obtained.

The first recognition result obtaining unit 16 is configured to match the test eye recognition result, test lip recognition result, and test head recognition result with a standard expression recognition result set, and use the standard expression recognition result with a matching degree greater than a preset threshold as An expression result is output. If the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.

Further, the classroom monitoring device further includes a reference point position acquisition module, a third recognition result acquisition module, and a third prompt message sending module.

The reference point position acquisition module is configured to acquire the reference point position information in real time, and acquire a third target face image according to the reference point position information.

A third recognition result acquisition module is configured to input a third target face image into an expression recognition model to obtain a third recognition result.

The third prompt message sending module is configured to send a third prompt message to the monitoring end if the third identification result is abnormal.

For the specific limitation of the classroom monitoring device, refer to the foregoing limitation on the classroom monitoring method, which is not repeated here. Each module in the above-mentioned classroom monitoring device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 8. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer device is used to store the original video data, the first target face image, the second target face image, the third target face image, the focus attention queue, and the prompt message. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a classroom monitoring method.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:

Adding a first target face image whose first recognition result is abnormal to the focus attention queue, where the focus attention queue includes a person's identity of attention;

Regularly collect a second target face image according to the second time interval and the attention person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result;

In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor performs the following steps:

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to describe the technical solution of the present application, but are not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should be included in Within the scope of this application.

Claims

A classroom monitoring method includes the following steps:

Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;

If the first identification result is abnormal, sending a first prompt message to the monitoring end;

Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;

Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;

If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
The classroom monitoring method according to claim 1, wherein the expression recognition model is an expression recognition model trained using a convolutional neural network, and the expression recognition model comprises: an eye recognition model, a lip recognition model, and Head recognition model;

Before the step of regularly collecting a first target face image according to a first time interval, inputting the first target face image into an expression recognition model, and obtaining a first recognition result, the classroom monitoring method further includes:

Obtaining a sample of a face image, and obtaining an eye training image, a lip training image, and a head training image according to the sample of the face image;

Inputting the eye training image into the eye recognition model to obtain a standard eye recognition result;

Inputting the lip training image into the lip recognition model to obtain a standard lip recognition result;

Inputting the head training image into the head recognition model to obtain a standard head recognition result;

The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into the standard facial expression recognition result set.
The classroom monitoring method according to claim 2, wherein the step of regularly acquiring a first target face image according to a first time interval comprises:

Obtaining the original video data at a timing according to the first time interval;

Frame and normalize the original video data to obtain a first target face image.
The classroom monitoring method according to claim 3, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;

The inputting the first target face image into an expression recognition model to obtain a first recognition result includes:

Using a face feature point detection algorithm to obtain a first target face feature point from the first target face image;

Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;

Input the eye region image into the eye recognition model to obtain a test eye recognition result; input the lip region image into the lip recognition model to obtain a test lip recognition result; Inputting the head region image into the head recognition model to obtain a test head recognition result;

Matching the test eye recognition result, the test lip recognition result, and the test head recognition result with the standard expression recognition result set, and using the standard expression recognition result with a degree of matching greater than a preset threshold as an output expression Results; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
The classroom monitoring method according to claim 1, wherein after the step of sending a second prompt message to the monitoring terminal if the second recognition result is abnormal, the method further comprises:

And releasing the second target face image whose second recognition result is normal from the focus attention queue.
The classroom monitoring method according to claim 1, wherein the classroom monitoring method further comprises:

Acquire the reference point position information in real time, and collect a third target face image according to the reference point position information;

Inputting the third target face image into the expression recognition model to obtain a third recognition result;

If the third identification result is abnormal, a third prompt message is sent to the monitoring end.
A classroom monitoring device, comprising:

A first recognition result acquisition module, configured to regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;

A first prompt message sending module, configured to send a first prompt message to a monitoring end if the first identification result is abnormal;

The focus attention queue adding module is configured to add the first target face image with an abnormality in the first recognition result to a focus attention queue, where the focus attention queue includes an attention person identification;

A second recognition result acquisition module, configured to regularly collect a second target face image according to a second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image to the expression In the recognition model, a second recognition result is obtained;

A second prompt message sending module is configured to send a second prompt message to the monitoring terminal if the second identification result is abnormal.
The classroom monitoring device according to claim 7, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;

The first recognition result acquisition module includes a face feature point acquisition unit, an area image acquisition unit, a test recognition result acquisition unit, and a first recognition result unit;

The face feature point obtaining unit is configured to obtain a first target face feature point from the first target face image by using a face feature point detection algorithm;

The area image obtaining unit is configured to divide the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;

The test recognition result acquisition unit is configured to input the eye region image into an eye recognition model to obtain a test eye recognition result; input the lip region image into a lip recognition model to obtain a test lip Input the head region image into a head recognition model to obtain a test head recognition result;

The first recognition result acquiring unit is configured to match the test eye recognition result, the test lip recognition result, and the test head recognition result with a standard expression recognition result set, and set the matching degree to be greater than a preset The threshold standard expression recognition result is used as the output expression result; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first The recognition result is abnormal.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the computer-readable instructions as follows: step:

Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;

If the first identification result is abnormal, sending a first prompt message to the monitoring end;

Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;

Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;

If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
The computer device according to claim 9, wherein the expression recognition model is an expression recognition model trained using a convolutional neural network, and the expression recognition model comprises: an eye recognition model, a lip recognition model, and a head Ministry identification model;

Before the step of regularly collecting a first target face image according to a first time interval, inputting the first target face image into an expression recognition model, and obtaining a first recognition result, the processor executes the computer The readable instructions also implement the following steps:

Obtaining a sample of a face image, and obtaining an eye training image, a lip training image, and a head training image according to the sample of the face image;

Inputting the eye training image into the eye recognition model to obtain a standard eye recognition result;

Inputting the lip training image into the lip recognition model to obtain a standard lip recognition result;

Inputting the head training image into the head recognition model to obtain a standard head recognition result;

The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into the standard facial expression recognition result set.
The computer device according to claim 10, wherein the periodically acquiring a first target face image according to a first time interval comprises:

Obtaining the original video data at a timing according to the first time interval;

Frame and normalize the original video data to obtain a first target face image.
The computer device according to claim 11, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;

The inputting the first target face image into an expression recognition model to obtain a first recognition result includes:

Using a face feature point detection algorithm to obtain a first target face feature point from the first target face image;

Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;

Input the eye region image into the eye recognition model to obtain a test eye recognition result; input the lip region image into the lip recognition model to obtain a test lip recognition result; Inputting the head region image into the head recognition model to obtain a test head recognition result;

Matching the test eye recognition result, the test lip recognition result, and the test head recognition result with the standard expression recognition result set, and using the standard expression recognition result with a degree of matching greater than a preset threshold as an output expression Results; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
The computer device according to claim 9, wherein after the step of sending a second prompt message to the monitoring terminal if the second recognition result is abnormal, the processor executes the computer The following steps are also implemented when reading instructions:

And releasing the second target face image whose second recognition result is normal from the focus attention queue.
The computer device of claim 9, wherein the processor further implements the following steps when executing the computer-readable instructions:

Acquire the reference point position information in real time, and collect a third target face image according to the reference point position information;

Inputting the third target face image into the expression recognition model to obtain a third recognition result;

If the third identification result is abnormal, a third prompt message is sent to the monitoring end.
One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:

Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;

If the first identification result is abnormal, sending a first prompt message to the monitoring end;

Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;

Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;

If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
The non-volatile readable storage medium according to claim 15, wherein the expression recognition model is an expression recognition model trained using a convolutional neural network, and the expression recognition model comprises: an eye recognition model, Lip recognition model and head recognition model;

The computer-readable instructions are processed before the step of regularly collecting a first target face image according to a first time interval, inputting the first target face image into an expression recognition model, and obtaining a first recognition result. The processor also implements the following steps when executing:

Acquiring a sample of a face image, and acquiring an eye training image, a lip training image, and a head training image according to the sample of the face image;

Inputting the eye training image into the eye recognition model to obtain a standard eye recognition result;

Inputting the lip training image into the lip recognition model to obtain a standard lip recognition result;

Inputting the head training image into the head recognition model to obtain a standard head recognition result;

The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into the standard facial expression recognition result set.
The non-volatile readable storage medium according to claim 16, wherein the step of regularly acquiring a first target face image according to a first time interval comprises:

Obtaining the original video data at a timing according to the first time interval;

Frame and normalize the original video data to obtain a first target face image.
The non-volatile readable storage medium according to claim 17, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;

The inputting the first target face image into an expression recognition model to obtain a first recognition result includes:

Using a face feature point detection algorithm to obtain a first target face feature point from the first target face image;

Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;

Input the eye region image into the eye recognition model to obtain a test eye recognition result; input the lip region image into the lip recognition model to obtain a test lip recognition result; Inputting the head region image into the head recognition model to obtain a test head recognition result;

Matching the test eye recognition result, the test lip recognition result, and the test head recognition result with the standard expression recognition result set, and using the standard expression recognition result with a degree of matching greater than a preset threshold as an output expression Results; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
The non-volatile readable storage medium according to claim 15, wherein after the step of sending a second prompt message to the monitoring terminal if the second identification result is abnormal, the computer When the readable instructions are executed by one or more processors, the one or more processors further perform the following steps:

And releasing the second target face image whose second recognition result is normal from the focus attention queue.
The non-volatile readable storage medium of claim 15, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors further perform the following steps:

Acquire the reference point position information in real time, and collect a third target face image according to the reference point position information;

Inputting the third target face image into the expression recognition model to obtain a third recognition result;

If the third identification result is abnormal, a third prompt message is sent to the monitoring end.