WO2020024400A1 - Class monitoring method and apparatus, computer device, and storage medium - Google Patents

Class monitoring method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2020024400A1
WO2020024400A1 PCT/CN2018/106435 CN2018106435W WO2020024400A1 WO 2020024400 A1 WO2020024400 A1 WO 2020024400A1 CN 2018106435 W CN2018106435 W CN 2018106435W WO 2020024400 A1 WO2020024400 A1 WO 2020024400A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition result
target face
face image
expression
lip
Prior art date
Application number
PCT/CN2018/106435
Other languages
French (fr)
Chinese (zh)
Inventor
周建伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020024400A1 publication Critical patent/WO2020024400A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources

Definitions

  • the present application belongs to the field of image processing, and more specifically, relates to a classroom monitoring method, device, computer equipment, and storage medium.
  • the embodiments of the present application provide a classroom monitoring method, device, computer equipment, and storage medium to solve the technical problem of low efficiency in identifying a student's class status due to an excessive amount of data.
  • a classroom monitoring method includes:
  • a classroom monitoring device includes:
  • a first recognition result acquisition module configured to regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
  • a first prompt message sending module configured to send a first prompt message to a monitoring terminal if the first identification result is abnormal
  • the focus attention queue adding module is configured to add the first target face image with an abnormality in the first recognition result to a focus attention queue, where the focus attention queue includes an attention person identification;
  • a second recognition result acquisition module configured to regularly collect a second target face image according to a second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image to the expression In the recognition model, a second recognition result is obtained;
  • a second prompt message sending module is configured to send a second prompt message to the monitoring terminal if the second identification result is abnormal.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
  • FIG. 1 is a schematic diagram of an application environment of a classroom monitoring method according to an embodiment of the present application
  • FIG. 2 is a flowchart of a classroom monitoring method according to an embodiment of the present application.
  • FIG. 3 is another flowchart of a classroom monitoring method according to an embodiment of the present application.
  • FIG. 5 is another flowchart of a classroom monitoring method according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a classroom monitoring device according to an embodiment of the present application.
  • FIG. 7 is a principle block diagram of a first recognition result acquisition module in a classroom monitoring apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a computer device in an embodiment of the present application.
  • the classroom monitoring method provided by this application can be applied in the application environment as shown in FIG. 1, wherein the monitoring end communicates with the server through the network, and the server collects the first target face image through the camera at a first time interval, and The obtained first target face image is input into an expression recognition model to obtain a first recognition result. If the first recognition result is abnormal, the server sends a first prompt message to the monitoring end, and the first recognition result is abnormal. The first target face image is added to the focus queue, where the focus queue includes the person's identity; then the server collects the second target face through the camera according to the second time interval and the person's identity in the focus group. For the image, the obtained second target face image is input into an expression recognition model to obtain a second recognition result.
  • the server sends a second prompt message to the monitoring end.
  • the monitoring end may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a classroom monitoring method is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • S10 Collect a first target face image according to a first time interval, input the first target face image into an expression recognition model, and obtain a first recognition result.
  • the first time interval is a preset time period, and can be specifically set according to actual needs, such as 5 minutes, 8 minutes, or 10 minutes.
  • the first target face image refers to the face images of all students in the entire class.
  • a plurality of cameras can be set in the classroom, and each camera collects a face image of a student in a fixed area, so as to realize the collection of the first target face image. Understandably, the greater the number of cameras, the higher the acquisition accuracy.
  • the camera collects video data, and the face image is obtained after the video data is framed by a predetermined frame rate.
  • the expression recognition model is a recognition model for judging the emotions of faces in the current image.
  • the expression recognition model can judge the probability values of the faces in the current image corresponding to a variety of emotions. If the probability value of a certain emotion exceeds the corresponding By presetting the threshold, the emotion corresponding to the standard face image is the first recognition result.
  • the emotions in the expression recognition model can be set to five types: listening, doubting, understanding, resisting, and disdain. Specifically, a large number of sample images representing the five emotions can be collected in advance to form a sample image set, and then a corresponding neural network model or classifier is selected for training, and an expression recognition model is finally obtained.
  • a first prompt message is sent according to the output first recognition result.
  • the expression recognition model will determine the expression result corresponding to the facial area in the first target face image; for example, listening, doubting, understanding, resisting, or disdain.
  • the expression recognition model determines that the emotion corresponding to the facial area in the first target face image is resistive or disdain, the corresponding output first recognition result is abnormal; when the expression recognition model determines the first target face image When the emotion corresponding to the middle facial area is listening, doubting, or understanding, the first recognition result corresponding to the output is normal.
  • a first prompt message is sent to the monitoring terminal to notify the student that the student may be in a state of opening error.
  • the monitoring end is a mobile phone or a computer or other communication device of a teacher, a class teacher, or other relevant person, so that the relevant person can understand the class status of the student, and accordingly make evaluations or stop actions.
  • the focus attention queue refers to a queue composed of students who may be in a state of being in a state of being in a state of small gaps.
  • the queue may be embodied by focusing on the identity of the person.
  • the follower logo is a logo used to distinguish students who need to focus on. For example, it can be a student number, an ID number, or a seat number.
  • S40 Collect a second target face image at regular intervals according to the second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result.
  • the second time interval is also a preset time period, which can be specifically set according to actual needs, such as 1 minute, 3 minutes, or 5 minutes. Preferably, the second time interval is smaller than the first time interval.
  • the second target face image refers to the face image of the student corresponding to the person's identity. Specifically, the corresponding camera can be found by following the person's identity, and the second target face image is obtained from the camera.
  • the method for acquiring the second target face image is similar to the method for acquiring the first target face image in step S10, and details are not described herein again.
  • the second target face image is input into the expression recognition model, and the process of obtaining the second recognition result is similar to the corresponding step in step S10, which is not repeated here.
  • a second prompt message is sent according to the output second recognition result.
  • the expression recognition model may determine the emotion corresponding to the facial area in the second target face image, such as listening, doubting, understanding, resisting, or disdain.
  • the corresponding output second recognition result is abnormal; when the expression recognition model determines the second target face image
  • the second recognition result corresponding to the output is normal. If the second recognition result is abnormal, a second prompt message is sent to the monitoring terminal to indicate that the student may be in a state of opening error.
  • a second target face image whose second recognition result is abnormal is added to the focus attention queue.
  • the second target face image whose second recognition result is abnormal is added to the focus attention queue while the second target face image whose second recognition result is normal is released from the focus attention queue.
  • a first target face image is periodically collected according to a first time interval, and the first target face image is input into an expression recognition model to obtain a first recognition result. If it is abnormal, send a first prompt message to the monitoring end; then adjust the focus attention queue according to the first recognition result, and periodically collect a second target face image according to the identification of the person in the focus attention queue and the second time interval, and The second target face image is input into the expression recognition model to obtain a second recognition result. If the second recognition result is abnormal, a second prompt message is sent to the monitoring end.
  • the expression recognition model is an expression recognition model trained using a convolutional neural network.
  • Convolutional Neural Network is a multilayer neural network that is good at processing related machine learning problems of images, especially large images.
  • the basic structure of CNN includes two layers: a convolutional layer and a pooling layer.
  • a 10-layer convolutional neural network is used. Because the more layers of the neural network, the longer the calculation time and the higher the degree of expression recognition, the 10-layer convolutional neural network can achieve training in a short time. Precision requirements.
  • the expression recognition model includes: an eye recognition model, a lip recognition model, and a head recognition model.
  • step S10 that is, the first target face image is regularly collected according to the first time interval, and the first target face image is input into the expression recognition model, and before the step of obtaining the first recognition result, as shown in FIG. 3
  • the classroom monitoring method provided in the embodiment of the present application further includes:
  • S61 Obtain a sample of a face image, and obtain an eye training image, a lip training image, and a head training image according to the sample of the face image.
  • the sample of the face image may be a face image of a student collected in advance.
  • the corresponding regions can be divided in advance according to the face image as the eye region image, lip region image, and head region image, and then the eye region image, lip region image, and head region obtained from the sample set are respectively
  • the collection of images is used as the eye training image, the lip training image, and the head training image, respectively.
  • the method for dividing the area may be preset according to requirements, and is not specifically limited in the embodiment of the present application.
  • the standard eye recognition result refers to a result recognized by the eye recognition model according to eye features in the eye training image.
  • the eyebrow feature in the eye training image may be used as the eye feature, and the standard definition is based on five preset expressions of listening, doubt, understanding, resistance, and disdain.
  • the eyebrow characteristics corresponding to the listening expression should be The eyebrows stretch naturally, and the angle ⁇ between the ends of the eyebrows and the midpoint of the eye is less than or equal to 120 degrees.
  • eyebrow features corresponding to five expressions of listening, doubting, understanding, resisting, and disdain are obtained.
  • the combination of the five corresponding eyebrow features and expressions can be used as the standard eye recognition result.
  • S63 The lip training image is input into a lip recognition model to obtain a standard lip recognition result.
  • the standard lip recognition result refers to a result recognized by the lip recognition model according to lip features in the lip training image.
  • the mouth corner feature in the lip training image may be used as the lip feature.
  • the corresponding mouth corner feature shall be the mouth corner moving downward, that is, the mouth corner feature line and its bisector line show a negative deviation.
  • the standard definition is based on five preset expressions of listening, doubt, understanding, resistance, and disdain.
  • the classified lip training images are input to the lip recognition model.
  • the mouth corner features corresponding to the five expressions of listening, doubt, understanding, resistance, and disdain are obtained, and the combination of the five corresponding mouth corner features and expressions can be used as the standard lip recognition result.
  • S64 The head training image is input into a head recognition model to obtain a standard head recognition result.
  • the standard head recognition result refers to a result recognized by the head recognition model according to the head features in the head training image.
  • the head corner feature in the head training image may be used as the head feature.
  • the head corner feature should be a bow or a twist, and the head angle change ⁇ is greater than 60 degrees.
  • the standard definition is based on the five expressions of listening, doubt, understanding, resistance, and disdain.
  • the classified head training image is input into the head recognition model, and The mouth corner features corresponding to the five expressions of listening, doubting, understanding, resisting and disdain.
  • the combination of the five corresponding head corner features and expressions can be used as the standard head recognition result.
  • S65 The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into a standard expression recognition result set.
  • the features corresponding to the standard eye recognition result, standard lip recognition result, and standard head recognition result are respectively composed into a standard expression recognition result set.
  • the corresponding eye characteristics are the two ends of the eyebrows and The angle ⁇ of the midpoint of the eye is greater than 120 degrees.
  • the corresponding lip feature is a negative deviation between the mouth corner feature line and its bisector.
  • the corresponding head feature is that the head angle changes between 0 and 60 degrees.
  • an eye training image, a lip training image, and a head training image are obtained according to a sample set of a face image, and then the eye training images are respectively input into an eye recognition model to obtain a standard Eye recognition results; input the lip training image into the lip recognition model to get the standard lip recognition result; input the head training image into the head recognition model to get the standard head recognition result; finally, standard eye
  • the recognition result, standard lip recognition result, and standard head recognition result form a standard expression recognition result set, which can provide support for expression recognition of the target face image collected subsequently.
  • step S10 that is, the first target face image is periodically collected according to a first time interval, and specifically includes the following steps:
  • the original video data is video data collected by cameras in the classroom. If there are a plurality of cameras, the original video data of the corresponding cameras can be obtained according to the identifier corresponding to the first target face image.
  • the camera of the classroom is opened to collect video, and the original video data is acquired at a first time interval.
  • the framing processing refers to dividing the original video data according to a preset time to obtain at least one frame of a video image to be identified.
  • normalization is a way of simplifying calculations, that is, transforming dimensional expressions into dimensionless expressions and becoming scalars.
  • a facial area of the first target is required to extract corresponding facial features, so the pixels of the framed video image to be identified need to be normalized to uniform pixels, for example, 260 * 260 to obtain the first target face image for subsequent recognition of each frame of the video image to be identified.
  • a first target face image is obtained, which can correspond to subsequent face images.
  • a first target face image is obtained, which can correspond to subsequent face images.
  • the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting, and disdain.
  • step S10 the first target face image is input into the expression recognition model.
  • the first recognition result which may specifically include the following steps:
  • S13 Use a face feature point detection algorithm to obtain a first target face feature point from a first target face image.
  • the first target face feature point refers to a feature coordinate point obtained from the first target face image according to a preset requirement.
  • the facial feature point detection algorithm refers to an algorithm that automatically locates facial feature points based on the input facial image.
  • the following facial feature point detection algorithms may be adopted to obtain facial feature point information:
  • OpenCV is a cross-platform computer vision library that can run on Linux, Windows, Android, and Mac OS operating systems. It consists of a series of C functions and a small number of C ++ classes. It also provides interfaces for languages such as Python, Ruby, and MATLAB.
  • Viola-Jones algorithm based on Harr features is one of facial feature point detection algorithms.
  • Haar feature is a feature that reflects the gray change of the image, and is a feature that reflects the pixel sub-module difference. Haar features are divided into three categories: edge features, linear features, and center-diagonal features.
  • the Viola-Jones algorithm is a method for face detection based on haar feature values of a face.
  • HOG Histogram of Oriented Gradient (HOG)
  • SVM Support Vector Machine (Machine) refers to a support vector machine, which is a common discrimination method. It is usually used for pattern recognition, classification, and regression analysis. HOG features combined with SVM classifiers are widely used in image recognition.
  • DPM Deformable Part Model
  • SVM Session-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-envelope-to-SVM.
  • HeadHunter and HeadHunter_baseline algorithms are the same in method as DPM, the difference is that the models used are different.
  • a facial feature point model that is, the Viola-Jones algorithm of Harr features
  • the eye area image, the lip area image, and the head area image are respectively obtained, specifically:
  • the eye area image can use the facial feature point detection algorithm to first locate the left and right eye corner and eyebrow center coordinates in the same eye area, and then use the left and right eye corner horizontal coordinates as the left and right coordinates, and the eyebrow center ordinate as the upper side.
  • the coordinates are the vertical coordinate of the corner of the eye plus the vertical distance from the position of the eyebrow center to the corner of the eye as the lower coordinate.
  • the rectangular area formed by the coordinates of these four points is the eye area, and the eye area image is obtained by acquiring the eye area.
  • the lip area image can use facial feature points to first locate the left and right mouth corner and nose tip coordinates, and then use the left and right mouth corner abscissas as the left and right coordinates, take the nose tip ordinate as the upper coordinate, and add the mouth corner ordinate to The vertical distance from the tip of the upper nose to the corner of the mouth is the lower coordinate, and the rectangular area formed by these four coordinates is the lip area.
  • the lip area image is obtained by acquiring the lip area.
  • the head area image may directly use the first target face image as the head area image.
  • S15 input the eye region image into the eye recognition model to obtain the test eye recognition result; input the lip region image into the lip recognition model to obtain the test lip recognition result; input the head region image to the head In the external recognition model, test head recognition results are obtained.
  • the test eye recognition result refers to the corresponding eye recognition result in the target face image obtained according to the real-time video data, the test lip recognition result and the test head recognition result, and so on.
  • the eye region image is input into the eye recognition model, and according to the expression defined by the eye recognition model, the expression corresponding to the eye region image can be obtained, which is the test eye recognition result.
  • the lip area image is input to the lip recognition model to obtain the test lip recognition result;
  • the head area image is input to the head recognition model to obtain the test head recognition result.
  • S16 Match the test eye recognition result, test lip recognition result, and test head recognition result with the standard expression recognition result set, and use the standard expression recognition result with a matching degree greater than a preset threshold as the output expression result. For listening, doubting, understanding, the first recognition result is normal; if the output expression result is resist or disdain, the first recognition result is abnormal.
  • the preset threshold is a preset threshold for the degree of matching between the test recognition result and the standard expression recognition result.
  • the preset threshold can be set according to the specific situation of the expression recognition model, and is not specifically limited here. Understandably, the expression recognition results corresponding to the test eye recognition results, test lip recognition results, and test head recognition knots may be inconsistent, which are also different from the standard expression recognition results. Therefore, the test recognition results are compared with the standard expressions. Each standard expression recognition result of the recognition result set is matched, and if the degree of matching exceeds a preset threshold, then the corresponding standard expression recognition result is used as the output expression result.
  • test eye recognition result, test lip recognition result, and test head recognition result are matched with the standard facial expression recognition result set. If the overall matching result of a standard facial expression recognition result in the standard facial expression recognition result set is greater than If a threshold is set, the standard recognition result is used as the output expression result.
  • the preset threshold is 80%, a student's test eye recognition result, test lip recognition result, and test head recognition result are matched with the standard expression recognition result set and found.
  • the matched result matches the standard expression recognition
  • the matching expression of the disdain expression in the result set exceeds a preset threshold of 80%, then the standard expression recognition result of the disdain is used as the output expression result, and the first recognition result is abnormal at this time.
  • a first target face feature point is obtained from a first target face image by using a face feature point detection algorithm, and then the first target face image is divided according to the first target face feature point.
  • the test eye recognition result, test lip recognition result, and test head recognition result are obtained.
  • the test eye recognition result, test lip recognition result, and test head recognition result are matched with a standard expression recognition result set.
  • the standard expression recognition result with a matching degree greater than a preset threshold is used as the output expression result.
  • the facial recognition model can improve the accuracy of facial recognition model recognition and improve the efficiency of classroom monitoring.
  • the classroom monitoring method provided in the embodiment of the present application further includes:
  • the reference point position information refers to the position information of the teacher.
  • the position information of the teacher can be obtained by obtaining the GPS positioning information of the mobile phone of the teacher, or the teacher can be tracked and collected through the camera to locate the position information of the teacher.
  • the third target face image is not collected.
  • the teacher walks off the podium, a blind spot appears or the teacher leaves the classroom, the third target face image is collected.
  • a position information mapping table may be established in advance, and according to the position information mapping table, a corresponding third target face image to be collected may be acquired from the position information mapping table according to the real-time position information of the teacher. For example, when the teacher is standing on the left side of the classroom, the student on the right side of the classroom is taken as the object that needs to be collected for the third target face image; when the teacher is standing on the right side of the classroom, the student on the left side of the classroom is taken Objects that need to be collected for the third target face image.
  • the setting of the specific location information mapping table may be set according to actual needs and scenarios, and is not limited herein.
  • the collected third target face image is input into an expression recognition model to obtain a third recognition result. If the third recognition result is abnormal, a third prompt message is sent to the monitoring end. This identification and message sending process is similar to the corresponding steps of the foregoing embodiment, and is not repeated here.
  • a third target face image whose third recognition result is abnormal may be added to the focus attention queue, where the focus attention queue includes the attention person identification. It should be understood that the process of adding a third target face image whose third recognition result is abnormal to the focus attention queue is parallel to the process of adding a face image whose recognition result is abnormal to the focus attention queue in the above embodiment. process.
  • a face image whose third recognition result is abnormal may also be added to the focus attention queue to implement the update of the focus attention queue and more specifically monitor the classroom effect.
  • the amount of video data can be reduced; by inputting the third target face image into the expression In the recognition model, a third recognition result is obtained. If the third recognition result is abnormal, a third prompt message is sent to the monitoring end, which can more specifically obtain the class status of the student in the classroom, find students who have a small difference, and reduce data collection. And the amount of calculations for identification, improving the efficiency of classroom monitoring.
  • a classroom monitoring device corresponds to the classroom monitoring method in the foregoing embodiment.
  • the classroom monitoring apparatus includes a first recognition result obtaining module 10, a first prompt message sending module 20, a focus attention queue adding module 30, a second recognition result obtaining module 40, and a second prompt message sending module 50.
  • the detailed description of each function module is as follows:
  • a first recognition result acquisition module 10 is configured to periodically collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result.
  • the first prompt message sending module 20 is configured to send a first prompt message to the monitoring end if the first identification result is abnormal.
  • the focus attention queue adding module 30 is configured to add a first target face image whose first recognition result is abnormal to the focus attention queue, wherein the focus attention queue includes a person identification.
  • a second recognition result acquisition module 40 is configured to regularly collect a second target face image according to the second time interval and the identifier of the person of interest in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second Identify the results.
  • the second prompt message sending module 50 is configured to send a second prompt message to the monitoring end if the second identification result is abnormal.
  • the expression recognition model is an expression recognition model trained using a convolutional neural network, and includes: an eye recognition model, a lip recognition model, and a head recognition model.
  • the classroom monitoring device further includes a standard result set acquisition module.
  • the standard result set acquisition module includes a training image acquisition unit, a standard eye recognition result acquisition unit, a standard lip recognition result acquisition unit, a standard head recognition result acquisition unit, and a standard result set. Get the unit.
  • a training image obtaining unit is configured to obtain a sample of a face image, and obtain an eye training image, a lip training image, and a head training image according to the sample of the face image.
  • a standard eye recognition result acquisition unit is configured to input an eye training image into an eye recognition model to obtain a standard eye recognition result.
  • a standard lip recognition result acquisition unit is used to input a lip training image into a lip recognition model to obtain a standard lip recognition result.
  • a standard head recognition result acquisition unit is configured to input a head training image into a head recognition model to obtain a standard head recognition result.
  • the standard result set acquisition unit is configured to combine a standard eye recognition result, a standard lip recognition result, and a standard head recognition result into a standard expression recognition result set.
  • the standard expression recognition result set includes five expression results of listening, doubt, understanding, resistance, and disdain.
  • the first recognition result acquisition module 10 includes a video data acquisition unit and a first target face image acquisition unit.
  • the video data acquiring unit is configured to acquire the original video data at a timing according to the first time interval.
  • the first target face image acquisition unit is configured to perform frame framing and normalization processing on the original video data to obtain a first target face image.
  • the first recognition result acquisition module 10 further includes a facial feature point acquisition unit 13, an area image acquisition unit 14, a test recognition result acquisition unit 15, and a first recognition result unit 16.
  • the facial feature point acquiring unit 13 is configured to acquire a first target facial feature point from a first target facial image by using a facial feature point detection algorithm.
  • An area image acquisition unit 14 is configured to divide a first target face image according to a first target face feature point to obtain an eye area image, a lip area image, and a head area image.
  • the test recognition result acquisition unit 15 is configured to input an eye area image into an eye recognition model to obtain a test eye recognition result; input a lip area image into a lip recognition model to obtain a test lip recognition result; The head region image is input into the head recognition model, and the test head recognition result is obtained.
  • the first recognition result obtaining unit 16 is configured to match the test eye recognition result, test lip recognition result, and test head recognition result with a standard expression recognition result set, and use the standard expression recognition result with a matching degree greater than a preset threshold as An expression result is output. If the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
  • the classroom monitoring device further includes a reference point position acquisition module, a third recognition result acquisition module, and a third prompt message sending module.
  • the reference point position acquisition module is configured to acquire the reference point position information in real time, and acquire a third target face image according to the reference point position information.
  • a third recognition result acquisition module is configured to input a third target face image into an expression recognition model to obtain a third recognition result.
  • the third prompt message sending module is configured to send a third prompt message to the monitoring end if the third identification result is abnormal.
  • Each module in the above-mentioned classroom monitoring device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 8.
  • the computer device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer-readable instructions, and a database.
  • the internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile storage medium.
  • the database of the computer device is used to store the original video data, the first target face image, the second target face image, the third target face image, the focus attention queue, and the prompt message.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by a processor to implement a classroom monitoring method.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor performs the following steps:
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

A class monitoring method and apparatus, a computer device, and a storage medium. The class monitoring method comprises: regularly collecting a first target human face image according to a first time interval, and inputting the first target human face image into an expression recognition model to obtain a first recognition result (S10); if the first recognition result is abnormal, sending a first prompt message to a monitoring terminal (S20); adding the first target human face image with the abnormal first recognition result to a focus queue, wherein the focus queue comprises a focus person identifier (S30); regularly collecting a second target human face image according to a second time interval and the focus person identifier in the focus queue, and inputting the second target human face image into the expression recognition model to obtain a second recognition result (S40); and if the second recognition result is abnormal, sending a second prompt message to the monitoring terminal (S50). According to the class monitoring method, the amount of computation of data collection and recognition is reduced while ensuring the monitoring of a class condition, thereby improving overall efficiency.

Description

课堂监控方法、装置、计算机设备及存储介质Classroom monitoring method, device, computer equipment and storage medium
本申请以2018年8月2日提交的申请号为201810870693.8,名称为“课堂监控方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This application is based on a Chinese invention patent application filed on August 2, 2018 with an application number of 201810870693.8, entitled "Classroom Monitoring Method, Apparatus, Computer Equipment, and Storage Medium", and claims its priority.
技术领域Technical field
本申请属于图像处理领域,更具体地说,是涉及一种课堂监控方法、装置、计算机设备及存储介质。The present application belongs to the field of image processing, and more specifically, relates to a classroom monitoring method, device, computer equipment, and storage medium.
背景技术Background technique
在教学课堂中,当前老师上课时,有可能观察不到有的同学思想在开小差,没有认真听课,可能导致学生可能错过一些重要知识点。随着图像识别技术的飞速发展,目前也有技术在尝试利用表情识别技术来实时获取课堂中学生的表情,利用摄像头采集图像对摄像头里面的采集图像进行分析,如果有多个人脸,就对每个人脸的面部表情和动作都进行分析,判断出学生当前是否处于开小差的状态。然而,由于课堂持续时间较长,学生人数又较多,持续的数据采集和识别会造成数据量过大,导致整体效率不高。In the teaching classroom, when the current teacher is attending a class, he or she may not observe that some of his classmates are thinking badly, and if he does not listen carefully, he may cause students to miss some important knowledge points. With the rapid development of image recognition technology, there are currently technologies that try to use expression recognition technology to obtain the expressions of students in the classroom in real time, and use the camera to collect images to analyze the collected images in the camera. If there are multiple faces, each face is Both facial expressions and movements are analyzed to determine whether the student is currently in a state of being in a bad position. However, due to the long duration of the classroom and the large number of students, continuous data collection and identification will cause excessive data volume, resulting in low overall efficiency.
发明内容Summary of the invention
本申请实施例提供一种课堂监控方法、装置、计算机设备及存储介质,以解决由于数据量过大导致在识别学生上课状态的效率不高的技术问题。The embodiments of the present application provide a classroom monitoring method, device, computer equipment, and storage medium to solve the technical problem of low efficiency in identifying a student's class status due to an excessive amount of data.
一种课堂监控方法,包括:A classroom monitoring method includes:
根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
若所述第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;
根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;
若所述第二识别结果为异常,则发送第二提示消息至所述监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
一种课堂监控装置,包括:A classroom monitoring device includes:
第一识别结果获取模块,用于根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;A first recognition result acquisition module, configured to regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
第一提示消息发送模块,用于若所述第一识别结果为异常,则发送第一提示消息至监控 端;A first prompt message sending module, configured to send a first prompt message to a monitoring terminal if the first identification result is abnormal;
重点关注队列添加模块,用于将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;The focus attention queue adding module is configured to add the first target face image with an abnormality in the first recognition result to a focus attention queue, where the focus attention queue includes an attention person identification;
第二识别结果获取模块,用于根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;A second recognition result acquisition module, configured to regularly collect a second target face image according to a second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image to the expression In the recognition model, a second recognition result is obtained;
第二提示消息发送模块,用于若所述第二识别结果为异常,则发送第二提示消息至所述监控端。A second prompt message sending module is configured to send a second prompt message to the monitoring terminal if the second identification result is abnormal.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
若所述第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;
根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;
若所述第二识别结果为异常,则发送第二提示消息至所述监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
若所述第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;
根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;
若所述第二识别结果为异常,则发送第二提示消息至所述监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below, and other features and advantages of the present application will become apparent from the description, the drawings, and the claims.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得 其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only for the present application. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained according to these drawings without paying creative labor.
图1是本申请一实施例中课堂监控方法的一应用环境示意图;1 is a schematic diagram of an application environment of a classroom monitoring method according to an embodiment of the present application;
图2是本申请一实施例中课堂监控方法的一流程图;2 is a flowchart of a classroom monitoring method according to an embodiment of the present application;
图3是本申请一实施例中课堂监控方法的另一流程图;3 is another flowchart of a classroom monitoring method according to an embodiment of the present application;
图4是本申请一实施例中课堂监控方法的另一流程图;4 is another flowchart of a classroom monitoring method according to an embodiment of the present application;
图5是本申请一实施例中课堂监控方法的另一流程图;5 is another flowchart of a classroom monitoring method according to an embodiment of the present application;
图6是本申请一实施例中课堂监控装置的一原理框图;6 is a schematic block diagram of a classroom monitoring device according to an embodiment of the present application;
图7是本申请一实施例中课堂监控装置中第一识别结果获取模块的一原理框图;7 is a principle block diagram of a first recognition result acquisition module in a classroom monitoring apparatus according to an embodiment of the present application;
图8是本申请一实施例中计算机设备的一示意图。FIG. 8 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following, the technical solutions in the embodiments of the present application will be clearly and completely described with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
本申请提供的课堂监控方法,可应用在如图1的应用环境中,其中,监控端通过网络与服务端进行通信,服务端通过摄像机根据第一时间间隔定时采集第一目标人脸图像,将获取到的第一目标人脸图像输入到表情识别模型中,得到第一识别结果,若第一识别结果为异常,则服务端向监控端发送第一提示消息,并将第一识别结果为异常的第一目标人脸图像添加到重点关注队列中,其中,重点关注队列包括关注人物标识;接着服务端通过摄像机根据第二时间间隔和重点关注队列中的关注人物标识定时采集第二目标人脸图像,将获取到的第二目标人脸图像输入到表情识别模型中,得到第二识别结果,若第二识别结果为异常,则服务端向监控端发送第二提示消息。其中,监控端可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务端可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The classroom monitoring method provided by this application can be applied in the application environment as shown in FIG. 1, wherein the monitoring end communicates with the server through the network, and the server collects the first target face image through the camera at a first time interval, and The obtained first target face image is input into an expression recognition model to obtain a first recognition result. If the first recognition result is abnormal, the server sends a first prompt message to the monitoring end, and the first recognition result is abnormal. The first target face image is added to the focus queue, where the focus queue includes the person's identity; then the server collects the second target face through the camera according to the second time interval and the person's identity in the focus group. For the image, the obtained second target face image is input into an expression recognition model to obtain a second recognition result. If the second recognition result is abnormal, the server sends a second prompt message to the monitoring end. Among them, the monitoring end may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented by an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种课堂监控方法,以该方法应用在图1中的服务端为例进行说明,包括如下步骤:In one embodiment, as shown in FIG. 2, a classroom monitoring method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:
S10:根据第一时间间隔定时采集第一目标人脸图像,将第一目标人脸图像输入到表情识别模型中,得到第一识别结果。S10: Collect a first target face image according to a first time interval, input the first target face image into an expression recognition model, and obtain a first recognition result.
其中,第一时间间隔是一个预设的时间段,具体可以根据实际需要设置,例如5分钟、8分钟或者10分钟。而第一目标人脸图像是指整个班所有学生的人脸图像。可以通过在教室中设置多个摄像头,每一摄像头采集一固定区域中的学生的人脸图像,以实现对第一目标人脸图像的采集。可以理解地,摄像头个数越多,采集精度就越高。可选地,摄像头采集的是视频数据,而人脸图像为视频数据通过预定的帧率进行分帧之后获得。The first time interval is a preset time period, and can be specifically set according to actual needs, such as 5 minutes, 8 minutes, or 10 minutes. The first target face image refers to the face images of all students in the entire class. A plurality of cameras can be set in the classroom, and each camera collects a face image of a student in a fixed area, so as to realize the collection of the first target face image. Understandably, the greater the number of cameras, the higher the acquisition accuracy. Optionally, the camera collects video data, and the face image is obtained after the video data is framed by a predetermined frame rate.
表情识别模型是用于判断当前图像中人脸情绪的识别模型,该表情识别模型可以判断当 前图像中人脸对应于预设的多种情绪的概率值,若某种情绪的概率值超过对应的预设阈值,则得到该标准人脸图像对应的情绪即为第一识别结果。例如,在本实施例中,针对于课堂上的情景,可以将表情识别模型中的情绪设定为倾听、疑惑、理解、抗拒和不屑5种。具体地,可以预先采集分别代表这五种情绪的大量样本图像进行标注,形成样本图像集,然后选择对应的神经网络模型或者分类器进行训练,最终得到表情识别模型。The expression recognition model is a recognition model for judging the emotions of faces in the current image. The expression recognition model can judge the probability values of the faces in the current image corresponding to a variety of emotions. If the probability value of a certain emotion exceeds the corresponding By presetting the threshold, the emotion corresponding to the standard face image is the first recognition result. For example, in this embodiment, for the situation in the classroom, the emotions in the expression recognition model can be set to five types: listening, doubting, understanding, resisting, and disdain. Specifically, a large number of sample images representing the five emotions can be collected in advance to form a sample image set, and then a corresponding neural network model or classifier is selected for training, and an expression recognition model is finally obtained.
S20:若第一识别结果为异常,则发送第一提示消息至监控端。S20: If the first recognition result is abnormal, send a first prompt message to the monitoring end.
在表情识别模型输出第一识别结果之后,根据输出的第一识别结果来发送第一提示消息。After the expression recognition model outputs a first recognition result, a first prompt message is sent according to the output first recognition result.
其中,表情识别模型会判断出第一目标人脸图像中面部区域对应的表情结果;例如倾听、疑惑、理解、抗拒或者不屑。可选地,当表情识别模型判断出第一目标人脸图像中面部区域对应的情绪为抗拒或者不屑时,对应输出的第一识别结果为异常;当表情识别模型判断出第一目标人脸图像中面部区域对应的情绪为倾听、疑惑或者理解时,对应输出的第一识别结果为正常。若第一识别结果为异常,则发送第一提示消息至监控端,提示该学生可能处于开小差状态。可选地,监控端为教师或者班主任等其他相关人员的手机或者电脑或者其它通讯设备,使相关人员了解该学生的上课状态,从而相应地作出评价或者制止等行为。The expression recognition model will determine the expression result corresponding to the facial area in the first target face image; for example, listening, doubting, understanding, resisting, or disdain. Optionally, when the expression recognition model determines that the emotion corresponding to the facial area in the first target face image is resistive or disdain, the corresponding output first recognition result is abnormal; when the expression recognition model determines the first target face image When the emotion corresponding to the middle facial area is listening, doubting, or understanding, the first recognition result corresponding to the output is normal. If the first recognition result is abnormal, a first prompt message is sent to the monitoring terminal to notify the student that the student may be in a state of opening error. Optionally, the monitoring end is a mobile phone or a computer or other communication device of a teacher, a class teacher, or other relevant person, so that the relevant person can understand the class status of the student, and accordingly make evaluations or stop actions.
S30:将第一识别结果为异常的第一目标人脸图像添加到重点关注队列中,重点关注队列包括关注人物标识。S30: Add the first target face image whose first recognition result is abnormal to the focus attention queue, and the focus attention queue includes the attention person identification.
其中,重点关注队列是指整个班所有学生中有可能处于开小差状态,需要重点关注的学生组成的队列,具体地,该队列可以通过关注人物标识来体现。关注人物标识是指用于区分需要重点关注的学生的标识。例如,可以为学号、身份证号或者座位号等。Among them, the focus attention queue refers to a queue composed of students who may be in a state of being in a state of being in a state of small gaps. Specifically, the queue may be embodied by focusing on the identity of the person. The follower logo is a logo used to distinguish students who need to focus on. For example, it can be a student number, an ID number, or a seat number.
S40:根据第二时间间隔和重点关注队列中的关注人物标识定时采集第二目标人脸图像,将第二目标人脸图像输入到表情识别模型中,得到第二识别结果。S40: Collect a second target face image at regular intervals according to the second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result.
其中,第二时间间隔也是一个预设的时间段,具体可以根据实际需要设置,例如1分钟、3分钟或者5分钟。优选地,第二时间间隔小于第一时间间隔。第二目标人脸图像是指关注人物标识对应的学生的人脸图像,具体地,可以通过关注人物标识找到对应的摄像头,并从该摄像头中获取到第二目标人脸图像。其中,采集第二目标人脸图像和步骤S10中采集第一目标人脸图像的方法类似,在此不再赘述。而将第二目标人脸图像输入到表情识别模型中,得到第二识别结果的过程也和步骤S10中的对应步骤类似,在此也不再赘述。The second time interval is also a preset time period, which can be specifically set according to actual needs, such as 1 minute, 3 minutes, or 5 minutes. Preferably, the second time interval is smaller than the first time interval. The second target face image refers to the face image of the student corresponding to the person's identity. Specifically, the corresponding camera can be found by following the person's identity, and the second target face image is obtained from the camera. The method for acquiring the second target face image is similar to the method for acquiring the first target face image in step S10, and details are not described herein again. The second target face image is input into the expression recognition model, and the process of obtaining the second recognition result is similar to the corresponding step in step S10, which is not repeated here.
S50:若第二识别结果为异常,则发送第二提示消息至监控端。S50: If the second recognition result is abnormal, send a second prompt message to the monitoring end.
在表情识别模型输出第二识别结果之后,根据输出的第二识别结果来发送第二提示消息。其中,表情识别模型会判断出第二目标人脸图像中面部区域对应的情绪,例如倾听、疑惑、理解、抗拒或者不屑。可选地,当表情识别模型判断出第二目标人脸图像中面部区域对应的情绪为抗拒或者不屑时,对应输出的第二识别结果为异常;当表情识别模型判断出第二目标人脸图像中面部区域对应的情绪为倾听、疑惑或者理解时,对应输出的第二识别结果为正常。若第二识别结果为异常,则发送第二提示消息至监控端,提示该学生可能处于开小差状态。After the expression recognition model outputs a second recognition result, a second prompt message is sent according to the output second recognition result. The expression recognition model may determine the emotion corresponding to the facial area in the second target face image, such as listening, doubting, understanding, resisting, or disdain. Optionally, when the expression recognition model determines that the emotion corresponding to the facial area in the second target face image is resistive or disdain, the corresponding output second recognition result is abnormal; when the expression recognition model determines the second target face image When the emotion corresponding to the middle facial area is listening, doubting or understanding, the second recognition result corresponding to the output is normal. If the second recognition result is abnormal, a second prompt message is sent to the monitoring terminal to indicate that the student may be in a state of opening error.
进一步地,将第二识别结果为异常的第二目标人脸图像添加到重点关注队列中。在一个 实施例中,在将第二识别结果为异常的第二目标人脸图像添加到重点关注队列中的同时,将第二识别结果为正常的第二目标人脸图像从重点关注队列中释放。可以理解,第一识别结果为异常的学生可能在某一时刻不够专心,但很快就恢复正常状态,这是可以接受的。通过对重点关注队列的更新,可以有效减少数据采集和识别的计算量,提高课堂监控的针对性。Further, a second target face image whose second recognition result is abnormal is added to the focus attention queue. In one embodiment, the second target face image whose second recognition result is abnormal is added to the focus attention queue while the second target face image whose second recognition result is normal is released from the focus attention queue. . It is understandable that a student whose first recognition result is abnormal may not be attentive at a certain moment, but will return to normal soon, which is acceptable. By updating the focus queue, you can effectively reduce the amount of data collection and identification calculations, and improve the relevance of classroom monitoring.
在图2对应的实施例中,通过根据第一时间间隔定时采集第一目标人脸图像,并将第一目标人脸图像输入到表情识别模型中,得到第一识别结果,若第一识别结果为异常,则发送第一提示消息至监控端;然后根据第一识别结果来调整重点关注队列,根据重点关注队列中的关注人物标识和第二时间间隔来定时采集第二目标人脸图像,并将第二目标人脸图像输入到表情识别模型中,得到第二识别结果,若第二识别结果为异常,则发送第二提示消息至监控端。通过采用一定的时间间隔采集图像,建立重点关注队列的方式来实现课堂监控,可以做到有针对性地进行监控,在保证很好地对课堂情况进行监控的同时减少了数据采集和识别的计算量,提高了整体效率。In the embodiment corresponding to FIG. 2, a first target face image is periodically collected according to a first time interval, and the first target face image is input into an expression recognition model to obtain a first recognition result. If it is abnormal, send a first prompt message to the monitoring end; then adjust the focus attention queue according to the first recognition result, and periodically collect a second target face image according to the identification of the person in the focus attention queue and the second time interval, and The second target face image is input into the expression recognition model to obtain a second recognition result. If the second recognition result is abnormal, a second prompt message is sent to the monitoring end. By using a certain time interval to collect images and establish a focus on the queue to achieve classroom monitoring, targeted monitoring can be achieved, while ensuring good monitoring of the classroom situation, while reducing data collection and identification calculations Volume, improving overall efficiency.
在一实施例中,表情识别模型为采用卷积神经网络训练得到的表情识别模型。其中,卷积神经网络(CNN)是一种多层神经网络,擅长处理图像尤其是大图像的相关机器学习问题。CNN的基本结构包括两层:卷积层和池化层。可选地,采用10层卷积神经网络,由于神经网络的层数越多,计算时间越长,表情识别区别度较高,而采用10层卷积神经网络能够实现在较短时间内达到训练精度要求。在本实施例中,表情识别模型包括:眼部识别模型、唇部识别模型和头部识别模型。在步骤S10之前,即根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果的步骤之前,如图3所示,本申请实施例提供的课堂监控方法还包括:In one embodiment, the expression recognition model is an expression recognition model trained using a convolutional neural network. Among them, Convolutional Neural Network (CNN) is a multilayer neural network that is good at processing related machine learning problems of images, especially large images. The basic structure of CNN includes two layers: a convolutional layer and a pooling layer. Optionally, a 10-layer convolutional neural network is used. Because the more layers of the neural network, the longer the calculation time and the higher the degree of expression recognition, the 10-layer convolutional neural network can achieve training in a short time. Precision requirements. In this embodiment, the expression recognition model includes: an eye recognition model, a lip recognition model, and a head recognition model. Before step S10, that is, the first target face image is regularly collected according to the first time interval, and the first target face image is input into the expression recognition model, and before the step of obtaining the first recognition result, as shown in FIG. 3 The classroom monitoring method provided in the embodiment of the present application further includes:
S61:获取人脸图像的样本,根据人脸图像的样本获取眼部训练图像、唇部训练图像和头部训练图像。S61: Obtain a sample of a face image, and obtain an eye training image, a lip training image, and a head training image according to the sample of the face image.
其中,人脸图像的样本可以为预先采集的学生的人脸图像。The sample of the face image may be a face image of a student collected in advance.
具体地,可以根据人脸图像预先划分相应的区域分别作为眼部区域图像、唇部区域图像和头部区域图像,然后将从样本集中获得的眼部区域图像、唇部区域图像和头部区域图像的集合分别作为眼部训练图像、唇部训练图像和头部训练图像。其中,划分区域的方法可以根据需要进行预先设定,本申请实施例不做具体限定。Specifically, the corresponding regions can be divided in advance according to the face image as the eye region image, lip region image, and head region image, and then the eye region image, lip region image, and head region obtained from the sample set are respectively The collection of images is used as the eye training image, the lip training image, and the head training image, respectively. The method for dividing the area may be preset according to requirements, and is not specifically limited in the embodiment of the present application.
S62:将眼部训练图像输入到眼部识别模型中,得到标准眼部识别结果。S62: The eye training image is input into an eye recognition model, and a standard eye recognition result is obtained.
其中,标准眼部识别结果是指眼部识别模型根据眼部训练图像中的眼部特征识别出来的结果。可选地,可以将眼部训练图像中的眉毛特征作为眼部特征,根据倾听、疑惑、理解、抗拒和不屑这五种预设的表情进行标准定义,例如,倾听表情对应的眉毛特征应是眉毛自然伸展,眉毛两端与眼睛中点的夹角β小于或者等于120度。The standard eye recognition result refers to a result recognized by the eye recognition model according to eye features in the eye training image. Optionally, the eyebrow feature in the eye training image may be used as the eye feature, and the standard definition is based on five preset expressions of listening, doubt, understanding, resistance, and disdain. For example, the eyebrow characteristics corresponding to the listening expression should be The eyebrows stretch naturally, and the angle β between the ends of the eyebrows and the midpoint of the eye is less than or equal to 120 degrees.
具体地,通过将眼部训练图像按照眉毛特征进行分类,将分类后的眼部训练图像输入到眼部识别模型中,得到倾听、疑惑、理解、抗拒和不屑这五种表情相对应的眉毛特征,这五种相对应的眉毛特征和表情结合的结果就可以作为标准眼部识别结果。Specifically, by classifying eye training images according to eyebrow features, and inputting the classified eye training images into an eye recognition model, eyebrow features corresponding to five expressions of listening, doubting, understanding, resisting, and disdain are obtained. The combination of the five corresponding eyebrow features and expressions can be used as the standard eye recognition result.
S63:将唇部训练图像输入到唇部识别模型中,得到标准唇部识别结果。S63: The lip training image is input into a lip recognition model to obtain a standard lip recognition result.
其中,标准唇部识别结果是指唇部识别模型根据唇部训练图像中的唇部特征识别出来的结果。可选地,可以将唇部训练图像中的嘴角特征作为唇部特征,例如,当表情为疑惑的表情时,对应的嘴角特征应是嘴角下移,即嘴角特征线与其平分线呈现负偏差。The standard lip recognition result refers to a result recognized by the lip recognition model according to lip features in the lip training image. Optionally, the mouth corner feature in the lip training image may be used as the lip feature. For example, when the expression is a doubtful expression, the corresponding mouth corner feature shall be the mouth corner moving downward, that is, the mouth corner feature line and its bisector line show a negative deviation.
具体地,根据倾听、疑惑、理解、抗拒和不屑这五种预设的表情进行标准定义,通过将唇部训练图像按照嘴角特征进行分类,将分类后的唇部训练图像输入到唇部识别模型中,得到倾听、疑惑、理解、抗拒和不屑这五种表情相对应的嘴角特征,这五种相对应的嘴角特征和表情结合的结果就可以作为标准唇部识别结果。Specifically, the standard definition is based on five preset expressions of listening, doubt, understanding, resistance, and disdain. By classifying the lip training images according to the corner features of the lips, the classified lip training images are input to the lip recognition model. In the analysis, the mouth corner features corresponding to the five expressions of listening, doubt, understanding, resistance, and disdain are obtained, and the combination of the five corresponding mouth corner features and expressions can be used as the standard lip recognition result.
S64:将头部训练图像输入到头部识别模型中,得到标准头部识别结果。S64: The head training image is input into a head recognition model to obtain a standard head recognition result.
其中,标准头部识别结果是指头部识别模型根据头部训练图像中的头部特征识别出来的结果。可选地,可以将头部训练图像中的头部转角特征作为头部特征,例如,将表情为抗拒时,头部转角特征应是低头或者扭头,头部角度变化α大于60度。The standard head recognition result refers to a result recognized by the head recognition model according to the head features in the head training image. Optionally, the head corner feature in the head training image may be used as the head feature. For example, when the expression is resisted, the head corner feature should be a bow or a twist, and the head angle change α is greater than 60 degrees.
具体地,根据倾听、疑惑、理解、抗拒和不屑这五种表情进行标准定义,通过将头部训练图像按照嘴角特征进行分类,将分类后的头部训练图像输入到头部识别模型中,得到倾听、疑惑、理解、抗拒和不屑这五种表情相对应的嘴角特征,这五种相对应的头部转角特征和表情结合的结果就可以作为标准头部识别结果。Specifically, the standard definition is based on the five expressions of listening, doubt, understanding, resistance, and disdain. By classifying the head training image according to the features of the mouth corners, the classified head training image is input into the head recognition model, and The mouth corner features corresponding to the five expressions of listening, doubting, understanding, resisting and disdain. The combination of the five corresponding head corner features and expressions can be used as the standard head recognition result.
S65:将标准眼部识别结果、标准唇部识别结果和标准头部识别结果组成标准表情识别结果集。S65: The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into a standard expression recognition result set.
具体地,将标准眼部识别结果、标准唇部识别结果和标准头部识别结果分别对应的特征组成标准表情识别结果集,例如,当表情为疑惑时,对应的眼部特征是眉毛两端与眼睛中点的夹角β大于120度,对应的唇部特征是嘴角特征线与其平分线呈现负偏差,对应的头部特征是头部角度变化在0到60度之间。将所有的标准眼部识别结果、标准唇部识别结果和标准头部识别结果组合在一起就形成了标准表情识别结果集。Specifically, the features corresponding to the standard eye recognition result, standard lip recognition result, and standard head recognition result are respectively composed into a standard expression recognition result set. For example, when the expression is doubtful, the corresponding eye characteristics are the two ends of the eyebrows and The angle β of the midpoint of the eye is greater than 120 degrees. The corresponding lip feature is a negative deviation between the mouth corner feature line and its bisector. The corresponding head feature is that the head angle changes between 0 and 60 degrees. By combining all standard eye recognition results, standard lip recognition results, and standard head recognition results, a standard expression recognition result set is formed.
在图3对应的实施例中,通过根据人脸图像的样本集获取眼部训练图像、唇部训练图像和头部训练图像,然后分别将眼部训练图像输入到眼部识别模型中,得到标准眼部识别结果;将唇部训练图像输入到唇部识别模型中,得到标准唇部识别结果;将头部训练图像输入到头部识别模型中,得到标准头部识别结果;最后将标准眼部识别结果、标准唇部识别结果和标准头部识别结果组成标准表情识别结果集,可以为后续采集的目标人脸图像的表情识别提供支持。In the embodiment corresponding to FIG. 3, an eye training image, a lip training image, and a head training image are obtained according to a sample set of a face image, and then the eye training images are respectively input into an eye recognition model to obtain a standard Eye recognition results; input the lip training image into the lip recognition model to get the standard lip recognition result; input the head training image into the head recognition model to get the standard head recognition result; finally, standard eye The recognition result, standard lip recognition result, and standard head recognition result form a standard expression recognition result set, which can provide support for expression recognition of the target face image collected subsequently.
在一实施例中,如图4所示,在步骤S10中,即根据第一时间间隔定时采集第一目标人脸图像,具体可以包括以下步骤:In an embodiment, as shown in FIG. 4, in step S10, that is, the first target face image is periodically collected according to a first time interval, and specifically includes the following steps:
S11:根据第一时间间隔定时获取原始视频数据。S11: Obtain the original video data according to the first time interval.
其中,原始视频数据是教室中摄像头采集的视频数据。若摄像头为复数个,则可以根据第一目标人脸图像对应的标识来获取到对应摄像头的原始视频数据。Among them, the original video data is video data collected by cameras in the classroom. If there are a plurality of cameras, the original video data of the corresponding cameras can be obtained according to the identifier corresponding to the first target face image.
具体地,打开教室的摄像头采集视频,按照第一时间间隔获取原始视频数据。Specifically, the camera of the classroom is opened to collect video, and the original video data is acquired at a first time interval.
S12:对原始视频数据进行分帧和归一化处理,得到第一目标人脸图像。S12: Frame and normalize the original video data to obtain a first target face image.
具体地,分帧处理是指按照预设时间对原始视频数据进行划分,以获取至少一帧的待识别视频图像。其中,归一化是一种简化计算的方式,即将有量纲的表达式经过变换,转化为无量纲的表达式,成为标量。例如本实施例中的原始视频数据中,需要有第一目标的面部区域,才能提取对应的表情特征,因此需要将分帧后的待识别视频图像的像素归一化到统一的像素,例如260*260,得到第一目标人脸图像,以便后续对每一帧待识别视频图像进行识别。Specifically, the framing processing refers to dividing the original video data according to a preset time to obtain at least one frame of a video image to be identified. Among them, normalization is a way of simplifying calculations, that is, transforming dimensional expressions into dimensionless expressions and becoming scalars. For example, in the original video data in this embodiment, a facial area of the first target is required to extract corresponding facial features, so the pixels of the framed video image to be identified need to be normalized to uniform pixels, for example, 260 * 260 to obtain the first target face image for subsequent recognition of each frame of the video image to be identified.
在图4对应的实施例中,通过根据第一时间间隔定时获取原始视频数据,然后对原始视频数据进行分帧和归一化处理,得到第一目标人脸图像,可以为后续人脸图像对应的表情进行识别提供支持。In the embodiment corresponding to FIG. 4, by obtaining the original video data periodically according to a first time interval, and then framing and normalizing the original video data, a first target face image is obtained, which can correspond to subsequent face images. Provides support for facial expression recognition.
在一实施例中,标准表情识别结果集包括倾听、疑惑、理解、抗拒和不屑五种表情结果,如图5所示,在步骤S10中,即将第一目标人脸图像输入到表情识别模型中,得到第一识别结果,具体可以包括以下步骤:In one embodiment, the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting, and disdain. As shown in FIG. 5, in step S10, the first target face image is input into the expression recognition model. To obtain the first recognition result, which may specifically include the following steps:
S13:采用人脸特征点检测算法从第一目标人脸图像中获取第一目标人脸特征点。S13: Use a face feature point detection algorithm to obtain a first target face feature point from a first target face image.
其中,第一目标人脸特征点是指按照预设的需要从第一目标人脸图像获取得到的特征坐标点。人脸特征点检测算法是指根据输入的人脸图像自动定位出人脸特征点的算法。可选地,可以采用以下人脸特征点检测算法获取人脸特征点信息:The first target face feature point refers to a feature coordinate point obtained from the first target face image according to a preset requirement. The facial feature point detection algorithm refers to an algorithm that automatically locates facial feature points based on the input facial image. Optionally, the following facial feature point detection algorithms may be adopted to obtain facial feature point information:
(1)OpenCV自带的基于Harr特征的Viola-Jones算法;(1) OpenCV's own Viola-Jones algorithm based on Harr features;
其中,OpenCV是一个跨平台计算机视觉库,可以运行在Linux、Windows、Android和Mac OS操作系统上,由一系列C函数和少量C++类构成,同时提供了Python、Ruby、MATLAB等语言的接口,实现了图像处理和计算机视觉方面的很多通用算法,而基于Harr特征的Viola-Jones算法是其中一种人脸特征点检测算法。Haar特征是一种反映图像的灰度变化的特征,是反映像素分模块差值的一种特征。Haar特征分为三类:边缘特征、线性特征和中心-对角线特征。Viola-Jones算法是基于人脸的haar特征值进行人脸检测的方法。Among them, OpenCV is a cross-platform computer vision library that can run on Linux, Windows, Android, and Mac OS operating systems. It consists of a series of C functions and a small number of C ++ classes. It also provides interfaces for languages such as Python, Ruby, and MATLAB. Many general algorithms in image processing and computer vision have been implemented, and the Viola-Jones algorithm based on Harr features is one of facial feature point detection algorithms. Haar feature is a feature that reflects the gray change of the image, and is a feature that reflects the pixel sub-module difference. Haar features are divided into three categories: edge features, linear features, and center-diagonal features. The Viola-Jones algorithm is a method for face detection based on haar feature values of a face.
(2)基于HOG+SVM特征的dlib;(2) dlib based on HOG + SVM features;
其中,dlib是一个现代化的C++工具箱,其中包含用于在C++中创建复杂软件以解决实际问题的机器学习算法和工具,HOG是指方向梯度直方图(Histogram of Oriented Gradient,HOG),SVM(Support Vector Machine)指的是支持向量机,是常见的一种判别方法,通常用来进行模式识别、分类以及回归分析,HOG特征结合SVM分类器被广泛应用于图像识别中。Among them, dlib is a modern C ++ toolbox, which contains machine learning algorithms and tools for creating complex software in C ++ to solve practical problems. HOG is Histogram of Oriented Gradient (HOG), SVM ( Support Vector Machine (Machine) refers to a support vector machine, which is a common discrimination method. It is usually used for pattern recognition, classification, and regression analysis. HOG features combined with SVM classifiers are widely used in image recognition.
(3)doppia库的三种人脸检测方法(DPM、HeadHunter和HeadHunter_baseline)。(3) Three face detection methods (DPM, HeadHunter and HeadHunter_baseline) from the doppia library.
其中,DPM(Deformable Part Model)是一个目标检测算法,目前已成为众多分类器、分割、人体姿态和行为分类的重要部分。DPM可以看做是HOG的扩展,方法是首先计算梯度方向直方图,然后用SVM训练得到目标梯度模型,再进行分类,从而使模型和目标匹配。而HeadHunter和HeadHunter_baseline算法与DPM在方法上是相同的,区别在于用到的模型不同。Among them, DPM (Deformable Part Model) is an object detection algorithm, which has become an important part of many classifiers, segmentation, human pose and behavior classification. DPM can be seen as an extension of HOG. The method is to first calculate the gradient direction histogram, and then use SVM to obtain the target gradient model, and then classify it so that the model matches the target. The HeadHunter and HeadHunter_baseline algorithms are the same in method as DPM, the difference is that the models used are different.
以下用第(1)种人脸特征点检测算法为例说明获取第一目标人脸特征点的过程:首先获取输入的人脸图像的样本图像,对样本图像进行预处理(归一化)后进行训练,得到人脸特征点模型,即Harr特征的Viola-Jones算法;然后获取输入的第一目标人脸图像,对第一目标人脸图像进行同样预处理,接着依次进行肤色区域分割、人脸特征区域分割和人脸特征区域分类的步骤,最后根据Harr特征的Viola-Jones算法与人脸特征区域分类进行匹配计算,得到第一目标人脸特征点。The following uses the (1) face feature point detection algorithm as an example to illustrate the process of obtaining the first target face feature point: First, a sample image of the input face image is obtained, and the sample image is preprocessed (normalized). Perform training to obtain a facial feature point model, that is, the Viola-Jones algorithm of Harr features; then obtain the input first target face image, perform the same preprocessing on the first target face image, and then sequentially perform skin color region segmentation, and human Steps of face feature region segmentation and face feature region classification. Finally, the Viola-Jones algorithm based on Harr features is used to perform matching calculations with face feature region classification to obtain the first target face feature point.
S14:根据第一目标人脸特征点划分所述第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像。S14: Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image.
根据获取到的第一目标人脸特征点,来分别获取眼部区域图像、唇部区域图像和头部区域图像,具体地:According to the obtained first target facial feature points, the eye area image, the lip area image, and the head area image are respectively obtained, specifically:
眼部区域图像可以采用人脸特征点检测算法先定位出同一眼睛区域中左右眼角和眉心的位置坐标,然后以左右眼角横坐标为左侧坐标和右侧坐标,以眉心位置纵坐标为上侧坐标,以眼角纵坐标加上眉心位置到眼角位置垂直距离为下侧坐标,以这四个点坐标构成的矩形区域为眼睛区域,获取该眼睛区域即得到眼部区域图像。The eye area image can use the facial feature point detection algorithm to first locate the left and right eye corner and eyebrow center coordinates in the same eye area, and then use the left and right eye corner horizontal coordinates as the left and right coordinates, and the eyebrow center ordinate as the upper side. The coordinates are the vertical coordinate of the corner of the eye plus the vertical distance from the position of the eyebrow center to the corner of the eye as the lower coordinate. The rectangular area formed by the coordinates of these four points is the eye area, and the eye area image is obtained by acquiring the eye area.
唇部区域图像可以采用人脸特征点先定位出左右嘴角和鼻尖位置坐标,然后以左右嘴角横坐标为左侧坐标和右侧坐标,以鼻尖位置纵坐标为上侧坐标,以嘴角纵坐标加上鼻尖位置到嘴角位置垂直距离为下侧坐标,以此四个坐标构成的矩形区域为唇部区域,获取该唇部区域即得到唇部区域图像。The lip area image can use facial feature points to first locate the left and right mouth corner and nose tip coordinates, and then use the left and right mouth corner abscissas as the left and right coordinates, take the nose tip ordinate as the upper coordinate, and add the mouth corner ordinate to The vertical distance from the tip of the upper nose to the corner of the mouth is the lower coordinate, and the rectangular area formed by these four coordinates is the lip area. The lip area image is obtained by acquiring the lip area.
头部区域图像可以直接采用第一目标人脸图像作为头部区域图像。The head area image may directly use the first target face image as the head area image.
S15:将眼部区域图像输入到眼部识别模型中,获取测试眼部识别结果;将唇部区域图像输入到唇部识别模型中,获取测试唇部识别结果;将头部区域图像输入到头部识别模型中,获取测试头部识别结果。S15: input the eye region image into the eye recognition model to obtain the test eye recognition result; input the lip region image into the lip recognition model to obtain the test lip recognition result; input the head region image to the head In the external recognition model, test head recognition results are obtained.
其中,测试眼部识别结果是指根据实时视频数据获取的目标人脸图像中对应的眼部识别结果,测试唇部识别结果与测试头部识别结果以此类推。The test eye recognition result refers to the corresponding eye recognition result in the target face image obtained according to the real-time video data, the test lip recognition result and the test head recognition result, and so on.
具体地,将眼部区域图像输入到眼部识别模型中,根据眼部识别模型定义的表情,可以得到眼部区域图像对应的表情,即为测试眼部识别结果。同样地,将唇部区域图像输入到唇部识别模型中,可以得到测试唇部识别结果;将头部区域图像输入到头部识别模型中,可以得到测试头部识别结果。Specifically, the eye region image is input into the eye recognition model, and according to the expression defined by the eye recognition model, the expression corresponding to the eye region image can be obtained, which is the test eye recognition result. Similarly, the lip area image is input to the lip recognition model to obtain the test lip recognition result; the head area image is input to the head recognition model to obtain the test head recognition result.
S16:将测试眼部识别结果、测试唇部识别结果和测试头部识别结果与标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果,若输出表情结果为倾听、疑惑、理解,则第一识别结果为正常;若输出表情结果为抗拒或不屑,则第一识别结果为异常。S16: Match the test eye recognition result, test lip recognition result, and test head recognition result with the standard expression recognition result set, and use the standard expression recognition result with a matching degree greater than a preset threshold as the output expression result. For listening, doubting, understanding, the first recognition result is normal; if the output expression result is resist or disdain, the first recognition result is abnormal.
其中,预设阈值是指测试识别结果与标准表情识别结果匹配度的预设阈值,预设阈值可以根据表情识别模型的具体情况进行设置,这里不做具体的限定。可以理解地,测试眼部识别结果、测试唇部识别结果和测试头部识别结分别对应的表情识别结果可能是不一致的,其 与标准表情识别结果也有差别,因此,将测试识别结果与标准表情识别结果集的每一标准表情识别结果进行匹配,若匹配度超过预设阈值,则将与对应的标准表情识别结果作为输出表情结果。The preset threshold is a preset threshold for the degree of matching between the test recognition result and the standard expression recognition result. The preset threshold can be set according to the specific situation of the expression recognition model, and is not specifically limited here. Understandably, the expression recognition results corresponding to the test eye recognition results, test lip recognition results, and test head recognition knots may be inconsistent, which are also different from the standard expression recognition results. Therefore, the test recognition results are compared with the standard expressions. Each standard expression recognition result of the recognition result set is matched, and if the degree of matching exceeds a preset threshold, then the corresponding standard expression recognition result is used as the output expression result.
具体地,将测试眼部识别结果、测试唇部识别结果和测试头部识别结果与标准表情识别结果集进行匹配,若整体与标准表情识别结果集中的某一标准表情识别结果的匹配结果大于预设阈值,则将该标准识别结果作为输出表情结果。可选地,当输出表情结果为倾听、疑惑、理解,则第一识别结果为正常;当输出表情结果为抗拒和不屑,则第一识别结果为异常。例如,若预设阈值为80%,对某个学生的测试眼部识别结果、测试唇部识别结果和测试头部识别结果与标准表情识别结果集进行匹配后发现,匹配的结果与标准表情识别结果集中的不屑表情匹配度超过预设阈值80%,则将不屑的标准表情识别结果作为输出表情结果,此时第一识别结果为异常。Specifically, the test eye recognition result, test lip recognition result, and test head recognition result are matched with the standard facial expression recognition result set. If the overall matching result of a standard facial expression recognition result in the standard facial expression recognition result set is greater than If a threshold is set, the standard recognition result is used as the output expression result. Optionally, when the output expression result is listening, doubting, understanding, the first recognition result is normal; when the output expression result is resisting and disdain, the first recognition result is abnormal. For example, if the preset threshold is 80%, a student's test eye recognition result, test lip recognition result, and test head recognition result are matched with the standard expression recognition result set and found. The matched result matches the standard expression recognition The matching expression of the disdain expression in the result set exceeds a preset threshold of 80%, then the standard expression recognition result of the disdain is used as the output expression result, and the first recognition result is abnormal at this time.
在图5对应的实施例中,通过采用人脸特征点检测算法从第一目标人脸图像中获取第一目标人脸特征点,然后根据第一目标人脸特征点划分第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像;再分别将眼部区域图像、唇部区域图像和头部区域图像输入到眼部识别模型、唇部识别模型和头部识别模型中,得到测试眼部识别结果、测试唇部识别结果和测试头部识别结果,最后将测试眼部识别结果、测试唇部识别结果和测试头部识别结果与标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果,若输出表情结果为倾听、疑惑、理解,则第一识别结果为正常;若输出表情结果为抗拒或不屑,则第一识别结果为异常。通过三个识别模型来实现表情识别模型,可以使表情识别模型识别的精度提升,提高课堂监控的效率。In the embodiment corresponding to FIG. 5, a first target face feature point is obtained from a first target face image by using a face feature point detection algorithm, and then the first target face image is divided according to the first target face feature point. To obtain the eye area image, lip area image, and head area image; and then input the eye area image, lip area image, and head area image to the eye recognition model, lip recognition model, and head recognition model, respectively The test eye recognition result, test lip recognition result, and test head recognition result are obtained. Finally, the test eye recognition result, test lip recognition result, and test head recognition result are matched with a standard expression recognition result set. The standard expression recognition result with a matching degree greater than a preset threshold is used as the output expression result. If the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal. Implementing the facial recognition model through three recognition models can improve the accuracy of facial recognition model recognition and improve the efficiency of classroom monitoring.
在一实施例中,本申请实施例提供的课堂监控方法还包括:In an embodiment, the classroom monitoring method provided in the embodiment of the present application further includes:
实时获取参照点位置信息,根据参照点位置信息采集第三目标人脸图像;然后将第三目标人脸图像输入到表情识别模型中,得到第三识别结果,若第三识别结果为异常,则发送第三提示消息至监控端。Obtain the reference point position information in real time, collect the third target face image according to the reference point position information; then input the third target face image into the expression recognition model to obtain a third recognition result. If the third recognition result is abnormal, then Send a third prompt message to the monitoring end.
其中,参照点位置信息指教师的位置信息,可以通过获取教师身上的手机的GPS定位信息来获取教师的位置信息,或者通过摄像头对教师进行跟踪采集,定位出教师的位置信息。The reference point position information refers to the position information of the teacher. The position information of the teacher can be obtained by obtaining the GPS positioning information of the mobile phone of the teacher, or the teacher can be tracked and collected through the camera to locate the position information of the teacher.
可以理解地,当教师处于讲台区域并面对学生时,因为全班学生都在老师的视线范围内,故不采集第三目标人脸图像。而当教师在走下讲台,视线出现盲区或者教师离开课室时,采集第三目标人脸图像。Understandably, when the teacher is in the podium area and faces the students, because the entire class of students are within the teacher's line of sight, the third target face image is not collected. When the teacher walks off the podium, a blind spot appears or the teacher leaves the classroom, the third target face image is collected.
具体地,可以预先建立位置信息映射表,根据该位置信息映射表,可以根据教师的实时位置信息,在位置信息映射表中获取到要采集的对应的第三目标人脸图像。例如,当教师站在教室的左侧时,将位于教室右侧的学生作为需要进行第三目标人脸图像采集的对象;当教师站在教室的右侧时,将位于教室左侧的学生作为需要进行第三目标人脸图像采集的对象。具体位置信息映射表的设置可以根据实际需要和场景来设置,在此不做限定。然后将采集到的第三目标人脸图像输入到表情识别模型中,得到第三识别结果,若第三识别结果为异常, 则发送第三提示消息至监控端。这个识别和消息发送过程和前述实施例的对应步骤类似,在此不再赘述。Specifically, a position information mapping table may be established in advance, and according to the position information mapping table, a corresponding third target face image to be collected may be acquired from the position information mapping table according to the real-time position information of the teacher. For example, when the teacher is standing on the left side of the classroom, the student on the right side of the classroom is taken as the object that needs to be collected for the third target face image; when the teacher is standing on the right side of the classroom, the student on the left side of the classroom is taken Objects that need to be collected for the third target face image. The setting of the specific location information mapping table may be set according to actual needs and scenarios, and is not limited herein. Then, the collected third target face image is input into an expression recognition model to obtain a third recognition result. If the third recognition result is abnormal, a third prompt message is sent to the monitoring end. This identification and message sending process is similar to the corresponding steps of the foregoing embodiment, and is not repeated here.
进一步地,可以将第三识别结果为异常的第三目标人脸图像添加到重点关注队列中,其中,重点关注队列包括关注人物标识。应理解,将第三识别结果为异常的第三目标人脸图像添加到重点关注队列这一过程,与上述实施例中的将识别结果为异常的人脸图像添加到重点关注队列过程是并行的过程。可选地,可以将第三识别结果为异常的人脸图像也添加到重点关注队列中,实现对重点关注队列的更新,更有针对性地实现课堂效果的监控。Further, a third target face image whose third recognition result is abnormal may be added to the focus attention queue, where the focus attention queue includes the attention person identification. It should be understood that the process of adding a third target face image whose third recognition result is abnormal to the focus attention queue is parallel to the process of adding a face image whose recognition result is abnormal to the focus attention queue in the above embodiment. process. Optionally, a face image whose third recognition result is abnormal may also be added to the focus attention queue to implement the update of the focus attention queue and more specifically monitor the classroom effect.
在本实施例中,通过实时获取教师的位置信息,根据教师这一参照点的位置来采集第三目标人脸图像,可以减少视频数据的采集量;通过将第三目标人脸图像输入到表情识别模型中,得到第三识别结果,若第三识别结果为异常,则发送第三提示消息至监控端,可以更有针对性地获取学生在课堂的上课状态,发现开小差的学生,减少数据采集和识别的计算量,提高课堂监控的效率。In this embodiment, by acquiring the position information of the teacher in real time and collecting the third target face image according to the position of the teacher's reference point, the amount of video data can be reduced; by inputting the third target face image into the expression In the recognition model, a third recognition result is obtained. If the third recognition result is abnormal, a third prompt message is sent to the monitoring end, which can more specifically obtain the class status of the student in the classroom, find students who have a small difference, and reduce data collection. And the amount of calculations for identification, improving the efficiency of classroom monitoring.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
在一实施例中,提供一种课堂监控装置,该课堂监控装置与上述实施例中课堂监控方法一一对应。如图6所示,该课堂监控装置包括第一识别结果获取模块10、第一提示消息发送模块20、重点关注队列添加模块30、第二识别结果获取模块40和第二提示消息发送模块50。各功能模块详细说明如下:In one embodiment, a classroom monitoring device is provided. The classroom monitoring device corresponds to the classroom monitoring method in the foregoing embodiment. As shown in FIG. 6, the classroom monitoring apparatus includes a first recognition result obtaining module 10, a first prompt message sending module 20, a focus attention queue adding module 30, a second recognition result obtaining module 40, and a second prompt message sending module 50. The detailed description of each function module is as follows:
第一识别结果获取模块10,用于根据第一时间间隔定时采集第一目标人脸图像,将第一目标人脸图像输入到表情识别模型中,得到第一识别结果。A first recognition result acquisition module 10 is configured to periodically collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result.
第一提示消息发送模块20,用于若第一识别结果为异常,则发送第一提示消息至监控端。The first prompt message sending module 20 is configured to send a first prompt message to the monitoring end if the first identification result is abnormal.
重点关注队列添加模块30,用于将第一识别结果为异常的第一目标人脸图像添加到重点关注队列中,其中,重点关注队列包括关注人物标识。The focus attention queue adding module 30 is configured to add a first target face image whose first recognition result is abnormal to the focus attention queue, wherein the focus attention queue includes a person identification.
第二识别结果获取模块40,用于根据第二时间间隔和重点关注队列中的关注人物标识定时采集第二目标人脸图像,将第二目标人脸图像输入到表情识别模型中,得到第二识别结果。A second recognition result acquisition module 40 is configured to regularly collect a second target face image according to the second time interval and the identifier of the person of interest in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second Identify the results.
第二提示消息发送模块50,用于若第二识别结果为异常,则发送第二提示消息至监控端。The second prompt message sending module 50 is configured to send a second prompt message to the monitoring end if the second identification result is abnormal.
进一步地,表情识别模型为采用卷积神经网络训练得到的表情识别模型,包括:眼部识别模型、唇部识别模型和头部识别模型。课堂监控装置还包括标准结果集获取模块,标准结果集获取模块包括:训练图像获取单元、标准眼部识别结果获取单元、标准唇部识别结果获取单元、标准头部识别结果获取单元和标准结果集获取单元。Further, the expression recognition model is an expression recognition model trained using a convolutional neural network, and includes: an eye recognition model, a lip recognition model, and a head recognition model. The classroom monitoring device further includes a standard result set acquisition module. The standard result set acquisition module includes a training image acquisition unit, a standard eye recognition result acquisition unit, a standard lip recognition result acquisition unit, a standard head recognition result acquisition unit, and a standard result set. Get the unit.
训练图像获取单元,用于获取人脸图像的样本,根据人脸图像的样本获取眼部训练图像、唇部训练图像和头部训练图像。A training image obtaining unit is configured to obtain a sample of a face image, and obtain an eye training image, a lip training image, and a head training image according to the sample of the face image.
标准眼部识别结果获取单元,用于将眼部训练图像输入到眼部识别模型中,得到标准眼部识别结果。A standard eye recognition result acquisition unit is configured to input an eye training image into an eye recognition model to obtain a standard eye recognition result.
标准唇部识别结果获取单元,用于将唇部训练图像输入到唇部识别模型中,得到标准唇部识别结果。A standard lip recognition result acquisition unit is used to input a lip training image into a lip recognition model to obtain a standard lip recognition result.
标准头部识别结果获取单元,用于将头部训练图像输入到头部识别模型中,得到标准头部识别结果。A standard head recognition result acquisition unit is configured to input a head training image into a head recognition model to obtain a standard head recognition result.
标准结果集获取单元,用于将标准眼部识别结果、标准唇部识别结果和标准头部识别结果组成标准表情识别结果集。The standard result set acquisition unit is configured to combine a standard eye recognition result, a standard lip recognition result, and a standard head recognition result into a standard expression recognition result set.
进一步地,标准表情识别结果集包括倾听、疑惑、理解、抗拒和不屑五种表情结果,第一识别结果获取模块10包括:视频数据获取单元和第一目标人脸图像获取单元。Further, the standard expression recognition result set includes five expression results of listening, doubt, understanding, resistance, and disdain. The first recognition result acquisition module 10 includes a video data acquisition unit and a first target face image acquisition unit.
视频数据获取单元,用于根据第一时间间隔定时获取原始视频数据。The video data acquiring unit is configured to acquire the original video data at a timing according to the first time interval.
第一目标人脸图像获取单元,用于对原始视频数据进行分帧和归一化处理,得到第一目标人脸图像。The first target face image acquisition unit is configured to perform frame framing and normalization processing on the original video data to obtain a first target face image.
进一步地,如图7所示,第一识别结果获取模块10还包括:人脸特征点获取单元13、区域图像获取单元14、测试识别结果获取单元15和第一识别结果单元16。Further, as shown in FIG. 7, the first recognition result acquisition module 10 further includes a facial feature point acquisition unit 13, an area image acquisition unit 14, a test recognition result acquisition unit 15, and a first recognition result unit 16.
人脸特征点获取单元13,用于采用人脸特征点检测算法从第一目标人脸图像中获取第一目标人脸特征点。The facial feature point acquiring unit 13 is configured to acquire a first target facial feature point from a first target facial image by using a facial feature point detection algorithm.
区域图像获取单元14,用于根据第一目标人脸特征点划分第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像。An area image acquisition unit 14 is configured to divide a first target face image according to a first target face feature point to obtain an eye area image, a lip area image, and a head area image.
测试识别结果获取单元15,用于将眼部区域图像输入到眼部识别模型中,获取测试眼部识别结果;将唇部区域图像输入到唇部识别模型中,获取测试唇部识别结果;将头部区域图像输入到头部识别模型中,获取测试头部识别结果。The test recognition result acquisition unit 15 is configured to input an eye area image into an eye recognition model to obtain a test eye recognition result; input a lip area image into a lip recognition model to obtain a test lip recognition result; The head region image is input into the head recognition model, and the test head recognition result is obtained.
第一识别结果获取单元16,用于将测试眼部识别结果、测试唇部识别结果和测试头部识别结果与标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果,若输出表情结果为倾听、疑惑、理解,则第一识别结果为正常;若输出表情结果为抗拒或不屑,则第一识别结果为异常。The first recognition result obtaining unit 16 is configured to match the test eye recognition result, test lip recognition result, and test head recognition result with a standard expression recognition result set, and use the standard expression recognition result with a matching degree greater than a preset threshold as An expression result is output. If the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
进一步地,课堂监控装置还包括参照点位置获取模块、第三识别结果获取模块和第三提示消息发送模块。Further, the classroom monitoring device further includes a reference point position acquisition module, a third recognition result acquisition module, and a third prompt message sending module.
参照点位置获取模块,用于实时获取参照点位置信息,根据参照点位置信息采集第三目标人脸图像。The reference point position acquisition module is configured to acquire the reference point position information in real time, and acquire a third target face image according to the reference point position information.
第三识别结果获取模块,用于将第三目标人脸图像输入到表情识别模型中,得到第三识别结果。A third recognition result acquisition module is configured to input a third target face image into an expression recognition model to obtain a third recognition result.
第三提示消息发送模块,用于若第三识别结果为异常,则发送第三提示消息至监控端。The third prompt message sending module is configured to send a third prompt message to the monitoring end if the third identification result is abnormal.
关于课堂监控装置的具体限定可以参见上文中对于课堂监控方法的限定,在此不再赘述。上述课堂监控装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the classroom monitoring device, refer to the foregoing limitation on the classroom monitoring method, which is not repeated here. Each module in the above-mentioned classroom monitoring device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储原始视频数据、第一目标人脸图像、第二目标人脸图像、第三目标人脸图像、重点关注队列和提示消息等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种课堂监控方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 8. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer device is used to store the original video data, the first target face image, the second target face image, the third target face image, the focus attention queue, and the prompt message. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a classroom monitoring method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现以下步骤:In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor. When the processor executes the computer-readable instructions, the following steps are implemented:
根据第一时间间隔定时采集第一目标人脸图像,将第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
若第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
将第一识别结果为异常的第一目标人脸图像添加到重点关注队列中,其中,重点关注队列包括关注人物标识;Adding a first target face image whose first recognition result is abnormal to the focus attention queue, where the focus attention queue includes a person's identity of attention;
根据第二时间间隔和重点关注队列中的关注人物标识定时采集第二目标人脸图像,将第二目标人脸图像输入到表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to the second time interval and the attention person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result;
若第二识别结果为异常,则发送第二提示消息至监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:In one embodiment, one or more non-volatile readable storage media storing computer-readable instructions are provided, and when the computer-readable instructions are executed by one or more processors, the one or more Each processor performs the following steps:
根据第一时间间隔定时采集第一目标人脸图像,将第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
若第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
将第一识别结果为异常的第一目标人脸图像添加到重点关注队列中,其中,重点关注队列包括关注人物标识;Adding a first target face image whose first recognition result is abnormal to the focus attention queue, where the focus attention queue includes a person's identity of attention;
根据第二时间间隔和重点关注队列中的关注人物标识定时采集第二目标人脸图像,将第二目标人脸图像输入到表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to the second time interval and the attention person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result;
若第二识别结果为异常,则发送第二提示消息至监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种 形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be assigned by different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to describe the technical solution of the present application, but are not limited thereto. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should be included in Within the scope of this application.

Claims (20)

  1. 一种课堂监控方法,其特征在于,包括以下步骤:A classroom monitoring method includes the following steps:
    根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
    若所述第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
    将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;
    根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;
    若所述第二识别结果为异常,则发送第二提示消息至所述监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
  2. 如权利要求1所述的课堂监控方法,其特征在于,所述表情识别模型为采用卷积神经网络训练得到的表情识别模型,所述表情识别模型包括:眼部识别模型、唇部识别模型和头部识别模型;The classroom monitoring method according to claim 1, wherein the expression recognition model is an expression recognition model trained using a convolutional neural network, and the expression recognition model comprises: an eye recognition model, a lip recognition model, and Head recognition model;
    在所述根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果的步骤之前,所述课堂监控方法还包括:Before the step of regularly collecting a first target face image according to a first time interval, inputting the first target face image into an expression recognition model, and obtaining a first recognition result, the classroom monitoring method further includes:
    获取人脸图像的样本,根据所述人脸图像的样本获取眼部训练图像、唇部训练图像和头部训练图像;Obtaining a sample of a face image, and obtaining an eye training image, a lip training image, and a head training image according to the sample of the face image;
    将所述眼部训练图像输入到所述眼部识别模型中,得到标准眼部识别结果;Inputting the eye training image into the eye recognition model to obtain a standard eye recognition result;
    将所述唇部训练图像输入到所述唇部识别模型中,得到标准唇部识别结果;Inputting the lip training image into the lip recognition model to obtain a standard lip recognition result;
    将所述头部训练图像输入到所述头部识别模型中,得到标准头部识别结果;Inputting the head training image into the head recognition model to obtain a standard head recognition result;
    将所述标准眼部识别结果、所述标准唇部识别结果和所述标准头部识别结果组成所述标准表情识别结果集。The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into the standard facial expression recognition result set.
  3. 如权利要求2所述的课堂监控方法,其特征在于,所述根据第一时间间隔定时采集第一目标人脸图像,包括:The classroom monitoring method according to claim 2, wherein the step of regularly acquiring a first target face image according to a first time interval comprises:
    根据所述第一时间间隔定时获取原始视频数据;Obtaining the original video data at a timing according to the first time interval;
    对所述原始视频数据进行分帧和归一化处理,得到第一目标人脸图像。Frame and normalize the original video data to obtain a first target face image.
  4. 如权利要求3所述的课堂监控方法,其特征在于,所述标准表情识别结果集包括倾听、疑惑、理解、抗拒和不屑五种表情结果;The classroom monitoring method according to claim 3, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;
    所述将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果,包括:The inputting the first target face image into an expression recognition model to obtain a first recognition result includes:
    采用人脸特征点检测算法从所述第一目标人脸图像中获取第一目标人脸特征点;Using a face feature point detection algorithm to obtain a first target face feature point from the first target face image;
    根据所述第一目标人脸特征点划分所述第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像;Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;
    将所述眼部区域图像输入到所述眼部识别模型中,获取测试眼部识别结果;将所述唇部区域图像输入到所述唇部识别模型中,获取测试唇部识别结果;将所述头部区域图像输入到 所述头部识别模型中,获取测试头部识别结果;Input the eye region image into the eye recognition model to obtain a test eye recognition result; input the lip region image into the lip recognition model to obtain a test lip recognition result; Inputting the head region image into the head recognition model to obtain a test head recognition result;
    将所述测试眼部识别结果、所述测试唇部识别结果和所述测试头部识别结果与所述标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果;若所述输出表情结果为倾听、疑惑、理解,则所述第一识别结果为正常;若所述输出表情结果为抗拒或不屑,则所述第一识别结果为异常。Matching the test eye recognition result, the test lip recognition result, and the test head recognition result with the standard expression recognition result set, and using the standard expression recognition result with a degree of matching greater than a preset threshold as an output expression Results; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
  5. 如权利要求1所述的课堂监控方法,其特征在于,在所述若所述第二识别结果为异常,则发送第二提示消息至所述监控端的步骤之后,所述方法还包括:The classroom monitoring method according to claim 1, wherein after the step of sending a second prompt message to the monitoring terminal if the second recognition result is abnormal, the method further comprises:
    将所述第二识别结果为正常的所述第二目标人脸图像从所述重点关注队列中释放。And releasing the second target face image whose second recognition result is normal from the focus attention queue.
  6. 如权利要求1所述的课堂监控方法,其特征在于,所述课堂监控方法还包括:The classroom monitoring method according to claim 1, wherein the classroom monitoring method further comprises:
    实时获取参照点位置信息,根据所述参照点位置信息采集第三目标人脸图像;Acquire the reference point position information in real time, and collect a third target face image according to the reference point position information;
    将所述第三目标人脸图像输入到所述表情识别模型中,得到第三识别结果;Inputting the third target face image into the expression recognition model to obtain a third recognition result;
    若所述第三识别结果为异常,则发送第三提示消息至所述监控端。If the third identification result is abnormal, a third prompt message is sent to the monitoring end.
  7. 一种课堂监控装置,其特征在于,包括:A classroom monitoring device, comprising:
    第一识别结果获取模块,用于根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;A first recognition result acquisition module, configured to regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
    第一提示消息发送模块,用于若所述第一识别结果为异常,则发送第一提示消息至监控端;A first prompt message sending module, configured to send a first prompt message to a monitoring end if the first identification result is abnormal;
    重点关注队列添加模块,用于将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;The focus attention queue adding module is configured to add the first target face image with an abnormality in the first recognition result to a focus attention queue, where the focus attention queue includes an attention person identification;
    第二识别结果获取模块,用于根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;A second recognition result acquisition module, configured to regularly collect a second target face image according to a second time interval and the identifier of the concerned person in the focus attention queue, and input the second target face image to the expression In the recognition model, a second recognition result is obtained;
    第二提示消息发送模块,用于若所述第二识别结果为异常,则发送第二提示消息至所述监控端。A second prompt message sending module is configured to send a second prompt message to the monitoring terminal if the second identification result is abnormal.
  8. 如权利要求7所述的课堂监控装置,其特征在于,所述标准表情识别结果集包括倾听、疑惑、理解、抗拒和不屑五种表情结果;The classroom monitoring device according to claim 7, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;
    所述第一识别结果获取模块包括人脸特征点获取单元、区域图像获取单元、测试识别结果获取单元和第一识别结果单元;The first recognition result acquisition module includes a face feature point acquisition unit, an area image acquisition unit, a test recognition result acquisition unit, and a first recognition result unit;
    所述人脸特征点获取单元,用于采用人脸特征点检测算法从所述第一目标人脸图像中获取第一目标人脸特征点;The face feature point obtaining unit is configured to obtain a first target face feature point from the first target face image by using a face feature point detection algorithm;
    所述区域图像获取单元,用于根据所述第一目标人脸特征点划分所述第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像;The area image obtaining unit is configured to divide the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;
    所述测试识别结果获取单元,用于将所述眼部区域图像输入到眼部识别模型中,获取测试眼部识别结果;将所述唇部区域图像输入到唇部识别模型中,获取测试唇部识别结果;将所述头部区域图像输入到头部识别模型中,获取测试头部识别结果;The test recognition result acquisition unit is configured to input the eye region image into an eye recognition model to obtain a test eye recognition result; input the lip region image into a lip recognition model to obtain a test lip Input the head region image into a head recognition model to obtain a test head recognition result;
    所述第一识别结果获取单元,用于将所述测试眼部识别结果、所述测试唇部识别结果和所述测试头部识别结果与标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果;若所述输出表情结果为倾听、疑惑、理解,则所述第一识别结果为正常;若所述输出表情结果为抗拒或不屑,则所述第一识别结果为异常。The first recognition result acquiring unit is configured to match the test eye recognition result, the test lip recognition result, and the test head recognition result with a standard expression recognition result set, and set the matching degree to be greater than a preset The threshold standard expression recognition result is used as the output expression result; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first The recognition result is abnormal.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the computer-readable instructions as follows: step:
    根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
    若所述第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
    将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;
    根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;
    若所述第二识别结果为异常,则发送第二提示消息至所述监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
  10. 如权利要求9所述的计算机设备,其特征在于,所述表情识别模型为采用卷积神经网络训练得到的表情识别模型,所述表情识别模型包括:眼部识别模型、唇部识别模型和头部识别模型;The computer device according to claim 9, wherein the expression recognition model is an expression recognition model trained using a convolutional neural network, and the expression recognition model comprises: an eye recognition model, a lip recognition model, and a head Ministry identification model;
    在所述根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果的步骤之前,所述处理器执行所述计算机可读指令时还实现如下步骤:Before the step of regularly collecting a first target face image according to a first time interval, inputting the first target face image into an expression recognition model, and obtaining a first recognition result, the processor executes the computer The readable instructions also implement the following steps:
    获取人脸图像的样本,根据所述人脸图像的样本获取眼部训练图像、唇部训练图像和头部训练图像;Obtaining a sample of a face image, and obtaining an eye training image, a lip training image, and a head training image according to the sample of the face image;
    将所述眼部训练图像输入到所述眼部识别模型中,得到标准眼部识别结果;Inputting the eye training image into the eye recognition model to obtain a standard eye recognition result;
    将所述唇部训练图像输入到所述唇部识别模型中,得到标准唇部识别结果;Inputting the lip training image into the lip recognition model to obtain a standard lip recognition result;
    将所述头部训练图像输入到所述头部识别模型中,得到标准头部识别结果;Inputting the head training image into the head recognition model to obtain a standard head recognition result;
    将所述标准眼部识别结果、所述标准唇部识别结果和所述标准头部识别结果组成所述标准表情识别结果集。The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into the standard facial expression recognition result set.
  11. 如权利要求10所述的计算机设备,其特征在于,所述根据第一时间间隔定时采集第一目标人脸图像,包括:The computer device according to claim 10, wherein the periodically acquiring a first target face image according to a first time interval comprises:
    根据所述第一时间间隔定时获取原始视频数据;Obtaining the original video data at a timing according to the first time interval;
    对所述原始视频数据进行分帧和归一化处理,得到第一目标人脸图像。Frame and normalize the original video data to obtain a first target face image.
  12. 如权利要求11所述的计算机设备,其特征在于,所述标准表情识别结果集包括倾听、疑惑、理解、抗拒和不屑五种表情结果;The computer device according to claim 11, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;
    所述将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果,包括:The inputting the first target face image into an expression recognition model to obtain a first recognition result includes:
    采用人脸特征点检测算法从所述第一目标人脸图像中获取第一目标人脸特征点;Using a face feature point detection algorithm to obtain a first target face feature point from the first target face image;
    根据所述第一目标人脸特征点划分所述第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像;Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;
    将所述眼部区域图像输入到所述眼部识别模型中,获取测试眼部识别结果;将所述唇部区域图像输入到所述唇部识别模型中,获取测试唇部识别结果;将所述头部区域图像输入到所述头部识别模型中,获取测试头部识别结果;Input the eye region image into the eye recognition model to obtain a test eye recognition result; input the lip region image into the lip recognition model to obtain a test lip recognition result; Inputting the head region image into the head recognition model to obtain a test head recognition result;
    将所述测试眼部识别结果、所述测试唇部识别结果和所述测试头部识别结果与所述标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果;若所述输出表情结果为倾听、疑惑、理解,则所述第一识别结果为正常;若所述输出表情结果为抗拒或不屑,则所述第一识别结果为异常。Matching the test eye recognition result, the test lip recognition result, and the test head recognition result with the standard expression recognition result set, and using the standard expression recognition result with a degree of matching greater than a preset threshold as an output expression Results; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
  13. 如权利要求9所述的计算机设备,其特征在于,在所述若所述第二识别结果为异常,则发送第二提示消息至所述监控端的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein after the step of sending a second prompt message to the monitoring terminal if the second recognition result is abnormal, the processor executes the computer The following steps are also implemented when reading instructions:
    将所述第二识别结果为正常的所述第二目标人脸图像从所述重点关注队列中释放。And releasing the second target face image whose second recognition result is normal from the focus attention queue.
  14. 如权利要求9所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device of claim 9, wherein the processor further implements the following steps when executing the computer-readable instructions:
    实时获取参照点位置信息,根据所述参照点位置信息采集第三目标人脸图像;Acquire the reference point position information in real time, and collect a third target face image according to the reference point position information;
    将所述第三目标人脸图像输入到所述表情识别模型中,得到第三识别结果;Inputting the third target face image into the expression recognition model to obtain a third recognition result;
    若所述第三识别结果为异常,则发送第三提示消息至所述监控端。If the third identification result is abnormal, a third prompt message is sent to the monitoring end.
  15. 一个或多个存储有计算机可读指令的非易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more non-volatile readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
    根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果;Regularly collect a first target face image according to a first time interval, and input the first target face image into an expression recognition model to obtain a first recognition result;
    若所述第一识别结果为异常,则发送第一提示消息至监控端;If the first identification result is abnormal, sending a first prompt message to the monitoring end;
    将所述第一识别结果为异常的所述第一目标人脸图像添加到重点关注队列中,所述重点关注队列包括关注人物标识;Adding the first target face image with an abnormality in the first recognition result to a focused attention queue, where the focused attention queue includes a focused person identification;
    根据第二时间间隔和所述重点关注队列中的所述关注人物标识定时采集第二目标人脸图像,将所述第二目标人脸图像输入到所述表情识别模型中,得到第二识别结果;Regularly collect a second target face image according to a second time interval and the concerned person identification in the focus attention queue, and input the second target face image into the expression recognition model to obtain a second recognition result ;
    若所述第二识别结果为异常,则发送第二提示消息至所述监控端。If the second identification result is abnormal, a second prompt message is sent to the monitoring end.
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述表情识别模型为采用卷积神经网络训练得到的表情识别模型,所述表情识别模型包括:眼部识别模型、唇部识别模型和头部识别模型;The non-volatile readable storage medium according to claim 15, wherein the expression recognition model is an expression recognition model trained using a convolutional neural network, and the expression recognition model comprises: an eye recognition model, Lip recognition model and head recognition model;
    在所述根据第一时间间隔定时采集第一目标人脸图像,将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果的步骤之前,所述计算机可读指令被处理器执行时还实现如下步骤:The computer-readable instructions are processed before the step of regularly collecting a first target face image according to a first time interval, inputting the first target face image into an expression recognition model, and obtaining a first recognition result. The processor also implements the following steps when executing:
    获取人脸图像的样本,根据所述人脸图像的样本获取眼部训练图像、唇部训练图像和头 部训练图像;Acquiring a sample of a face image, and acquiring an eye training image, a lip training image, and a head training image according to the sample of the face image;
    将所述眼部训练图像输入到所述眼部识别模型中,得到标准眼部识别结果;Inputting the eye training image into the eye recognition model to obtain a standard eye recognition result;
    将所述唇部训练图像输入到所述唇部识别模型中,得到标准唇部识别结果;Inputting the lip training image into the lip recognition model to obtain a standard lip recognition result;
    将所述头部训练图像输入到所述头部识别模型中,得到标准头部识别结果;Inputting the head training image into the head recognition model to obtain a standard head recognition result;
    将所述标准眼部识别结果、所述标准唇部识别结果和所述标准头部识别结果组成所述标准表情识别结果集。The standard eye recognition result, the standard lip recognition result, and the standard head recognition result are combined into the standard facial expression recognition result set.
  17. 如权利16所述的非易失性可读存储介质,其特征在于,所述根据第一时间间隔定时采集第一目标人脸图像,包括:The non-volatile readable storage medium according to claim 16, wherein the step of regularly acquiring a first target face image according to a first time interval comprises:
    根据所述第一时间间隔定时获取原始视频数据;Obtaining the original video data at a timing according to the first time interval;
    对所述原始视频数据进行分帧和归一化处理,得到第一目标人脸图像。Frame and normalize the original video data to obtain a first target face image.
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述标准表情识别结果集包括倾听、疑惑、理解、抗拒和不屑五种表情结果;The non-volatile readable storage medium according to claim 17, wherein the standard expression recognition result set includes five expression results of listening, doubting, understanding, resisting and disdain;
    所述将所述第一目标人脸图像输入到表情识别模型中,得到第一识别结果,包括:The inputting the first target face image into an expression recognition model to obtain a first recognition result includes:
    采用人脸特征点检测算法从所述第一目标人脸图像中获取第一目标人脸特征点;Using a face feature point detection algorithm to obtain a first target face feature point from the first target face image;
    根据所述第一目标人脸特征点划分所述第一目标人脸图像,得到眼部区域图像、唇部区域图像和头部区域图像;Dividing the first target face image according to the first target face feature point to obtain an eye area image, a lip area image, and a head area image;
    将所述眼部区域图像输入到所述眼部识别模型中,获取测试眼部识别结果;将所述唇部区域图像输入到所述唇部识别模型中,获取测试唇部识别结果;将所述头部区域图像输入到所述头部识别模型中,获取测试头部识别结果;Input the eye region image into the eye recognition model to obtain a test eye recognition result; input the lip region image into the lip recognition model to obtain a test lip recognition result; Inputting the head region image into the head recognition model to obtain a test head recognition result;
    将所述测试眼部识别结果、所述测试唇部识别结果和所述测试头部识别结果与所述标准表情识别结果集进行匹配,将匹配度大于预设阈值的标准表情识别结果作为输出表情结果;若所述输出表情结果为倾听、疑惑、理解,则所述第一识别结果为正常;若所述输出表情结果为抗拒或不屑,则所述第一识别结果为异常。Matching the test eye recognition result, the test lip recognition result, and the test head recognition result with the standard expression recognition result set, and using the standard expression recognition result with a degree of matching greater than a preset threshold as an output expression Results; if the output expression result is listening, doubting, understanding, the first recognition result is normal; if the output expression result is resisting or disdain, the first recognition result is abnormal.
  19. 如权利要求15所述的非易失性可读存储介质,其特征在于,在所述若所述第二识别结果为异常,则发送第二提示消息至所述监控端的步骤之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The non-volatile readable storage medium according to claim 15, wherein after the step of sending a second prompt message to the monitoring terminal if the second identification result is abnormal, the computer When the readable instructions are executed by one or more processors, the one or more processors further perform the following steps:
    将所述第二识别结果为正常的所述第二目标人脸图像从所述重点关注队列中释放。And releasing the second target face image whose second recognition result is normal from the focus attention queue.
  20. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The non-volatile readable storage medium of claim 15, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors further perform the following steps:
    实时获取参照点位置信息,根据所述参照点位置信息采集第三目标人脸图像;Acquire the reference point position information in real time, and collect a third target face image according to the reference point position information;
    将所述第三目标人脸图像输入到所述表情识别模型中,得到第三识别结果;Inputting the third target face image into the expression recognition model to obtain a third recognition result;
    若所述第三识别结果为异常,则发送第三提示消息至所述监控端。If the third identification result is abnormal, a third prompt message is sent to the monitoring end.
PCT/CN2018/106435 2018-08-02 2018-09-19 Class monitoring method and apparatus, computer device, and storage medium WO2020024400A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810870693.8A CN109344682A (en) 2018-08-02 2018-08-02 Classroom monitoring method, device, computer equipment and storage medium
CN201810870693.8 2018-08-02

Publications (1)

Publication Number Publication Date
WO2020024400A1 true WO2020024400A1 (en) 2020-02-06

Family

ID=65291504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/106435 WO2020024400A1 (en) 2018-08-02 2018-09-19 Class monitoring method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN109344682A (en)
WO (1) WO2020024400A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175534A (en) * 2019-05-08 2019-08-27 长春师范大学 Teaching assisting system based on multitask concatenated convolutional neural network
CN111428671A (en) * 2020-03-31 2020-07-17 杭州博雅鸿图视频技术有限公司 Face structured information identification method, system, device and storage medium
CN111563449A (en) * 2020-04-30 2020-08-21 上海交通大学 Real-time classroom attention detection method and system
CN112052815A (en) * 2020-09-14 2020-12-08 北京易华录信息技术股份有限公司 Behavior detection method and device and electronic equipment
CN112055212A (en) * 2020-08-24 2020-12-08 深圳市青柠互动科技开发有限公司 System and method for centralized analysis and processing of multiple paths of videos
CN112228036A (en) * 2020-06-04 2021-01-15 精英数智科技股份有限公司 Gas extraction monitoring method, system and equipment and readable storage medium
CN112580522A (en) * 2020-12-22 2021-03-30 北京每日优鲜电子商务有限公司 Method, device and equipment for detecting sleeper and storage medium
CN112699774A (en) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 Method and device for recognizing emotion of person in video, computer equipment and medium
CN113096808A (en) * 2021-04-23 2021-07-09 深圳壹账通智能科技有限公司 Event prompting method and device, computer equipment and storage medium
CN113298159A (en) * 2021-05-28 2021-08-24 平安科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium
CN113627335A (en) * 2021-08-10 2021-11-09 浙江大华技术股份有限公司 Method and device for monitoring behavior of examinee, storage medium and electronic device
CN113989606A (en) * 2021-10-29 2022-01-28 北京环境特性研究所 Target person tracking system and method
CN114612977A (en) * 2022-03-10 2022-06-10 苏州维科苏源新能源科技有限公司 Big data based acquisition and analysis method
CN115100600A (en) * 2022-06-30 2022-09-23 苏州市新方纬电子有限公司 Intelligent detection method and system for production line of battery pack

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582611A (en) * 2019-02-18 2020-08-25 北京入思技术有限公司 Classroom teaching evaluation method and system based on emotion perception
CN109919079A (en) * 2019-03-05 2019-06-21 百度在线网络技术(北京)有限公司 Method and apparatus for detecting learning state
CN110164209A (en) * 2019-04-24 2019-08-23 薄涛 Instructional terminal, server and live teaching broadcast system
CN110287807A (en) * 2019-05-31 2019-09-27 上海亿童科技有限公司 A kind of human body information acquisition method, apparatus and system
CN110418095B (en) * 2019-06-28 2021-09-14 广东虚拟现实科技有限公司 Virtual scene processing method and device, electronic equipment and storage medium
CN110737535B (en) * 2019-09-09 2023-02-07 平安证券股份有限公司 Data processing method and device based on message queue and computer equipment
CN110580470A (en) * 2019-09-12 2019-12-17 深圳壹账通智能科技有限公司 Monitoring method and device based on face recognition, storage medium and computer equipment
CN110598632B (en) * 2019-09-12 2022-09-09 深圳市商汤科技有限公司 Target object monitoring method and device, electronic equipment and storage medium
CN112584091B (en) * 2019-09-29 2022-04-26 杭州海康威视数字技术股份有限公司 Alarm information generation method, alarm information analysis method, system and device
CN110942086B (en) * 2019-10-30 2024-04-23 平安科技(深圳)有限公司 Data prediction optimization method, device, equipment and readable storage medium
CN111507241A (en) * 2020-04-14 2020-08-07 四川聚阳科技集团有限公司 Lightweight network classroom expression monitoring method
CN111556279A (en) * 2020-05-22 2020-08-18 腾讯科技(深圳)有限公司 Monitoring method and communication method of instant session
CN111738177B (en) * 2020-06-28 2022-08-02 四川大学 Student classroom behavior identification method based on attitude information extraction
CN112528777A (en) * 2020-11-27 2021-03-19 富盛科技股份有限公司 Student facial expression recognition method and system used in classroom environment
CN113239841B (en) * 2021-05-24 2023-03-24 桂林理工大学博文管理学院 Classroom concentration state detection method based on face recognition and related instrument
CN113570484B (en) * 2021-09-26 2022-02-08 广州华赛数据服务有限责任公司 Online primary school education management system and method based on big data
CN114724229B (en) * 2022-05-23 2022-09-02 北京英华在线科技有限公司 Learning state detection system and method for online education platform
CN116757524B (en) * 2023-05-08 2024-02-06 广东保伦电子股份有限公司 Teacher teaching quality evaluation method and device
CN117152688A (en) * 2023-10-31 2023-12-01 江西拓世智能科技股份有限公司 Intelligent classroom behavior analysis method and system based on artificial intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250822A (en) * 2016-07-21 2016-12-21 苏州科大讯飞教育科技有限公司 Student's focus based on recognition of face monitoring system and method
CN106599881A (en) * 2016-12-30 2017-04-26 首都师范大学 Student state determination method, device and system
CN107169900A (en) * 2017-05-19 2017-09-15 南京信息工程大学 A kind of student listens to the teacher rate detection method
CN107292271A (en) * 2017-06-23 2017-10-24 北京易真学思教育科技有限公司 Learning-memory behavior method, device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106851216B (en) * 2017-03-10 2019-05-28 山东师范大学 A kind of classroom behavior monitoring system and method based on face and speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250822A (en) * 2016-07-21 2016-12-21 苏州科大讯飞教育科技有限公司 Student's focus based on recognition of face monitoring system and method
CN106599881A (en) * 2016-12-30 2017-04-26 首都师范大学 Student state determination method, device and system
CN107169900A (en) * 2017-05-19 2017-09-15 南京信息工程大学 A kind of student listens to the teacher rate detection method
CN107292271A (en) * 2017-06-23 2017-10-24 北京易真学思教育科技有限公司 Learning-memory behavior method, device and electronic equipment

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175534A (en) * 2019-05-08 2019-08-27 长春师范大学 Teaching assisting system based on multitask concatenated convolutional neural network
CN111428671A (en) * 2020-03-31 2020-07-17 杭州博雅鸿图视频技术有限公司 Face structured information identification method, system, device and storage medium
CN111563449A (en) * 2020-04-30 2020-08-21 上海交通大学 Real-time classroom attention detection method and system
CN112228036A (en) * 2020-06-04 2021-01-15 精英数智科技股份有限公司 Gas extraction monitoring method, system and equipment and readable storage medium
CN112055212A (en) * 2020-08-24 2020-12-08 深圳市青柠互动科技开发有限公司 System and method for centralized analysis and processing of multiple paths of videos
CN112052815B (en) * 2020-09-14 2024-02-20 北京易华录信息技术股份有限公司 Behavior detection method and device and electronic equipment
CN112052815A (en) * 2020-09-14 2020-12-08 北京易华录信息技术股份有限公司 Behavior detection method and device and electronic equipment
CN112580522A (en) * 2020-12-22 2021-03-30 北京每日优鲜电子商务有限公司 Method, device and equipment for detecting sleeper and storage medium
CN112699774A (en) * 2020-12-28 2021-04-23 深延科技(北京)有限公司 Method and device for recognizing emotion of person in video, computer equipment and medium
CN112699774B (en) * 2020-12-28 2024-05-24 深延科技(北京)有限公司 Emotion recognition method and device for characters in video, computer equipment and medium
CN113096808A (en) * 2021-04-23 2021-07-09 深圳壹账通智能科技有限公司 Event prompting method and device, computer equipment and storage medium
CN113298159A (en) * 2021-05-28 2021-08-24 平安科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium
CN113627335A (en) * 2021-08-10 2021-11-09 浙江大华技术股份有限公司 Method and device for monitoring behavior of examinee, storage medium and electronic device
CN113989606A (en) * 2021-10-29 2022-01-28 北京环境特性研究所 Target person tracking system and method
CN114612977A (en) * 2022-03-10 2022-06-10 苏州维科苏源新能源科技有限公司 Big data based acquisition and analysis method
CN115100600A (en) * 2022-06-30 2022-09-23 苏州市新方纬电子有限公司 Intelligent detection method and system for production line of battery pack
CN115100600B (en) * 2022-06-30 2024-05-31 苏州市新方纬电子有限公司 Intelligent detection method and system for production line of battery pack

Also Published As

Publication number Publication date
CN109344682A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
WO2020024400A1 (en) Class monitoring method and apparatus, computer device, and storage medium
WO2021004112A1 (en) Anomalous face detection method, anomaly identification method, device, apparatus, and medium
WO2019232866A1 (en) Human eye model training method, human eye recognition method, apparatus, device and medium
WO2019232862A1 (en) Mouth model training method and apparatus, mouth recognition method and apparatus, device, and medium
JP7078803B2 (en) Risk recognition methods, equipment, computer equipment and storage media based on facial photographs
WO2020015076A1 (en) Facial image comparison method and apparatus, computer device, and storage medium
WO2020098374A1 (en) Face key point detection method, apparatus, computer device and storage medium
WO2021078157A1 (en) Image processing method and apparatus, electronic device, and storage medium
WO2020015075A1 (en) Facial image comparison method and apparatus, computer device, and storage medium
CN108829900B (en) Face image retrieval method and device based on deep learning and terminal
WO2020024395A1 (en) Fatigue driving detection method and apparatus, computer device, and storage medium
US9547808B2 (en) Head-pose invariant recognition of facial attributes
CN109086711B (en) Face feature analysis method and device, computer equipment and storage medium
CN108629336B (en) Face characteristic point identification-based color value calculation method
EP2889805A2 (en) Method and system for emotion and behavior recognition
WO2019033571A1 (en) Facial feature point detection method, apparatus and storage medium
WO2022252642A1 (en) Behavior posture detection method and apparatus based on video image, and device and medium
CN111209845A (en) Face recognition method and device, computer equipment and storage medium
CN111598038B (en) Facial feature point detection method, device, equipment and storage medium
CN110598638A (en) Model training method, face gender prediction method, device and storage medium
CN110197107B (en) Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium
US11062126B1 (en) Human face detection method
US10360441B2 (en) Image processing method and apparatus
CN109002776B (en) Face recognition method, system, computer device and computer-readable storage medium
CN116863522A (en) Acne grading method, device, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18928702

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18928702

Country of ref document: EP

Kind code of ref document: A1