WO2023101117A1

WO2023101117A1 - Method for non-face-to-face education management using deep learning-based person recognition

Info

Publication number: WO2023101117A1
Application number: PCT/KR2022/007400
Authority: WO
Inventors: 이충건; 김준회
Original assignee: 주식회사 마블러스
Priority date: 2021-11-30
Filing date: 2022-05-25
Publication date: 2023-06-08
Also published as: KR20230081013A

Abstract

The present invention relates to an education method comprising the steps of: receiving image data from an external electronic device; primarily detecting face data from the received image data; secondarily detecting person data when no face data is detected; and managing non-face-to-face image information on the basis of the detected face data or person data.

Description

Non-face-to-face education management method using deep learning-based human recognition

The present invention relates to a deep learning-based human recognition method and a non-face-to-face education management method, and more particularly, to a non-face-to-face education management method using deep learning-based human recognition.

With the development of IT technology, there are many places where non-face-to-face meetings or lectures are held through various online video conferencing platforms such as ZOOM, Google MEET, MS Teams, and Skype. After the COVID-19 pandemic, this phenomenon becomes more prominent as not only work but also daily life environments move from offline to online. It is pointed out that the online video conferencing platform is a means that can be conveniently used anytime, anywhere, but the lack of concentration and frequent departure of students due to the fact that it is not a face-to-face situation. In particular, in the case of elementary, middle, and high school classes, there is a problem of leaving the seat after leaving the non-face-to-face lecture as a log-in state, which is becoming a social problem even though continuous concentration and seat maintenance are required.

Although there has been object recognition image technology in the past, existing programs were developed for industrial use and are used under special circumstances such as building management, factory management, or autonomous driving. The program to be developed is required to build a deep learning model that is easy to use in the field and usable in a mobile environment, but there was a problem that the existing technology did not sufficiently provide a solution.

The present invention is to provide a non-face-to-face education management method using deep learning-based human recognition in order to solve the above problems.

A method of driving a deep learning module for a mobile environment performed by a computing device according to an embodiment of the present invention, comprising: receiving image data from an external electronic device; Primarily detecting face data from received image data; Secondarily detecting human data when face data is not detected; and managing non-face-to-face image information based on the detected face data or human data.

In an embodiment, an emotion index may be derived based on the detected face data and human data.

In one embodiment, managing the non-face-to-face image information may be managing based on the derived emotion index.

In one embodiment, the method may further include a step of feedback updating the deep learning model by transferring data collected in an online situation.

An education method performed by a computing device according to an embodiment of the present invention, comprising: receiving image data from an external electronic device; Primarily detecting face data from received image data; Secondarily detecting human data when face data is not detected; and tracking the detected face data or human data in real time and managing non-face-to-face training image information.

In an embodiment, the managing of the non-face-to-face training image information includes recognizing the detected face data or person data as being away and sending a warning alarm when a detection state of the detected face data or person data changes over a predetermined period of time. can do.

In one embodiment, the method may further include deriving an emotion index based on the detected face data and human data.

In one embodiment, the emotion index includes an emotion index and a concentration index, and the emotion index inputs facial data to an emotion recognition model based on facial expression recognition technology to determine any one emotion type or enjoyment among positive, negative, and neutral. Any one emotion type among surprise, sadness, anger, fear, displeasure, and calmness is derived as a probability value, and the concentration index derives heart rate and heart rate variability through rPPG after face detection from the input image, and normal, concentration, and immersion Concentration types can be derived as probability values for each stage.

Describes a computer program recorded in a computer readable storage medium configured to perform the education method according to the above-described content through an electronic device according to another embodiment of the present invention.

In the present invention, real-time face recognition and person recognition are possible even with image data taken by a general RGB camera using AI deep learning technology-based object recognition technology.

According to the present invention, it is possible to implement a fast and accurate algorithm that can be driven even in a mobile environment.

Emotion indicators are extracted so that they can be applied in a non-face-to-face education environment, and it is possible to actively manage the educational environment by checking whether the seat is away or not, as well as checking the emotion indicators and concentration indicators.

In addition, it is possible to actively update the AI deep learning module through transfer learning.

Effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

1 illustrates a communication system according to various embodiments of the present invention.

2 is a block diagram of a configuration of a video electronic device based on changes in emotion and concentration state according to various embodiments of the present disclosure.

Figure 3 shows a block diagram of the configuration of a server according to various embodiments of the present invention.

4 illustrates a deep learning-based human recognition method and a non-face-to-face education management method according to various embodiments of the present disclosure.

5 illustrates the structure of a machine learning model according to various embodiments of the present invention.

6 illustrates an example in which the artificial intelligence logic according to the present invention derives result values for emotion, posture, and concentration.

7 is an exemplary diagram for explaining the architecture and learning of a deep learning module according to the present invention.

8 illustrates an example of a process of detailed models constituting an emotion recognition model based on transfer learning in detail.

9 is a flowchart illustrating an emotion index and concentration index acquisition according to the present invention.

10 shows the result of performing a demonstration of the face and person recognition model according to the present invention.

[Description of code]

100: user 110: electronic device

111: memory 112: transceiver

113: processor 114: camera

115: recording device 116: output device

120: wired/wireless communication network 130: server

131: memory 132: transceiver

133: processor 500: artificial neural network

510: input layer 511: input information

530: hidden layer 531: first hidden layer

532: first unit 533: second hidden layer

534: second unit 550: output layer

551: prediction result unit

Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. This invention may be embodied in many different forms and is not limited to the embodiments set forth herein.

Referring to FIG. 1 , a communication system according to various embodiments of the present disclosure includes an electronic device 110, a wired/wireless communication network 120, and a server 130. The server 130 obtains image data from the user's electronic device 110 through the wired/wireless communication network 120, derives an emotional state and a concentration state, and then displays a chatbot message UI corresponding to the corresponding state. It is transmitted back to the electronic device 110 of the user through the wireless communication network 120 .

The electronic device 110 captures and transmits image data including face and posture information for the learning state of the user according to a request of the server 130 through the wired/wireless communication network 120 . The electronic device 110 includes a memory that can store information, a transceiver that can transmit and receive information, and at least one processor that can perform information calculation, such as a personal computer, a cellular phone, a smart phone, and a tablet computer. It may be an electronic device including. The type of electronic device 110 is not limited.

The wired/wireless communication network 120 provides a communication path through which the electronic device 110 and the server 130 can transmit and receive signals and data to each other. The wired/wireless communication network 120 is not limited to a communication method according to a specific communication protocol, and an appropriate communication method may be used according to an implementation example. For example, when configured as an Internet Protocol (IP) based system, the wired/wireless communication network 120 may be implemented as a wired/wireless Internet network, and the electronic device 110 and the server 130 are implemented as mobile communication terminals. If possible, the wired/wireless communication network 120 may be implemented as a wireless network such as a cellular network or a wireless local area network (WLAN) network.

The server 130 receives image data including face and posture information for the learning state of the user from the electronic device 110 through the wired/wireless communication network 120 . The server 130 may be an electronic device including a memory capable of storing information, a transmitting/receiving unit capable of transmitting and receiving information, and at least one processor capable of performing information calculation.

2 illustrates a block diagram of a configuration of an electronic device according to various embodiments of the present disclosure.

Referring to FIG. 2 , an electronic device 110 according to various embodiments of the present disclosure includes a memory 111, a transceiver 112, and a processor 113.

The memory 111 may include volatile memory, non-volatile memory, or a combination of volatile and non-volatile memories. Also, the memory 111 may provide stored data according to a request of the processor 113 .

The transceiver 112 is connected to the processor 113 and transmits and/or receives signals. All or part of the transceiver 113 may be referred to as a transmitter, a receiver, or a transceiver. The transceiver 112 is a wired access system and a wireless access system, such as an institute of electrical and electronics engineers (IEEE) 802.xx system, an IEEE Wi-Fi system, a 3rd generation partnership project (3GPP) system, and a 3GPP long term evolution (LTE) system. , 3GPP 5G new radio (NR) system, 3GPP2 system, at least one of various wireless communication standards such as Bluetooth may be supported.

The processor 113 may be configured to implement the procedures and/or methods proposed in the present invention. The processor 113 controls overall operations of the electronic device 110 to provide content based on machine learning analysis of biometric information. For example, the processor 116 transmits or receives information or the like through the transceiver 115 . Processor 116 also writes data to and reads data from memory 112 . Processor 116 may include at least one processor.

Figure 3 shows a block diagram of the configuration of the server 130 according to various embodiments of the present invention.

Referring to FIG. 3 , a server 130 according to various embodiments of the present disclosure includes a memory 131 , a transceiver 132 and a processor 133 . The server 130 may be a type of electronic device.

The server 130 receives image data including face and posture information about the learning state of the user from the electronic device 110 through the wired/wireless communication network 120 . The server 130 converts the received image data into Mat data, converts the Mat data into emotional and concentration values by inputting the Mat data into the emotion recognition SDK module, and converts the emotional state and concentration state values into control logic. and call the chatbot message UI linked to the control logic.

The memory 131 is connected to the transceiver 132 and may store information received through communication. In addition, the memory 131 is connected to the processor 133 and may store data such as a basic program for operation of the processor 133, an application program, setting information, and information generated by operation of the processor 133. . The memory 131 may include volatile memory, non-volatile memory, or a combination of volatile and non-volatile memories. Also, the memory 131 may provide stored data according to a request of the processor 133 .

The transceiver 132 is connected to the processor 133 and transmits and/or receives signals. All or part of the transceiver 132 may be referred to as a transmitter, a receiver, or a transceiver. The transceiver 132 is a wired access system and a wireless access system, such as an institute of electrical and electronics engineers (IEEE) 802.xx system, an IEEE Wi-Fi system, a 3rd generation partnership project (3GPP) system, and a 3GPP long term evolution (LTE) system. , 3GPP 5G new radio (NR) system, 3GPP2 system, at least one of various wireless communication standards such as Bluetooth may be supported.

The processor 133 may be configured to implement the procedures and/or methods proposed in the present invention. The processor 133 converts image data into emotional and concentration values by inputting them to the emotion recognition SDK module, and controls overall operations of the server 130, such as control logic using the emotional state and concentration values as input values. For example, the processor 133 transmits or receives information or the like through the transceiver 132 . Also, the processor 133 writes data to and reads data from the memory 131 . The processor 135 may include at least one processor.

4 illustrates an education method according to various embodiments of the present invention. Referring to FIG. 4 , image data is received from an external electronic device (S100), face data is primarily detected from the received image data (S200), and face data is obtained from the received image data. Determining whether it is detected (S300), detecting human data secondarily if face data is not detected (S400), deriving an emotion index through a deep learning module (S500), based on the emotion index It includes a step of managing non-face-to-face image information (S600), and a step of feedback-updating a deep learning model by transferring data collected in an online situation (S700).

Step S100 is a step of receiving image data including a face captured or transmitted in real time from the user's electronic device 110 . The training method or the deep learning module driving method according to the present invention may receive image data including a user's facial expression or human shape.

Step S200 is a step of primarily detecting face data from received image data.

Step S300 is a step of determining whether face data is detected in received image data.

Step S400 is a step of secondarily detecting human data when face data is not detected.

Step S500 is a step of deriving an emotion index through a deep learning module. In this step, image data received through the OpenCV module or the like can be processed. For example, in this step, image data can be converted into a Mat (matrix) format handled by the OpenCV image processing library. The derived face data may be converted into Mat data, and the Mat data may be input into an emotion recognition SDK module to be converted into emotion and concentration values. The emotional state may include a plurality of emotional state values with a sum of 1.0, and the concentration state may include a plurality of concentration state values with a sum of 1.0. Specifically, the emotional state value and the concentration state value represent probability values, and the sum of the corresponding states must always satisfy 1.0. The 7 emotions and 3 concentration states are calculated as probability values, and the sum of the 7 emotions always has a probability value of 1.0, and the sum of the 3 concentration states also always has a probability value of 1.0. That is, it is possible to form a closed variable space without additional variables.

Step S600 may be a step of managing non-face-to-face image information based on the emotion index. Specifically, it will be described later with reference to FIG. 6 .

Step S700 may include a step of feedback updating the deep learning model by transferring data collected in an online situation.

The structure of a multi-layer perceptron (MLP) for providing content based on machine learning analysis of biometric information according to various embodiments of the present invention is shown.

Deep learning, as one of the emerging technologies in the field of machine learning, is a neural network composed of a plurality of hidden layers and a plurality of hidden units included in them. When low-level features are input to a deep learning model, these basic features are transformed into high-level features that can better explain the problem to be predicted while passing through a plurality of hidden layers. In this process, since prior knowledge or intuition of an expert is not required, subjective factors in feature extraction can be removed, and a model with higher generalization ability can be developed. Furthermore, in the case of deep learning, since feature extraction and model construction are composed of one set, there is an advantage in that the final model can be formed through a simpler process compared to existing machine learning theories.

A multi-layer perceptron (MLP) is a type of artificial neural network (ANN) with multiple nodes based on deep learning. Each node uses a non-linear activation function with neurons similar to animal connection patterns. This nonlinear property makes it possible to linearly distinguish inseparable data.

Referring to FIG. 5 , the artificial neural network 500 of the MLP model according to various embodiments of the present invention includes one or more input layers 510, a plurality of hidden layers 530, and one or more outputs. It consists of an output layer (550).

Input data such as an RGB value of each pixel in at least one ultrasound image per unit time is input to a node of the input layer 510 . Here, the user's biometric information, eg, electrocardiogram information, concentration level, happiness emotion intensity information, and adjusted content information, eg, content genre, content topic, and content channel information 511 are deep It corresponds to the basic characteristic (low level feature) of the learning model.

A node of the hidden layer 530 performs calculations based on input factors. The hidden layer 530 is a layer in which units defined by a plurality of nodes formed by integrating the user's biometric information and the information 511 of the adjusted content are stored. As shown in FIG. 5 , the hidden layer 530 may include a plurality of hidden layers.

For example, when the hidden layer 530 is composed of a first hidden layer 531 and a second hidden layer 533, the first hidden layer 531 includes user biometric information and adjusted content information 511 As a layer in which first units 532 defined by a plurality of nodes formed by consolidating are stored, the first unit 532 corresponds to a higher characteristic of the user's biometric information and the information 511 of the adjusted content. The second hidden layer 533 is a layer in which second units 534, defined as a plurality of nodes formed by consolidating the first units of the first hidden layer 531, are stored. Corresponds to the upper characteristics of 1 unit 532.

Nodes of the output layer 550 represent calculated prediction results. The output layer 550 may include a plurality of prediction result units 551 . Specifically, the plurality of prediction result units 551 may include two units of a true unit and a false unit. Specifically, the true unit is a prediction result unit meaning that the emotional state value and concentration state value of the user's face data are highly likely to be higher than the threshold value after adjusting the content to the adjusted content, and the false unit is the content adjustment to the adjusted content. It is a prediction result unit that means that the possibility that the emotion state value and the concentration state value of the user's face data are higher than the threshold value is low.

Weights are assigned to connections between the prediction result units 551 and the second units 534 included in the second hidden layer 533, which is the last layer among the hidden layers 530. Based on these weights, it is predicted whether the emotional state value and the concentration state value of the user's face data are greater than or equal to a threshold value after adjusting the content to the adjusted content.

The artificial neural network 500 of the MLP model learns by adjusting learning parameters. According to one embodiment, the learning parameters include at least one of a weight and a variance. The learning parameters are iteratively adjusted through an optimization algorithm called gradient descent. Each time a prediction result is computed from a given data sample (forward propagation), the performance of the network is evaluated through a loss function that measures the prediction error. Each learning parameter of the artificial neural network 500 is adjusted by gradually increasing in the direction of minimizing the value of the loss function, and this process is called back-propagation.

Examine the control logic according to FIG. 6 . After recognizing the learner's face from the image data provided by the user's electronic device, the first depth can be divided into emotion (emotion) determination, face detection, and concentration state determination. The learner's emotional state value and concentration state value derived from the above-described emotion recognition SDK module are allocated to the emotion determination unit and the concentration state determination unit, respectively, and basic face detection data may correspond to the face detection unit.

In the second depth corresponding to the emotion determination unit, one of seven types of emotional states of the learner may be derived based on the result input to the emotion determination unit. This emotion determination unit can derive a corresponding result through machine learning (machine learning) modules such as deep learning and artificial intelligence, and is not limited to a specific technology.

In the third depth corresponding to the emotion determination unit, an appropriate control logic may be calculated based on emotional state values corresponding to the seven emotions determined in the second depth. For example, when a positive emotional state value corresponds to level 0 (less than 0.5) for a long period of time, control logic may be activated to drive a care chatbot UI for user management.

In the second depth corresponding to the face detection unit, based on the extracted face detection data, a result of the appropriateness of the posture and a determination of whether to leave the seat may be derived through the posture determination unit and the seat departure determination unit.

In the third depth corresponding to the face detection unit, when the posture judgment value derived from the posture determination unit exceeds the appropriate area, the output signal is sent to the user's electronic device to implement a guide to the dialog and face position reset screen by controlling the chatbot UI can be sent

In the third depth corresponding to the seat departure determination unit, when the derived seat departure judgment value exceeds the appropriate area, the chatbot UI is controlled, and the user's electronic interface is implemented to implement a guide through a voice interface during learning It can transmit an output signal to the device.

In the second depth corresponding to the concentration state determiner, it can be divided into a normal state, a concentration state, and an immersion state corresponding to three categories.

In the third depth corresponding to the concentration state determination unit, if the normal state is extended for a longer period of time than a preset time, the chatbot UI is controlled, and if the normal state continues for a long time, it is determined that care is required, and guidance through a voice interface during learning An output signal can be sent to the user's electronic device to implement.

In some embodiments, the step of managing the non-face-to-face image information (S600) is to manage the non-face-to-face image information based on the derived emotional index, and when the emotional index is derived, a predetermined signal is sent to the non-face-to-face image It may be a step of transmitting to an electronic device for photographing.

As described above, in the 3-step depth structure of the control logic, when the control logic determines that the learning efficiency is reduced based on the user's emotion/concentration value, the seat is empty, or a change is required, the control logic According to the control, a chatbot message UI may be called and a signal may be transmitted to be controlled in the user's electronic device. Through output according to the control logic, non-face-to-face image information obtained by photographing a learner's face in real time may be managed in real time.

In some embodiments, managing the non-face-to-face image information (S600) may include tracking detected face data or human data in real time and managing non-face-to-face training image information. The managing of the non-face-to-face education image information may include recognizing the detected face data or person data as being away and transmitting a warning alarm when the detection state changes over a predetermined time period.

As shown in FIG. 6 , if it is determined that the user is out of the seat for a certain period of time, the control logic may provide a route alarm through the chatbot message UI.

Referring to FIG. 7 , the facial expression recognition deep learning module based on ai deep learning technology according to the present invention is a model learned using an image dataset of Korean elementary school students (5 to 13 years old) as input data. 7 describes a transfer learning module as an exemplary diagram for explaining the architecture and learning of a deep learning module according to the present invention.

Referring to FIG. 7 , pre-processing may be performed on an image data set of 5,000 to 10,000 or Korean elementary school students at a predetermined resolution and size.

Thereafter, the preprocessed image dataset is inserted into a first convolutional network, and at this time, the number of channels of the first convolutional network may be determined as n by n, where n may be 24 by way of example.

Thereafter, max-pooling is performed on the first convolution to prevent overfitting, and the size of the network can be reduced by extracting the maximum value from the data set. For example, the network after max pooling may be determined as n/2 by n/2, and may be illustratively a 12 by 12 network, but is not limited thereto.

Thereafter, the max-pooled data may be inserted into a second convolutional network. In this case, the number of channels of the second convolutional network may be determined by m by m, for example, m may be 8 by way of example, but is not limited thereto.

The second convolution can be max-pooled again.

Afterwards, the Rectified Linear Unit (ReLu), which rectifies the final network, can be inserted into the front of the neural network for deep learning learning as an active function. Through this learning, deep learning can be learned to derive whether an input image corresponds to any one of a plurality of emotional states or a plurality of concentration states.

The present invention can derive quantitative indicators for the emotional state, learning state, and concentration state of Korean students by deep learning through the transfer learning architecture of these images of Korean elementary school students.

Referring to FIG. 8 , the emotion recognition derivation model according to the present invention may be composed of the following three detailed models.

The first emotion recognition model may perform facial expression analysis by applying the ai deep learning module after detecting a face from an input image, and calculate probability values for each of three emotion types (positive, negative, and neutral). A quantitative emotional index may be derived based on the probability value.

The second emotion recognition model performs facial expression analysis by applying the above-described ai deep learning module after detecting a face from an input image, and probability values for each of the seven emotion types (joy, surprise, sadness, anger, fear, displeasure, calmness) can be calculated. A quantitative emotional index may be derived based on the probability value. The first emotion recognition model and the second emotion recognition model may be applied alternatively or complementary to each other, so that a more precise emotion index may be calculated.

The concentration recognition model may derive heart rate and heart rate variability after face detection from an input image. By applying rPPG (remote photoplethysmography) technology to the derived heart rate variability data, heart rate and heart rate variability can be measured and analyzed. Based on the heart rate variability analysis result, the concentration recognition model calculates a probability value for each of the three stages of concentration (normal → concentration → immersion), and derives the concentration state index. A concentration index can be derived based on the concentration state indicator. The concentration state index can be used as a basic value for deriving a learning index.

Finally, the emotion recognition derivation model may derive a learning index based on quantitative index values derived from the first emotion recognition model, the second emotion recognition model, and the concentration recognition model. The learning index is an indicator for quantitatively grasping the user's learning status, and more precise calculation is possible by additionally utilizing student data and data generated in the learning situation, such as learning time, number of questions, and absentee status. It could be possible.

Referring to FIG. 9 , the education method according to the present invention includes receiving image data from an external electronic device (S110), detecting face data from the received image data (S210), and using facial expression recognition technology for the face data. obtaining an emotion index by inputting the face data to the rPPG model based on the change in light blood flow (S410) It may include calculating the learning index as (S510).

Step S110 is a step of receiving image data including a face captured or transmitted in real time from the user's electronic device 110 .

Step S210 is a step of extracting a user's facial expression from the received image data and extracting a data set for deriving emotion and concentration based on the facial expression.

In step S310, based on the first emotion recognition module and the second emotion recognition module, probability values for each of three emotion types (positive, negative, neutral) are calculated or seven emotion types (joy, surprise, sadness, anger, Fear, displeasure, calmness) may be a step of calculating each probability value.

In step S410, heart rate and heart rate variability may be derived after face detection from the input image. This step measures and analyzes heart rate and heart rate variability by applying rPPG (remote photoplethysmography) technology to the derived heart rate variability data.

Step S510 is a step of calculating a learning index based on the concentration index and the emotion index.

Emotional index and learning index may be built into a database. The database may store the emotional index and learning index in the form of a time series to enable comprehensive analysis.

In the case of implementing the embodiment of the present invention using hardware, ASICs (application specific integrated circuits) or DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices) configured to perform the present invention , FPGAs (field programmable gate arrays), etc. may be provided in the processor of the present invention.

Meanwhile, the above-described method can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable medium. In addition, the structure of data used in the above-described method may be recorded on a computer-readable storage medium through various means. Program storage devices, which may be used to describe a storage device containing executable computer code for performing various methods of the present invention, should not be construed as including transitory objects such as carrier waves or signals. do. The computer-readable storage media includes storage media such as magnetic storage media (eg, ROM, floppy disk, hard disk, etc.) and optical reading media (eg, CD-ROM, DVD, etc.).

The embodiments described above are those in which elements and features of the present invention are combined in a predetermined form. Each component or feature should be considered optional unless explicitly stated otherwise. Each component or feature may be implemented in a form not combined with other components or features. In addition, it is also possible to configure an embodiment of the present invention by combining some elements and/or features. The order of operations described in the embodiments of the invention may be changed. Some components or features of one embodiment may be included in another embodiment, or may be replaced with corresponding components or features of another embodiment. It is obvious that claims that do not have an explicit citation relationship in the claims can be combined to form an embodiment or can be included as new claims by amendment after filing.

It will be clear to those skilled in the art that the present invention can be embodied in other forms without departing from the technical spirit and essential characteristics of the present invention. Accordingly, the above embodiments should be considered in all respects as illustrative rather than restrictive. The scope of the present invention should be determined by reasonable interpretation of the appended claims and all possible changes within the equivalent scope of the present invention.

The education method according to the present invention supports user learning in a mobile environment because real-time face recognition and person recognition are possible even with image data taken by a general RGB camera using AI deep learning technology-based object recognition technology. It has the potential to be widely used in the education industry.

Claims

In the teaching method, performed by a computing device,

Receiving image data from an external electronic device;

Primarily detecting face data from received image data;

Secondarily detecting human data when face data is not detected; and

A training method comprising managing non-face-to-face image information based on detected face data or human data.
According to claim 1,

The method further includes deriving an emotion index corresponding to the input data based on the detected face data and human data using a pre-learned deep learning model.
According to claim 2,

The step of managing the non-face-to-face image information is to manage the non-face-to-face image information based on the derived emotional index, and when the emotional index is derived, a predetermined signal is transmitted to an electronic device for photographing a non-face-to-face image. An education method characterized in that the step.
According to claim 1,

The step of managing the non-face-to-face image information,

Tracking detected face data or human data in real time and managing non-face-to-face training image information;

The step of managing the non-face-to-face education image information,

The education method further comprising the step of recognizing the detected face data or person data as being away and transmitting a warning alarm when the detected state changes over a predetermined period of time.
According to claim 1

The training method further comprising the step of feedback updating the deep learning model by transferring the data collected in the online situation.
According to claim 5,

The emotion index includes an emotion index and a concentration index,

The emotion index inputs facial data into an emotion recognition model based on facial expression recognition technology to obtain any one emotion type among positive, negative, and neutral emotions, or any one emotion type among joy, surprise, sadness, anger, fear, displeasure, and calmness. is derived as a probability value,

The concentration index derives heart rate and heart rate variability through rPPG after face detection from the input image, and derives concentration types in normal, concentration, and immersion levels as probability values.
An electronic device comprising a memory, a transceiver and at least one processor, wherein the at least one processor is configured to perform the education method according to any one of claims 1 to 6.
A computer program recorded on a computer readable storage medium configured to perform the education method according to any one of claims 1 to 6 through an electronic device.