KR20230081012A

KR20230081012A - Method, device, and program for education comprising deep-learning module enabling generating indicators of emotion

Info

Publication number: KR20230081012A
Application number: KR1020210168670A
Authority: KR
Inventors: 김성엽; 오하람; 정문원
Original assignee: 주식회사 마블러스
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-06-07

Abstract

The present invention relates to a video educational method based on changes in emotion and concentration states performed by a computing device. A video educational method includes the steps of: receiving image data from an external electronic device; converting the received image data into Mat data; inputting the Mat data into a deep learning module learned based on Korean input data to convert the Mat data into emotion and concentration state values; and inserting the emotional and concentration state values into a control logic.

Description

Educational device, method, and program for calculating emotional quotient based on emotion index and concentration index

본 발명은 감정 지표와 집중 지표를 바탕으로 감성 지수를 산출하는 교육 장치, 방법, 및 프로그램에 관한 것이다. The present invention relates to an educational apparatus, method, and program for calculating an emotional index based on an emotional index and a concentration index.

과거에는 지적 능력을 높이는 교육을 중요시했지만, 21세기에는 감성 능력의 중요성이 더욱 커지고 있다. 특히, 유초등때 감성 교육이 자아 정서 발달, 공감 능력 형성, 사회성, 창의성 증진 등에 중요한 역할을 한다는 많은 연구 결과들이 이 발표되고 있다. 이러한 감성 케어가 학습 효과를 높이는데 핵심 요소가 되어가고 있음에도 교육 현장에서는 학생의 감성적인 부분을 활용하는 것에 소극적인 태도를 보이고 있어 학습 효율의 향상이 이루어지지 않고 있는 것이 현실이다. 코로나19의 팬데믹 이후 학습 형태가 오프라인에서 온라인으로 옮겨감에 따라 대면 상황에서의 정서적인 관리가 안됨으로 인해서 감성 케어의 중요성은 향상되는 추세이다. In the past, education to enhance intellectual ability was valued, but in the 21st century, the importance of emotional ability is increasing. In particular, many research results have been announced that emotional education in early childhood and elementary school plays an important role in ego-emotional development, empathy formation, sociality, and creativity promotion. Although this emotional care is becoming a key factor in increasing the learning effect, education sites are showing a passive attitude to utilize the emotional part of students, so the reality is that the improvement of learning efficiency is not being achieved. As the form of learning shifts from offline to online after the COVID-19 pandemic, the importance of emotional care is on the rise due to the lack of emotional management in face-to-face situations.

따라서, 비대면 학습 상황에서 학생의 감성적인 표현과 신체 반응들을 실시간으로 인식하여 현재 학생이 느끼고 있는 감정, 집중 상태 등 다양한 감성 지표들을 측정하는 기술 개발이 요구된다. 특히, 바탕으로 학생이 느끼고 있는 감성을 공감하며 다양한 장치들을 통해 긍정적인 방향으로 유도할 수 있는 기술의 개발이 요구된다. 이러한 학생의 감성을 효과적으로 도출하기 위해서는 정량적인 지표뿐 아니라 정성적인 지표의 도출이 요구되므로 관련 기술의 개발이 요구된다.Therefore, it is required to develop a technology that recognizes students' emotional expressions and physical reactions in real time in a non-face-to-face learning situation and measures various emotional indicators such as the student's current feeling and concentration state. In particular, it is required to develop a technology that can sympathize with the emotions students are feeling and lead them in a positive direction through various devices. In order to effectively derive these students' emotions, it is necessary to develop related technologies because it is necessary to derive qualitative indicators as well as quantitative indicators.

본 발명은 전술한 문제점을 해결하기 위하여 감정 지표와 집중 지표를 바탕으로 감성 지수를 산출하는 교육 장치, 방법, 및 프로그램을 제공하고자 한다.An object of the present invention is to provide an educational device, method, and program for calculating an emotional index based on an emotional index and a concentration index in order to solve the above problems.

본 발명의 일 실시예에 따른 컴퓨팅 장치에 의해 수행되는 감성 및 집중상태의 변화에 기반한 화상 교육 방법은, 외부 전자장치로부터 이미지 데이터를 수신하는 단계; 수신한 이미지 데이터에서 얼굴 데이터를 검출하는 단계; 상기 얼굴 데이터를 표정 인식 기술 기반의 감정 인식 모델에 입력하여 감정 지표를 획득하는 단계; 및 상기 얼굴 데이터를 광혈류량 변화량 기반의 rPPG 모델에 입력하여 집중 지표를 획득하는 단계;를 포함한다.An image education method based on changes in emotion and concentration state performed by a computing device according to an embodiment of the present invention includes receiving image data from an external electronic device; detecting face data from the received image data; obtaining an emotion index by inputting the facial data to an emotion recognition model based on facial expression recognition technology; and acquiring a concentration index by inputting the face data into an rPPG model based on a change in optical blood flow.

상기 감정 인식 모델은 긍정, 부정, 중립 중 어느 하나의 감정 유형을 판별하고 각각의 확률값을 산출하는 제1 감정 인식 모델을 포함할 수 있다. The emotion recognition model may include a first emotion recognition model that determines any one of positive, negative, and neutral emotion types and calculates each probability value.

상기 감정 인식 모델은 즐거움, 놀람, 슬픔, 화남, 두려움, 불쾌함, 덤덤함 중 어느 하나의 감정 유형을 판별하고 각각의 확률값을 산출하는 제2 감정 인식 모델을 포함할 수 있다. The emotion recognition model may include a second emotion recognition model that determines any one emotion type among joy, surprise, sadness, anger, fear, displeasure, and calmness and calculates a probability value for each emotion.

상기 rPPG 모델은 입력 이미지로부터 얼굴 검출 후 심박수와 심박변이도를 도출하고 보통, 집중, 몰입 단계별로 집중 상태를 판별하고, 각각의 확률값이 산출할 수 있다.The rPPG model derives heart rate and heart rate variability after detecting a face from an input image, determines a concentration state in normal, concentration, and immersion stages, and calculates respective probability values.

상기 감성 인식 모델은,The emotion recognition model,

한국인 유, 초등학생 입력 데이터 기반으로 학습되는 것을 특징으로 할 수 있다. It may be characterized in that it is learned based on input data of Koreans and elementary school students.

상기 감성 인식 모델은,The emotion recognition model,

컨벌루션 네트워크와 맥스 풀링을 통해 네트워크 데이터를 생성하고,Generate network data through convolutional networks and max pooling,

생성된 네트워크 데이터를 정류한 선형 유닛(Rectified Linear Unit;ReLu)를 활성 함수로 하여 딥러닝 학습을 수행하는 것을 특징으로 할 수 있다. Deep learning may be performed using a Rectified Linear Unit (ReLu) obtained by rectifying the generated network data as an activation function.

상기 집중 지표와 상기 감정 지표를 바탕으로 학습 지수를 산출할 수 있다. A learning index may be calculated based on the concentration index and the emotion index.

본 발명의 다른 실시예에 따른, 전자 장치에 있어서, 카메라, 녹음 장치, 출력 장치, 메모리, 송수신기 및 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 전술한 내용에 따른 교육 방법을 수행하도록 구성된다. According to another embodiment of the present invention, an electronic device includes a camera, a recording device, an output device, a memory, a transceiver, and at least one processor, wherein the at least one processor performs the education method according to the above description. is configured to

본 발명의 다른 실시예에 따른, 전술한 내용에 따른 교육 방법을 전자 장치를 통해 수행하도록 구성되는 컴퓨터 판독 가능한 저장 매체에 기록된 컴퓨터 프로그램을 기술한다. Describes a computer program recorded in a computer readable storage medium configured to perform the education method according to the above-described content through an electronic device according to another embodiment of the present invention.

본 이미지 인식 기술을 사용하여 다음과 같이 3가지 세부 유형의 감성 지표를 측정할 수 있다. 제1 감정 상태 지표 및 제2 감정 상태 지표는 AI 딥러닝 기술 기반의 얼굴 표정 인식 기술을 사용하고, 집중 상태 지표는 얼굴 이미지의 광혈류량 변화량을 분석하는 rPPG 기술을 사용한다.Using this image recognition technology, it is possible to measure three detailed types of emotion indicators as follows. The first emotional state indicator and the second emotional state indicator use facial expression recognition technology based on AI deep learning technology, and the concentration state indicator uses rPPG technology that analyzes the amount of light blood flow change in a facial image.

측정된 감정 및 집중 지표를 바탕으로 학습지수를 산출할 수 있다. A learning index can be calculated based on the measured emotion and concentration indicators.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당해 기술분야에 있어서의 통상의 지식을 가진 자가 명확하게 이해할 수 있을 것이다.Effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 다양한 실시 예들에 따른 통신 시스템을 도시한다.
도 2는 본 발명의 다양한 실시 예들에 따른 감성 및 집중상태의 변화에 기반한 화상 전자 장치의 구성에 대한 블록도를 도시한다.
도 3은 본 발명의 다양한 실시 예들에 따른 서버의 구성에 대한 블록도를 도시한다.
도 4는 본 발명의 다양한 실시 예들에 따른 감성 및 집중상태의 변환이 가능하도록 전자 장치에 의해 수행되는 교육 방법을 도시한다.
도 5는 본 발명의 다양한 실시 예들에 따른 기계 학습 모델의 구조를 도시한다.
도 6은 본 발명에 따른 인공지능 로직이 감성, 자세, 집중에 대한 결과값을 도출하는 일 예를 도시한다.
도 7은 본 발명에 따른 딥러닝 모듈의 아키텍쳐 및 학습에 대한 설명을 위한 예시적 도면이다.
도 8은 전이 학습을 바탕으로 한 감성인식 모델을 구성하는 세부 모델들의 프로세스를 구체적으로 나타낸 일 예를 도시한다.
도 9는 본 발명에서 도출되는 감성 지표의 학습 효과를 예시적으로 도시한 도면이다.
도 10은 본 발명의 다양한 실시 예들에 따른 감성 지표의 도출이 가능한 교육 방법을 도시한다.
도 11은 딥러닝 모듈 이외에도 감성 지수 산출에 필요한 추가 지표에 대한 예시를 도시한다.1 illustrates a communication system according to various embodiments of the present invention.
2 is a block diagram of a configuration of a video electronic device based on changes in emotion and concentration state according to various embodiments of the present disclosure.
Figure 3 shows a block diagram of the configuration of a server according to various embodiments of the present invention.
4 illustrates an education method performed by an electronic device to enable conversion of emotions and concentration states according to various embodiments of the present disclosure.
5 illustrates the structure of a machine learning model according to various embodiments of the present invention.
6 illustrates an example in which the artificial intelligence logic according to the present invention derives result values for emotion, posture, and concentration.
7 is an exemplary diagram for explaining the architecture and learning of a deep learning module according to the present invention.
8 illustrates an example of a process of detailed models constituting an emotion recognition model based on transfer learning in detail.
9 is a diagram exemplarily illustrating the learning effect of the emotion index derived in the present invention.
10 illustrates an education method capable of deriving an emotion index according to various embodiments of the present disclosure.
11 shows an example of an additional index required for calculating the emotional index in addition to the deep learning module.

이하, 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. This invention may be embodied in many different forms and is not limited to the embodiments set forth herein.

컴퓨팅 장치에 의해 수행되는 감성 및 집중상태의 변화에 기반한 화상 교육 방법은, 외부 전자장치로부터 이미지 데이터를 수신하는 단계; 수신한 이미지 데이터를 Mat 데이터로 변환하는 단계; 상기 Mat 데이터를 감성인식 SDK 모듈에 입력하여 감성 및 집중상태 값으로 변환하는 단계; 상기 감성상태 및 집중상태 값을 제어 로직에 삽입하는 단계; 및 상기 제어 로직에 연동되는 챗봇 메시지 UI를 호출하는 단계;An image education method based on changes in emotion and concentration performed by a computing device includes receiving image data from an external electronic device; converting the received image data into Mat data; converting the Mat data into emotional and concentration values by inputting the Mat data into an emotion recognition SDK module; inserting the emotional state and concentration state values into control logic; and calling a chatbot message UI linked to the control logic;

도 1은 본 발명의 다양한 실시 예들에 따른 통신 시스템을 도시한다.1 illustrates a communication system according to various embodiments of the present invention.

도 1을 참고하면, 본 발명의 다양한 실시 예들에 따른 통신 시스템은 전자 장치(110), 유/무선 통신 네트워크(120), 서버(130)를 포함한다. 서버(130)는 이미지 데이터를 유/무선통신 네트워크(120)를 통해 사용자의 전자 장치(110)로 부터 획득하고, 감성상태 및 집중상태를 도출한 뒤 해당 상태에 대응하는 챗봇 메시지 UI를 유/무선통신 네트워크(120)를 통해 사용자의 전자 장치(110)에 다시 송신한다.Referring to FIG. 1 , a communication system according to various embodiments of the present disclosure includes an electronic device 110, a wired/wireless communication network 120, and a server 130. The server 130 obtains image data from the user's electronic device 110 through the wired/wireless communication network 120, derives an emotional state and a concentration state, and then displays a chatbot message UI corresponding to the corresponding state. It is transmitted back to the electronic device 110 of the user through the wireless communication network 120 .

전자 장치(110)는, 유/무선 통신 네트워크(120)를 통하여 서버(130)의 요청에 따라 사용자의 학습 상태에 대한 얼굴 및 자세 정보를 포함하는 이미지 데이터를 촬영하여 송신한다. 전자 장치(110)는 퍼스널 컴퓨터, 셀룰러 폰, 스마트 폰 및 태블릿 컴퓨터 등과 같이, 정보를 저장할 수 있는 메모리, 정보의 송수신을 수행할 수 있는 송수신부, 정보의 연산을 수행할 수 있는 적어도 하나의 프로세서를 포함하는 전자 장치일 수 있다. 전자 장치(110)의 종류는 한정되지 않는다. The electronic device 110 captures and transmits image data including face and posture information for the learning state of the user according to a request of the server 130 through the wired/wireless communication network 120 . The electronic device 110 includes a memory that can store information, a transceiver that can transmit and receive information, and at least one processor that can perform information calculation, such as a personal computer, a cellular phone, a smart phone, and a tablet computer. It may be an electronic device including. The type of electronic device 110 is not limited.

유/무선 통신 네트워크(120)는, 전자 장치(110) 및 서버(130)가 서로 신호 및 데이터를 송수신할 수 있는 통신 경로를 제공한다. 유/무선 통신 네트워크(120)는 특정한 통신 프로토콜에 따른 통신 방식에 한정되지 않으며, 구현 예에 따라 적절한 통신 방식이 사용될 수 있다. 예를 들어, 인터넷 프로토콜(IP) 기초의 시스템으로 구성되는 경우 유/무선 통신 네트워크(120)는 유무선 인터넷망으로 구현될 수 있으며, 전자 장치(110) 및 서버(130)가 이동 통신 단말로서 구현되는 경우 유/무선 통신 네트워크(120)는 셀룰러 네트워크 또는 WLAN(wireless local area network) 네트워크와 같은 무선망으로 구현될 수 있다.The wired/wireless communication network 120 provides a communication path through which the electronic device 110 and the server 130 can transmit and receive signals and data to each other. The wired/wireless communication network 120 is not limited to a communication method according to a specific communication protocol, and an appropriate communication method may be used according to an implementation example. For example, when configured as an Internet Protocol (IP) based system, the wired/wireless communication network 120 may be implemented as a wired/wireless Internet network, and the electronic device 110 and the server 130 are implemented as mobile communication terminals. If possible, the wired/wireless communication network 120 may be implemented as a wireless network such as a cellular network or a wireless local area network (WLAN) network.

서버(130)는, 유/무선 통신 네트워크(120)를 통하여 전자 장치(110)로부터 사용자의 학습 상태에 대한 얼굴 및 자세 정보를 포함하는 이미지 데이터를 수신한다. 서버(130)는 정보를 저장할 수 있는 메모리, 정보의 송수신을 수행할 수 있는 송수신부, 정보의 연산을 수행할 수 있는 적어도 하나의 프로세서를 포함하는 전자 장치일 수 있다.The server 130 receives image data including face and posture information for the learning state of the user from the electronic device 110 through the wired/wireless communication network 120 . The server 130 may be an electronic device including a memory capable of storing information, a transmitting/receiving unit capable of transmitting and receiving information, and at least one processor capable of performing information calculation.

도 2는 본 발명의 다양한 실시 예들에 따른 전자 장치의 구성에 대한 블록도를 도시한다.2 illustrates a block diagram of a configuration of an electronic device according to various embodiments of the present disclosure.

도 2를 참고하면, 본 발명의 다양한 실시 예들에 따른 전자 장치(110)는 메모리(111), 송수신부(112) 및 프로세서(113)를 포함한다.Referring to FIG. 2 , an electronic device 110 according to various embodiments of the present disclosure includes a memory 111, a transceiver 112, and a processor 113.

메모리(111)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 메모리(111)는 프로세서(113)의 요청에 따라 저장된 데이터를 제공할 수 있다. The memory 111 may include volatile memory, non-volatile memory, or a combination of volatile and non-volatile memories. Also, the memory 111 may provide stored data according to a request of the processor 113 .

송수신부(112)는, 프로세서(113)와 연결되고 신호를 전송 및/또는 수신한다. 송수신부(113)의 전부 또는 일부는 송신기(transmitter), 수신기(receiver), 또는 송수신기(transceiver)로 지칭될 수 있다. 송수신기(112)는 유선 접속 시스템 및 무선 접속 시스템들인 IEEE(institute of electrical and electronics engineers) 802.xx 시스템, IEEE Wi-Fi 시스템, 3GPP(3rd generation partnership project) 시스템, 3GPP LTE(long term evolution) 시스템, 3GPP 5G NR(new radio) 시스템, 3GPP2 시스템, 블루투스(bluetooth) 등 다양한 무선 통신 규격 중 적어도 하나를 지원할 수 있다.The transceiver 112 is connected to the processor 113 and transmits and/or receives signals. All or part of the transceiver 113 may be referred to as a transmitter, a receiver, or a transceiver. The transceiver 112 is a wired access system and a wireless access system, such as an institute of electrical and electronics engineers (IEEE) 802.xx system, an IEEE Wi-Fi system, a 3rd generation partnership project (3GPP) system, and a 3GPP long term evolution (LTE) system. , 3GPP 5G new radio (NR) system, 3GPP2 system, at least one of various wireless communication standards such as Bluetooth may be supported.

프로세서(113)는, 본 발명에서 제안한 절차 및/또는 방법들을 구현하도록 구성될 수 있다. 프로세서(113)는 생체 정보의 기계 학습 분석에 기반하여 컨텐츠를 제공하기 위한 전자 장치(110)의 전반적인 동작들을 제어한다. 예를 들어, 프로세서(116)는 송수신부(115)를 통해 정보 등을 전송 또는 수신한다. 또한, 프로세서(116)는 메모리(112)에 데이터를 기록하고, 읽는다. 프로세서(116)는 적어도 하나의 프로세서(processor)를 포함할 수 있다.The processor 113 may be configured to implement the procedures and/or methods proposed in the present invention. The processor 113 controls overall operations of the electronic device 110 to provide content based on machine learning analysis of biometric information. For example, the processor 116 transmits or receives information or the like through the transceiver 115 . Processor 116 also writes data to and reads data from memory 112 . Processor 116 may include at least one processor.

도 3은 본 발명의 다양한 실시 예들에 따른 서버(130)의 구성에 대한 블록도를 도시한다.Figure 3 shows a block diagram of the configuration of the server 130 according to various embodiments of the present invention.

도 3을 참고하면, 본 발명의 다양한 실시 예들에 따른 서버(130)는 메모리(131), 송수신부(132) 및 프로세서(133)를 포함한다. 서버(130)는 전자 장치의 일종일 수 있다.Referring to FIG. 3 , a server 130 according to various embodiments of the present disclosure includes a memory 131 , a transceiver 132 and a processor 133 . The server 130 may be a type of electronic device.

서버(130)는 유/무선 통신 네트워크(120)를 통하여 전자 장치(110)로부터 사용자의 학습 상태에 대한 얼굴 및 자세 정보를 포함하는 이미지 데이터를 수신한다. 서버(130)는 수신한 이미지 데이터를 수신한 이미지 데이터를 Mat 데이터로 변환하고, Mat 데이터를 감성인식 SDK 모듈에 입력하여 감성 및 집중상태 값으로 변환하고, 감성상태 및 집중상태 값을 제어 로직에 삽입하고, 상기 제어 로직에 연동되는 챗봇 메시지 UI를 호출한다. The server 130 receives image data including face and posture information about the learning state of the user from the electronic device 110 through the wired/wireless communication network 120 . The server 130 converts the received image data into Mat data, converts the Mat data into emotional and concentration values by inputting the Mat data into the emotion recognition SDK module, and converts the emotional state and concentration state values into control logic. and call the chatbot message UI linked to the control logic.

메모리(131)는, 송수신부(132)와 연결되고 통신을 통해 수신한 정보 등을 저장할 수 있다. 또한, 메모리(131)는, 프로세서(133)와 연결되고 프로세서(133)의 동작을 위한 기본 프로그램, 응용 프로그램, 설정 정보, 프로세서(133)의 연산에 의하여 생성된 정보 등의 데이터를 저장할 수 있다. 메모리(131)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 메모리(131)는 프로세서(133)의 요청에 따라 저장된 데이터를 제공할 수 있다. The memory 131 is connected to the transceiver 132 and may store information received through communication. In addition, the memory 131 is connected to the processor 133 and may store data such as a basic program for operation of the processor 133, an application program, setting information, and information generated by operation of the processor 133. . The memory 131 may include volatile memory, non-volatile memory, or a combination of volatile and non-volatile memories. Also, the memory 131 may provide stored data according to a request of the processor 133 .

송수신부(132)는, 프로세서(133)와 연결되고 신호를 전송 및/또는 수신한다. 송수신부(132)의 전부 또는 일부는 송신기(transmitter), 수신기(receiver), 또는 송수신기(transceiver)로 지칭될 수 있다. 송수신기(132)는 유선 접속 시스템 및 무선 접속 시스템들인 IEEE(institute of electrical and electronics engineers) 802.xx 시스템, IEEE Wi-Fi 시스템, 3GPP(3rd generation partnership project) 시스템, 3GPP LTE(long term evolution) 시스템, 3GPP 5G NR(new radio) 시스템, 3GPP2 시스템, 블루투스(bluetooth) 등 다양한 무선 통신 규격 중 적어도 하나를 지원할 수 있다.The transceiver 132 is connected to the processor 133 and transmits and/or receives signals. All or part of the transceiver 132 may be referred to as a transmitter, a receiver, or a transceiver. The transceiver 132 is a wired access system and a wireless access system, such as an institute of electrical and electronics engineers (IEEE) 802.xx system, an IEEE Wi-Fi system, a 3rd generation partnership project (3GPP) system, and a 3GPP long term evolution (LTE) system. , 3GPP 5G new radio (NR) system, 3GPP2 system, at least one of various wireless communication standards such as Bluetooth may be supported.

프로세서(133)는, 본 발명에서 제안한 절차 및/또는 방법들을 구현하도록 구성될 수 있다. 프로세서(133)는 이미지 데이터로부터 감성인식 SDK 모듈에 입력하여 감성 및 집중상태 값으로 변환하고, 감성상태 및 집중상태 값을 입력값으로 하는 제어 로직 등 서버(130)의 전반적인 동작들을 제어한다. 예를 들어, 프로세서(133)는 송수신부(132)를 통해 정보 등을 전송 또는 수신한다. 또한, 프로세서(133)는 메모리(131)에 데이터를 기록하고, 읽는다. 프로세서(135)는 적어도 하나의 프로세서(processor)를 포함할 수 있다.The processor 133 may be configured to implement the procedures and/or methods proposed in the present invention. The processor 133 converts image data into emotional and concentration values by inputting them to the emotion recognition SDK module, and controls overall operations of the server 130, such as control logic using the emotional state and concentration values as input values. For example, the processor 133 transmits or receives information or the like through the transceiver 132 . Also, the processor 133 writes data to and reads data from the memory 131 . The processor 135 may include at least one processor.

도 4는 본 발명의 다양한 실시 예들에 따른 감성 및 집중상태의 변환이 가능하도록 전자 장치에 의해 수행되는 교육 방법을 도시한다.4 illustrates an education method performed by an electronic device to enable conversion of emotions and concentration states according to various embodiments of the present disclosure.

도 4를 참조하면, 외부 전자장치로부터 이미지 데이터를 수신하는 단계(S100), 수신한 이미지 데이터를 Mat 데이터로 변환하는 단계(S200), 상기 Mat 데이터를 감성인식 SDK 모듈에 입력하여 감성 및 집중상태 값으로 변환하는 단계(S300), 상기 감성상태 및 집중상태 값을 제어 로직에 삽입하고, 상기 제어 로직에 연동되는 챗봇 메시지 UI를 호출하는 단계(S400)를 포함한다. Referring to FIG. 4, receiving image data from an external electronic device (S100), converting the received image data into Mat data (S200), inputting the Mat data to the emotion recognition SDK module to enter a state of emotion and concentration Converting into values (S300), inserting the emotional state and concentration state values into control logic, and calling a chatbot message UI linked to the control logic (S400).

단계 S100은, 사용자의 전자장치(110)으로부터 촬영되거나 실시간으로 전송되는 얼굴을 포함하는 이미지 데이터를 수신하는 단계이다. 본 발명에 따른 교육 방법은 사용자의 얼굴표정이 포함된 이미지 데이터를 바탕으로, 얼굴표정 기반 감성 및 집중상태를 도출하고 해당 감성 및 집중상태의 정량적 수치값에 따른 로직 제어를 통해 반응성 챗봇 UI를 사용자의 전자장치(110)로 제공함으로써 비대면 상황하에서의 사용자의 반응에 대응할 수 있는 보조교사 역할을 수행하는 것에 있다. 이에 따라, 이미지 데이터는 적어도 복수개의 시간 시점의 얼굴 표정이 확인될 수 있는 해상도 및 화질을 가지는 데이터일 것이 요구될 수 있다. 전술한 서버(130)는 유/무선 통신 네트워크(120)를 통해 전자장치(100)로부터 데이터를 수신할 수 있다.Step S100 is a step of receiving image data including a face captured or transmitted in real time from the user's electronic device 110 . The training method according to the present invention derives facial expression-based emotion and concentration state based on image data including the user's facial expression, and provides a reactive chatbot UI to the user through logic control according to quantitative numerical values of the corresponding emotion and concentration state. By providing the electronic device 110 of, it is to perform the role of an assistant teacher capable of responding to the user's reaction in a non-face-to-face situation. Accordingly, it may be required that the image data be data having a resolution and image quality by which facial expressions of at least a plurality of points in time can be confirmed. The aforementioned server 130 may receive data from the electronic device 100 through the wired/wireless communication network 120 .

단계 S200은, OpenCV 모듈 등을 통하여 수신한 이미지 데이터를 가공할 수 있다. 예를 들어, 본 단계에서는 이미지 데이터를 OpenCV 영상처리 라이브러리에서 다루는 데이터 형식 Mat(matrix) 형태로 변환할 수 있다. OpenCV는 오픈소스 컴퓨터 비전 C라이브러리의 약자로, 실시간 이미지 프로세싱에 활용되는 언어를 나타낸다. Step S200 may process image data received through an OpenCV module or the like. For example, in this step, image data can be converted into a Mat (matrix) format handled by the OpenCV image processing library. OpenCV stands for Open Source Computer Vision C Library, which represents a language used for real-time image processing.

Mat 데이터는 m by n 채널 구조의 데이터 형식을 나타내는 것으로, 기본 데이터(char, int)등의 변수 뿐 아니라, OpenCV에서 정의한 다양한 형식의 데이터를 포함시킬 수 있고, 그러한 데이터 양식은 복소수, 모노.컬러, 입체, 벡터, 군집, 텐서, 히스토그램을 모두 포함할 수 있다.Mat data represents the data format of m by n channel structure, and can include not only variables such as basic data (char, int), but also various types of data defined by OpenCV, such data formats include complex numbers, mono.color , solids, vectors, clusters, tensors, and histograms.

단계 S300은, 상기 Mat 데이터를 감성인식 SDK 모듈에 입력하여 감성 및 집중상태 값으로 변환하는 단계이다. 본 발명에 있어서, 사용자가 학습 중에 나타내고 모듈에서 인식하고자 하는 감성의 종류는 7개를 포함할 수 있다. 본 발명에 있어서, 사용자가 학습 중에 나타내는 집중 상태는 3개를 포함할 수 있다. Step S300 is a step of converting the Mat data into emotion and concentration values by inputting the Mat data to the emotion recognition SDK module. In the present invention, the types of emotions that the user wants to show during learning and recognize in the module may include seven. In the present invention, the concentration state indicated by the user during learning may include three states.

상기 감정상태는 합산이 1.0이 되는 복수의 감정 상태값을 포함하고, 상기 집중상태는 합산이 1.0이 되는 복수의 집중 상태값을 포함할 수 있다. 구체적으로, 감정 상태값 및 집중 상태값은 확률값을 나타내며, 해당 상태들의 합은 각각 항상 1.0을 만족해야 한다. 7개 감성 및 3개 집중상태는 확률값으로 산출되며, 7개 감성의 합은 항상 1.0, 3개 집중상태의 합 또한 항상 1.0의 확률값을 가질 수 있다. 즉, 추가 변수가 없는 닫혀진 변수 공간을 형성할 수 있다. The emotional state may include a plurality of emotional state values with a sum of 1.0, and the concentration state may include a plurality of concentration state values with a sum of 1.0. Specifically, the emotional state value and the concentration state value represent probability values, and the sum of the corresponding states must always satisfy 1.0. The 7 emotions and 3 concentration states are calculated as probability values, and the sum of the 7 emotions always has a probability value of 1.0, and the sum of the 3 concentration states also always has a probability value of 1.0. That is, it is possible to form a closed variable space without additional variables.

감성인식 SDK 모듈은 사용자 얼굴영상이미지를 input으로 얼굴표정 기반 감정을 판별할 수 있다. 또한, 모듈은 심장반응 정보를 산출하여 집중상태를 판별할 수 있다. The emotion recognition SDK module can discriminate emotions based on facial expression by inputting the user's facial image. In addition, the module may determine the concentration state by calculating cardiac response information.

감성인식 SDK 모듈은 소수점 형태의 확률 값으로 집중/몰입도 값을 반환하며, 해당 값을 바탕으로 감성의 인식/판별이 가능할 수 있다. The emotion recognition SDK module returns the concentration/immersion value as a probability value in the form of a decimal point, and based on the value, emotion recognition/discrimination may be possible.

도 7을 참조하면, 구체적으로 7개의 감성은 즐거움, 놀람, 두려움, 화남, 불쾌함, 슬픔, 덤덤함을 포함할 수 있다.Referring to FIG. 7 , specifically, the seven emotions may include joy, surprise, fear, anger, displeasure, sadness, and calmness.

구체적으로 3개의 집중상태는 보통, 집중, 몰입을 나타낼 수 있다.Specifically, the three concentration states may represent normal, concentration, and immersion.

단계 S400은, 감성상태 및 집중상태 값을 제어 로직에 삽입하는 단계일 수 있다. 제어 로직은 챗봇을 제어하여 감성상태 및 집중상태 값 또는 해당 값의 변화, 단위시간당 변화에 따라 사용자의 학습 효율을 고취하기 위한 적절한 반응이 출력되도록 제어하는 것일 수 있다.Step S400 may be a step of inserting emotional state and concentration state values into control logic. The control logic may control the chatbot to output an appropriate response to enhance the user's learning efficiency according to changes in emotional state and concentration state values or corresponding values, and changes per unit time.

제어 로직은 상기 감정 상태값이나 상기 집중 상태값이 기설정된 설정 수치에 도달하였을 때 상기 제어 로직은 대응되는 챗봇 메시지 UI를 호출한다. 더욱 구체적으로는 상기 제어 로직은 상기 기설정된 기준 시간 동안 평균수치로 환산하였을 때 상기 감정 상태 값이나 상기 집중 상태값이 기 설정된 설정 수치에 도달하였을 때 상기 제어 로직은 대응되는 챗봇 메시지 UI를 호출할 수 있다. 자세히는 도 6을 참조하여 이하 후술한다. The control logic calls a corresponding chatbot message UI when the emotional state value or the concentration state value reaches a preset value. More specifically, the control logic calls a corresponding chatbot message UI when the emotional state value or the concentration state value reaches a preset value when converted into an average value for the preset reference time. can Details will be described below with reference to FIG. 6 .

구체적으로, 도 6에 따른 제어로직을 살핀다. 사용자의 전자장치에서 제공된 이미지 데이터로부터 학습자의 얼굴을 인식한 뒤, 제1 뎁스로 감성(감정) 판단, 얼굴 감지, 집중 상태 판단으로 구분할 수 있다. 전술한 감성인식 SDK 모듈에서 도출된 학습자 감성상태 값 및 집중상태 값은 각기 감성 판단부, 집중 상태 판단부로 할당되고, 기초적인 얼굴 감지 데이터는 얼굴 감지부에 대응될 수 있다.Specifically, look at the control logic according to FIG. 6 . After recognizing the learner's face from the image data provided by the user's electronic device, the first depth can be divided into emotion (emotion) determination, face detection, and concentration state determination. The learner's emotional state value and concentration state value derived from the above-described emotion recognition SDK module are allocated to the emotion determination unit and the concentration state determination unit, respectively, and basic face detection data may correspond to the face detection unit.

감성 판단부에 대응되는 제2 뎁스 에서는 감성 판단부에 입력된 결과를 바탕으로 7가지 종류의 학습자 감성상태중 어느하나를 도출할 수 있다. 이러한 감성 판단부는 딥러닝, 인공지능 등의 머신러닝(기계학습) 모듈을 통해 해당 결과를 도출할 수 있으며 특정 기술에 한정되지 않는다. In the second depth corresponding to the emotion determination unit, one of seven types of emotional states of the learner may be derived based on the result input to the emotion determination unit. This emotion determination unit can derive a corresponding result through machine learning (machine learning) modules such as deep learning and artificial intelligence, and is not limited to a specific technology.

감성 판단부에 대응되는 제3 뎁스에서는 제2 뎁스에서 판단된 7가지 감성들에 대응되는 감성 상태값을 바탕으로 적절한 제어 로직을 산출할 수 있다. 예를 들어, 긍정적인 감성상태값이 장기간 동안 0단계(0.5 미만)에 해당하는 경우, 사용자의 관리를 위한 돌봄 챗봇 UI를 구동하도록 제어 로직이 활성화 될 수 있다. 예를 들어, 긍정적인 감성상태값이 장기간 동안 1단계(0.5 내지 0.75) 및 2단계(0.75 내지 1.0)인 경우 제어로직은 챗봇 UI가 질문에 따라 텍스트 인터페이스의 단답형 응답과 반응 화면, 활동 종료 및 화면 전환, 활동 종료 및 활동 추천 화면 등 다양한 타입으로 가이드 구현 하도록 제어될 수 있다. In the third depth corresponding to the emotion determination unit, an appropriate control logic may be calculated based on emotional state values corresponding to the seven emotions determined in the second depth. For example, when a positive emotional state value corresponds to level 0 (less than 0.5) for a long period of time, control logic may be activated to drive a care chatbot UI for user management. For example, if the positive emotional state value is 1st level (0.5 to 0.75) and 2nd level (0.75 to 1.0) for a long period of time, the control logic is such that the chatbot UI responds to the question in a text interface with short-answer responses and response screens, end of activity and It can be controlled to implement guides in various types such as screen switching, activity end, and activity recommendation screens.

얼굴 감지부에 대응되는 제2 뎁스에서는 추출한 얼굴 감지 데이터를 바탕으로, 자세 판단부와 자리 이탈 여부 판단부를 통해 자세의 적정성에 대한 결과와 자리 이탈 여부에 대한 판단을 도출할 수 있다.In the second depth corresponding to the face detection unit, based on the extracted face detection data, a result of the appropriateness of the posture and a determination of whether to leave the seat may be derived through the posture determination unit and the seat departure determination unit.

얼굴 감지부에 대응되는 제3 뎁스에서는 자세 판단부에서 도출된 자세 판단값이 적정 영역에서 도과할 경우, 챗봇 UI를 제어하여 대사 및 얼굴 위치 재설정 화면으로 가이드 구현하도록 사용자의 전자장치로 출력 신호를 송신할 수 있다.In the third depth corresponding to the face detection unit, when the posture judgment value derived from the posture determination unit exceeds the appropriate area, the output signal is sent to the user's electronic device to implement a guide to the dialog and face position reset screen by controlling the chatbot UI can be sent

자리 이탈 여부 판단부에 대응되는 제3 뎁스에서는 도출된 자리 이탈 여부 판단값이 적정 영역에서 도과할 경우, 챗봇 UI를 제어하여, 장시간 자리 비움인 경우 학습 중 음성 인터페이스를 통한 가이드 구현하도록 사용자의 전자장치로 출력 신호를 송신할 수 있다.In the third depth corresponding to the seat departure determination unit, when the derived seat departure judgment value exceeds the appropriate area, the chatbot UI is controlled, and the user's electronic interface is implemented to implement a guide through a voice interface during learning It can transmit an output signal to the device.

집중 상태 판단부에 대응되는 제2 뎁스에서는, 3가지 구분에 대응하는 보통 상태, 집중 상태, 몰입 상태로 구분할 수 있다. In the second depth corresponding to the concentration state determiner, it can be divided into a normal state, a concentration state, and an immersion state corresponding to three categories.

집중 상태 판단부에 대응되는 제3 뎁스에서는, 보통 상태가 기설정된 시간보다 장기간 연장될 경우, 챗봇 UI를 제어하여, 보통이 장시간 이어지는 경우 돌봄이 필요한 상태라고 판단하고, 학습 중 음성 인터페이스를 통한 가이드 구현 하도록 사용자의 전자장치로 출력 신호를 송신할 수 있다.In the third depth corresponding to the concentration state determination unit, if the normal state is extended for a longer period of time than a preset time, the chatbot UI is controlled, and if the normal state continues for a long time, it is determined that care is required, and guidance through a voice interface during learning An output signal can be sent to the user's electronic device to implement.

집중 및 몰입 상태가 기설정된 수치보다 단위 시간 당 하락하는 경우, 제어로직은 챗봇 UI가 질문에 따라 텍스트 인터페이스의 단답형 응답과 반응 화면, 활동 종료 및 화면 전환, 활동 종료 및 활동 추천 화면 등 다양한 타입으로 가이드 구현 하도록 제어될 수 있다.If the concentration and immersion state decreases per unit time than the preset value, the control logic is such that the chatbot UI responds to questions in various types such as short-answer responses and response screens in the text interface, activity end and screen transition, activity end and activity recommendation screen, etc. Guides can be controlled to implement.

전술한 바와 같이, 제어 로직의 3단계 뎁스 구조에서 제어로직이 사용자의 감성/집중상태 값을 바탕으로 학습 효율이 저하되거나, 또는 자리를 비우거나, 변화가 필요한 시점이라고 판단되는 경우, 제어 로직의 제어에 따라 챗봇 메시지 UI가 호출되어 사용자의 전자 장치에서 제어되도록 신호가 송출될 수 있다.As described above, in the 3-step depth structure of the control logic, when the control logic determines that the learning efficiency is reduced based on the user's emotion/concentration value, the seat is empty, or a change is required, the control logic According to the control, a chatbot message UI may be called and a signal may be transmitted to be controlled in the user's electronic device.

요컨대, 본 방법(S400)은 감성 도출결과에 있어서 긍정 감성 상태값이 기설정된 시간 동안 기준치 미만인 경우나, 집중 도출결과에 있어서 집중 상태값이 기설정된 시간 동안 기준치 미만인 경우, 돌봄 챗봇 메시지 UI를 호출할 수 있다. In short, the present method (S400) calls the care chatbot message UI when the positive emotional state value in the emotion derivation result is less than the reference value for a preset time or when the concentration state value is less than the reference value for a preset time in the concentration derivation result can do.

요컨대, 본 방법(S400)은 감성 도출결과에 있어서 긍정 감성 상태값이 단위시간동안 마이너스 변화량이 크거나, 집중 도출결과에 있어서 집중 상태값이 단위시간 동안 마이너스 변화량이 큰 경우, 가이드 챗봇 메시지 UI를 호출할 수 있다. In short, in the present method (S400), when the positive emotional state value in the emotion derivation result has a large negative change amount during the unit time, or the concentration state value in the concentration deduction result has a large negative change amount during the unit time, the guide chatbot message UI is displayed. can be called

본 방법은 챗봇 메시지 UI 호출 이후의 UI 발현시점 전 후의 사용자 상태 변화 산출하는 단계(S500)를 포함한다. The method includes a step of calculating a user state change before and after the UI appearance time after calling the chatbot message UI (S500).

단계 S500는 챗봇 메시지 UI 호출 이후의 사용자의 감성 및/또는 집중 상태 변화를 산출하고 챗봇 멧세지 UI의 효율성을 피드백 판단하는 단계일 수 있다. 구체적으로는, Step S500 may be a step of calculating a change in the user's emotion and/or concentration state after calling the chatbot message UI and determining the effectiveness of the chatbot message UI as feedback. Specifically,

감성 상태값의 변화, 집중 상태값의 변화, 또는 감성 상태 카테고리의 변화, 집중 상태 카테고리의 변화 중 적어도 하나의 변화에 따라 According to a change in at least one of a change in emotional state value, a change in concentration state value, a change in emotional state category, or a change in concentration state category

상기 감성 상태가 제1 상태에서 제2 상태로 변화된 경우에 해당 제어 로직에 대응하는 챗봇 메시지 UI 를 피드백하는 단계를 포함할 수 있다.The method may include feeding back a chatbot message UI corresponding to a corresponding control logic when the emotional state changes from the first state to the second state.

상기 제1 상태가 부정 감성이고, 상기 제2 상태가 긍정 감성인 경우, 상기 제어 로직에 대응하는 챗봇 메시지 UI에 강화 피드백(reinforcement feedback)을 수행하고,When the first state is negative emotion and the second state is positive emotion, performing reinforcement feedback on the chatbot message UI corresponding to the control logic;

상기 제1 상태가 긍정 감성이고, 상기 제2 상태가 부정 감성인 경우, 상기 제어 로직에 대응하는 챗봇 메시지 UI에 부정 피드백(negative reinforcement)을 수행하는 단계일 수 있다.When the first state is positive emotion and the second state is negative emotion, performing negative reinforcement on the chatbot message UI corresponding to the control logic.

또한 본 발명은, 챗봇 메시지 UI는 STT/TTS API와 통신하여 외부 전자장치에 음성 데이터를 송출하는 단계(미도시)을 더 포함할 수 있다. In addition, the present invention, the chatbot message UI may further include a step (not shown) of transmitting voice data to an external electronic device by communicating with the STT/TTS API.

도 5는 본 발명의 다양한 실시 예들에 따른 기계 학습 모델의 구조를 도시한다.5 illustrates the structure of a machine learning model according to various embodiments of the present invention.

본 발명의 다양한 실시 예들에 따른 생체 정보의 기계 학습 분석에 기반하여 컨텐츠를 제공하기 위한 다층 인공 신경망(multi-layer perceptron, MLP)의 구조를 도시한다.The structure of a multi-layer perceptron (MLP) for providing content based on machine learning analysis of biometric information according to various embodiments of the present invention is shown.

심층 학습(deep learning)은 최근 기계 학습 분야에서 대두되고 있는 기술 중 하나로써, 복수 개의 은닉 계층(hidden layer)과 이들에 포함되는 복수 개의 유닛(hidden unit)으로 구성되는 신경망(neural network)이다. 심층 학습 모델에 기본 특성(low level feature)들을 입력하는 경우, 이러한 기본 특성들이 복수 개의 은닉 계층을 통과하면서 예측하고자 하는 문제를 보다 잘 설명할 수 있는 상위 레벨 특성(high level feature)로 변형된다. 이러한 과정에서 전문가의 사전 지식 또는 직관이 요구되지 않기 때문에 특성 추출에서의 주관적 요인을 제거할 수 있으며, 보다 높은 일반화 능력을 갖는 모델을 개발할 수 있게 된다. 나아가, 심층 학습의 경우 특징 추출과 모델 구축이 하나의 세트로 구성되어 있기 때문에 기존의 기계학습 이론들 대비 보다 단순한 과정을 통하여 최종 모델을 형성할 수 있다는 장점이 있다.Deep learning, as one of the emerging technologies in the field of machine learning, is a neural network composed of a plurality of hidden layers and a plurality of hidden units included in them. When low-level features are input to a deep learning model, these basic features are transformed into high-level features that can better explain the problem to be predicted while passing through a plurality of hidden layers. In this process, since prior knowledge or intuition of an expert is not required, subjective factors in feature extraction can be removed, and a model with higher generalization ability can be developed. Furthermore, in the case of deep learning, since feature extraction and model construction are composed of one set, there is an advantage in that the final model can be formed through a simpler process compared to existing machine learning theories.

다층 인공 신경망(multi-layer perceptron, MLP)은 심층 학습에 기반하여 여러 개의 노드가 있는 인공 신경망(artificial neural network, ANN)의 한 종류이다. 각 노드는 동물의 연결 패턴과 유사한 뉴런으로 비선형 활성화 기능을 사용한다. 이 비선형 성질은 분리할 수 없는 데이터를 선형적으로 구분할 수 있게 한다.A multi-layer perceptron (MLP) is a type of artificial neural network (ANN) with multiple nodes based on deep learning. Each node uses a non-linear activation function with neurons similar to animal connection patterns. This nonlinear property makes it possible to linearly distinguish inseparable data.

도 5를 참고하면, 본 발명의 다양한 실시 예들에 따른 MLP 모델의 인공 신경망(500)은 하나 이상의 입력 계층(input layer)(510), 복수 개의 은닉 계층(hidden layer)(530), 하나 이상의 출력 계층(output layer)(550)으로 구성된다. Referring to FIG. 5 , the artificial neural network 500 of the MLP model according to various embodiments of the present invention includes one or more input layers 510, a plurality of hidden layers 530, and one or more outputs. It consists of an output layer (550).

입력 계층(510)의 노드에는 단위 시간별 적어도 하나의 초음파 이미지 내 각각의 픽셀의 RGB 값과 같은 입력 데이터가 입력된다. 여기서, 사용자의 생체 정보, 예를 들어, 심전도 정보, 집중도 수치, 행복 감정 강도의 정보, 및, 조정 컨텐츠의 정보, 예를 들어, 컨텐츠 장르, 컨텐츠 주제, 컨텐츠 채널의 정보 각각(511)은 심층 학습 모델의 기본 특성(low level feature)에 해당한다.Input data such as an RGB value of each pixel in at least one ultrasound image per unit time is input to a node of the input layer 510 . Here, the user's biometric information, eg, electrocardiogram information, concentration level, happiness emotion intensity information, and adjusted content information, eg, content genre, content topic, and content channel information 511 are deep It corresponds to the basic characteristic (low level feature) of the learning model.

은닉 계층(530)의 노드에서는 입력된 인자들에 기초한 계산이 이루어진다. 은닉 계층(530)은 사용자의 생체 정보 및 조정 컨텐츠의 정보(511)를 규합시켜 형성된 복수 개의 노드로 정의되는 유닛들이 저장된 계층이다. 은닉 계층(530)은 도 5에 도시된 바와 같이 복수 개의 은닉 계층으로 구성될 수 있다. A node of the hidden layer 530 performs calculations based on input factors. The hidden layer 530 is a layer in which units defined by a plurality of nodes formed by integrating the user's biometric information and the information 511 of the adjusted content are stored. As shown in FIG. 5 , the hidden layer 530 may include a plurality of hidden layers.

예를 들어, 은닉 계층(530)이 제1 은닉 계층(531) 및 제2 은닉 계층(533)으로 구성될 경우, 제1 은닉 계층(531)은 사용자의 생체 정보 및 조정 컨텐츠의 정보(511)를 규합시켜 형성된 복수 개의 노드로 정의되는 제1 유닛들(532)이 저장되는 계층으로서, 제1 유닛(532)은 사용자의 생체 정보 및 조정 컨텐츠의 정보(511)의 상위 특징에 해당된다. 제2 은닉 계층(533)은 제1 은닉 계층(531)의 제1 유닛들을 규합시켜 형성된 복수 개의 노드로 정의되는 제2 유닛들(534)이 저장되는 계층으로, 제2 유닛(534)은 제1 유닛(532)의 상위 특징에 해당된다.For example, when the hidden layer 530 is composed of a first hidden layer 531 and a second hidden layer 533, the first hidden layer 531 includes user biometric information and adjusted content information 511 As a layer in which first units 532 defined by a plurality of nodes formed by consolidating are stored, the first unit 532 corresponds to a higher characteristic of the user's biometric information and the information 511 of the adjusted content. The second hidden layer 533 is a layer in which second units 534, defined as a plurality of nodes formed by consolidating the first units of the first hidden layer 531, are stored. Corresponds to the upper characteristics of 1 unit 532.

출력 계층(550)의 노드에서는 계산된 예측 결과를 나타낸다. 출력 계층(550)에는 복수 개의 예측 결과 유닛들(551)이 구비될 수 있다. 구체적으로 복수 개의 예측 결과 유닛들(551)은 참(true) 유닛 및 거짓(false) 유닛의 두 개의 유닛들로 구성될 수 있다. 구체적으로, 참 유닛은 조정 컨텐츠로의 컨텐츠 조정 후 사용자의 얼굴 데이터 중 감성 상태값과 집중 상태 값이 임계 수치보다 높을 가능성이 높다는 의미를 지닌 예측 결과 유닛이고, 거짓 유닛은 조정 컨텐츠로의 컨텐츠 조정 후 사용자의 얼굴 데이터 중 감성 상태값과 집중 상태 값이 임계 수치보다 높을 가능성이 낮다는 의미를 지닌 예측 결과 유닛이다.Nodes of the output layer 550 represent calculated prediction results. The output layer 550 may include a plurality of prediction result units 551 . Specifically, the plurality of prediction result units 551 may include two units of a true unit and a false unit. Specifically, the true unit is a prediction result unit meaning that the emotional state value and concentration state value of the user's face data are highly likely to be higher than the threshold value after adjusting the content to the adjusted content, and the false unit is the content adjustment to the adjusted content. It is a prediction result unit that means that the possibility that the emotion state value and the concentration state value of the user's face data are higher than the threshold value is low.

은닉 계층(530) 중 마지막 계층인 제2 은닉 계층(533)에 포함된 제2 유닛들(534)과 예측 결과 유닛들(551) 간의 연결에 대하여 각각의 가중치들이 부여되게 된다. 이러한 가중치에 기초하여 조정 컨텐츠로의 컨텐츠 조정 후 사용자의 얼굴 데이터 중 감성 상태값과 집중 상태값이 임계 수치 이상일지 여부를 예측하게 된다.Weights are assigned to connections between the prediction result units 551 and the second units 534 included in the second hidden layer 533, which is the last layer among the hidden layers 530. Based on these weights, it is predicted whether the emotional state value and the concentration state value of the user's face data are greater than or equal to a threshold value after adjusting the content to the adjusted content.

MLP 모델의 인공 신경망(500)은 학습 파라미터들을 조정하여 학습한다. 일 실시 예에 따라서, 학습 파라미터들은 가중치 및 편차 중 적어도 하나를 포함한다. 학습 파라미터들은 기울기 하강법(gradient descent)이라는 최적화 알고리즘을 통해 반복적으로 조정된다. 주어진 데이터 샘플로부터 예측 결과가 계산될 때마다(순방향 전파, forward propagation), 예측 오류를 측정하는 손실 함수를 통해 네트워크의 성능이 평가된다. 인공 신경망(500)의 각 학습 파라미터는 손실 함수의 값을 최소화하는 방향으로 조금식 증가하여 조정되는데, 이 과정은 역 전파(back-propagation)라고 한다.The artificial neural network 500 of the MLP model learns by adjusting learning parameters. According to one embodiment, the learning parameters include at least one of a weight and a variance. The learning parameters are iteratively adjusted through an optimization algorithm called gradient descent. Each time a prediction result is computed from a given data sample (forward propagation), the performance of the network is evaluated through a loss function that measures the prediction error. Each learning parameter of the artificial neural network 500 is adjusted by gradually increasing in the direction of minimizing the value of the loss function, and this process is called back-propagation.

도 7을 참조하면, 본 발명에 따른 ai 딥러닝 기술 기반의 표정 인식 딥러닝 모듈은 한국 유초등생(5~13세)의 이미지 데이터셋을 입력 데이터로 하여 학습된 모델이다. 도 7은 본 발명에 따른 딥러닝 모듈의 아키텍쳐 및 학습 설명을 위한 예시적 도면이다.Referring to FIG. 7 , the facial expression recognition deep learning module based on ai deep learning technology according to the present invention is a model learned using an image dataset of Korean elementary school students (5 to 13 years old) as input data. 7 is an exemplary diagram for explaining the architecture and learning of a deep learning module according to the present invention.

도 7을 참조하면, 5천개~ 1만개 내지 한국 유초등생의 이미지 데이터 셋을 기설정된 해상도 및 크기로 전처리 가공을 수행할 수 있다.Referring to FIG. 7 , pre-processing may be performed on an image data set of 5,000 to 10,000 or Korean elementary school students at a predetermined resolution and size.

이후, 전처리 가공된 이미지 데이터셋을 제1 컨벌루션 네트워크에 삽입하고, 이때 제1 컨벌루션 네트워크의 채널 수는 n by n 으로 결정될 수 있으며, 예를 들어, n은 예시적으로 24일 수 있다.Thereafter, the preprocessed image dataset is inserted into a first convolutional network, and at this time, the number of channels of the first convolutional network may be determined as n by n, where n may be 24 by way of example.

이후, 제1 컨벌루션을 맥스 풀링(max-pooling)하여,오버 피팅(overfitting)을 방지하고, 데이터 셋중의 최대값을 뽑아내어 네트워크의 크기를 감소시킬 수 있다. 예를 들어, 맥스 풀링된 이후의 네트워크는 n/2 by n/2 로 결정될 수 있으며, 예시적으로 12 by 12 네트워크 일 수 있으나 이에 한정되는 것은 아니다.Thereafter, max-pooling is performed on the first convolution to prevent overfitting, and the size of the network can be reduced by extracting the maximum value from the data set. For example, the network after max pooling may be determined as n/2 by n/2, and may be illustratively a 12 by 12 network, but is not limited thereto.

이후 맥스 풀링된 데이터를 제2 컨벌루션 네트워크에 삽입할 수 있다. 이때, 제2 컨벌루션 네트워크의 채널수는 m by m으로 결정될 수 있으며, 예를 들어, m은 예시적으로 8일 수 있으나 이에 한정되는 것은 아니다.Thereafter, the max-pooled data may be inserted into a second convolutional network. In this case, the number of channels of the second convolutional network may be determined by m by m, for example, m may be 8 by way of example, but is not limited thereto.

제2 컨벌루션을 재차 맥스 풀링할 수 있다. The second convolution can be max-pooled again.

이후 최종 네트워크를 정류한 선형 유닛(Rectified Linear Unit;ReLu)를 Active Function으로 하여 딥러닝 학습을 위한 뉴럴 네트워크의 전단에 삽입할 수 이다. 이러한 학습을 통해 딥러닝은 입력된 이미지가 복수개의 감정 상태 또는 복수개의 집중 상태 중 어느 하나의 결과값과 대응되는지를 도출하도록 학습될 수 있다.Afterwards, the Rectified Linear Unit (ReLu), which rectifies the final network, can be inserted into the front of the neural network for deep learning learning as an active function. Through this learning, deep learning can be learned to derive whether an input image corresponds to any one of a plurality of emotional states or a plurality of concentration states.

본 발명은 이러한 한국 유초등생의 이미지를 전이학습 아키텍쳐를 통해 딥러닝 학습을 시킴으로써 한국인 학생들의 감성 상태, 학습 상태, 집중 상태에 대한 정량적 지표를 도출할 수 있다.The present invention can derive quantitative indicators for the emotional state, learning state, and concentration state of Korean students by deep learning through the transfer learning architecture of these images of Korean elementary school students.

도 8을 참조하면, 본 발명에 따른 감성인식 도출 모델은 다음의 세가지 세부 모델로 구성될 수 있다. Referring to FIG. 8 , the emotion recognition derivation model according to the present invention may be composed of the following three detailed models.

제1 감정 인식 모델은 입력 받은 이미지로부터 얼굴 검출 후 전술한 ai 딥러닝 모듈을 적용하여 표정 분석을 수행하고 3가지 감정 유형(긍정, 부정, 중립) 각각의 확률값을 산출할 수 있다. 상기 확률값을 바탕으로 정량적 감정지수가 도출될 수 있다.The first emotion recognition model may perform facial expression analysis by applying the ai deep learning module after detecting a face from an input image, and calculate probability values for each of three emotion types (positive, negative, and neutral). A quantitative emotional index may be derived based on the probability value.

제2 감정 인식 모델은 입력 받은 이미지로부터 얼굴 검출 후 전술한 ai 딥러닝 모듈을 적용하여 표정 분석을 수행하고 7가지 감정 유형(즐거움, 놀람, 슬픔, 화남, 두려움, 불쾌함, 덤덤함) 각각의 확률값을 산출할 수 있다. 상기 확률값을 바탕으로 정량적 감정지수가 도출될 수 있다. 제1 감정 인식 모델과 제2 감정 인식 모델은 택일적으로 적용되거나 양 모델이 상호보완적으로 적용되어 보다 정밀한 감정지수의 산출이 가능할 수 있다.The second emotion recognition model performs facial expression analysis by applying the above-described ai deep learning module after detecting a face from an input image, and probability values for each of the seven emotion types (joy, surprise, sadness, anger, fear, displeasure, calmness) can be calculated. A quantitative emotional index may be derived based on the probability value. The first emotion recognition model and the second emotion recognition model may be applied alternatively or complementary to each other, so that a more precise emotion index may be calculated.

집중도 인식 모델은, 입력 이미지로부터 얼굴 검출 후 심박수와 심박변이도를 도출할 수 있다. 도출된 심박변이도 데이터에 rPPG(원격 광혈류측정) 기술을 적용함으로써 심박수와 심박변이도를 측정하고 분석할 수 있다. 집중도 인식 모델은 심박변이도의 분석 결과를 토대로 3단계 집중 상태(보통→집중 →몰입) 각각의 확률값이 산출하여 이것을 집중 상태 지표를 도출할 수 있다. 집중 상태 지표를 바탕으로 집중 지수를 도출할 수 있다. 집중 상태 지표는 학습지수를 도출하기 위한 기초 수치로 활용될 수 있다.The concentration recognition model may derive heart rate and heart rate variability after face detection from an input image. By applying rPPG (remote photoplethysmography) technology to the derived heart rate variability data, heart rate and heart rate variability can be measured and analyzed. Based on the heart rate variability analysis result, the concentration recognition model calculates a probability value for each of the three stages of concentration (normal → concentration → immersion), and derives the concentration state index. A concentration index can be derived based on the concentration state indicator. The concentration state index can be used as a basic value for deriving a learning index.

최종적으로 감성인식 도출 모델은 전술한 제1 감정 인식 모델, 제2 감정 인식 모델, 집중도 인식 모델에서 도출된 정량적 지표 값을 바탕으로 학습지수를 도출할 수 있다. 학습지수는 사용자의 학습 상태를 정량적으로 파악하기 위한 지표를 의미하며, 학생 데이터 및 학습 상황에서 발생하는 데이터, 예를 들어, 학습 시간, 질문 수, 자리이탈 상태 등을 추가적으로 활용하여 보다 정밀한 산출이 가능할 수 있다.Finally, the emotion recognition derivation model may derive a learning index based on quantitative index values derived from the first emotion recognition model, the second emotion recognition model, and the concentration recognition model. The learning index is an indicator for quantitatively grasping the user's learning status, and more precise calculation is possible by additionally utilizing student data and data generated in the learning situation, such as learning time, number of questions, and absentee status. It could be possible.

구체적으로는 도 10을 참조하면, 본 발명에 따른 교육 방법은, 외부 전자장치로부터 이미지 데이터를 수신하는 단계(S110), 수신한 이미지 데이터에서 얼굴 데이터를 검출하는 단계(S210), 상기 얼굴 데이터를 표정 인식 기술 기반의 감정 인식 모델에 입력하여 감정 지표를 획득하는 단계(S310) 및 상기 얼굴 데이터를 광혈류량 변화량 기반의 rPPG 모델에 입력하여 집중 지표를 획득하는 단계(S410) 및 집중 지표와 상기 감정 지표를 바탕으로 학습 지수를 산출하는 단계(S510)을 포함할 수 있다. Specifically, referring to FIG. 10 , the education method according to the present invention includes receiving image data from an external electronic device (S110), detecting face data from the received image data (S210), and Obtaining an emotion index by inputting it into an emotion recognition model based on facial expression recognition technology (S310), and acquiring a concentration index by inputting the face data into an rPPG model based on light blood flow change (S410), and obtaining a concentration index and the emotion index A step of calculating a learning index based on the index (S510) may be included.

단계(S110)은, 사용자의 전자장치(110)으로부터 촬영되거나 실시간으로 전송되는 얼굴을 포함하는 이미지 데이터를 수신하는 단계이다. Step S110 is a step of receiving image data including a face captured or transmitted in real time from the user's electronic device 110 .

단계(S210)은, 수신한 이미지 데이터에서 사용자의 얼굴 표정을 추출하여 얼굴표정 기반 감성 및 집중상태를 도출하기 위한 데이터 셋을 추출하는 단계이다. Step S210 is a step of extracting a user's facial expression from the received image data and extracting a data set for deriving emotion and concentration based on the facial expression.

단계(S310)은, 제1 감정 인시 모듈 및 제2 감정 인식 모듈을 바탕으로 3가지 감정 유형(긍정, 부정, 중립) 각각의 확률값을 산출하거나 7가지 감정 유형(즐거움, 놀람, 슬픔, 화남, 두려움, 불쾌함, 덤덤함) 각각의 확률값을 산출하는 단계일 수 있다.In step S310, based on the first emotion recognition module and the second emotion recognition module, probability values for each of three emotion types (positive, negative, neutral) are calculated or seven emotion types (joy, surprise, sadness, anger, Fear, displeasure, calmness) may be a step of calculating each probability value.

단계(S410)은, 입력 이미지로부터 얼굴 검출 후 심박수와 심박변이도를 도출할 수 있다. 도출된 심박변이도 데이터에 rPPG(원격 광혈류측정) 기술을 적용함으로써 심박수와 심박변이도를 측정하고 분석하는 단계이다.In step S410, heart rate and heart rate variability may be derived after face detection from the input image. This step measures and analyzes heart rate and heart rate variability by applying rPPG (remote photoplethysmography) technology to the derived heart rate variability data.

단계(S510)은, 집중 지표와 상기 감정 지표를 바탕으로 학습 지수를 산출하는 단계이다.Step S510 is a step of calculating a learning index based on the concentration index and the emotion index.

감성 지수와 학습 지표 등은 데이터베이스로 구축될 수 있다. 데이터베이스는 감성 지수와 학습 지표를 시계열 형태로 저장하여 종합적인 분석이 가능할 수 있다. Emotional index and learning index may be built into a database. The database may store the emotional index and learning index in the form of a time series to enable comprehensive analysis.

도 11을 참조하면, 하기에는 딥러닝 모듈이외에도 감성 지수 산출에 필요한 추가 지표에 대한 예시가 기술된다.Referring to FIG. 11 , an example of an additional index necessary for calculating the emotional index in addition to the deep learning module will be described below.

하드웨어를 이용하여 본 발명의 실시 예를 구현하는 경우에는, 본 발명을 수행하도록 구성된 ASICs(application specific integrated circuits) 또는 DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays) 등이 본 발명의 프로세서에 구비될 수 있다.In the case of implementing the embodiment of the present invention using hardware, ASICs (application specific integrated circuits) or DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices) configured to perform the present invention , FPGAs (field programmable gate arrays), etc. may be provided in the processor of the present invention.

한편, 상술한 방법은, 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터 판독 가능 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터 판독 가능한 저장 매체에 여러 수단을 통하여 기록될 수 있다. 본 발명의 다양한 방법들을 수행하기 위한 실행 가능한 컴퓨터 코드를 포함하는 저장 디바이스를 설명하기 위해 사용될 수 있는 프로그램 저장 디바이스들은, 반송파(carrier waves)나 신호들과 같이 일시적인 대상들은 포함하는 것으로 이해되지는 않아야 한다. 상기 컴퓨터 판독 가능한 저장 매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, DVD 등)와 같은 저장 매체를 포함한다.Meanwhile, the above-described method can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable medium. In addition, the structure of data used in the above-described method may be recorded on a computer-readable storage medium through various means. Program storage devices, which may be used to describe a storage device containing executable computer code for performing various methods of the present invention, should not be construed as including transitory objects such as carrier waves or signals. do. The computer-readable storage media includes storage media such as magnetic storage media (eg, ROM, floppy disk, hard disk, etc.) and optical reading media (eg, CD-ROM, DVD, etc.).

이상에서 설명된 실시 예들은 본 발명의 구성요소들과 특징들이 소정 형태로 결합된 것들이다. 각 구성요소 또는 특징은 별도의 명시적 언급이 없는 한 선택적인 것으로 고려되어야 한다. 각 구성요소 또는 특징은 다른 구성요소나 특징과 결합되지 않은 형태로 실시될 수 있다. 또한, 일부 구성요소들 및/또는 특징들을 결합하여 본 발명의 실시 예를 구성하는 것도 가능하다. 발명의 실시 예들에서 설명되는 동작들의 순서는 변경될 수 있다. 어느 실시 예의 일부 구성이나 특징은 다른 실시 예에 포함될 수 있고, 또는 다른 실시 예의 대응하는 구성 또는 특징과 교체될 수 있다. 특허청구범위에서 명시적인 인용 관계가 있지 않은 청구항들을 결합하여 실시 예를 구성하거나 출원 후의 보정에 의해 새로운 청구항으로 포함시킬 수 있음은 자명하다.The embodiments described above are those in which elements and features of the present invention are combined in a predetermined form. Each component or feature should be considered optional unless explicitly stated otherwise. Each component or feature may be implemented in a form not combined with other components or features. In addition, it is also possible to configure an embodiment of the present invention by combining some elements and/or features. The order of operations described in the embodiments of the invention may be changed. Some components or features of one embodiment may be included in another embodiment, or may be replaced with corresponding components or features of another embodiment. It is obvious that claims that do not have an explicit citation relationship in the claims can be combined to form an embodiment or can be included as new claims by amendment after filing.

본 발명이 본 발명의 기술적 사상 및 본질적인 특징을 벗어나지 않고 다른 형태로 구체화될 수 있음은 본 발명이 속한 분야 통상의 기술자에게 명백할 것이다. 따라서, 상기 실시 예는 제한적인 것이 아니라 예시적인 모든 관점에서 고려되어야 한다. 본 발명의 권리범위는 첨부된 청구항의 합리적 해석 및 본 발명의 균등한 범위 내 가능한 모든 변화에 의하여 결정되어야 한다.It will be clear to those skilled in the art that the present invention can be embodied in other forms without departing from the technical spirit and essential characteristics of the present invention. Accordingly, the above embodiments should be considered in all respects as illustrative rather than restrictive. The scope of the present invention should be determined by reasonable interpretation of the appended claims and all possible changes within the equivalent scope of the present invention.

100: 사용자 110: 전자 장치
111: 메모리 112: 송수신부
113: 프로세서 114: 카메라
115: 녹음 장치 116: 출력 장치
120: 유/무선 통신 네트워크 130: 서버
131: 메모리 132: 송수신부
133: 프로세서 500: 인공 신경망
510: 입력 계층 511: 입력 정보
530: 은닉 계층 531: 제1 은닉 계층
532: 제1 유닛 533: 제2 은닉 계층
534: 제2 유닛 550: 출력 계층
551: 예측 결과 유닛100: user 110: electronic device
111: memory 112: transceiver
113: processor 114: camera
115: recording device 116: output device
120: wired/wireless communication network 130: server
131: memory 132: transceiver
133: processor 500: artificial neural network
510: input layer 511: input information
530: hidden layer 531: first hidden layer
532: first unit 533: second hidden layer
534: second unit 550: output layer
551: prediction result unit

Claims

In the image education method based on the change of emotion and concentration state performed by a computing device,
Receiving image data from an external electronic device;
detecting face data from the received image data;
obtaining an emotion index by inputting the facial data to an emotion recognition model based on facial expression recognition technology; and
and acquiring a concentration index by inputting the face data into an rPPG model based on the amount of change in optical blood flow.

According to claim 1,
The emotion recognition model includes a first emotion recognition model that determines any one of positive, negative, and neutral emotion types and calculates each probability value.

According to claim 2,
The emotion recognition model includes a second emotion recognition model that determines any one emotion type among joy, surprise, sadness, anger, fear, displeasure, and calmness and calculates each probability value.

According to claim 1,
The rPPG model derives heart rate and heart rate variability after face detection from the input image, determines the concentration state in normal, concentration, and immersion stages, and calculates each probability value.

According to claim 1,
The emotion recognition model,
An educational method characterized in that it is learned based on input data of Korean students and elementary school students.

According to claim 1,
The emotion recognition model,
Generate network data through convolutional networks and max pooling,
A training method characterized in that deep learning is performed by using a Rectified Linear Unit (ReLu) obtained by rectifying the generated network data as an activation function.

According to claim 1,
Calculating a learning index based on the concentration index and the emotion index; education method characterized in that it further comprises.

In electronic devices,
a memory, a transceiver and at least one processor;
The at least one processor is configured to perform the education method according to any one of claims 1 to 7,
educational device.

A computer program recorded on a computer readable storage medium configured to perform the education method according to any one of claims 1 to 7 through an electronic device.