KR20230076733A

KR20230076733A - English speaking teaching method using interactive artificial intelligence avatar based on emotion and memory, device and system therefor

Info

Publication number: KR20230076733A
Application number: KR1020220082017A
Authority: KR
Inventors: 임재원; 정종현
Original assignee: 주식회사 유나이티드어소시에이츠
Priority date: 2021-11-22
Filing date: 2022-07-04
Publication date: 2023-05-31
Also published as: KR102644992B1; KR20230076734A; KR102418558B1

Abstract

According to one embodiment of the present invention, an emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprises the steps of: receiving a user input including a first English sentence; performing a natural language understanding operation for the first English sentence; determining whether there is a preset intention classification matching a conversation intention included in the first English sentence as a result of performing the natural language understanding operation; generating at least one first natural language based on the first English sentence when there is no preset intent classification matching the conversation intent; calling a template associated with the preset intent classification when there is the preset intent classification matching the conversation intention and generating at least one second natural language based on the called template; extracting an embedding value from the at least one first or second natural language and extracting the first or second natural language based on the extracted embedding value; and outputting the extracted first or second natural language as a second English sentence in response to the first English sentence. The present invention has the effect of improving an English-speaking learning effect by promoting natural and smooth interaction/communication.

Description

English speaking teaching method using interactive artificial intelligence avatar based on emotion and memory, device and system therefor}

본 명세서는 감정 및 기억 기반의 대화형 인공지능 아바타 영어 말하기 교육 방법, 장치 및 이에 대한 시스템을 제안한다. The present specification proposes an emotion- and memory-based interactive artificial intelligence avatar English speaking education method, apparatus, and system therefor.

통신 및 교통의 발달로 해외 교류가 더욱 활발해지면서, 영어 학습의 중요성은 날로 증가하고 있다. 우리나라의 경우, 초등학교에서부터 교과서와 멀티 미디어 학습 자료 등을 활용하여 학습자가 영어를 접하고 학습할 수 있는 기회를 다양하게 제공하고 있다. 특히, 2015 개정 교육 과정에서는 초등학교에서부터 일상생활에서 영어를 자연스럽게 사용할 수 있도록 기본적인 영어를 이해하고 표현하는 능력을 기르는 것을 목표로, 말하기와 듣기 영역을 강조하고 있다.As overseas exchanges become more active due to the development of communication and transportation, the importance of learning English is increasing day by day. In Korea, various opportunities are provided for learners to encounter and learn English by utilizing textbooks and multi-media learning materials, starting from elementary school. In particular, the 2015 revised curriculum emphasizes the areas of speaking and listening, with the goal of developing the ability to understand and express basic English so that English can be used naturally in everyday life from elementary school.

그러나, 많은 영어 학습자의 경우 다른 영역에 비해 말하기 영역은 수준이 낮은 편이며, 이는 우리나라 영어 교육의 고질적인 문제로 언급되고 있다. 듣기, 읽기, 쓰기를 주로 다루는 영어 시험인 토익(TOEIC) 시험에서 2020년 기준 우리나라의 평균 성적은 세계 49개국 중 상위권인 17위, 아시아 3위로 전 세계에서 상위권에 속한다. 반면, ETS의 '2018년 전 세계 토플 성적 데이터'에 따르면 한국인의 토플 말하기 분야 점수는 20점으로 공동 122위, 최하위권으로 나타났다(머니투데이, 2020. 5. 10).However, in the case of many English learners, the level of the speaking area is low compared to other areas, which is mentioned as a chronic problem in English education in Korea. In the TOEIC test, an English test that mainly deals with listening, reading, and writing, Korea's average score as of 2020 ranks 17th out of 49 countries in the world and 3rd in Asia, ranking among the top in the world. On the other hand, according to ETS's 'Worldwide TOEFL Score Data in 2018', Koreans' TOEFL speaking score was 20 points, tied for 122nd, and ranked at the bottom (Money Today, May 10, 2020).

이러한 문제를 극복하는 방안으로 학습자가 영어에 노출될 수 있는 기회를 늘리고, 상호작용 기반의 영어교육을 제공하기 위한 인공지능 챗봇이 활발하게 개발되고 있다. 인공지능 챗봇이란 문자나 음성 상호작용으로 사람 간의 대화와 유사한 형태의 의사소통을 하는 컴퓨터 프로그램으로, 사용자의 지시와 질문에 반응을 보이거나 요구를 수행할 수 있다. 이러한 인공지능 챗봇은 학습자와 대화를 통해 상호작용하며 영어 노출 기회를 늘리고 다양한 콘텐츠를 바탕으로 영어를 학습할 수 있도록 지원하고 있다. As a way to overcome these problems, artificial intelligence chatbots are being actively developed to increase opportunities for learners to be exposed to English and to provide interaction-based English education. An artificial intelligence chatbot is a computer program that communicates in a form similar to human conversation through text or voice interaction, and can respond to user's instructions and questions or perform requests. These artificial intelligence chatbots interact with learners through conversation, increase opportunities for exposure to English, and support learning English based on various contents.

대표적인 국내외의 인공지능 챗봇으로, 텍스트 기반 챗봇인 Cleverbot, Replika, Mitsuku 등과 음성기반 챗봇인 Siri, Echo, Ellie, 펭톡 등이 있다.Representative domestic and foreign artificial intelligence chatbots include text-based chatbots such as Cleverbot, Replika, and Mitsuku, and voice-based chatbots such as Siri, Echo, Ellie, and PengTalk.

그러나 현재까지 개발된 인공지능 챗봇의 경우, 대화 시 단답형의 응답만을 제공하거나, 혹은 지나치게 많은 정보를 제공하여 학습 효과 및 학습자의 흥미를 저하시키는 요인을 제공하는 경우가 흔하다. 더불어 학습자가 우선적으로 발화해야만 대화가 시작되고, 시나리오처럼 주어지는 특정 상황 이외의 대화 맥락을 원활히 인식하지 못하여 학습자와 자연스러운 상호작용이 어려운 경우가 많다. 또한, 일부 교육용 인공지능 챗봇의 경우 콘텐츠가 한정되어 있어 특정 주제에 대해 학습하는 시간 외에는 능동적으로 영어 학습을 하기 어렵다는 제한점이 있다.However, in the case of artificial intelligence chatbots developed so far, it is common to provide only short-answer responses during conversation, or to provide factors that reduce learning effects and learners' interest by providing too much information. In addition, a conversation starts only when the learner utters it first, and natural interaction with the learner is often difficult because the conversation context other than the specific situation given like a scenario is not smoothly recognized. In addition, in the case of some educational artificial intelligence chatbots, there is a limitation that it is difficult to actively learn English outside of the time to learn about a specific subject because the contents are limited.

따라서, 본 명세서에서는 이러한 기존 인공지능 챗봇의 문제를 모두 해결하고, 효율적/능동적으로 영어 말하기 학습을 진행하기 위한 인공지능 아바타/챗봇 엔진을 제공하고자 한다.Therefore, in the present specification, it is intended to solve all the problems of existing artificial intelligence chatbots and to provide an artificial intelligence avatar/chatbot engine for efficiently/actively learning to speak English.

본 발명의 일 실시예에 따른 감정 및 기억 기반의 대화형 인공지능 아바타 영어 말하기 교육 방법에 있어서, 제1 영어 문장이 포함된 사용자 입력을 수신하는 단계; 상기 제1 영어 문장에 대한 자연어 이해 동작을 수행하는 단계; 자연어 이해 동작을 수행한 결과, 상기 제1 영어 문장에 포함된 대화 의도와 매칭되는 기설정된 의도 분류가 있는지 판단하는 단계; 상기 대화 의도와 매칭되는 상기 기설정된 의도 분류가 없는 경우, 상기 제1 영어 문장을 기반으로 적어도 하나의 제1 자연어를 생성하는 단계; 상기 대화 의도와 매칭되는 상기 기설정된 의도 분류가 있는 경우, 상기 기설정된 의도 분류와 연관된 템플릿을 호출하고, 호출한 템플릿을 기반으로 적어도 하나의 제2 자연어를 생성하는 단계; 상기 적어도 하나의 제1 또는 제2 자연어로부터 임베딩 값을 추출하고, 추출한 임베딩 값을 기준으로 제1 또는 제2 자연어를 추출하는 단계; 및 추출한 제1 또는 제2 자연어를, 상기 제1 영어 문장에 대한 응답인 제2 영어 문장으로서 출력하는 단계; 를 포함할 수 있다.A method for teaching English speaking of an interactive artificial intelligence avatar based on emotion and memory according to an embodiment of the present invention, comprising: receiving a user input including a first English sentence; performing a natural language understanding operation on the first English sentence; As a result of performing the natural language understanding operation, determining whether there is a predetermined intention classification matching the conversation intention included in the first English sentence; generating at least one first natural language based on the first English sentence when there is no predetermined intent classification matching the conversation intent; calling a template associated with the preset intent classification and generating at least one second natural language based on the called template when there is the preset intent classification that matches the conversation intent; extracting an embedding value from the at least one first or second natural language, and extracting the first or second natural language based on the extracted embedding value; and outputting the extracted first or second natural language as a second English sentence that is a response to the first English sentence. can include

본 발명의 일 실시예에 따르면, 사용자/학습자의 의도를 가장 우선 순위로 파악하고, 사용자/학습자의 의도에 가장 부합하는 최적의 답변을 출력하므로, 자연스럽고 원활한 상호작용/의사소통이 진행되어 영어 말하기 학습 효과가 향상된다는 효과가 있다. According to an embodiment of the present invention, since the user/learner's intention is identified as the highest priority and the optimal answer that most closely matches the user/learner's intention is output, natural and smooth interaction/communication proceeds in English It has the effect of improving the speaking learning effect.

또한, 본 발명의 일 실시예에 따르면, 인공지능 아바타와 감정 및 기억 기반의 대화가 진행되므로, 강제적인 영어 학습 환경으로 인해 유발될 수 있는 학습자의 긴장감/스트레스가 미연에 방지 및 완화되며, 동일한 주제에 대한 반복된 대화가 방지되어 사용자/학습자가 대화에 쉽게 질리거나 흥미를 잃지 않고 지속적으로 진행할 수 있다는 효과가 있다. In addition, according to one embodiment of the present invention, since a conversation based on emotion and memory is conducted with an artificial intelligence avatar, the learner's tension/stress that may be caused by a compulsory English learning environment is prevented and alleviated in advance, and the same There is an effect that repeated conversation on the subject is prevented so that the user/learner can continue to progress without easily getting tired of or losing interest in the conversation.

또한, 본 발명의 일 실시예에 따르면, 인공지능 아바타가 실시간으로 영어 말하기 문장에 대한 피드백을 제공하므로, 보다 빠르고 효율적인 영어 말하기 학습이 진행될 수 있다는 효과가 있다.In addition, according to an embodiment of the present invention, since the artificial intelligence avatar provides feedback on English speaking sentences in real time, there is an effect that faster and more efficient English speaking learning can proceed.

이외의 본 발명의 다른 효과에 대해서는, 이하에서 각 도면을 참조하여 상세히 후술하기로 한다.Other effects of the present invention other than that will be described in detail below with reference to each drawing.

도 1은 본 발명의 일 실시예에 따른 인공지능 아바타와의 영어 학습 진행 화면을 예시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 사용자 장치의 대화형 인공지능 아바타를 이용한 영어 말하기 교육 방법에 관한 순서도이다.
도 3은 본 발명의 일 실시예에 따라 모션 입력을 감지하여 대화를 개시하는 사용자 장치를 예시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 Rasa open source framework 작성 방법을 예시한 순서도이다.
도 5는 본 발명의 일 실시예에 따른 정책 단계에 대한 프로그래밍을 예시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 템플릿을 예시한 도면이다.
도 7 및 8은 본 발명의 일 실시예에 따른 교육 컨텐츠 모델의 제1 자연어 생성 방법을 예시한 도면이다.
도 9는 본 발명의 일 실시예예 따른 학교/학원 시스템과의 교육 연계를 통한 영어 말하기 교육 방법을 예시한 도면이다.
도 10은 본 발명의 일 실시예에 따른 인공지능 아바타와의 대화 흐름을 예시한 도면이다.
도 11은 본 발명의 일 실시예예 따른 인공지능 아바타를 예시한다.
도 12는 본 발명의 일 실시예에 따른 사용자 장치의 블록도이다.1 is a diagram illustrating an English learning progress screen with an artificial intelligence avatar according to an embodiment of the present invention.
2 is a flowchart illustrating an English speaking education method using an interactive artificial intelligence avatar of a user device according to an embodiment of the present invention.
3 is a diagram illustrating a user device that initiates a conversation by sensing a motion input according to an embodiment of the present invention.
4 is a flowchart illustrating a Rasa open source framework creation method according to an embodiment of the present invention.
5 is a diagram illustrating programming for a policy step according to one embodiment of the present invention.
6 is a diagram illustrating a template according to an embodiment of the present invention.
7 and 8 are diagrams illustrating a first natural language generation method of an educational content model according to an embodiment of the present invention.
9 is a diagram illustrating an English speaking education method through education linkage with a school/academy system according to an embodiment of the present invention.
10 is a diagram illustrating a conversation flow with an artificial intelligence avatar according to an embodiment of the present invention.
11 illustrates an artificial intelligence avatar according to an embodiment of the present invention.
12 is a block diagram of a user device according to an embodiment of the present invention.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the technology to be described below can have various changes and various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, and it should be understood to include all modifications, equivalents, or substitutes included in the spirit and scope of the technology described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. 예를 들어, 'A 및/또는 B'는 'A 또는 B 중 적어도 하나'의 의미로 해석될 수 있다. 또한, '/'는 '및' 또는 '또는'으로 해석될 수 있다.Terms such as first, second, A, B, etc. may be used to describe various elements, but the elements are not limited by the above terms, and are merely used to distinguish one element from another. used only as For example, without departing from the scope of the technology described below, a first element may be referred to as a second element, and similarly, the second element may be referred to as a first element. The terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items. For example, 'A and/or B' may be interpreted as meaning 'at least one of A or B'. Also, '/' can be interpreted as 'and' or 'or'.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In the terms used in this specification, singular expressions should be understood to include plural expressions unless clearly interpreted differently in context, and terms such as “comprising” refer to the described features, numbers, steps, operations, and components. , parts or combinations thereof, but it should be understood that it does not exclude the possibility of the presence or addition of one or more other features or numbers, step-action components, parts or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to a detailed description of the drawings, it is to be clarified that the classification of components in the present specification is merely a classification for each main function in charge of each component. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function. In addition, each component to be described below may additionally perform some or all of the functions of other components in addition to its main function, and some of the main functions of each component may be performed by other components. Of course, it may be dedicated and performed by .

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing a method or method of operation, each process constituting the method may occur in a different order from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

이하에서 후술하는 사용자 장치는, 본 명세서에서 제안되는 대화형 인공지능 아바타를 이용한 영어 말하기 교육 방법/실시예를 수행/실행 가능하도록 구현된 어플리케이션/프로그램/소프트웨어가 사전에 설치되어 실행 중인 전자 서버/기기/장치에 해당할 수 있다. 따라서, 이하에서 별도로 설명하지 않더라도, 본 명세서에서 제안되는 방법/실시예는 상기 어플리케이션/프로그램/소프트웨어가 실행 중인 서버/기기/장치를 통해 수행/실행되는 것으로 해석될 수 있으며, 각 서버/기기/장치의 동작은 상기 어플리케이션/프로그램/소프트웨어의 동작/기능으로 해석될 수 있다. 이하에서는 설명의 편의를 위해, 상기 어플리케이션/프로그램/소프트웨어가 설치된 사용자의 단말인 '사용자 장치'를 중심으로 설명하나, '사용자 장치'는 '어플리케이션' 또는 상기 어플리케이션/프로그램/소프트웨어를 운영/관리/제어하는 '어플리케이션 서버'로 대체되어 설명될 수 있다. A user device to be described below includes an electronic server/s/ It may correspond to a device/device. Therefore, even if not separately described below, the method/embodiment proposed in this specification can be interpreted as being performed/executed through a server/device/device where the application/program/software is running, and each server/device/device The operation of the device may be interpreted as the operation/function of the application/program/software. Hereinafter, for convenience of description, the description will be centered on the 'user device', which is the terminal of the user in which the application/program/software is installed, but the 'user device' will operate/manage/manage the 'application' or the application/program/software. It can be described by being replaced with the 'application server' that controls.

도 1은 본 발명의 일 실시예에 따른 인공지능 아바타와의 영어 학습 진행 화면을 예시한 도면이다.1 is a diagram illustrating an English learning progress screen with an artificial intelligence avatar according to an embodiment of the present invention.

본 도면에 예시한 바와 같이, 사용자 장치는 인공지능 아바타(120)와의 영어 말하기 대화 인터페이스 화면(110)을 출력할 수 있다. 인터페이스 화면(110)에는, 인공지능 아바타(120)와 채팅창(130)이 포함될 수 있다. 학습자는 채팅창(130)에 직접 채팅을 입력(150)하거나, 음성 입력 버튼(140)을 선택하여 음성 입력을 수행함으로써 인공지능 아바타(120)와 대화를 할 수 있다. 이때 대화 언어는 학습자가 학습하고자 하는 언어(예를 들어, 영어, 한국어 등)로 학습자가 직접 설정할 수 있다. As illustrated in this figure, the user device may output an English speaking conversation interface screen 110 with the artificial intelligence avatar 120. The interface screen 110 may include an artificial intelligence avatar 120 and a chat window 130 . The learner can have a conversation with the artificial intelligence avatar 120 by directly inputting a chat message 150 into the chatting window 130 or by selecting the voice input button 140 to perform voice input. At this time, the learner can directly set the dialogue language to a language the learner wants to learn (eg, English, Korean, etc.).

채팅창(130)을 통해 양자간 대화가 기록 및 출력되며, 인공지능 아바타(120)는 미리 학습된 AI 모델, 대화 템플릿 등을 이용하여 학습 대상 언어로 학습자와의 자연스러운 대화/의사소통을 진행할 수 있다. 특히, 본 명세서에서 제안되는 인공지능 아바타(120)는, 학습자의 대화 의도를 파악하는 데 초점을 맞추며, 파악한 의도에 맞춰 대답을 출력함으로써 학습자와의 자연스러운 의사소통을 진행할 수 있다. 이외에도, 보다 자연스럽고 원활한 의사소통을 위해, 인공지능 아바타(120)는 사용자의 감정, 지난 대화 기록 및/또는 대화 상황 등을 종합적으로 고려하여 대답 후보를 추출하고, 대답 후보 중 가장 적절한 대답을 선택하여 출력할 수 있다. The conversation between the two is recorded and output through the chat window 130, and the artificial intelligence avatar 120 can proceed with natural conversation / communication with the learner in the target language using a pre-learned AI model, conversation template, etc. there is. In particular, the artificial intelligence avatar 120 proposed in this specification focuses on identifying the learner's conversational intention, and outputs an answer according to the identified intention, thereby enabling natural communication with the learner. In addition, for more natural and smooth communication, the artificial intelligence avatar 120 comprehensively considers the user's emotions, past conversation records and/or conversation situations, etc. to extract candidate answers and selects the most appropriate answer from among the candidate answers. can be printed out.

이를 통해, 학습자와 인공지능 아바타(120)간 자연스럽고도 원활한 의사소통이 진행되게 되며, 학습자의 학습에 대한 긴장감/스트레스는 완화되고, 활발한 대화를 통해 학습자의 학습 능력은 더욱 향상될 수 있다. Through this, natural and smooth communication between the learner and the artificial intelligence avatar 120 proceeds, the learner's tension/stress for learning is alleviated, and the learner's learning ability can be further improved through active conversation.

이하에서는, 본 명세서에서 제안하는 인공지능 아바타(120)를 구현하기 위한 구체적인 시스템/장치 구조 및 방법 등에 살펴보기로 한다.Hereinafter, a detailed system/device structure and method for implementing the artificial intelligence avatar 120 proposed in this specification will be described.

도 2는 본 발명의 일 실시예에 따른 사용자 장치의 대화형 인공지능 아바타를 이용한 영어 말하기 교육 방법에 관한 순서도이다.2 is a flowchart illustrating an English speaking education method using an interactive artificial intelligence avatar of a user device according to an embodiment of the present invention.

본 순서도에서 적어도 하나의 단계가 제외되거나 새로운 단계가 부가될 수 있으며, 각 단계의 순서는 실시예에 따라 변경될 수 있다. 나아가, 각 단계는 해당 단계를 수행하는 사용자 장치의 기능적 구성에 대응할 수 있다. 예를 들어, 사용자 입력 단계는 사용자의 입력을 센싱/감지/수신하는 사용자 입력부에 대응할 수 있다. 각 기능적 구성은 적어도 하나의 하드웨어적인/소프트웨어적인 구성을 이용하여 구현될 수 있다.In this flowchart, at least one step may be excluded or a new step may be added, and the order of each step may be changed according to embodiments. Furthermore, each step may correspond to a functional configuration of a user device performing the step. For example, the user input step may correspond to a user input unit sensing/sensing/receiving a user input. Each functional configuration may be implemented using at least one hardware/software configuration.

도 2를 참조하면, 우선 사용자 장치는 제1 영어 문장이 포함된 사용자 입력을 수신할 수 있다(S201). 사용자 입력은 사용자 장치에 대한 다양한 입력을 의미하는 것으로, 예를 들어 텍스트 입력, 음성 입력 등이 이에 해당할 수 있다. 사용자 장치는 평소에는 대기/절전 모드를 유지하다가, 사용자 입력을 감지하면 학습을 위한 자연스러운 영어 대화를 개시할 수 있다. 'Referring to FIG. 2 , first, the user device may receive a user input including a first English sentence (S201). User input refers to various inputs to the user device, and may correspond to, for example, text input and voice input. The user device normally maintains a standby/power-save mode, and may start a natural English conversation for learning when a user input is detected. '

사용자 장치는 실시예에 따라 제1 영어 문장이 포함된 사용자 입력 외의 기설정된 입력이 감지된 경우, 직접 대화를 개시할 수도 있는데, 이에 대해서는 도 3을 참조하여 이하에서 후술한다.According to an embodiment, the user device may initiate a direct conversation when a preset input other than a user input including the first English sentence is detected, which will be described later with reference to FIG. 3 .

도 3은 본 발명의 일 실시예에 따라 모션 입력을 감지하여 대화를 개시하는 사용자 장치를 예시한 도면이다. 3 is a diagram illustrating a user device that initiates a conversation by sensing a motion input according to an embodiment of the present invention.

도 3을 참조하면, 사용자 장치는 적어도 하나의 카메라 센서를 포함하고, 이를 이용하여 학습자(320)에 대한 시각적 오브젝트(예를 들어, 안면, 얼굴, 움직임, 표정 등)(330)를 인식/식별할 수 있다. 특히, 사용자 장치는 카메라 센서를 이용하여 움직임이 감지되면, 움직임이 감지된 대상에 대한 시각적 오브젝트(330)를 인식/식별하여 미리 학습 시스템에 등록되어 있는 학습자(320)인지를 판단할 수 있다. Referring to FIG. 3, the user device includes at least one camera sensor, and recognizes/identifies a visual object (eg, face, face, movement, facial expression, etc.) 330 of the learner 320 using the camera sensor. can do. In particular, when a motion is detected using a camera sensor, the user device may determine whether the user device is a learner 320 registered in the learning system in advance by recognizing/identifying the visual object 330 for the motion-detected object.

만일, 감지된 대상이 시스템에 등록되어 있는 학습자(320)인 경우, 사용자 장치는 현재 시간, 학습자(320)의 표정, 학습 주제 및/또는 지난 대화 기록 등을 종합적으로 고려하여 대화를 개시할 영어 문장을 결정하고, 결정한 영어 문장을 출력(310)함으로써 학습자(320)와의 대화를 능동적으로 개시할 수 있다. 학습자는 이러한 사용자 장치의 개시 문장(310)에 대하여 답변을 수행할 수 있으며, 해당 답변은 제1 영어 문장이 포함된 사용자 입력으로서 S201 단계에서와 같이 사용자 장치에 의해 인식/감지되어 처리될 수 있다. If the detected target is the learner 320 registered in the system, the user device comprehensively considers the current time, learner 320's facial expression, learning subject, and/or past conversation records, etc. to initiate a conversation in English. A conversation with the learner 320 may be actively initiated by determining a sentence and outputting the determined English sentence 310 . The learner may perform an answer to the opening sentence 310 of the user device, and the corresponding answer may be recognized/detected and processed by the user device as in step S201 as a user input including the first English sentence. .

다시 도 2를 참조하면, 다음으로, 사용자 장치는 제1 영어 문장에 대한 자연어 이해 동작을 수행할 수 있다(S202). 여기서 자연어 이해 동작(S202)은, 사용자 장치가 입력된 제1 영어 문장에 내포된 의미, 의도, 감정 등을 이해하기 위한 일련의 동작을 의미할 수 있다. Referring back to FIG. 2 , next, the user device may perform a natural language understanding operation on the first English sentence (S202). Here, the natural language understanding operation (S202) may mean a series of operations for the user device to understand the meaning, intention, emotion, etc. contained in the input first English sentence.

이러한 자연어 이해 동작(S202)은, 크게 제1 영어 문장에 대한 전처리 단계(미도시), 전처리된 제1 영어 문장의 감정 카테고리 분류 단계(S202c) 및 전처리된 제1 영어 문장으로부터 대화 의도 및 개체명을 추출(S202a, S202b)하는 단계를 포함하여 구성될 수 있다. These natural language understanding operations (S202) include a preprocessing step (not shown) of the first English sentence, a step of classifying the emotion category of the preprocessed first English sentence (S202c), and a conversation intent and object name from the preprocessed first English sentence. It may be configured to include the step of extracting (S202a, S202b).

제1 영어 문장을 전처리하는 단계는, 제1 영어 문장에 대한 자연어 이해 인식률을 향상시키기 위해, 제1 영어 문장을 전처리하는 일련의 과정을 의미할 수 있다. 사용자 장치는 제1 영어 문장에 포함된 공백 문자를 기준으로 토크나이저('WhitespaceTokenizer')를 수행할 수 있으며, 자연어 이해에 불필요한 단어(예를 들어, a, an, the 등과 같은 관사), 공백 등을 필터링('StopwordFilter')함으로써 전처리 동작을 수행할 수 있다. 여기서, 토크나이저는 특정 문장/구문에서 의미있는 요소들을 토큰으로 쪼개는 기능을 수행할 수 있다. The step of preprocessing the first English sentence may refer to a series of processes of preprocessing the first English sentence in order to improve a natural language understanding recognition rate for the first English sentence. The user device may perform a tokenizer ('WhitespaceTokenizer') based on whitespace characters included in the first English sentence, and words unnecessary to natural language understanding (eg, articles such as a, an, the, etc.), spaces, etc. Preprocessing operation can be performed by filtering ('StopwordFilter'). Here, the tokenizer can perform a function of splitting meaningful elements into tokens in a specific sentence/syntax.

감정 카테고리를 분류하는 단계(S202c)는, 기설정된 학습 모델을 통해 사전 구축된 감정 분류 모델에 전처리된 제1 영어 문장을 입력하여, 제1 영어 문장에 내포된 감정을 기설정된 감정 카테고리로 분류하는 단계에 해당할 수 있다. 감정 분류 모델 구축에 있어 다양한 AI 학습 모델이 활용될 수 있으나, 본 명세서에서는 ELECTRA 모델을 제안한다. ELECTRA 모델은 BERT의 장점을 유지하고 효율적으로 학습할 수 있는 언어 사전 훈련 방식을 제안한 학습 모델이며, 문장에 포함된 감정을 인식하는 데 주로 활용되는 모델이다. 기설정된 감정 카테고리는 운영자 또는 관리자에 의해 사전 설정된 카테고리로 정의될 수 있으며, 예를 들어 "happy(행복)", "sadness(슬픔)", "anger(분노)" 등과 같은 다양한 감정으로 정의될 수 있다. In the step of classifying the emotion category (S202c), the preprocessed first English sentence is input to an emotion classification model built in advance through a preset learning model, and the emotion contained in the first English sentence is classified into a preset emotion category. step may be applicable. Various AI learning models can be used in constructing an emotion classification model, but the ELECTRA model is proposed in this specification. The ELECTRA model is a learning model that proposes a language pre-training method that can efficiently learn while maintaining the advantages of BERT, and is a model that is mainly used to recognize emotions included in sentences. The preset emotion category may be defined as a category preset by an operator or manager, and may be defined as various emotions, such as, for example, "happy", "sadness", "anger", and the like. there is.

대화 의도 및 개체명을 추출하는 단계(S202a, S202b)는, 기설정된 학습 모델을 통해 사전 구축된 대화 의도 추출 모델에 전처리된 제1 영어 문장을 입력하여, 제1 영어 문장에 내포된 대화 의도 및 개체명을 추출하는 단계에 해당할 수 있다. 대화 의도 및 개체명 추출 모델 구축에 있어 다양한 AI 학습 모델이 활용될 수 있으나, 본 명세서에서는 DIET(Dual Intent and Entity Transformer) 분류기(classifier) 모델을 제안한다. DIET 분류기 모델은 의도 분류와 개체 인식을 함께 처리할 수 있는 변환기 아키텍처에 해당한다. In the step of extracting the conversation intention and entity name (S202a, S202b), the preprocessed first English sentence is input to a conversation intention extraction model built in advance through a preset learning model, and the conversation intention and It may correspond to the step of extracting the entity name. Various AI learning models can be used to build conversational intent and object name extraction models, but in this specification, a Dual Intent and Entity Transformer (DIET) classifier model is proposed. The DIET classifier model corresponds to a transformer architecture that can handle both intent classification and object recognition.

대화 의도는, 예를 들어 “chithat(일상생활 대화)", "greeting (인사)" 등과 같이 다양한 카테고리로 추출될 수 있다. 개체명은, 예를 들어 “user_name(학습자 이름)", "user_persona(인공지능 아바타 이름)" 등과 같은 이름, 호칭, 인칭 대명사 및/또는 지칭 대명사 등에 해당할 수 있다. Conversation intent can be extracted into various categories, such as “chithat (daily life conversation)”, “greeting (greeting)”, etc. Entity names, for example, “user_name (learner name)”, “user_persona (artificial greeting)” Intelligence avatar name)" may correspond to a name, title, personal pronoun, and/or referential pronoun.

다음으로, 사용자 장치는 제1 영어 문장으로부터 추출한 대화 의도와 매칭되는 기설정된 의도 분류가 있는지 판단할 수 있다(S203). 만일, 추출한 대화 의도와 매칭되는 기설정된 의도 분류가 있는 경우, 사용자 장치는 템플릿 기반의 자연어 생성 동작(S204~S206)을 수행할 수 있다. 반대로, 추출한 대화 의도와 매칭되는 기설정된 의도 분류가 없는 경우, 사용자 장치는 DL 기반의 자연어 생성 동작(S207)을 수행할 수 있다. Next, the user device may determine whether there is a predetermined intention classification that matches the conversation intention extracted from the first English sentence (S203). If there is a predetermined intent classification that matches the extracted conversation intent, the user device may perform a template-based natural language generation operation (S204 to S206). Conversely, when there is no preset intent classification that matches the extracted conversation intent, the user device may perform a DL-based natural language generation operation ( S207 ).

우선, 추출한 대화 의도와 매칭되는 기설정된 의도 분류가 있는 경우부터 설명한 후, 기설정된 의도 분류가 없는 경우에 대해 설명한다.First, the case where there is a preset intent classification that matches the extracted conversation intention will be described, and then the case where there is no preset intent classification will be described.

추출한 대화 의도와 매칭되는 기설정된 의도 분류가 있는 경우, 사용자 장치는 템플릿 기반의 제2 자연어 생성을 위해 전처리된 제1 영어 문장과 연관된 템플릿을 탐색 및 호출하는 동작을 수행할 수 있다(S204). If there is a preset intent classification that matches the extracted conversation intent, the user device may perform an operation of searching for and calling a template associated with the preprocessed first English sentence in order to generate a template-based second natural language (S204).

이를 위한 동작으로서, 본 명세서에서는 Rasa open source framework를 활용한 방식을 제안한다. Rasa open source framework는 텍스트 및 음성 기반 대화를 자동화하는 오픈 소스 머신 러닝 프레임 워크이다. As an operation for this, in this specification, a method using the Rasa open source framework is proposed. The Rasa open source framework is an open source machine learning framework that automates text- and voice-based conversations.

도 4는 본 발명의 일 실시예에 따른 Rasa open source framework 작성 방법을 예시한 순서도이며, 도 5는 본 발명의 일 실시예에 따른 정책 단계에 대한 프로그래밍을 예시한 도면이다.4 is a flowchart illustrating a Rasa open source framework creation method according to an embodiment of the present invention, and FIG. 5 is a diagram illustrating programming for a policy step according to an embodiment of the present invention.

도 3 및 4를 참조하면, Rasa open source framework는 크게 추적 단계(S204a, S420)와 정책 단계(S204b, S430)를 포함한다. Referring to Figures 3 and 4, the Rasa open source framework largely includes a tracking step (S204a, S420) and a policy step (S204b, S430).

추적 단계(S420)는 대화를 진행함에 있어 필요한 정보를 저장하는 단계에 해당한다. 본 단계(S204a, S420)에서, 사용자 장치는 전처리된 제1 영어 문장으로부터 입력 텍스트, 시간, 의도 분류 결과, 개체명 추출 결과 및/또는 감정 분류 결과를 추출하여 Rasa open source framework에 정의된 형식에 따라 대화 시나리오를 생성하여 저장할 수 있다. 이러한 대화 시나리오의 예시는 다음의 표 1과 같다. The tracking step (S420) corresponds to a step of storing information necessary for conducting a conversation. In this step (S204a, S420), the user device extracts the input text, time, intent classification result, entity name extraction result, and/or emotion classification result from the preprocessed first English sentence, and extracts it in a format defined in the Rasa open source framework. You can create and save conversation scenarios according to An example of such a dialogue scenario is shown in Table 1 below.

정책 단계(S204b, S430)는 앞서 추적 단계에서 추출한 의도 분류 결과에 대해 어떤 행동을 취해야 할지 결정하는 단계에 해당한다. 사용자 장치는, 도 5에 예시된 바와 같이, 각 의도 분류 결과별로 행동이 사전에 정의되어 있을 수 있으며, 인식한 의도 분류 결과에 따라 정의된 행동을 수행할 수 있다. 예를 들어, 도 5를 참조하면, 사용자 장치가 제1 영어 문장의 대화 의도를 'greet'로 분류한 경우, 사용자 장치는 'action_chitchat_generator'에 정의되어 있는 행동(예를 들어, 'Hi!! How are you today?'라는 음성 및 채팅 출력 등)을 수행할 수 있다. The policy step (S204b, S430) corresponds to a step of determining what action to take for the intention classification result extracted in the previous tracking step. As illustrated in FIG. 5 , the user device may have actions defined in advance for each intention classification result, and may perform the defined action according to the recognized intention classification result. For example, referring to FIG. 5 , when the user device classifies the conversational intention of the first English sentence as 'greet', the user device takes an action defined in 'action_chitchat_generator' (eg, 'Hi!! How voice and chatting output such as 'are you today?') can be performed.

정책 단계(S204b, S430)는 이하에서 후술하는 S205 단계를 수행한 결과, 제1 영어 문장의 대화 시나리오와 매칭되는 템플릿이 없는 경우에 한해 수행될 수 있다. Policy steps (S204b, S430) may be performed only when there is no template matching the dialogue scenario of the first English sentence as a result of performing step S205, which will be described below.

다시 도 2를 참조하면, 다음으로 사용자 장치는 Rasa open source framework를 기반으로 작성된 대화 시나리오와 매칭되는/대응하는 템플릿이 존재하는지 판단할 수 있다(S205). Referring back to FIG. 2 , the user device may next determine whether a template matching/corresponding to a conversation scenario created based on the Rasa open source framework exists (S205).

만일, 대화 시나리오와 매칭되는 기설정된 대화 템플릿이 없는 경우, 앞서 도 5를 참조하여 상술한 바와 같이, 사용자 장치는 정책 단계(S430)에서 정의되어 있는 행동을 취할 수 있다. 반대로, 대화 시나리오와 매칭되는 기설정된 대화 템플릿이 있는 경우, 사용자 장치는 매칭되는 템플릿을 호출하고, 호출한 템플릿에 정의되어 있는 바에 따른 행동을 취할 수 있다(S206). If there is no preset conversation template that matches the conversation scenario, as described above with reference to FIG. 5 , the user device may take actions defined in the policy step (S430). Conversely, if there is a preset conversation template that matches the conversation scenario, the user device may call the matching template and take an action according to what is defined in the called template (S206).

도 6은 본 발명의 일 실시예에 따른 템플릿을 예시한 도면이다. 6 is a diagram illustrating a template according to an embodiment of the present invention.

본 도면은 특히, '아침' 시나리오에 대한 템플릿에 해당한다. 도 6을 참조하면, 템플릿에는 기설정된 의도 분류별 취해야 할 행동(또는 응답)이 1대 N의 대응 관계로 정의되어 있을 수 있다. 사용자 장치는 제1 영어 문장과 매칭되는 기설정된 의도 분류에 대응하는 적어도 하나의 행동(또는 응답)을 선택(S206b)하여 제2 자연어로서 생성할 수 있다(S206a). 예를 들어, 사용자 장치가 제1 영어 문장으로부터 'morning(아침인사)' 의도 분류를 추출한 경우, 이에 대응하여 정의되어 있는 행동인 'utter_resp_morning'을 선택할 수 있으며(S206b), 그 결과 'Good morning to you, too! Did you sleep well, {name}?"를 제2 자연어로서 생성할 수 있다(S206a). In particular, this drawing corresponds to a template for the 'morning' scenario. Referring to FIG. 6 , in the template, actions (or responses) to be taken for each predetermined intent category may be defined in a one-to-n correspondence relationship. The user device may select (S206b) at least one action (or response) corresponding to a preset intent classification that matches the first English sentence and generate it as a second natural language (S206a). For example, when the user device extracts the 'morning' intention classification from the first English sentence, it can select 'utter_resp_morning', which is an action defined in response to this (S206b), and as a result, 'Good morning to you, too! Did you sleep well, {name}?" can be generated as a second natural language (S206a).

이렇듯 기설정된 템플릿을 활용하여 대화를 진행하는 경우, 현재 상황 및 학습자의 의도를 보다 정확하게 파악한 상태에서 자연스러운 의사소통이 가능하다는 효과가 있다. 나아가, 학습 목적, 교육 공학적 목적 등에 따라 자유로운 템플릿의 구성이 가능하여 시스템 전체적인 자유도 및 유연성이 향상되며, 그 결과 기존에 동일한 대화 형식이 반복적으로 수행되어 학습자의 흥미를 저하시켰던 기존의 챗봇의 한계점을 극복할 수 있다는 효과를 갖는다. In this way, when a conversation is conducted using a preset template, there is an effect that natural communication is possible in a state in which the current situation and the learner's intention are more accurately grasped. Furthermore, it is possible to freely configure templates according to learning purposes, educational engineering purposes, etc., thereby improving the overall degree of freedom and flexibility of the system. has the effect of overcoming

이상으로, S203 단계에서 대화 의도와 매칭되는 기설정된 의도 분류가 있는 경우에 대하여 살펴보았다. 이하에서는, S203 단계에서 대화 의도와 매칭되는 기설정된 의도 분류가 없는 경우에 대하여 살펴본다.In the above, the case where there is a preset intention classification that matches the conversation intention in step S203 has been reviewed. Hereinafter, a case where there is no preset intent classification matching the conversation intent in step S203 will be described.

다시 도 2를 참조하면, S202 단계에서 추출한 대화 의도와 매칭되는 기설정된 의도 분류가 없는 경우, 사용자 장치는 제1 영어 문장을 기반으로 즉흥적인 적어도 하나의 제1 자연어를 생성할 수 있다(S207). 이는 다양한 AI 학습 모델을 기반으로 수행될 수 있다.Referring back to FIG. 2 , when there is no preset intent classification that matches the conversation intent extracted in step S202, the user device may generate at least one first natural language spontaneously based on the first English sentence (S207). . This can be done based on various AI learning models.

일 실시예로서, 사용자 장치는 Open AI GPT3의 davinci 모델을 API 호출하고(S207b), davinci 모델에 제1 영어 문장(특히, 전처리된 제1 영어 문장)을 입력하여 제1 자연어를 생성할 수 있다(S207a). As an embodiment, the user device may generate a first natural language by making an API call to the davinci model of Open AI GPT3 (S207b) and inputting a first English sentence (in particular, a preprocessed first English sentence) to the davinci model. (S207a).

다른 실시예로서, 사용자 장치는 교육 공학적으로 의미가 있는 공감 대화법을 사전에 학습하여 구축된 공감 대화 모델을 활용하여 사용자 입력에 대한 응답인 제1 자연어를 생성할 수 있다(S207c). 본 공감 대화 모델 구축에 있어, 학습자의 감정을 파악하기 위한 학습 모델인 Electra 모델이 적극적으로 활용될 수 있으며, 이를 통해 학습자의 감정을 파악하고, 이에 공감하는 응답을 주로 수행하도록 학습될 수 있다. As another embodiment, the user device may generate a first natural language that is a response to a user input by utilizing an empathic conversation model built by pre-learning an empathic conversation method meaningful in terms of educational engineering (S207c). In constructing this empathic conversation model, the Electra model, which is a learning model for grasping the learner's emotion, can be actively utilized, and through this, it can be learned to grasp the learner's emotion and respond empathetically to it.

다른 실시예로서, 사용자 장치는 Blender Bot 모델에 제1 영어 문장을 입력하여 제1 자연어를 생성할 수 있다(S207d). 여기서, Blender Bot 모델은 Facebook 팀에서 개발한 챗봇 모델로, 인터넷 검색 쿼리를 생성할 수 있고, 시간이 지남에 따라 지식이 추가되면서 이전 지식을 참고하여 대화를 할 수 있는 능력을 갖는 챗봇 모델로, 사용자 입력에 대해 페르소나가 반영된 응답을 생성하는 것을 특징으로 한다. As another embodiment, the user device may generate a first natural language by inputting a first English sentence to the Blender Bot model (S207d). Here, the Blender Bot model is a chatbot model developed by the Facebook team that can generate Internet search queries and has the ability to have conversations by referring to previous knowledge as knowledge is added over time. It is characterized by generating a response in which the persona is reflected in response to the user input.

다른 실시예로서, 사용자 장치는 지식 베이스 기반(또는 웹 검색 기반)의 제1 자연어를 생성할 수 있다(S207e). 보다 상세하게는, 사용자 장치는 제1 영어 문장에서 키워드를 추출하고, 추출한 키워드를 웹상에서 검색하고, 검색한 결과를 기초로 제1 자연어를 생성할 수 있다. As another embodiment, the user device may generate a first natural language based on a knowledge base (or based on a web search) (S207e). More specifically, the user device may extract a keyword from the first English sentence, search the extracted keyword on the web, and generate a first natural language based on the search result.

다른 실시예로서, 사용자 장치는 지난 대화 기록을 기반으로 제1 자연어를 생성할 수 있다(S207f). 보다 상세하게는, 사용자 장치는 학습자와 인공지능 아바타간의 대화를 하나의 세트로 그룹핑하여 학습한 후 데이터 베이스에 저장할 수 있으며, 학습한 지난 대화 기록을 바탕으로 제1 자연어를 생성하는 장기 기억 기반 대화 모델을 구축할 수 있다. 사용자 장치는 이렇게 구축한 장기 기억 기반 대화 모델에 제1 영어 문장을 입력하여 출력된 응답을 기초로 제1 자연어를 생성할 수 있다. 장기 기억 대화 모델은 제1 영어 문장으로부터 키워드를 추출한 후 학습된 내용 또는 데이터 베이스 상에서 검색하고, 검색 결과를 출력할 수 있다. As another embodiment, the user device may generate a first natural language based on a past conversation record (S207f). More specifically, the user device may group conversations between learners and artificial intelligence avatars into one set, learn them, and store them in a database, and generate a first natural language based on a record of learned conversations based on long-term memory based conversations. model can be built. The user device may generate the first natural language based on the output response by inputting the first English sentence to the long-term memory-based conversation model built in this manner. The long-term memory conversation model extracts keywords from the first English sentence, searches the learned contents or database, and outputs the search results.

다른 실시예로서, 사용자 장치는 사전에 미리 학습되어 구축된 교육 컨텐츠 모델을 입력받고, 해당 모델을 기초로 제1 자연어를 생성할 수 있다(S207g). 이에 대해서는 도 7 및 8을 참조하여 이하에서 상세히 후술한다.As another embodiment, the user device may receive a pre-learned and built educational content model and generate a first natural language based on the model (S207g). This will be described in detail below with reference to FIGS. 7 and 8 .

도 7 및 8은 본 발명의 일 실시예에 따른 교육 컨텐츠 모델의 제1 자연어 생성 방법을 예시한 도면이다. 7 and 8 are diagrams illustrating a first natural language generation method of an educational content model according to an embodiment of the present invention.

영어 말하기 교육 컨텐츠 제작자는 영어 교육 주제에 맞춰 교육 내용을 학습시킨 교육 컨텐츠 모델을 구축할 수 있으며, 이렇게 구축한 교육 컨텐츠 모델을 사용자 장치에 입력할 수 있다. 예를 들어, 도 7에 예시한 바와 같이 교육 주제가 '시제'인 경우, 영어 말하기 교육 컨텐츠 제작자는 '시제'와 관련된 영어 문장의 대답을 유도하는 질문 리스트 및 이에 대한 적절한 응답을 AI 학습 모델에 학습시켜 교육 컨텐츠 모델을 구축할 수 있다. An English speaking educational content producer may build an educational content model in which educational contents are learned according to an English educational topic, and may input the educational content model thus built into a user device. For example, as illustrated in FIG. 7 , when the education topic is 'tense', the English speaking education content creator learns a list of questions leading to answers to English sentences related to 'tense' and an appropriate response to the AI learning model. to build an educational content model.

이렇게 구축된 교육 컨텐츠 모델은 사용자 장치에 입력/수신될 수 있으며, 사용자 장치는 교육 컨텐츠 모델에 기정의되어 있는 적어도 하나의 질문(예를 들어, 도 7의 ask_past)을 제1 자연어로서 생성하여 도 8과 같이 출력할 수 있다(810-1, 810-4). 나아가, 교육 컨텐츠 모델은 출력한 질문에 대한 응답인 사용자 입력(810-2)을 수신할 수 있으며, 이로부터 제3 영어 문장을 추출할 수 있다. 교육 컨텐츠 모델은 질문(810-1, 4)에 대응하여 기학습된 응답과의 매칭 정도를 기초로 제3 영어 문장의 문장 완성도를 평가할 수 있다. 특히, 교육 컨텐츠 모델은 교육 컨텐츠 주제와 대응하는 조건이 만족되었는지 여부에 가중치를 부여하여 제3 영어 문장(810-2)의 문장 완성도를 평가할 수 있다. 예를 들어, 시제를 제외한 나머지 부분에 오류가 있는 문장의 완성도는, 시제에만 오류가 있는 문장보다 문장 완성도보다 더 높게 평가될 수 있다. The educational content model built in this way may be input/received by a user device, and the user device may generate at least one question (eg, ask_past in FIG. 7 ) predefined in the educational content model as a first natural language. 8 can be output (810-1, 810-4). Furthermore, the educational content model may receive a user input 810 - 2 that is a response to the output question, and extract a third English sentence from it. The educational content model may evaluate the degree of completeness of the third English sentence based on the degree of matching with pre-learned responses in response to the questions 810-1 and 4. In particular, the educational content model may evaluate the sentence completeness of the third English sentence 810-2 by assigning a weight to whether a condition corresponding to the educational content subject is satisfied. For example, the completeness of a sentence with an error except for the tense may be evaluated higher than the completeness of the sentence than a sentence with an error only in the tense.

교육 컨텐츠 모델은 교육 컨텐츠 주제와 관련된 부분에 오류를 발견한 경우, 도 8에 예시한 바와 같이, 이를 정정하기 위한 자연어(본 도면에서 시제를 정정하는 자연어)를 생성하여 즉각적으로 피드백/출력할 수 있다(810-3). When an error is found in a part related to a subject of educational content, the educational content model generates natural language (natural language for correcting tense in this figure) to correct it, as illustrated in FIG. Yes (810-3).

교육 컨텐츠 모델은 평가한 문장 완성도가 기설정된 문장 완성도 이상이 될때까지 내부에 정의된 질문을 연속적으로/순차적으로 제3 자연어로 생성할 수 있으며, 제3 자연어가 포함된 제4 영어 문장(810-4)을 제3 영어 문장(810-2)에 대한 응답으로서 연속적으로 출력할 수 있다.The training content model may continuously/sequentially generate questions defined therein in the third natural language until the evaluated sentence completeness becomes equal to or higher than the preset sentence completeness, and the fourth English sentence including the third natural language (810- 4) can be continuously output as a response to the third English sentence 810-2.

다시 도 2를 참조하면, 다음으로 사용자 장치는 앞서 생성된 적어도 하나의 제1 및/또는 제2 자연어를 랭킹화할 수 있다(S208). 보다 상세하게는, 사용자 장치는 생성한 적어도 하나의 제1 및/또는 제2 자연어로부터 임베딩 값을 추출하고, 가장 높은 임베딩 값을 갖는 제1 또는 제2 자연어를 추출할 수 있다. 여기서 임베딩이란, 자연어 처리 분야에서 자연어를 기계가 이해할 수 있는 숫자 형태인 벡터로 바꾼 결과 혹은 그 일련의 과정을 의미한다. 사용자 장치는 적어도 하나의 제1 및/또는 제2 자연어로부터 추출한 임베딩 값을 BERT 기반의 랭커에 입력하고, 랭커는 각 임베딩 값별 예측 값을 내림차순으로 정렬한 목록을 반환할 수 있다(S208a). 사용자 장치는 반환된 목록에서 최상위에 랭크된 제1 또는 제2 자연어를 선택할 수 있다.Referring back to FIG. 2 , next, the user device may rank at least one first and/or second natural language generated in advance (S208). More specifically, the user device may extract an embedding value from at least one generated first and/or second natural language, and extract the first or second natural language having the highest embedding value. Here, embedding means a result or a series of processes in the field of natural language processing that converts natural language into a vector in the form of numbers that can be understood by a machine. The user device may input an embedding value extracted from at least one first and/or second natural language into a BERT-based ranker, and the ranker may return a list in which predicted values for each embedding value are sorted in descending order (S208a). The user device may select the first or second natural language ranked at the top in the returned list.

다음으로, 사용자 장치는 선택한 제1 또는 제2 자연어의 위험성을 탐지할 수 있다(S209). 보다 상세하게는, 사용자 장치는 GPT3 content filter에 선택된 제1 또는 제2 자연어를 입력하여 toxicity를 분류할 수 있다(S209a). 입력된 제1 또는 제2 자연어가 toxicity로 분류된 경우, S208 단계의 반환 목록에서 그 다음 순위에 랭크된 제1 또는 제2 자연어를 선택할 수 있다. 만일, toxicity 분류 작업이 형태소/단어 단위로 수행되는 경우, 사용자 장치는 제1 또는 제2 자연어 내에서 toxicity로 분류된 형태소/단어를 toxicity로 분류되지 않은 다른 형태소/단어로 paraphrasing하여 제1 또는 제2 자연어의 변환을 수행할 수 있다(S209b).Next, the user device may detect danger of the selected first or second natural language (S209). More specifically, the user device may classify toxicity by inputting the selected first or second natural language into the GPT3 content filter (S209a). When the input first or second natural language is classified as toxic, the first or second natural language ranked next may be selected from the returned list in step S208. If toxicity classification is performed in units of morphemes/words, the user device paraphrases morphemes/words classified as toxic in the first or second natural language to other morphemes/words not classified as toxic, and 2 Natural language conversion can be performed (S209b).

마지막으로, 사용자 장치는 최종 변환/출력된 제1 또는 제2 자연어를, 제1 영어 문장에 대한 응답인 제2 영어 문장으로서 출력/응답할 수 있다(S210). 제2 영어 문장은, 도 1에서 예시한 바와 같은 인터페이스 화면을 통해, 인공지능 아바타의 음성 및/또는 채팅으로 출력될 수 있다. Finally, the user device may output/response the finally converted/output first or second natural language as a second English sentence that is a response to the first English sentence (S210). The second English sentence may be output as voice and/or chatting of the artificial intelligence avatar through the interface screen as illustrated in FIG. 1 .

S209 단계는 실시예에 따라 수행되거나 제외될 수 있다. 이 경우, S208 단계에서 선택된 제1 또는 제2 자연어가, 제2 영어 문장으로서 출력될 수 있다.Step S209 may be performed or excluded according to embodiments. In this case, the first or second natural language selected in step S208 may be output as a second English sentence.

도 9는 본 발명의 일 실시예예 따른 학교/학원 시스템과의 교육 연계를 통한 영어 말하기 교육 방법을 예시한 도면이다.9 is a diagram illustrating an English speaking education method through education linkage with a school/academy system according to an embodiment of the present invention.

본 발명의 일 실시예예 따른 영어 말하기 교육 방법/시스템은, 학습자의 학교 및/또는 학원 시스템(910)(또는 학습 지도자의 사용자 장치)과 연동/연계되어 더욱 높은 수준의 교육을 제공할 수 있다. The English speaking education method/system according to an embodiment of the present invention can provide a higher level of education by linking/linking with a learner's school and/or academy system 910 (or a learning leader's user device).

보다 상세하게는, 학교/학원 시스템(910)을 통해 학습 지도자는 교육하고자 하는 학습 컨텐츠와 관련된 템플릿 또는 학습 모델을 생성하여 사용자 장치로 전송할 수 있다. 템플릿의 경우, 도 2의 S206 단계에서 템플릿 기반 제1 자연어를 생성하는 데 사용될 수 있으며, 학습 모델의 경우 S207 단계에서 제2 자연어를 생성하는 데 사용될 수 있다. More specifically, through the school/academy system 910, a learning leader may create a template or learning model related to learning content to be taught and transmit it to the user device. A template may be used to generate a template-based first natural language in step S206 of FIG. 2 , and a learning model may be used to generate a second natural language in step S207 .

학습 지도자는 기설정된 형식에 따라 교육하고자 하는 학습 컨텐츠를 작성함으로써 템플릿을 생성할 수 있다. 예를 들어, 학습 지도자는 Rasa open source framework 형식에 맞게 학습 컨텐츠를 입력할 수 있으며, 학습 지도자의 사용자 장치는 이를 템플릿 형식으로 변환하여 학습자의 사용자 장치로 전송해줄 수 있다. A learning leader may create a template by creating learning content to be taught according to a preset format. For example, a learning leader can input learning content in accordance with the Rasa open source framework format, and the learning leader's user device can convert it into a template format and transmit it to the learner's user device.

또한, 학습 지도자는 교육하고자 하는 학습 컨텐츠를 AI 모델에 학습시킴으로써 학습 모델을 생성할 수 있다. 이때, 학습 지도자는 학습 컨텐츠를 학습자에게 학습시키고자 하는 언어(예를 들어, 영어, 한국어 등)로 AI 모델을 학습시킴으로써 학습 모델을 생성할 수 있다. 이렇게 생성된 모델은 학습자의 사용자 장치로 전송될 수 있다. In addition, the learning leader can create a learning model by learning the learning content to be taught to the AI model. At this time, the learning leader can create a learning model by learning the AI model in a language (eg, English, Korean, etc.) in which the learner wants to learn the learning content. The model created in this way can be transmitted to the learner's user device.

이렇듯 학교/학원 시스템(910)으로부터 수신된 템플렛 및/또는 학습 모델을 기반으로 인공지능 아바타(920)가 자연스럽게 학습 언어로 학습자와 대화를 진행할 수 있으며, 이를 통해 말하기 학습뿐 아니라, 다른 교과에 대한 학습도 동시에 수행할 수 있다는 효과가 발생한다.In this way, based on the template and/or learning model received from the school/academy system 910, the artificial intelligence avatar 920 can naturally communicate with the learner in the learning language, and through this, not only learning to speak, but also learning about other subjects. There is an effect that learning can be performed simultaneously.

도 10은 본 발명의 일 실시예에 따른 인공지능 아바타와의 대화 흐름을 예시한 도면이다. 10 is a diagram illustrating a conversation flow with an artificial intelligence avatar according to an embodiment of the present invention.

특히, 본 도면은 본 발명에 따라 실제 구현된 인공지능 아바타와 학습자 간의 대화 흐름을 나타낸다. 본 도면에 나타난 바와 같이, 인공지능 아바타는 학습자의 발화/대화 의도, 상황을 정확히 인지하여 자연스러운 대화가 가능하다. In particular, this figure shows a conversation flow between an artificial intelligence avatar actually implemented according to the present invention and a learner. As shown in this figure, the artificial intelligence avatar can have a natural conversation by accurately recognizing the learner's speech/conversation intention and situation.

도 11은 본 발명의 일 실시예예 따른 인공지능 아바타를 예시한다.11 illustrates an artificial intelligence avatar according to an embodiment of the present invention.

본 도면에 예시한 바와 같이, 학습 대상별 인공지능 아바타의 설정이 자유롭게 가능하며, 각 인공지능 아바타의 구체적인 특징(예를 들어, 이름, 나이, 태생지, 성격, 발음 등)까지 직접 설정이 가능하다. 사용자 장치는 인식된 학습자별 설정된 인공지능 아바타를 활성화하여 대화를 진행할 수 있으며, 인공지능 아바타에 대하여 설정되어 있는 특징을 대화에 직접 반영하여 출력할 수도 있다.As illustrated in this figure, it is possible to freely set artificial intelligence avatars for each learning subject, and it is possible to directly set specific characteristics (eg, name, age, place of birth, personality, pronunciation, etc.) of each artificial intelligence avatar. The user device can proceed with a conversation by activating the artificial intelligence avatar set for each recognized learner, and can directly reflect and output characteristics set for the artificial intelligence avatar to the conversation.

예를 들어, 도 2의 마지막 단계에 인공지능 아바타 특징 기반 대화 변환 단계가 추가될 수 있으며, 사용자 장치는 최종 생성된 제2 영어 문장을 인공지능 아바타 특징을 기반으로 변환(예를 들어, 단어, 톤, 빠르기 등을 조절)하여 최종 출력할 수 있다. For example, an artificial intelligence avatar feature-based conversation conversion step may be added to the last step of FIG. 2, and the user device converts the finally generated second English sentence based on the artificial intelligence avatar feature (eg, words, Tone, tempo, etc.) can be finalized.

도 12는 본 발명의 일 실시예예 따른 사용자 장치의 블록도이다.12 is a block diagram of a user device according to an embodiment of the present invention.

도 12를 참조하면, 사용자 장치(1200)는 제어부(1210), 메모리부(1220), 통신부(1230) 및/또는 사용자 입/출력부(1240)를 포함할 수 있다. Referring to FIG. 12 , a user device 1200 may include a control unit 1210, a memory unit 1220, a communication unit 1230, and/or a user input/output unit 1240.

제어부(1210)는 적어도 하나의 구성/유닛을 제어하여 본 명세서에서 제안된 적어도 하나의 실시예를 수행할 수 있다. 제어부(1210)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), AP(Application Processor), AP(Application Processor) 또는 본 발명의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 적어도 하나 포함하여 구성될 수 있다. 제어부(1210)는 본 발명의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다.The controller 1210 may control at least one component/unit to perform at least one embodiment proposed in this specification. The control unit 1210 may be a Central Processing Unit (CPU), Micro Processor Unit (MPU), Micro Controller Unit (MCU), Application Processor (AP), Application Processor (AP), or any form well known in the art. It may be configured to include at least one processor. The controller 1210 may perform calculations for at least one application or program for executing a method according to embodiments of the present invention.

메모리부(1220)는, 비디오, 오디오, 사진, 동영상, 애플리케이션 등 다양한 디지털 데이터를 저장할 수 있다. 메모리부(1210)는 플래시 메모리, HDD(Hard Disk Drive), SSD(Solid State Drive) 등의 다양한 디지털 데이터 저장 공간을 나타낸다.The memory unit 1220 may store various digital data such as video, audio, photos, moving pictures, and applications. The memory unit 1210 represents various digital data storage spaces such as a flash memory, a hard disk drive (HDD), and a solid state drive (SSD).

통신부(1230)는 적어도 하나의 유/무선 통신 프로토콜을 사용하여 통신을 수행, 데이터를 송신/수신할 수 있다. 통신부(1230)는 유선 또는 무선으로 외부 네트워크에 접속하여, 디지털 데이터를 송신/수신할 수 있다. The communication unit 1230 may transmit/receive data by performing communication using at least one wired/wireless communication protocol. The communication unit 1230 may transmit/receive digital data by accessing an external network by wire or wirelessly.

사용자 입/출력부(1240)는, 적어도 하나의 센서를 이용하여 사용자의 입력을 센싱하고, 적어도 하나의 출력 수단(예를 들어, 디스플레이, 스피커 등)을 이용하여 시각적/청각적 화면/효과/피드백을 출력할 수 있다. 적어도 하나의 센서는, 중력(gravity) 센서, 지자기 센서, 모션 센서, 자이로스코프 센서, 가속도 센서, 적외선 센서, 기울임(inclination) 센서, 밝기 센서, 고도 센서, 후각 센서, 온도 센서, 뎁스 센서, 압력 센서, 밴딩 센서, 오디오 센서, 비디오 센서, GPS(Global Positioning System) 센서, 터치 센서 및 그립 센서 등의 다양한 센싱 수단 중 적어도 어느 하나를 포함할 수 있다. The user input/output unit 1240 senses a user's input using at least one sensor, and uses at least one output means (eg, a display, a speaker, etc.) to visual/audible screen/effect/ Feedback can be output. The at least one sensor may include a gravity sensor, a geomagnetic sensor, a motion sensor, a gyroscope sensor, an acceleration sensor, an infrared sensor, an inclination sensor, a brightness sensor, an altitude sensor, an olfactory sensor, a temperature sensor, a depth sensor, and a pressure sensor. It may include at least one of various sensing means such as a sensor, a bending sensor, an audio sensor, a video sensor, a Global Positioning System (GPS) sensor, a touch sensor, and a grip sensor.

본 발명에 따른 실시예는 다양한 수단, 예를 들어, 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 일 실시예는 하나 또는 그 이상의 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서, 콘트롤러, 마이크로 콘트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.An embodiment according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of hardware implementation, one embodiment of the present invention provides one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs ( field programmable gate arrays), processors, controllers, microcontrollers, microprocessors, etc.

또한, 펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 일 실시예는 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차, 함수 등의 형태로 구현되어, 다양한 컴퓨터 수단을 통하여 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.In addition, in the case of implementation by firmware or software, an embodiment of the present invention is implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above, and is stored on a recording medium readable through various computer means. can be recorded. Here, the recording medium may include program commands, data files, data structures, etc. alone or in combination. Program instructions recorded on the recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in computer software. For example, recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs (Compact Disk Read Only Memory) and DVDs (Digital Video Disks), floptical It includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, such as a floptical disk, and ROM, RAM, flash memory, and the like. Examples of program instructions may include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes generated by a compiler. These hardware devices may be configured to act as one or more software modules to perform the operations of the present invention, and vice versa.

아울러, 본 발명에 따른 장치나 단말은 하나 이상의 프로세서로 하여금 앞서 설명한 기능들과 프로세스를 수행하도록 하는 명령에 의하여 구동될 수 있다. 예를 들어 그러한 명령으로는, 예컨대 JavaScript나 ECMAScript 명령 등의 스크립트 명령과 같은 해석되는 명령이나 실행 가능한 코드 혹은 컴퓨터로 판독 가능한 매체에 저장되는 기타의 명령이 포함될 수 있다. 나아가 본 발명에 따른 장치는 서버 팜(Server Farm)과 같이 네트워크에 걸쳐서 분산형으로 구현될 수 있으며, 혹은 단일의 컴퓨터 장치에서 구현될 수도 있다.In addition, an apparatus or terminal according to the present invention may be driven by a command that causes one or more processors to perform the functions and processes described above. For example, such instructions may include interpreted instructions, such as script instructions such as JavaScript or ECMAScript instructions, or executable code or other instructions stored on a computer readable medium. Furthermore, the device according to the present invention may be implemented in a distributed manner over a network, such as a server farm, or may be implemented in a single computer device.

또한, 본 발명에 따른 장치에 탑재되고 본 발명에 따른 방법을 실행하는 컴퓨터 프로그램(프로그램, 소프트웨어, 소프트웨어 어플리케이션, 스크립트 혹은 코드로도 알려져 있음)은 컴파일 되거나 해석된 언어나 선험적 혹은 절차적 언어를 포함하는 프로그래밍 언어의 어떠한 형태로도 작성될 수 있으며, 독립형 프로그램이나 모듈, 컴포넌트, 서브루틴 혹은 컴퓨터 환경에서 사용하기에 적합한 다른 유닛을 포함하여 어떠한 형태로도 전개될 수 있다. 컴퓨터 프로그램은 파일 시스템의 파일에 반드시 대응하는 것은 아니다. 프로그램은 요청된 프로그램에 제공되는 단일 파일 내에, 혹은 다중의 상호 작용하는 파일(예컨대, 하나 이상의 모듈, 하위 프로그램 혹은 코드의 일부를 저장하는 파일) 내에, 혹은 다른 프로그램이나 데이터를 보유하는 파일의 일부(예컨대, 마크업 언어 문서 내에 저장되는 하나 이상의 스크립트) 내에 저장될 수 있다. 컴퓨터 프로그램은 하나의 사이트에 위치하거나 복수의 사이트에 걸쳐서 분산되어 통신 네트워크에 의해 상호 접속된 다중 컴퓨터나 하나의 컴퓨터 상에서 실행되도록 전개될 수 있다.In addition, a computer program (also known as a program, software, software application, script or code) loaded into a device according to the present invention and executing the method according to the present invention includes a compiled or interpreted language or a priori or procedural language. It can be written in any form of programming language, and can be deployed in any form, including stand-alone programs, modules, components, subroutines, or other units suitable for use in a computer environment. Computer programs do not necessarily correspond to files in a file system. A program may be in a single file provided to the requested program, or in multiple interacting files (e.g., one or more modules, subprograms, or files that store portions of code), or parts of files that hold other programs or data. (eg, one or more scripts stored within a markup language document). A computer program may be deployed to be executed on a single computer or multiple computers located at one site or distributed across multiple sites and interconnected by a communication network.

설명의 편의를 위하여 각 도면을 나누어 설명하였으나, 각 도면에 서술되어 있는 실시예들을 병합하여 새로운 실시예를 구현하도록 설계하는 것도 가능하다. 또한, 본 발명은 상술한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상술한 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시 예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.For convenience of description, each drawing has been divided and described, but it is also possible to design to implement a new embodiment by merging the embodiments described in each drawing. In addition, the present invention is not limited to the configuration and method of the described embodiments as described above, but the above-described embodiments are configured by selectively combining all or part of each embodiment so that various modifications can be made. It could be.

또한, 이상에서는 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 명세서는 상술한 특정의 실시예에 한정되지 아니하며, 청구 범위에서 청구하는 요지를 벗어남이 없이 당해 명세서가 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형 실시들은 본 명세서의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안될 것이다.In addition, although preferred embodiments have been shown and described above, this specification is not limited to the specific embodiments described above, and those skilled in the art in the art to which the specification pertains without departing from the subject matter claimed in the claims Of course, various modifications are possible by the person, and these modifications should not be individually understood from the technical spirit or perspective of the present specification.

Claims

Emotion and memory-based interactive artificial intelligence avatar English speaking education method,
Receiving a user input including a first English sentence;
performing a natural language understanding operation on the first English sentence;
As a result of performing the natural language understanding operation, determining whether there is a predetermined intention classification matching the conversation intention included in the first English sentence;
generating at least one first natural language based on the first English sentence when there is no predetermined intent classification matching the conversation intent;
calling a template associated with the preset intent classification and generating at least one second natural language based on the called template when there is the preset intent classification that matches the conversation intent;
extracting an embedding value from the at least one first or second natural language, and extracting the first or second natural language based on the extracted embedding value; and
outputting the extracted first or second natural language as a second English sentence that is a response to the first English sentence; Including,
Generating at least one first natural language based on the first English sentence when there is no preset intent classification matching the conversation intent,
receiving an educational content model; and
Generating a plurality of questions related to learning objects predefined in the educational content model as the second English sentence and the fifth English sentence, respectively, based on the first natural language,
The step of outputting as the second English sentence,
receiving a user input including a third English sentence as a response to the second English sentence;
inputting the third English sentence into the educational content model;
Evaluating sentence completeness of the third English sentence output from the educational content model;
If the evaluated sentence includes an error according to the learning target criterion and the completeness level is less than the preset sentence completeness, immediately generating another question in which the error is corrected, which is predefined in the educational content model, as the third natural language ;
outputting the third natural language as a fourth English sentence sequentially succeeding the second English sentence and the third English sentence in response to the third English sentence; and
After the fourth English sentence is output, outputting the generated fifth English sentence,
Storing the first and second English sentences as a set in a conversation database; further comprising,
Generating at least one first natural language based on the first English sentence when there is no preset intent classification matching the conversation intent,
extracting keywords from the first English sentence;
searching for the keyword on the database;
generating the first natural language based on a search result; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 1,
Performing a natural language understanding operation for the first English sentence,
performing pre-processing on the first English sentence;
Classifying emotion categories of the preprocessed first English sentences; and
extracting a conversation intention and object name from the preprocessed first English sentence; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 2,
The step of performing preprocessing on the first English sentence,
performing a tokenizer based on blank characters included in the first English sentence; and
removing articles included in the first English sentence; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 2,
Classifying the emotion category,
inputting the preprocessed first English sentence to an emotion classification model pre-constructed through ELECTRA, and classifying the emotion contained in the first English sentence into a preset emotion category; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 2,
The step of extracting the conversation intention and object name,
inputting the pre-processed first English sentence to a dialogue intention and entity name extraction model built in advance through a DIET classifier, and extracting a conversation intention and entity name contained in the first English sentence; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 5,
The object name,
Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method including at least one of a name, title, personal pronoun and referential pronoun included in the preprocessed first English sentence.

According to claim 2,
The step of calling a template associated with the preset intent classification,
generating a conversation scenario according to a preset framework format by tracking at least one of text, time, the emotional category, the conversation intention, and the object name from the preprocessed first English sentence;
performing an action according to a preset behavioral policy corresponding to the preset intent classification when there is no template matching the created dialogue scenario; and
calling the template when there is the template that matches the created dialogue scenario; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 7,
The preset framework is a Rasa Open source frame work, an interactive artificial intelligence avatar English speaking education method based on emotion and memory.

According to claim 8,
The template is a template in which responses for each predetermined intent classification are defined in a correspondence relationship of 1 to N (N is a natural number), an emotion- and memory-based interactive artificial intelligence avatar English speaking education method.

According to claim 9,
Generating at least one second natural language based on the called template,
generating at least one response corresponding to a preset intent classification of the first English sentence as the second natural language within the called template; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 1,
Generating at least one first natural language based on the first English sentence when there is no preset intent classification matching the conversation intent,
generating the first natural language by calling a davinci model of Open AI GPT3 through an application programming interface (API) and inputting the first English sentence into the davinci model; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 1,
Generating at least one first natural language based on the first English sentence when there is no preset intent classification matching the conversation intent,
generating the first natural language by inputting the first English sentence to a Blender Bot model; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.

According to claim 1,
Generating at least one first natural language based on the first English sentence when there is no preset intent classification matching the conversation intent,
extracting keywords from the first English sentence;
Searching for the keyword on the web; and
generating the first natural language based on a search result; Emotion and memory-based interactive artificial intelligence avatar English speaking teaching method comprising a.