KR102551296B1

KR102551296B1 - Dialogue system and its method for learning to speak foreign language

Info

Publication number: KR102551296B1
Application number: KR1020210126586A
Authority: KR
Inventors: 조민수; 권오욱; 노윤형; 이기영; 이요한; 최승권; 황금하
Original assignee: 한국전자통신연구원
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2023-07-05
Also published as: KR20230043558A

Abstract

본 발명은 대화 코퍼스의 양이 적은 외국어 학습 도메인에서, 미션 아이템 후보 선정 모듈을 통해 대화 코퍼스에 존재하지 않은 예제에 대해서도 학습자가 주제에 맞게 발화하고 있는지 정확히 파악하도록 하여, 시스템의 대화 성능을 효율적으로 개선시킬 수 있다. 또한, 본 발명은 대화 상태 예측 단계에서 자유 대화 추적 기술을 활용하여, 목적 대화 중 발생할 수 있는 자유 대화에 대해서도 처리함으로써, 예측이 불가한 자유 대화까지도 처리할 수 있다.In the foreign language learning domain where the amount of dialogue corpus is small, the present invention efficiently improves the conversation performance of the system by accurately identifying whether the learner is uttering in accordance with the topic even for examples that do not exist in the conversation corpus through the mission item candidate selection module. can be improved In addition, the present invention can process even unpredictable free conversations by processing free conversations that may occur during target conversations by utilizing free conversation tracking technology in the conversation state prediction step.

Description

Dialogue system and its method for learning to speak foreign language}

본 발명은 외국어 말하기 학습을 위한 대화 장치 및 방법에 관한 것으로, 더욱 상세하게는, 외국어 문형을 시스템과 대화로 학습하고자 하는 학습자를 위한 외국어 학습용 대화 시스템이다.The present invention relates to a conversation apparatus and method for learning to speak a foreign language, and more particularly, to a conversation system for learning a foreign language for learners who want to learn sentence patterns in a foreign language through dialogue with the system.

일반적인 목적지향 대화 시스템에서는 사용자 요구 사항을 파악하기 위해, 사용자 발화와 대화 이력으로부터 사용자의 목적에 해당하는 정보를 추출하는 단계를 거치는데, 이를 정확하게 수행하기 위해서는 다양한 예제를 포함하는 학습 코퍼스가 필수적이다. In a general goal-directed conversation system, in order to identify user requirements, a step of extracting information corresponding to the user's purpose from the user's utterance and conversation history is required. In order to do this accurately, a learning corpus including various examples is essential. .

일 예로, 'favorite hobby' 주제에 대해서 시스템이 학습자의 예상 발화로 "I enjoy [$hobby]"로 학습 또는 준비된 경우, 학습자가 "I enjoy [drawing]"라고 발화했는데, drawing이 학습 대화 코퍼스나 규칙에서 $hobby의 예제로 존재하지 않으면, 학습자가 올바른 문장을 발화했음에도 불구하고 'favorite hobby'에 대한 정보라고 파악하지 못할 수 있다. For example, for the subject of 'favorite hobby', if the system learns or prepares "I enjoy [$hobby]" as the learner's expected utterance, the learner utters "I enjoy [drawing]", and drawing If it does not exist as an example of $hobby in the rule, it may not be recognized as information about 'favorite hobby' even though the learner utters the correct sentence.

또 다른 예로, 'favorite sport', 'favorite food' 주제에 대해서, 학습자가 두개의 주제를 개별 문장으로 발화하지 않고, 하나의 문장으로 $sport와 $food를 표현하여 "I enjoy [skiing] and eating [hamburger]"라고 발화한 경우에 대해서, 시스템은 학습자가 두개의 주제에 대해서 올바르게 발화했지만 대화 코퍼스에 두 주제를 동시에 발화하는 예제가 존재하지 않아, 해당 문장의 표현을 올바른 표현이라고 판단하지 못할 수 있다. As another example, for the subjects 'favorite sport' and 'favorite food', the learner expresses $sport and $food in one sentence without uttering the two subjects as separate sentences, saying "I enjoy [skiing] and eating" [hamburger]", the system may not judge the expression of the sentence as the correct expression because the learner uttered correctly on two topics, but there is no example of uttering both topics at the same time in the conversation corpus. there is.

외국어 학습을 위한 대화 시스템에서는, 주어진 예제 문장을 그대로 표현하는 것보다 자신에게 맞는 문장을 다양한 형태로 표현하는 것이 보다 효과적일 것이다. 따라서, 학습 코퍼스나 규칙으로 준비되지 않은 문형의 예제, 즉 단어, 구 또는 비슷한 문형에 대해서도 시스템은 학습자의 상황에 따라 문장이 옳은 표현인가를 판별할 수 있어야 한다. In a conversation system for foreign language learning, it would be more effective to express sentences suitable for oneself in various forms than to express given example sentences as they are. Therefore, even for examples of sentence patterns not prepared by learning corpus or rules, that is, words, phrases, or similar sentence patterns, the system must be able to determine whether the sentence is a correct expression according to the learner's situation.

또한 외국어 학습 대화 시스템에서는 대화 중 학습자가 학습 주제와 상관없는 비목적 대화, 즉 자유 대화를 발화하는 경우가 발생할 수 있다. In addition, in the foreign language learning conversation system, a case in which a learner utters a non-purpose conversation, that is, a free conversation, that has nothing to do with the learning topic may occur during a conversation.

예를 들어, 학습자가 시스템과 학습 대화를 주고받다가 집중력이 떨어지거나 흥미를 잃어, "재미없어" 또는"근데 오늘 날씨가 어때?"와 같은 자유 대화를 시도할 수 있다. 이는 실제 학습 환경에서 빈번하게 발생할 수 있는 자연스러운 현상으로, 학습 효과를 최대화하기 위해서는 예측이 불가한 자유 대화까지도 대화 시스템이 처리할 수 있도록 해야 한다. For example, a learner may lose concentration or lose interest while exchanging learning conversations with the system, and may try free conversations such as "It's not fun" or "How is the weather today?" This is a natural phenomenon that can occur frequently in an actual learning environment. In order to maximize the learning effect, the conversation system should be able to handle even unpredictable free conversation.

하지만, 종래의 목적 지향 대화 시스템에서는 학습자의 태스크 수행 도중 발생할 수 있는 자유 대화에 대해서는 처리하지 않는 한계가 존재한다. However, in the conventional goal-oriented conversation system, there is a limitation in not processing free conversation that may occur during the learner's task performance.

상술한 문제점을 해결하기 위한 본 발명은 학습 코퍼스에 없는 다양한 예제들에 대해서도 학습자의 발화 표현을 정확히 인지하여, 학습자가 학습해야 할 표현을 올바르게 수행하고 있는지 또는 자유 대화를 발화하고 있는지를 파악하고, 이에 맞는 응답과 대화를 수행할 수 있는 외국어 말하기 학습을 위한 대화 장치 및 방법을 제공하는 데 있다.In order to solve the above problems, the present invention accurately recognizes the speech expression of the learner even for various examples not in the learning corpus, and determines whether the learner is correctly performing the expression to be learned or uttering free conversation, It is an object of the present invention to provide a conversation apparatus and method for learning to speak a foreign language capable of performing a response and a conversation accordingly.

본 발명의 전술한 목적 및 그 이외의 목적과 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부된 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다.The foregoing and other objects, advantages and characteristics of the present invention, and methods of achieving them will become clear with reference to the embodiments described below in detail in conjunction with the accompanying drawings.

상술한 목적을 달성하기 위한 대화 장치에 의해 수행되는 외국어 말하기 학습을 위한 방법은, 음성 인식 모듈이, 상기 대화 장치에 의해 주어지는 복수의 미션 아이템들 중에서 어느 하나의 미션 아이템에 대해 발화한 학습자 음성을 인식하여 상기 학습자 음성에 대응하는 텍스트를 생성하는 단계; 미션 아이템 후보 선정 모듈이, 상기 복수의 미션 아이템들 중에서 상기 텍스트와 유사도가 가장 높은 미션 아이템 후보를 선정하는 단계; 대화 상태 예측 모듈이, 상기 텍스트, 상기 미션 아이템 후보, 상기 복수의 미션 아이템들 중에서 상기 미션 아이템 후보를 제외한 나머지 미션 아이템들 및 상기 복수의 미션 아이템들과 관련성이 없는 기타 아이템을 이용하여, 슬롯과 상기 슬롯에 연결되는 슬롯값을 포함하는 대화 상태를 예측하는 단계; 및 시스템 발화 생성 모듈이, 상기 예측된 대화 상태를 기반으로, '미션 대화' 또는 '자유 대화'에 대응하는 시스템 발화를 출력하는 단계 를 포함한다.In order to achieve the above object, a method for learning to speak a foreign language performed by a conversation device includes a voice recognition module using a learner's voice uttered for any one mission item among a plurality of mission items given by the conversation device. recognizing and generating text corresponding to the learner's voice; selecting, by a mission item candidate selection module, a mission item candidate having the highest similarity with the text from among the plurality of mission items; The dialog state prediction module uses the text, the mission item candidate, mission items other than the mission item candidate among the plurality of mission items, and other items unrelated to the plurality of mission items to determine the slot and predicting a conversation state including a slot value connected to the slot; and outputting, by a system speech generation module, a system speech corresponding to 'mission dialogue' or 'free dialogue' based on the predicted dialogue state.

본 발명에 따르면, 대화 코퍼스의 양이 적은 외국어 학습 도메인에서, 미션 아이템 후보 선정 모듈을 통해 대화 코퍼스에 존재하지 않은 예제에 대해서도 학습자가 주제에 맞게 발화하고 있는지 정확히 파악하도록 하여, 시스템의 대화 성능을 효율적으로 개선시킬 수 있다. According to the present invention, in a foreign language learning domain with a small amount of dialogue corpus, the dialogue performance of the system is improved by accurately determining whether the learner is uttering in accordance with the topic even for examples that do not exist in the dialogue corpus through the mission item candidate selection module. can be improved efficiently.

또한, 대화 상태 예측 단계에서 자유 대화 추적 기술을 활용하여, 목적 대화 중 발생할 수 있는 자유 대화에 대해서도 처리함으로써, 예측이 불가한 자유 대화까지도 처리할 수 있다.In addition, by using the free conversation tracing technology in the conversation state prediction step, free conversation that may occur during the target conversation is also processed, so that even unpredictable free conversation can be processed.

도 1은 본 발명의 실시 예에 따른 외국어 말하기 학습을 위한 대화 장치의 내부 구성을 개략적으로 나타내는 전체 구성도이다.
도 2는 도 1에 도시한 미션 아이템과 대화 코퍼스의 예를 설명하기 위한 도면이고,
도 3은 도 1에 도시한 미션 아이템 후보 선정 모듈의 처리 과정을 설명하기 위한 도면이다.
도 4는 도 1에 도시한 대화 상태 예측 모듈의 상세 구성도이다.
도 5는 도 1에 도시한 시스템 발화 생성 모듈의 상세 구성도이다.
도 6은 도 4에 도시한 슬롯 유형 확장부와 슬롯값 예측부의 출력 예와 도 5에 도시된 대화 유형 인지부와 시스템 발화 생성부의 출력 예를 설명하기 위한 테이블이다.
도 7은 본 발명의 다른 실시 예에 따른 언어 모델을 활용한 외국어 말하기 학습을 위한 대화 장치의 구성도이다.
도 8은 본 발명의 실시 예에 따른 외국어 말하기 학습을 위한 대화 장치에 의해 수행되는 외국어 말하기 학습을 위한 방법을 나타내는 흐름도이다.1 is an overall configuration diagram schematically showing the internal configuration of a conversation apparatus for learning to speak a foreign language according to an embodiment of the present invention.
2 is a diagram for explaining an example of a mission item and a dialogue corpus shown in FIG. 1;
FIG. 3 is a diagram for explaining a process of the mission item candidate selection module shown in FIG. 1 .
4 is a detailed configuration diagram of the dialog state prediction module shown in FIG. 1;
5 is a detailed configuration diagram of the system utterance generation module shown in FIG. 1;
FIG. 6 is a table for explaining output examples of the slot type extension unit and slot value prediction unit shown in FIG. 4 and output examples of the dialogue type recognition unit and system speech generation unit shown in FIG. 5 .
7 is a block diagram of a conversation apparatus for learning to speak a foreign language using a language model according to another embodiment of the present invention.
8 is a flowchart illustrating a method for learning to speak a foreign language performed by a conversation apparatus for learning to speak a foreign language according to an embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are only illustrated for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention It can be implemented in various forms and is not limited to the embodiments described herein.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers, It should be understood that the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded.

본 발명에 따른 외국어 학습용 대화 시스템은, 학습자에게 서너 개의 주제를 제공하여, 학습자가 주제에 적합한 문장을 발화하도록 질문하고, 이에 학습자가 응답하는 형태의 대화를 수행한다. The conversation system for learning a foreign language according to the present invention provides learners with three or four topics, asks questions to utter sentences suitable for the topics, and performs a conversation in which the learners respond.

특별히 한정하는 것은 아니지만, 본 발명은 호텔 및 택시와 같은 도메인에서, 시스템 질문과 사용자 응답을 기반으로 예약이 이뤄지는 목적지향 대화 시스템과 유사한 구조를 가질 수 있다.Although not particularly limited, the present invention may have a structure similar to a destination-oriented conversation system in which reservations are made based on system questions and user responses in domains such as hotels and taxis.

이하, 도면을 참조하여, 본 발명의 실시 예에 대해 상세히 설명하기로 한다.Hereinafter, with reference to the drawings, an embodiment of the present invention will be described in detail.

도 1은 본 발명의 실시 예에 따른 외국어 말하기 학습을 위한 대화 장치의 내부 구성을 개략적으로 나타내는 전체 구성도이다.1 is an overall configuration diagram schematically showing the internal configuration of a conversation apparatus for learning to speak a foreign language according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시 예에 따른 외국어 말하기 학습을 위한 대화 장치에서는, 외국어 학습 도메인을 영어로 가정한다. Referring to FIG. 1 , in the conversation apparatus for learning to speak a foreign language according to an embodiment of the present invention, it is assumed that the foreign language learning domain is English.

대화 장치는 학습 단계와 실행 단계로 나눌 수 있다. 학습 단계에서는 대화 코퍼스(107)를 통한 시스템 모듈의 파라미터 업데이트 및 학습이 진행된다. 실행 단계에서는 학습된 모듈을 활용한 실제 학습자와 시스템의 질문-응답 형태의 대화가 수행된다. The dialogue device can be divided into a learning phase and an execution phase. In the learning step, parameters of the system module are updated and learned through the dialogue corpus 107. In the execution stage, a dialogue in the form of question-response is performed between the actual learner using the learned module and the system.

아래 예시는 도 1의 구성 요소인 미션 아이템(100)과 대화 코퍼스(107)를 예시로 나타낸 것이다.The example below shows the mission item 100 and the dialogue corpus 107, which are the components of FIG. 1, as an example.

실행 단계에서 학습자는(101)는, 대화 장치로부터 학습자가 대화내에서 발화해야 할 주제 또는 학습 표현을 해당 언어 표현으로 나타낸 미션 아이템(100)을 입력 받는다.In the execution step, the learner 101 receives a mission item 100 representing a subject or learning expression to be uttered in a conversation by the learner in a corresponding language expression from the conversation device.

미션 아이템을 입력 받은 학습자는 주어진 미션 아이템에 맞는 문장을 발화하는데, 이 때 학습자 발화는 음성 형태 또는 텍스트 형태로 입력 받을 수 있다. The learner receiving the mission item utters a sentence suitable for the given mission item, and at this time, the learner's utterance can be input in the form of voice or text.

학습자 발화가 음성인 경우, 음성 인식 모듈(102)는 학습자(101)의 음성을 텍스트로 변환하고, 학습자(101)의 발화가 텍스트 형태인 경우, 음성 인식 모듈(102)는 학습자(101)의 발화에 대응하는 텍스트를 그대로 미션 아이템 후보 선정 모듈(103)로 전달한다. If the learner's utterance is voice, the voice recognition module 102 converts the learner's 101's voice into text, and if the learner's 101's utterance is in text form, the voice recognition module 102 converts the learner's 101's voice into text. The text corresponding to the utterance is transferred to the mission item candidate selection module 103 as it is.

미션 아이템 후보 선정 모듈(103)은 학습자(101)의 발화가 미션 아이템에 해당하는지 여부를 파악하고, 미션 아이템에 해당할 경우 어떤 미션 아이템에 해당하는지를 파악하는 모듈로, 학습자의 발화에 대응하는 텍스트를 입력으로 이용하여 발화 문장과 가장 유사한 미션 아이템 후보를 선정하여 출력한다. The mission item candidate selection module 103 is a module that determines whether the utterance of the learner 101 corresponds to a mission item, and if so, which mission item it corresponds to, the text corresponding to the learner's utterance. is used as an input to select and output the mission item candidate most similar to the spoken sentence.

대화 상태 예측 모듈(104)은 미션 아이템 후보 선정 모듈(103)로부터 전달받은 미션 아이템 후보와 학습자 발화를 입력 받아 현시점까지의 미션 아이템에 대한 정보를 가진 대화 상태를 예측한다. The dialogue state prediction module 104 receives the mission item candidates and the learner's speech transmitted from the mission item candidate selection module 103 and predicts a dialogue state having information on mission items up to the current point in time.

대화 상태는 학습자가 미션 아이템을 학습 의도에 맞게 올바르게 발화하고 있는지 관리하기 위한 것으로, 대화의 매 턴마다 각각의 미션 아이템을 슬롯(slot) 형태로 나타내어 각 슬롯에 대한 값을 예측하도록 한다. The conversation state is to manage whether the learner correctly utters the mission item according to the learning intention, and each mission item is displayed in the form of a slot at each turn of the conversation to predict the value of each slot.

시스템 발화 생성 모듈(105)에서는 대화 상태 예측 모듈(104)의 출력인 대화 상태와 학습자 발화를 입력으로 받아 학습자 발화에 맞는 다음 시스템 발화를 출력한다. 출력된 시스템 발화는 텍스트 형태 그대로 학습자(101)에게 전달되거나, 음성 출력 모듈(106)에 입력되어 음성 형태로 변환되어 학습자에게 전달된다. The system speech generating module 105 receives the dialogue state output from the dialogue state prediction module 104 and the learner's speech as inputs and outputs the next system speech that matches the learner's speech. The output system utterance is delivered to the learner 101 as it is in text form, or is input to the voice output module 106 and converted into a voice form and delivered to the learner.

학습 단계에서는 아래 예시와 같이 미션 아이템, 학습자-시스템 대화, 각 턴의 대화 상태로 구성된 대화 코퍼스를(107)를 입력 받아 대화 상태 예측 모듈(104)과 시스템 발화 생성 모듈(105)의 학습을 진행한다. In the learning phase, as shown in the example below, the dialog corpus 107 composed of the mission item, the learner-system dialog, and the dialog state of each turn is input and the dialog state prediction module 104 and the system utterance generation module 105 learn. do.

손실(Loss) 계산부(108)에서는 두 모듈들(104 및 105)로부터 생성된 결과와 실제 대화 코퍼스(107)의 정답과의 차이를 나타내는 손실 값(Loss)을 계산하고 이를 파라미터 수정 및 저장부(109)에 전달한다. 여기서, 대화 코퍼스(107)의 정답은, 예를 들면, 도 2의 참조 번호 '107A', '107B', '107C' 및 '107D'가 각각 지시하는 시스템 발화와 대화 상태일 수 있다.The loss calculation unit 108 calculates a loss value (Loss) representing the difference between the result generated from the two modules 104 and 105 and the correct answer of the actual dialogue corpus 107, and modifies and stores the value forwarded to (109). Here, the correct answers of the dialog corpus 107 may be, for example, system utterances and dialog states indicated by reference numerals '107A', '107B', '107C', and '107D' in FIG. 2 , respectively.

파라미터 수정 및 저장부(109)에서는 손실 계산부(108)로부터 전달된 손실 값(Loss)를 최소화하는 방향으로 각 모듈의 파라미터를 업데이트하고 이를 저장한다. The parameter correction and storage unit 109 updates the parameters of each module in a direction that minimizes the loss value (Loss) transmitted from the loss calculation unit 108 and stores them.

저장된 파라미터는 실행 단계에서 대화 상태 예측 모듈(104)과 시스템 발화 생성 모듈(105)에 활용된다. 단, 미션 아이템 선정 모듈(103)은 사전 학습된 임베딩을 활용할 경우, 학습을 진행하지 않아도 되며, 미션 아이템 선정 모듈(103)의 단어 또는 문장의 유사도 판별을 학습으로 진행할 경우, BERT와 같은 대용량 언어 모델 기반으로 미션 아이템 표현과 단어 또는 구, 문장에 대한 유사성에 대해 True 또는 False인지를 판별하는 이진 분류 학습을 한다.The stored parameters are utilized in the dialogue state prediction module 104 and the system speech generation module 105 in the execution phase. However, the mission item selection module 103 does not need to proceed with learning when pre-learned embedding is used, and when the similarity determination of words or sentences of the mission item selection module 103 is performed by learning, a large-capacity language such as BERT Based on the model, binary classification learning is performed to determine whether it is true or false about the similarity between the expression of the mission item and the word, phrase, or sentence.

미션 아이템 후보 선정 모듈(103)에서 사전 학습된 임베딩 정보로 미션 아이템 후보를 선정할 경우, 텍스트 형태의 학습자 발화(101, 102)와 미션 아이템(100)을 Word2Vec, Sent2Vec, FastText, 버트(Bidirectional Encoder Representations from Transformers, BERT), GPT(Generative Pre-trained Transformer) 등의 임베딩 모델을 활용하여 의미적 정보가 반영된 벡터 형태로 임베딩하여, 임베딩 벡터 간 코사인 유사도(cosine similarity), 유클리드 거리(Euclidian distance) 등의 유사도가 특정 임계치(예, 0.5)이상이면, 학습자 발화 문장(학습자 음성에 대응하는 텍스트)과 가장 적합한 미션 아이템 후보를 선정하고 이를 대화 상태 예측 모듈(104)에 전달한다. 만약, 특정 임계치 이상이 없는 경우는 자유 대화를 처리하기 위한 기타 아이템 (예, other)을 선택하여 대화 상태 예측 모듈(104)에 전달한다. When mission item candidates are selected with pre-learned embedding information in the mission item candidate selection module 103, learner utterances (101, 102) and mission items (100) in the form of text are converted into Word2Vec, Sent2Vec, FastText, Bert (Bidirectional Encoder) By using embedding models such as Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT), embedding is performed in the form of vectors that reflect semantic information to determine cosine similarity between embedding vectors, Euclidian distance, etc. If the similarity of is greater than a specific threshold (eg, 0.5), the most suitable mission item candidate with the learner's uttered sentence (text corresponding to the learner's voice) is selected and transmitted to the dialogue state prediction module 104. If there is no more than a specific threshold value, another item (eg, other) for processing free conversation is selected and transmitted to the conversation state prediction module 104 .

도 2는 도 1에 도시한 미션 아이템과 대화 코퍼스의 예를 설명하기 위한 도면이고, 도 3은 도 1에 도시한 미션 아이템 후보 선정 모듈의 처리 과정을 설명하기 위한 도면이다.2 is a diagram for explaining an example of a mission item and a dialogue corpus shown in FIG. 1 , and FIG. 3 is a diagram for explaining a process of a mission item candidate selection module shown in FIG. 1 .

도 2 및 3을 참조하면, 본 발명의 실시 예에서 제시된 미션 아이템은 favorite hobby, favorite sport, favorite food로, 각 턴(Turn)의 학습자 발화와 가장 연관성이 높은 미션 아이템을 선택하고, 선택된 후보를 다음 단계에 전달한다.Referring to FIGS. 2 and 3, the mission items presented in the embodiment of the present invention are favorite hobby, favorite sport, and favorite food, and a mission item most closely related to the learner's utterance of each turn is selected, and the selected candidate is selected. forward to the next step

도 3에 도시된 테이블 항목 중에서 "(1) 문장 및 미션 아이템 임베딩" 항목은 학습자 발화와 각 미션 아이템을 벡터 형태로 임베딩한 것이다. Among the table items shown in FIG. 3, the item “(1) Sentence and mission item embedding” is an embedding of the learner's utterance and each mission item in a vector form.

"(2) 유사도" 항목은 학습자 발화에 대한 각 미션 아이템 간의 유사도 결과를 나타낸 것이다. The item "(2) similarity" indicates the similarity result between each mission item for the learner's utterance.

"(3) 미션 아이템 후보 판단" 항목과 "(4) 미션 아이템 후보 선택" 항목은 유사도가 임계값 이상인지 이하인지를 판단한 후 그 결과를 토대로 선택된 최종 미션 아이템 후보를 나타낸 것이다. Items "(3) Mission item candidate determination" and "(4) Mission item candidate selection" indicate the final mission item candidate selected based on the result after determining whether the similarity is greater than or equal to a threshold value.

턴 1(Turn 1)의 경우, "I enjoy drawing"이라는 발화에 대해, 유사도가 임계값(0.5)이상인 favorite hobby가 미션 아이템 후보로 선택된다. 이와 같이 미션 아이템 후보 선정 모듈(103)에서는, 대화 코퍼스에 drawing에 대한 예제가 없더라도, 임베딩 유사도 측정 방식을 통해 해당 발화가 favorite hobby 미션 아이템에 해당하는지 예측할 수 있다. In the case of Turn 1, for the utterance "I enjoy drawing", a favorite hobby having a similarity higher than a threshold value (0.5) is selected as a mission item candidate. In this way, in the mission item candidate selection module 103, even if there is no example of drawing in the conversation corpus, it is possible to predict whether a corresponding utterance corresponds to a favorite hobby mission item through an embedding similarity measurement method.

턴 2(Turn 2)의 경우, This is boring이라는 발화에 대해 유사도가 임계값(0.5) 이상인 미션 아이템이 없어 기타 아이템이 선택된다. In the case of Turn 2, other items are selected because there is no mission item whose similarity is higher than the threshold value (0.5) for the utterance This is boring.

턴 3(Turn 3)의 경우, 유사도가 임계값(0.5)이상인 favorite sport와 favorite hobby가 미션 아이템 후보로 선택된다. 두개의 미션 아이템을 동시에 발화한 예제가 없더라도, 선택된 미션 아이템 후보들을 통해 학습자로부터 두개의 미션 아이템이 발화된 것을 확인할 수 있다.In the case of Turn 3, favorite sports and favorite hobbies with a similarity greater than a threshold value (0.5) are selected as mission item candidates. Even if there is no example of uttering two mission items at the same time, it can be confirmed that the two mission items are uttered by the learner through the selected mission item candidates.

미션 아이템 후보 선정 모듈(103)에서 단어 또는 문장의 유사도 판별을 학습으로 진행할 경우, 텍스트 형태의 학습자 발화(101, 102)와 각 미션 아이템(100)을 입력으로 하여 각각의 유사성에 대한 판단(true 또는 false)에 근거하여 True인 미션 아이템을 미션 아이템 후보로 선정하고, 이를 대화 상태 예측 모듈(104)에 전달한다. 만약, 미션 아이템 후보가 비어 있으면, 기타 아이템(예, other)을 선택하여 대화 상태 예측 모듈(104)에 전달한다.When the mission item candidate selection module 103 proceeds with learning to determine the similarity of words or sentences, the text-type learner utterances 101 and 102 and each mission item 100 are used as inputs to determine each similarity (true or false), a true mission item is selected as a mission item candidate and transmitted to the dialog state prediction module 104. If the mission item candidate is empty, another item (eg, other) is selected and transmitted to the dialog state prediction module 104 .

도 4는 도 1에 도시한 대화 상태 예측 모듈의 상세 구성도이다. 4 is a detailed configuration diagram of the dialog state prediction module shown in FIG. 1;

도 4를 참조하면, 대화 상태 예측 모듈(104)에서는 시스템과 학습자의 대화 이력을 바탕으로, 다음 대화를 이어 나가기 위해, 매 턴 해당 발화에서 사전 정의한 슬롯 유형의 정보를 추출한다. Referring to FIG. 4 , the conversation state prediction module 104 extracts information of a predefined slot type from a corresponding utterance every turn in order to continue the next conversation based on the conversation history between the system and the learner.

이 대화 시스템에서 슬롯 유형은, favorite hobby, favorite sport, favorite food와 같은 미션 아이템에 해당하며, 슬롯 값은 hotdog, pizza, hamburger와 같은 food 슬롯에 해당하는 값을 나타낸다. In this dialog system, slot types correspond to mission items such as favorite hobby, favorite sport, and favorite food, and slot values represent values corresponding to food slots such as hotdog, pizza, and hamburger.

대화 상태 예측에서 자유 대화를 추적하기 위해, 슬롯 유형 확장부(104A)에서는 기존 슬롯 유형에 기타 슬롯을 포함시킨다. 예를 들어, 기존 슬롯 유형(favorite hobby, favorite sport, favorite food)에 기타 슬롯 유형 other을 추가하여, 확장된 슬롯 유형(favorite hobby, favorite sport, favorite food, other)을 완성시킨다. To track free conversations in conversation state prediction, the slot type extension 104A includes other slots to existing slot types. For example, the other slot type other is added to the existing slot type (favorite hobby, favorite sport, favorite food) to complete the extended slot type (favorite hobby, favorite sport, favorite food, other).

슬롯값 예측부(104B)에서는 확장된 슬롯 유형, 미션 아이템 후보 선정 모듈(103)로부터 전달받은 미션 아이템 후보, 학습자 발화(101, 102) 등을 종합하여 슬롯-슬롯값(slot-value) 형태로 구성된 현시점의 대화 상태를 예측하고 이를 시스템 발화 생성 모듈(105)에 전달한다. 이때, 슬롯 값 예측부(104B)에서 미션 아이템 후보를 활용하는 방법으로, 학습자 발화와 단순 연결(concatenate)하거나 어텐션(attention)을 적용하여 가중치를 부여한 후 학습자 발화를 인코딩할 수 있다. The slot value prediction unit 104B synthesizes the expanded slot type, mission item candidates received from the mission item candidate selection module 103, learner utterances 101, 102, etc., and converts them into a slot-value form. The constructed dialogue state at the present time is predicted and transmitted to the system speech generation module 105. In this case, as a method of utilizing the mission item candidate in the slot value predictor 104B, the learner utterance may be encoded by simply concatenating the learner utterance or by applying attention to assign a weight.

도 5는 도 1에 도시한 시스템 발화 생성 모듈의 상세 구성도이다. 5 is a detailed configuration diagram of the system utterance generation module shown in FIG. 1;

도 5를 참조하면, 시스템 발화 생성 모듈(105)은 대화 상태 예측 모듈(104)로부터 전달받은 대화 상태에 따라 대화 유형 인지부(105A)에서 미션 아이템에 해당하는 단어를 포함하는 문장을 학습자가 발화하고 이에 시스템이 응답하는 대화 형태인 '미션 대화' 또는 학습자와 시스템이 미션 아이템과 상관없는 발화를 주고받는 대화 형태인 '자유 대화' 여부를 결정하고, 이를 반영하여 시스템 발화 생성부(105B)에서 입력 받은 학습자 발화(101, 102)에 적절한 응답을 생성하도록 한다. 생성된 응답은 텍스트 형태의 시스템 발화로 출력되어 다음 단계(101, 106)에 전달된다. Referring to FIG. 5 , in the system speech generation module 105, the learner utters a sentence including a word corresponding to a mission item from the conversation type recognition unit 105A according to the conversation state received from the conversation state prediction module 104. Then, it determines whether it is a 'mission dialogue', which is a dialogue form in which the system responds, or a 'free dialogue', which is a dialogue form in which a learner and the system exchange utterances unrelated to mission items, and reflects this in the system speech generation unit 105B. An appropriate response is generated for the received learner utterances (101, 102). The generated response is output as a system utterance in the form of text and delivered to the next steps 101 and 106.

도 6은 도 4에 도시한 슬롯 유형 확장부와 슬롯값 예측부의 출력 예와 도 5에 도시된 대화 유형 인지부와 시스템 발화 생성부의 출력 예를 설명하기 위한 테이블이다.FIG. 6 is a table for explaining output examples of the slot type extension unit and slot value prediction unit shown in FIG. 4 and output examples of the dialogue type recognition unit and system speech generation unit shown in FIG. 5 .

도 6을 참조하면, '(5) 확장된 슬롯 유형' 항목은 슬롯 유형 확장부(104A)의 출력으로, '[기타 슬롯]'을 포함하도록 확장된 슬롯 유형을 나타낸 것이다. Referring to FIG. 6, the item '(5) Extended slot type' is an output of the slot type extension unit 104A, and indicates a slot type extended to include '[other slots]'.

'(6) 예측된 대화 상태(슬롯-슬롯값)' 항목은 슬롯값 예측부(104B)로부터 예측된 현재 턴의 대화 상태로 기타 슬롯 유형(other)에 대한 슬롯값이 예측된 것을 볼 수 있다. In the item '(6) Predicted dialogue state (slot-slot value)', it can be seen that slot values for other slot types (other) are predicted as the dialogue state of the current turn predicted from the slot value prediction unit 104B. .

'(7) 대화 유형' 항목은 대화 유형 인지부(105A)의 출력으로, 업데이트된 '(6) 예측된 대화 상태(슬롯-슬롯값)'으로부터 현재 턴의 대화 유형을 예측한 것이다. 이를 기반으로 시스템 발화 생성부(105B)에서는 학습자 발화의 응답인 '(8) 시스템 발화'을 출력한다. 턴 1(Turn 1)에서 생성된 시스템 발화의 경우, 학습자가 자신의 취미에 대해 적절하게 발화한 것에 대한 반응과 함께 다음 미션 아이템으로 넘어가는 응답을 제시한다.The item '(7) Conversation type' is an output of the conversation type recognizing unit 105A, and the conversation type of the current turn is predicted from the updated '(6) predicted conversation state (slot-slot value)'. Based on this, the system speech generator 105B outputs '(8) system speech', which is a response to the learner's speech. In the case of the system utterance generated in Turn 1, a response to the learner's appropriate utterance for his/her hobby and a response to move on to the next mission item are presented.

턴 2(Turn 2)의 시스템 발화의 경우, 미션 아이템(favorite sport)과 관련이 없는 학습자 발화에 대해 학습자가 다시 학습에 집중할 수 있도록 하는 응답과 함께 다시 한번 미션 아이템 대해 물어보는 응답을 생성한다. In the case of the system utterance of Turn 2, a response that allows the learner to focus on learning again and a response asking about the mission item are generated for the learner utterance not related to the mission item (favorite sport).

턴 3(Turn 3)의 경우, 시스템은 학습자가 남은 두개의 미션 아이템(favorite sport, favorite food)에 대해 적절하게 발화했으므로, 마무리 인사로 대화를 종료한다. In the case of Turn 3, the system ends the conversation with a closing salutation as the learner uttered appropriately for the remaining two mission items (favorite sport and favorite food).

도 7은 본 발명의 다른 실시 예에 따른 언어 모델을 활용한 외국어 말하기 학습을 위한 대화 장치의 구성도이다.7 is a block diagram of a conversation apparatus for learning to speak a foreign language using a language model according to another embodiment of the present invention.

도 7을 참조하면, 본 발명의 다른 실시 예에 다른 대화 장치는, 대화 상태 예측 모듈(104)과 시스템 발화 생성 모듈(105)을 통합한 사전 학습된 언어 모델(200)을 활용하여, 대화 상태와 시스템 발화를 한 번에 예측한다. Referring to FIG. 7 , a conversation apparatus according to another embodiment of the present invention utilizes a pre-learned language model 200 in which a dialogue state prediction module 104 and a system utterance generation module 105 are integrated, and the dialogue state and system firing at once.

이러한 사전 학습된 언어 모델(200)은 미션 아이템, 학습자 발화, 미션 아이템 후보 선정 모듈(103)을 통과하여 출력된 미션 아이템 후보를 순차적으로 시스템에 입력한 다음, 대화 상태와 시스템 발화를 출력한다. The pre-learned language model 200 sequentially inputs the mission item, the learner's speech, and the mission item candidates outputted through the mission item candidate selection module 103 to the system, and then outputs a conversation state and system speech.

도 8은 본 발명의 실시 예에 따른 외국어 말하기 학습을 위한 대화 장치에 의해 수행되는 외국어 말하기 학습을 위한 방법을 나타내는 흐름도이다.8 is a flowchart illustrating a method for learning to speak a foreign language performed by a conversation apparatus for learning to speak a foreign language according to an embodiment of the present invention.

아래에서 수행되는 각 단계의 수행주체는, 도 1 내지 6의 설명으로부터 명확해질 수 있다. 다만, 도 1에 도시된 구성들은 프로세서, 메모리, 입출력 장치, 저장 매체, 메모리 및 이들을 연결하는 시스템 버스 등으로 이루어진 컴퓨팅 장치로 구현될 수 있다. 이 경우, 아래에서 수행되는 각 단계의 수행주체는, 프로세서일 수도 있다.The performers of each step performed below can be clarified from the description of FIGS. 1 to 6 . However, the configurations shown in FIG. 1 may be implemented as a computing device including a processor, a memory, an input/output device, a storage medium, a memory, and a system bus connecting them. In this case, the performer of each step performed below may be a processor.

도 8을 참조하면, 먼저, 음성 인식 모듈(102)에 의해, 상기 대화 장치에 의해 주어지는 복수의 미션 아이템들 중에서 어느 하나의 미션 아이템에 대해 발화한 학습자 음성을 인식하여 상기 학습자 음성에 대응하는 텍스트를 생성하는 단계가 수행된다(810).Referring to FIG. 8, first, the voice recognition module 102 recognizes a learner's voice uttered for any one mission item among a plurality of mission items given by the conversation device, and then text corresponding to the learner's voice. A step of generating is performed (810).

이어, 미션 아이템 후보 선정 모듈(103)에 의해, 상기 복수의 미션 아이템들 중에서 상기 텍스트와 유사도가 가장 높은 미션 아이템 후보를 선정하는 단계가 수행된다(820).Subsequently, a step of selecting a mission item candidate having the highest similarity with the text among the plurality of mission items is performed by the mission item candidate selection module 103 (820).

이어, 대화 상태 예측 모듈(104)에 의해, 상기 텍스트, 상기 미션 아이템 후보, 상기 복수의 미션 아이템들 중에서 상기 미션 아이템 후보를 제외한 나머지 미션 아이템들 및 상기 복수의 미션 아이템들과 관련성이 없는 기타 아이템을 이용하여, 슬롯과 상기 슬롯에 연결되는 슬롯값을 포함하는 대화 상태를 예측하는 단계가 수행된다(830).Then, by the conversation state prediction module 104, the text, the mission item candidate, mission items other than the mission item candidate among the plurality of mission items, and other items not related to the plurality of mission items. A step of predicting a conversation state including a slot and a slot value connected to the slot is performed (830).

이어, 시스템 발화 생성 모듈(104)에 의해, 상기 예측된 대화 상태를 기반으로, '미션 대화' 또는 '자유 대화'에 대응하는 시스템 발화를 출력하는 단계가 수행된다(840).Subsequently, a system speech generation module 104 outputs a system speech corresponding to 'mission dialogue' or 'free dialogue' based on the predicted dialogue state (840).

실시 예에서, 상기 미션 아이템은, 학습자가 상기 대화 장치와의 대화에서 발화해야 할 주제 또는 학습 표현을 나타내는 외국어일 수 있다.In an embodiment, the mission item may be a foreign language representing a topic or learning expression that a learner should utter in a conversation with the conversation device.

실시 예에서, 상기 텍스트와 유사도가 가장 높은 미션 아이템 후보를 선정하는 단계(820)는, 임베딩 모델을 이용하여, 상기 복수의 미션 아이템들과 상기 텍스트에 각각 포함된 단어를 의미적 정보가 포함된 임베딩 벡터(embedding vector)로 표현하는 단계 및 상기 임베딩 벡터를 이용하여, 상기 복수의 미션 아이템들 중에서 상기 텍스트와 유사도가 가장 높은 미션 아이템 후보를 선정하는 단계를 포함할 수 있다.In an embodiment, in step 820 of selecting a mission item candidate having the highest similarity with the text, by using an embedding model, a word included in each of the plurality of mission items and the text is included in semantic information. Expressing the text as an embedding vector, and selecting a mission item candidate having the highest similarity with the text from among the plurality of mission items using the embedding vector.

실시 예에서, 상기 슬롯과 상기 슬롯에 연결되는 슬롯값을 포함하는 대화 상태를 예측하는 단계(830)는, 상기 미션 아이템 후보를 상기 슬롯으로 구성하고, 상기 텍스트 내에서, 상기 미션 아이템 후보에 대응하는 단어를 상기 슬롯값으로 구성하는 단계, 상기 나머지 미션 아이템들을 각각 슬롯으로 구성하고, 'none'을 상기 나머지 미션 아이템들로 각각 구성된 슬롯에 연결되는 슬롯값으로 구성하는 단계, 상기 기타 아이템을 'other'를 나타내는 슬롯으로 구성하고, 'false'를 상기 'other'에 연결되는 슬롯값으로 구성하는 단계를 포함할 수 있다.In an embodiment, the step 830 of predicting a conversation state including the slot and a slot value connected to the slot comprises configuring the mission item candidate as the slot, and corresponding to the mission item candidate in the text. configuring the word as the slot value, configuring the remaining mission items as slots, configuring 'none' as a slot value connected to slots respectively composed of the remaining mission items, and configuring the other items as 'none'. It may include configuring a slot representing 'other' and configuring 'false' as a slot value connected to the 'other'.

실시 예에서, 상기 '미션 대화'에 대응하는 시스템 발화를 출력하는 단계(840)는, 상기 'other'에 연결되는 슬롯값이 상기 'false'인 경우, 학습자가 상기 어느 하나의 미션 아이템에 대해 적절히 발화한 것으로 판단하여, 상기 학습자와 상기 대화 장치 사이의 대화 유형을 상기 '미션 대화'로 인지하는 단계 및 상기 학습자가 상기 어느 하나의 미션 아이템에 대해 발화한 것에 대한 적절한 반응을 나타내는 제1 응답 표현과 상기 나머지 미션 아이템들 중에서 선정된 다음 미션 아이템에 대한 제2 응답 표현을 대화 코퍼스(도 1 및 2의 107)에서 추출하고, 상기 추출한 상기 제1 및 제2 응답 표현을 나타내는 상기 시스템 발화를 출력하는 단계를 포함할 수 있다.In an embodiment, in the step 840 of outputting the system speech corresponding to the 'mission dialogue', if the slot value connected to the 'other' is 'false', the learner learns about any one mission item. Determining that the learner has uttered appropriately, recognizing the conversation type between the learner and the conversation device as the 'mission conversation' and a first response indicating an appropriate response to the learner's utterance on any one mission item Expression and a second response expression for the next mission item selected from among the remaining mission items are extracted from the dialog corpus (107 in FIGS. 1 and 2), and the system utterance representing the extracted first and second response expressions is It may include an output step.

실시 예에서, 상기 제1 응답 표현은, 예를 들면, 도 6의 턴 1의 "Oh, that's good drawing" 또는 턴 3의 "Oh, that's good."일 수 있다.In an embodiment, the first response expression may be, for example, “Oh, that's good drawing” in turn 1 of FIG. 6 or “Oh, that's good.” in turn 3 of FIG. 6 .

실시 예에서, 상기 제2 응답 표현은, 다음 미션 아이템에 대한 질문 형태의 표현으로서, 예를 들면, 도 6의 턴 1의 "What's your favorite sport?"In an embodiment, the second response expression is an expression in the form of a question for the next mission item, for example, “What's your favorite sport?” in turn 1 of FIG. 6 .

실시 예에서, 상기 학습자 음성에 대응하는 텍스트를 생성하는 단계(810)에서, 상기 대화 장치에 의해 주어지는 복수의 미션 아이템들과 관련된 단어를 포함하지 않는 상기 학습자 음성에 대응하는 텍스트를 생성한 경우, 상기 'other'에 연결되는 슬롯값으로 구성하는 단계는, 상기 'false'를 나타내는 슬롯값을 'true'를 나타내는 슬롯값으로 변경하는 단계일 수 있다.In an embodiment, in the step of generating text corresponding to the learner's voice (810), when the text corresponding to the learner's voice that does not include words related to a plurality of mission items given by the conversation device is generated, The configuring of the slot value connected to 'other' may be a step of changing the slot value indicating 'false' to a slot value indicating 'true'.

실시 예에서, 상기 'other'에 연결되는 슬롯값이 상기 'true'인 경우, 학습자가 상기 복수의 미션 아이템들 중에서 어떤 미션 아이템에 대해서도 적절히 발화하지 않은 것으로 판단하여, 상기 학습자와 상기 대화 장치 사이의 대화 유형을 '자유 대화'로 인지하는 단계 및 학습자가 상기 복수의 미션 아이템들 중에서 어떤 미션 아이템에 대해서도 적절히 발화하지 않은 것에 대해 상기 학습자가 대화에 집중하도록 하게하는 임의의 제1 응답 표현과, 상기 어느 하나의 미션 아이템 또는 다른 미션 아이템에 대한 질문을 나타내는 제2 응답 표현을 포함하는 제2 응답 표현을 포함하는 상기 시스템 발화를 출력하는 단계를 포함할 수 있다.In an embodiment, when the slot value connected to the 'other' is 'true', it is determined that the learner has not properly uttered any mission item among the plurality of mission items, and the relationship between the learner and the conversation device is determined. Recognizing the conversation type as 'free conversation' and any first response expression for allowing the learner to focus on the conversation for not properly uttering any mission item among the plurality of mission items; and outputting the system utterance including a second response expression including a second response expression indicating a question for the one mission item or another mission item.

실시 예에서, 상기 학습자가 대화에 집중하도록 하게하는 임의의 상기 제1 응답 표현은, 예를 들면, 도 6의 턴 2의 "Let's try to concentrate."일 수 있다.In an embodiment, any of the first response expressions that cause the learner to concentrate on the conversation may be, for example, “Let's try to concentrate.” in turn 2 of FIG. 6 .

실시 예에서, 상기 어느 하나의 미션 아이템 또는 다른 미션 아이템에 대한 질문을 나타내는 제2 응답 표현을 포함하는 제2 응답 표현은, 예를 들면, 도 6의 턴 2의 "What's your favorite sport?"일 수 있다.In an embodiment, the second response expression including the second response expression indicating a question about any one mission item or another mission item is, for example, “What's your favorite sport?” in turn 2 of FIG. 6 . can

실시 예에서, 상기 추출한 상기 제1 및 제2 응답 표현을 나타내는 상기 시스템 발화를 출력하는 단계 이후, 상기 제2 응답 표현에 따른 상기 시스템 발화에 따라, 상기 나머지 미션 아이템들 모두와 관련된 단어를 포함하는 상기 학습자 음성을 발화한 경우, 상기 음성 인식 모듈(102)이, 상기 나머지 미션 아이템들 모두와 관련된 단어를 포함하는 상기 학습자 음성에 대응하는 제2 텍스트를 생성하는 단계, 상기 미션 아이템 후보 선정 모듈(103)이, 상기 나머지 미션 아이템들 모두를 미션 아이템 후보를 선정하는 단계 및 상기 대화 상태 예측 모듈(104)이, 상기 제2 텍스트, 상기 나머지 미션 아이템들 및 상기 나머지 미션 아이템들과 관련성이 없는 기타 아이템을 이용하여, 슬롯과 상기 슬롯에 연결되는 슬롯값을 포함하는 대화 상태를 예측하는 단계 및 상기 시스템 발화 생성 모듈이, 상기 예측된 대화 상태를 기반으로, '미션 대화'에 대응하는 시스템 발화를 출력하는 단계를 더 포함할 수 있다.In an embodiment, after the step of outputting the system speech representing the extracted first and second response expressions, according to the system speech according to the second response expression, including words related to all of the remaining mission items. When the learner's voice is uttered, the voice recognition module 102 generating a second text corresponding to the learner's voice including words related to all of the remaining mission items, the mission item candidate selection module ( 103) selecting all of the remaining mission items as mission item candidates, and the conversation state prediction module 104 selecting the second text, the remaining mission items, and others unrelated to the remaining mission items. Using an item, predicting a dialogue state including a slot and a slot value connected to the slot, and the system speech generating module generates a system speech corresponding to a 'mission dialogue' based on the predicted dialogue state. An outputting step may be further included.

실시 예에서, 슬롯과 상기 슬롯에 연결되는 슬롯값을 포함하는 대화 상태를 예측하는 단계는, 상기 나머지 미션 아이템들을 슬롯으로 구성하고, 상기 제2 텍스트에 포함된 단어들 중에서 각 나머지 미션 아이템에 대응하는 단어를 슬롯 값으로 구성하는 단계 및 상기 기타 아이템을 'other'를 나타내는 슬롯으로 구성하고, 'false'를 상기 'other'에 연결되는 슬롯값으로 구성하는 단계를 포함할 수 있다.In an embodiment, predicting a conversation state including a slot and a slot value connected to the slot comprises configuring the remaining mission items as slots and corresponding to each remaining mission item among words included in the second text. and configuring the other item as a slot value representing 'other' and configuring 'false' as a slot value connected to the 'other'.

실시 예에서, 상기 '미션 대화'에 대응하는 시스템 발화를 출력하는 단계는, 상기 'other'에 연결되는 슬롯값이 상기 'false'인 경우, 학습자가 상기 나머지 미션 아이템들에 대해 적절히 발화한 것으로 판단하여, 상기 학습자와 상기 대화 장치 사이의 대화 유형을 상기 '미션 대화'로 인지하는 단계 및 학습자가 상기 나머지 미션 아이템들을 포함하는 상기 복수의 미션 아이템들 모두에 대해 적절히 발화함에 따라, 상기 대화 장치와의 대화를 종료하기 위한 인사를 나타내는 제3 응답 표현을 상기 대화 코퍼스에서 추출하고, 상기 추출한 상기 제3 응답 표현을 나타내는 상기 시스템 발화를 출력하는 단계를 포함할 수 있다.In an embodiment, in the step of outputting the system speech corresponding to the 'mission dialogue', if the slot value connected to the 'other' is 'false', it is assumed that the learner properly uttered the remaining mission items. Judging and recognizing the conversation type between the learner and the conversation device as the 'mission conversation' and as the learner appropriately utters all of the plurality of mission items including the remaining mission items, the conversation device and extracting a third response expression representing a greeting for terminating a conversation with the conversation corpus from the conversation corpus, and outputting the system utterance representing the extracted third response expression.

실시 예에서, 상기 제3 응답 표현은, 예를 들면, 도 6의 "It was nice talking to you!"일 수 있다.In an embodiment, the third response expression may be, for example, “It was nice talking to you!” in FIG. 6 .

본 발명의 보호범위가 이상에서 명시적으로 설명한 실시예의 기재와 표현에 제한되는 것은 아니다. 또한, 본 발명이 속하는 기술분야에서 자명한 변경이나 치환으로 말미암아 본 발명이 보호범위가 제한될 수도 없음을 다시 한번 첨언한다.The protection scope of the present invention is not limited to the description and expression of the embodiments explicitly described above. In addition, it is added once again that the scope of protection of the present invention cannot be limited due to obvious changes or substitutions in the technical field to which the present invention belongs.

Claims

A method for learning to speak a foreign language performed by a conversation device,
recognizing, by a voice recognition module, a learner's voice uttered for any one mission item among a plurality of mission items given by the conversation device and generating text corresponding to the learner's voice;
selecting, by a mission item candidate selection module, a mission item candidate having the highest similarity with the text from among the plurality of mission items;
The dialog state prediction module uses the text, the mission item candidate, mission items other than the mission item candidate among the plurality of mission items, and other items unrelated to the plurality of mission items to determine the slot and predicting a conversation state including a slot value connected to the slot; and
outputting, by a system speech generating module, a system speech corresponding to 'mission dialogue' or 'free dialogue' based on the predicted dialogue state;
A method for learning to speak a foreign language comprising a.

In paragraph 1,
The mission item given by the conversation device,
A method for learning to speak a foreign language, wherein the foreign language represents a subject or learning expression to be uttered by a learner in a conversation with the conversation device.

In paragraph 1,
The step of selecting a mission item candidate having the highest similarity with the text,
expressing each of the plurality of mission items and words included in the text as an embedding vector including semantic information using an embedding model; and
Selecting a mission item candidate having the highest similarity with the text from among the plurality of mission items by using the embedding vector
A method for learning to speak a foreign language comprising a.

In paragraph 1,
Predicting a conversation state including the slot and a slot value connected to the slot,
configuring the mission item candidate as the slot, and configuring a word corresponding to the mission item candidate in the text as the slot value;
configuring the remaining mission items as slots, and configuring 'none' as a slot value connected to slots respectively composed of the remaining mission items;
configuring the other item as a slot representing 'other' and configuring 'false' as a slot value connected to the 'other'
A method for learning to speak a foreign language comprising a.

In paragraph 4,
In the step of outputting the system speech corresponding to the 'mission dialogue',
If the slot value connected to the 'other' is 'false', it is determined that the learner uttered appropriately for any one of the mission items, and the conversation type between the learner and the conversation device is set to 'mission conversation'. recognizing as; and
A first response expression representing an appropriate response to the learner's utterance for any one mission item and a second response expression for a next mission item selected from among the remaining mission items are extracted from the dialogue corpus, and the extracted outputting the system utterance representing the first and second response expressions;
A method for learning to speak a foreign language comprising a.

In paragraph 4,
In the step of generating the text corresponding to the learner's voice, when the text corresponding to the learner's voice is generated that does not include words related to a plurality of mission items given by the conversation device,
The step of configuring the slot value connected to the 'other',
and changing the slot value indicating 'false' to a slot value indicating 'true'.

In paragraph 6,
If the slot value connected to the 'other' is 'true', it is determined that the learner has not properly uttered any mission item among the plurality of mission items, and the conversation type between the learner and the conversation device is determined. Recognizing it as 'free conversation'; and
Any first response expression that causes the learner to focus on a conversation for the learner not properly uttering any mission item among the plurality of mission items, and a question about any one mission item or another mission item outputting the system utterance comprising a second response expression comprising a second response expression representing
A method for learning to speak a foreign language comprising a.

In paragraph 5,
After the step of outputting the system utterance representing the extracted first and second response expressions,
When the learner's voice including words related to all of the remaining mission items is uttered according to the system utterance according to the second response expression, the voice recognition module includes words related to all of the remaining mission items. generating second text corresponding to the learner's voice;
selecting, by the mission item candidate selection module, mission item candidates from all of the remaining mission items;
The conversation state prediction module predicts a conversation state including a slot and a slot value connected to the slot using the second text, the remaining mission items, and other items unrelated to the remaining mission items. step; and
outputting, by the system speech generation module, a system speech corresponding to 'mission dialogue' based on the predicted dialogue state;
A method for learning to speak a foreign language that further comprises.

In paragraph 8,
Predicting a conversation state including a slot and a slot value connected to the slot,
configuring the remaining mission items as slots, and configuring words corresponding to the remaining mission items among words included in the second text as slot values; and
configuring the other item as a slot representing 'other' and configuring 'false' as a slot value connected to the 'other'
A method for learning to speak a foreign language comprising a.

In paragraph 9,
In the step of outputting the system speech corresponding to the 'mission dialogue',
When the slot value connected to the 'other' is 'false', it is determined that the learner properly uttered the remaining mission items, and the conversation type between the learner and the conversation device is set to the 'mission conversation'. recognizing; and
As the learner appropriately utters all of the plurality of mission items including the remaining mission items, a third response expression indicating a greeting for terminating the conversation with the conversation device is extracted from the conversation corpus, and the extracted outputting the system utterance representing the third response expression;
A method for learning to speak a foreign language comprising a.