KR102543926B1

KR102543926B1 - User Equipment with Artificial Inteligence for Forign Language Education and Method for Forign Language Education

Info

Publication number: KR102543926B1
Application number: KR1020200113694A
Authority: KR
Inventors: 차형경
Original assignee: 차형경
Priority date: 2020-09-07
Filing date: 2020-09-07
Publication date: 2023-06-15
Also published as: KR20230086647A; KR20220032200A

Abstract

본 발명은 외국어 교육용 인공지능 기능을 구비하여 사용자 음성의 문법 오류/발음 오류에 따른 음성 반응을 제공하는 사용자 기기, 및 사용자 기기의 온라인 접속을 통해 외국어 교육을 제공하는 방법에 대한 것이다.
외국어 교육용 인공지능 기능을 구비한 사용자 기기는, 사용자의 음성을 문자로 인식하는 과정에서 인식되는 오류가 있는지, 상기 오류가 발음상 오류인지, 또는 상기 오류가 문법상 오류인지를 판단하며, 오류가 발음상 오류인 경우, 상기 오류가 있는 부분을 대체할 수 있는 표현을 포함하는 제 1 음성 반응을 스피커를 통해 제공하며, 오류가 문법상 오류인 경우, 상기 오류가 있는 부분에 대응하는 기 마련된 문법 설명을 포함하는 제 2 음성 반응을 스피커를 통해 제공한다. The present invention relates to a user device having an artificial intelligence function for foreign language education and providing a voice response according to a grammatical error/pronunciation error of a user's voice, and a method of providing foreign language education through online access of the user device.
A user device having an artificial intelligence function for foreign language education determines whether there is an error recognized in the process of recognizing the user's voice as a character, whether the error is a pronunciation error, or whether the error is a grammatical error, and determines whether the error is a grammatical error. In the case of a pronunciation error, a first voice response including an expression capable of replacing the erroneous part is provided through a speaker, and if the error is a grammatical error, a pre-prepared grammar corresponding to the erroneous part is provided. A second audio response containing the description is provided through the speaker.

Description

User Equipment with Artificial Intelligence for Foreign Language Education and Method for Foreign Language Education

본 발명은 외국어 교육용 인공지능 기능을 구비하여 사용자 음성의 문법 오류/발음 오류에 따른 음성 반응을 제공하는 사용자 기기, 및 사용자 기기의 온라인 접속을 통해 외국어 교육을 제공하는 방법에 대한 것이다.The present invention relates to a user device having an artificial intelligence function for foreign language education and providing a voice response according to a grammatical error/pronunciation error of a user's voice, and a method of providing foreign language education through online access of the user device.

최근 인공지능(AI) 스피커의 보급으로 사용자의 음성인식을 통한 댁내 가전기기들을 제어하고, 보다 발전적으로는 인공지능 케릭터와 대화를 진행하는 서비스가 제공되고 있다.Recently, with the spread of artificial intelligence (AI) speakers, a service is being provided that controls household appliances through user voice recognition and, more advancedly, conducts conversations with AI characters.

일부 사용자는 이러한 AI 스피커에 외국어로 명령어를 입력하여 제어함으로써 외국어 학습에도 활용하고 있다. 다만, 현재의 AI 스피커는 사용자의 외국어 명령어를 인식하여 이에 따른 기기 제어 또는 반응을 할 뿐, 보다 높은 수준의 외국어 교육에 활용되지는 못하는 상황이다.Some users are also using it for learning foreign languages by inputting commands into these AI speakers in foreign languages and controlling them. However, the current AI speaker only recognizes the user's foreign language command and controls or reacts to the device accordingly, but cannot be used for higher level foreign language education.

또한, 온라인을 통해 사용자의 음성을 인식하여 발음을 교육해 주는 서비스도 제공되고 있다. 다만, 이러한 서비스는 사용자의 음성에 대응하는 보다 정확한 발음을 제공할 뿐, 사용자의 외국어 음성에 문법적인 문제가 있는지, 발음상 문제가 있는지에 대한 구분을 제공하지 못하고 있다.In addition, a service that recognizes a user's voice and teaches pronunciation through online is also provided. However, these services only provide more accurate pronunciation corresponding to the user's voice, but do not provide a distinction as to whether there is a grammatical problem or a pronunciation problem in the user's foreign language voice.

상술한 바와 같은 문제를 해결하기 위해 본 발명의 일 실시형태에서는 사용자의 외국어 음성에 오류가 있는지, 오류가 있다면 발음상의 문제인지, 문법상의 문제인지를 효율적으로 구분하여, 이러한 구분에 따라 적절한 교육 서비스를 제공하고자 한다.In order to solve the above problems, in one embodiment of the present invention, it is efficiently classified whether there is an error in the user's foreign language voice, and if there is an error, whether it is a pronunciation problem or a grammatical problem, and appropriate education service according to this classification. want to provide

또한, 본 발명의 일 실시형태에서는 상술한 사용자 음성의 구분에 기초하여 사용자에게 효율적인 교육 서비스를 제공하기 위한 인터페이스를 제공하고자 한다.In addition, an embodiment of the present invention intends to provide an interface for providing an efficient education service to the user based on the above-described classification of the user's voice.

본 발명의 목적은 상술한 목적에 한정되지 않으며, 이하의 설명을 통해 다양한 목적이 제시된다. The object of the present invention is not limited to the above object, and various objects are presented through the following description.

상술한 바와 같은 과제를 해결하기 위한 본 발명의 일 측면에서는, 외국어 교육용 인공지능 기능을 구비한 사용자 기기에 있어서, 사용자의 음성을 입력받는 음성입력장치; 상기 사용자 음성을 문자로 인식하여 처리하는 프로세서; 및 상기 프로세서의 처리에 따라 사용자에게 음성 반응을 제공하는 스피커를 포함하며, 상기 프로세서는, (a) 상기 사용자의 음성을 문자로 인식하는 과정에서 인식되는 오류가 있는지, 상기 오류가 발음상 오류인지, 또는 상기 오류가 문법상 오류인지를 판단하며, (b) 상기 오류가 발음상 오류인 경우, 상기 오류가 있는 부분을 대체할 수 있는 표현을 포함하는 제 1 음성 반응을 상기 스피커를 통해 제공하며, (c) 상기 오류가 문법상 오류인 경우, 상기 오류가 있는 부분에 대응하는 기 마련된 문법 설명을 포함하는 제 2 음성 반응을 상기 스피커를 통해 제공하는, 사용자 기기를 제안한다.In one aspect of the present invention for solving the above problems, in the user device having an artificial intelligence function for foreign language education, the voice input device for receiving the user's voice; a processor recognizing and processing the user's voice as text; and a speaker that provides a voice response to the user according to the processing of the processor, wherein the processor determines (a) whether there is an error recognized in the process of recognizing the user's voice as a text or whether the error is a pronunciation error. , or determining whether the error is a grammatical error, (b) if the error is a pronunciation error, providing a first voice response including an expression that can replace the erroneous part through the speaker; , (c) If the error is a grammatical error, the user device provides a second audio response including a pre-prepared grammatical explanation corresponding to the erroneous part through the speaker.

여기서, 상기 프로세서는, 상기 사용자의 음성을 문자로 인식하기 위해 사용되는 제 1 문법 모델, 및 상기 사용자의 음성을 문자로 인식한 후, 문자 인식된 언어에 존재하는 문법상 오류를 인식하기 위해 사용되는 제 2 문법 모델을 사용할 수 있다.Here, the processor is used to recognize a first grammar model used to recognize the user's voice as a text and, after recognizing the user's voice as a text, a grammatical error present in the text-recognized language. A second grammar model can be used.

상기 제 2 문법 모델은 웹 서버의 데이터에 기반하여 학습 기반으로 각 문법상 오류 및 각 문법상 오류에 대응하는 문법 설명을 쌍으로써 갱신하도록 구성될 수 있다.The second grammatical model may be configured to update each grammatical error and a grammatical description corresponding to each grammatical error in pairs based on learning based on data of the web server.

상기 프로세서는, 상기 오류가 없는 경우, 문자로 인식된 사용자의 언어의 내용과 대화하는 내용에 대응하는 제 3 음성 반응을 상기 스피커를 통해 제공할 수 있다.When there is no error, the processor may provide a third voice response corresponding to the content of the user's language recognized as text and the content of conversation through the speaker.

상기 프로세서는, 상기 사용자의 음성에서 특징을 추출한 후, 상기 특징에 대응하는 후보 발음열을 검출하고, 상기 후보 발음열 중 제 1 문법 모델을 이용하여 소정 확률 이상으로 조합가능한 언어를 선택하여, 상기 사용자의 음성을 문자로 인식할 수 있다.The processor extracts a feature from the user's voice, detects a candidate pronunciation sequence corresponding to the feature, selects a language that can be combined with a predetermined probability or more using a first grammar model from the candidate pronunciation sequence, and The user's voice can be recognized as text.

상기 프로세서는, 상기 후보 발음열 중 상기 제 1 문법 모델을 이용하여 조합가능한 언어를 선택하지 못하는 경우, 상기 오류가 발음상 오류인 것으로 판단할 수 있다.The processor may determine that the error is a pronunciation error when a language combinable using the first grammar model is not selected from among the candidate pronunciation sequences.

상기 프로세서는, 상기 후보 발음열 중 상기 제 1 문법 모델을 이용하여 조합가능한 언어를 선택하되, 상기 후보 발음열 중 최상위 확률로 인식되는 문자와 상기 선택된 언어 사이에 차이가 소정 기준 이상인 경우, 상기 오류가 발음상 오류인 것으로 판단할 수 있다.The processor selects a language that can be combined using the first grammar model from among the candidate pronunciation sequences, and when a difference between a character recognized with the highest probability and the selected language is greater than or equal to a predetermined standard, the error can be judged to be a pronunciation error.

상기 프로세서는, 상기 (c) 동작에 있어서, 상기 오류가 있는 부분을 대체할 수 있는 표현에 대한 사용자의 응답을 상기 음성입력장치를 통해 수신하고, 상기 사용자의 응답에 따라 상기 사용자의 음성을 문자로 인식할 수 있다.In the operation (c), the processor receives a user's response to an expression capable of replacing the erroneous part through the voice input device, and converts the user's voice into text according to the user's response. can be recognized as

상기 프로세서는, 상기 사용자 음성 중 모국어 음성과 외국어 음성을 구분하며, 상기 사용자 음성이 상기 외국어 음성으로 판단되는 경우, 상기 (a) 내지 상기 (c)의 동작을 수행할 수 있다.The processor may distinguish between a native language voice and a foreign language voice among the user voices, and perform operations (a) to (c) when the user voice is determined to be the foreign language voice.

상기 사용자의 음성은 상기 사용자 기기의 동작 모드를 변경하기 위한 명령어를 포함할 수 있으며, 상기 명령어에 따라 상기 프로세서는 (1) 1 문장 단위로 상기 사용자의 음성을 처리하여 상기 음성 반응을 상기 스피커에 제공하는 문답 모드, 또는 (2) 소정 시간 상기 사용자의 음성을 지속적으로 기록한 후 상기 음성 반응을 상기 스피커에 제공하는 장문 기록 모드로 선택적으로 동작할 수 있다.The user's voice may include a command for changing the operation mode of the user device, and according to the command, the processor (1) processes the user's voice in units of one sentence and outputs the voice response to the speaker. or (2) a long text recording mode in which the user's voice is continuously recorded for a predetermined period of time and then the voice response is provided to the speaker.

상기 사용자 기기는 IoT (Internet of Things) 방식으로 하나 이상의 댁내 기기와 무선 연결될 수 있으며, 상기 사용자 음성 중 모국어 음성 또는 외국어 음성 중 하나 이상을 통해 상기 댁내 기기를 제어할 수 있다.The user device may be wirelessly connected to one or more in-house devices using an Internet of Things (IoT) method, and may control the in-house devices through at least one of a native language voice and a foreign language voice among the user voices.

한편, 상술한 문제를 해결하기 위한 본 발명의 다른 일 측면에서는 사용자 기기의 온라인 접속을 통해 외국어 교육을 제공하는 방법에 있어서, 상기 사용자 기기의 마이크를 통해 입력된 사용자의 외국어 문장 음성 발음을 인식하여 문자로 변환하고, 상기 문자로 변환된 문장의 오류를 판단하며, 상기 오류가 있는 부분이 문법상 오류인 경우, 상기 오류가 있는 부분에 대응하는 기 마련된 문법 설명을 상기 오류가 있는 부분의 강조 표시와 함께 상기 사용자 기기의 표시장치에 표시하며, 상기 오류가 있는 부분이 발음상 오류인 경우, 상기 오류가 있는 부분을 대체할 수 있는 표현을 상기 오류가 있는 부분의 강조 표시와 함께 상기 사용자 기기의 표시장치에 표시하여 추천하고, 상기 추천 표현을 포함하는 예시적인 문장의 추천 발음을 상기 사용자 기기의 스피커를 통해 제시하는 것을 포함하는, 외국어 교육 제공 방법을 제공한다.Meanwhile, in another aspect of the present invention for solving the above problems, in a method for providing foreign language education through online access of a user device, by recognizing the user's foreign language sentence voice pronunciation input through the microphone of the user device Conversion into characters, determining an error in the sentence converted into characters, and if the erroneous part is a grammatical error, a pre-prepared grammatical explanation corresponding to the erroneous part is highlighted on the erroneous part is displayed on the display device of the user device together with, and when the erroneous part is a pronunciation error, an expression that can replace the erroneous part is displayed on the display device of the user device along with highlighting of the erroneous part. A method for providing foreign language education is provided, which includes displaying on a display device, recommending, and presenting a recommended pronunciation of an exemplary sentence including the recommended expression through a speaker of the user device.

상기 오류가 있는 부분을 대체할 수 있는 표현을 추천하는 것은, 복수의 표현을 추천하여 상기 사용자 기기의 표시장치에 표시하고, 상기 사용자 기기의 입력 장치를 통해 선택된 표현을 상기 추천 표현으로 결정하며, 상기 사용자의 외국어 문장 음성 발음 중 오류가 있는 부분과 상기 결정된 추천 표현 사이의 맵핑 관계를 상기 외국어 교육을 제공하는 서버의 데이터 베이스에 저장하여 인공지능 기반으로 상기 서버를 학습시키는 것을 포함할 수 있다.Recommending an expression capable of replacing the erroneous part recommends a plurality of expressions and displays them on a display device of the user device, and determines an expression selected through an input device of the user device as the recommended expression; It may include storing a mapping relationship between an erroneous portion of the user's voice pronunciation of a foreign language sentence and the determined recommended expression in a database of a server providing foreign language education, and learning the server based on artificial intelligence.

상기 오류가 있는 부분과 상기 기 마련된 문법 설명의 대응 데이터는 인공지능 방식으로 상기 외국어 교육을 제공하는 서버의 데이터 베이스에 증가시키며, 상기 오류가 있는 부분이 문법상 오류인 것으로 판단되는 경우는, 상기 오류가 있는 부분과 대응하는 기 마련된 문법 설명이 존재하는 경우일 수 있다.Corresponding data between the erroneous part and the prepared grammatical explanation is increased in the database of the server providing the foreign language education by artificial intelligence, and when it is determined that the erroneous part is a grammatical error, the It may be the case that there is a pre-prepared grammar explanation corresponding to the part with the error.

상술한 바와 같은 본 발명의 실시예들에 따르면, 사용자의 외국어 음성에 오류가 있는지, 오류가 있다면 발음상의 문제인지, 문법상의 문제인지를 효율적으로 구분하여, 이러한 구분에 따라 적절한 교육 서비스를 제공할 수 있다.According to the embodiments of the present invention as described above, it is possible to efficiently classify whether there is an error in the user's foreign language voice, and if there is an error, whether it is a pronunciation problem or a grammatical problem, and provide appropriate education services according to this classification. can

또한, 본 발명의 일 실시예에 따르면, 상술한 사용자 음성의 구분에 기초하여 사용자에게 효율적인 교육 서비스를 제공하기 위한 인터페이스를 제공할 수 있다.In addition, according to an embodiment of the present invention, an interface for providing an efficient education service to a user may be provided based on the above-described classification of the user's voice.

본 발명에 따른 효과는 상술한 효과에 한정되지 않으며, 이하의 설명을 통해 다양한 효과가 제시된다. Effects according to the present invention are not limited to the above-mentioned effects, and various effects are presented through the following description.

도 1은 본 발명의 일 실시예에 따른 사용자 기기의 개념을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 사용자 기기의 동작 방식을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따라 사용자 음성을 인식하는 과정을 구체적으로 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 사용자 기기를 통해 댁내 기기를 제어하는 개념을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따라 외국어 교육을 제공하는 방법을 설명하기 위한 도면이다.1 is a diagram for explaining the concept of a user device according to an embodiment of the present invention.
2 is a diagram for explaining an operating method of a user device according to an embodiment of the present invention.
3 is a diagram for explaining in detail a process of recognizing a user's voice according to an embodiment of the present invention.
4 is a diagram for explaining a concept of controlling an in-house device through a user device according to an embodiment of the present invention.
5 is a diagram for explaining a method of providing foreign language education according to an embodiment of the present invention.

이하 본 발명에 대한 설명을 도면을 참조하여 상세히 설명한다.Hereinafter, the description of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 사용자 기기의 개념을 설명하기 위한 도면이다.1 is a diagram for explaining the concept of a user device according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 외국어 교육용 인공지능 기능을 구비한 사용자 기기는 AI 스피커(10)의 형태를 가질 수 있다. 또한, 이러한 AI 스피커(10)는 스마트폰(20a), 노트북(20b) 등 다양한 소프트웨어 처리 기능을 구비한 사용자 기기와 무선 또는 유선으로 연결될 수 있다.Referring to FIG. 1 , a user device having an artificial intelligence function for foreign language education according to the present embodiment may have the form of an AI speaker 10. In addition, the AI speaker 10 may be connected wirelessly or wired to a user device having various software processing functions, such as a smart phone 20a or a laptop computer 20b.

이하의 설명에서 사용자 기기는 AI 스피커(10) 자체에 내장될 수 있으며, 이와 달리 사용자와 음성 인터페이스를 제공하기 위한 음성입력장치 및 스피커 기능은 AI 스피커(10)에, 그리고 이러한 정보를 처리하는 프로세서는 AI 스피커(10)와 연결된 다른 사용자 기기(20a 또는 20b)에 구비될 수도 있다.In the following description, the user device may be built into the AI speaker 10 itself, and unlike the voice input device and speaker function for providing a voice interface with the user, the AI speaker 10 and the processor that processes this information may be provided in another user device 20a or 20b connected to the AI speaker 10.

구체적으로, 본 실시예에 따른 사용자 기기는 사용자의 음성을 입력받는 음성입력장치; 상기 사용자 음성을 문자로 인식하여 처리하는 프로세서; 및 상기 프로세서의 처리에 따라 사용자에게 음성 반응을 제공하는 스피커를 포함할 수 있다. 여기서 사용자 음성은 모국어 음성 신호일 수도, 외국어 음성 신호일 수도 있으며, 이에 대해서는 후술한다.Specifically, the user device according to the present embodiment includes a voice input device that receives a user's voice; a processor recognizing and processing the user's voice as text; and a speaker providing a voice response to the user according to the processing of the processor. Here, the user's voice may be a native language voice signal or a foreign language voice signal, which will be described later.

음성입력장치는 도 1에 도시된 AI 스피커(10)에 구비되는 것이 바람직하지만, 다른 사용자 기기(20a, 20b 등)에 구비될 수도 있다.The voice input device is preferably provided in the AI speaker 10 shown in FIG. 1, but may be provided in other user devices (20a, 20b, etc.).

도 2는 본 발명의 일 실시예에 따른 사용자 기기의 동작 방식을 설명하기 위한 도면이다.2 is a diagram for explaining an operating method of a user device according to an embodiment of the present invention.

본 실시예에 따른 프로세서는 상기 사용자의 음성을 문자로 인식하는 과정에서 인식되는 오류가 있는지 여부를 먼저 판단할 수 있다 (S310). 만일, 오류가 있다면 프로세스는 상기 오류가 발음상 오류인지, 또는 상기 오류가 문법상 오류인지를 추가적으로 판단할 수 있다(S320). The processor according to the present embodiment may first determine whether there is an error recognized in the process of recognizing the user's voice as text (S310). If there is an error, the process may additionally determine whether the error is a pronunciation error or a grammatical error (S320).

만일, 상기 오류가 발음상 오류인 경우, 프로세서는 상기 오류가 있는 부분을 대체할 수 있는 표현을 포함하는 제 1 음성 반응을 상기 스피커를 통해 제공할 수 있다. 예를 들어, 사용자가 “Today, I'm gonna finish this patent specification drafting.”이라는 외국어 문장을 의도하고 사용자 기기를 향하여 이야기하였으나, 인식된 언어에서는 “Today, I'm gonna finish this patent specticet drafting”이라고 인지되어, 'specification'이라는 단어의 발음 문제로 판단되는 경우, 본 실시예에 따른 사용자 기기는 오류가 있는 표현을 대체할 수 있는 표현, 즉 'specification'을 포함하는 제 1 음성 반응으로서 “Do you meant 'patent specification ?'이라는 음성 반응을 스피커를 통해 제공할 수 있다. If the error is a pronunciation error, the processor may provide a first voice response including an expression capable of replacing the erroneous part through the speaker. For example, the user intended the foreign language sentence “Today, I'm gonna finish this patent specification drafting.” , and when it is determined that the word 'specification' is a pronunciation problem, the user device according to the present embodiment sends “Do You meant 'patent specification?'

일 실시예에서는, 이러한 스피커 음성, 즉 상기 오류가 있는 부분을 대체할 수 있는 표현에 대한 사용자의 응답으로서 'Yes, you are right' 등의 응답을 상기 음성입력장치를 통해 수신하고, 상기 사용자의 응답에 따라 상기 사용자의 음성을 문자로 인식할 수 있다. 이후 본 실시예에 따른 사용자 장치는 후술하는 제 3 음성반응을 스피커를 통해 제공할 수 있다.In one embodiment, a response such as 'Yes, you are right' is received through the voice input device as a user's response to the speaker's voice, that is, an expression that can replace the erroneous part, and the user's According to the response, the user's voice may be recognized as text. Then, the user device according to the present embodiment may provide a third voice response to be described later through a speaker.

한편, 상기 오류가 문법상 오류인 경우, 상기 오류가 있는 부분에 대응하는 기 마련된 문법 설명을 포함하는 제 2 음성 반응을 상기 스피커를 통해 제공할 수 있다. 예를 들어, 사용자가 “Today, I gonna finish this patent specification drafting”이라는 외국어 문장을 입력한 경우, 후술하는 문법 오류 모델에 기반하여 사용자 음성에 문법상 오류가 있음을 인지하고, 이에 따라 제 2 음성 반응을 제공할 수 있다. 이러한 경우, 사용자 기기는 사용자가 의도한 문장의 의도를 충분히 알 수 있기 때문에 후술하는 오류 없는 경우에 대한 제 3 음성 반응을 포함하여 제 2 음성 반응을 제공할 수 있다. Meanwhile, when the error is a grammatical error, a second audio response including a pre-prepared grammatical explanation corresponding to the erroneous part may be provided through the speaker. For example, when a user inputs a foreign language sentence “Today, I gonna finish this patent specification drafting”, it is recognized that there is a grammatical error in the user's voice based on a grammatical error model described later, and accordingly, a second voice response can be provided. In this case, since the user device can sufficiently know the intention of the user's intended sentence, it can provide a second voice response including a third voice response for the case without an error, which will be described later.

상술한 예에서, 사용자 기기의 프로세서는 “Good luck Aaron, go for it ! But, you'd better put 'be verb' in front of 'gonna'”와 같은 제 2 음성 반응을 제공할 수 있다. In the above example, the processor of the user device sends “Good luck Aaron, go for it! But, you'd better put 'be verb' in front of 'gonna'”.

한편, 단계 S310에서 사용자의 음성에 오류가 없는 경우, 본 실시예에 따른 사용자 기기는 문자로 인식된 사용자의 언어의 내용과 대화하는 내용에 대응하는 제 3 음성 반응을 상기 스피커를 통해 제공할 수 있다. 최근 AI 기술의 발달로 본 실시예에 따른 사용자 기기는 사용자와 자연어 대화를 진행할 수 있으며, 후술하는 바와 같이 IoT (Internet of Things) 방식으로 무선 연결된 댁내 가전기기들을 제어할 수도 있다.On the other hand, if there is no error in the user's voice in step S310, the user device according to the present embodiment may provide a third voice response corresponding to the content of the user's language recognized as text and the content of conversation through the speaker. there is. With the recent development of AI technology, the user device according to the present embodiment can conduct a natural language conversation with the user and, as will be described later, can control home appliances wirelessly connected to the Internet of Things (IoT).

도 3은 본 발명의 일 실시예에 따라 사용자 음성을 인식하는 과정을 구체적으로 설명하기 위한 도면이다.3 is a diagram for explaining in detail a process of recognizing a user's voice according to an embodiment of the present invention.

도 3을 참조하면, 음성 인식을 수행하는 디바이스(100)는 특징 추출부(110), 후보 발음열 검출부(120) 및 언어선택부(140)를 포함할 수 있다. 특징 추출부(110)는 입력된 음성 신호에 대한 특징 정보를 추출한다. 후보 발음열 검출부(120)는 추출된 특징 정보로부터 적어도 하나의 후보 발음열을 검출한다. 언어 선택부(140)는 각 후보발음열의 출현 확률 정보에 기초하여, 음성 인식된 최종 언어를 선택한다. 또한, 언어 선택부(140)는 발음 사전(150)을 이용하여 각 후보 발음열과 대응되는 단어를 검출함으로써, 검출된 단어의 출현 확률 정보에 기초하여, 음성 인식된 최종 언어를 선택할 수 있다. 단어의 출현 확률 정보는, 음성 인식이 수행될 때, 음성 인식된 언어에서, 해당 단어가 출현할 확률 정보를 의미한다. 이하에서 디바이스(100)의 각 구성 요소에 대해 구체적으로 설명하기로 한다.Referring to FIG. 3 , a device 100 performing voice recognition may include a feature extractor 110, a candidate pronunciation string detector 120, and a language selector 140. The feature extractor 110 extracts feature information on the input voice signal. The candidate pronunciation sequence detector 120 detects at least one candidate pronunciation sequence from the extracted feature information. The language selection unit 140 selects the final language recognized as speech based on information on the probability of appearance of each candidate pronunciation sequence. In addition, the language selection unit 140 may select a final voice-recognized language based on the appearance probability information of the detected word by detecting a word corresponding to each candidate pronunciation sequence using the pronunciation dictionary 150 . The appearance probability information of a word means information on a probability that a corresponding word appears in a voice-recognized language when voice recognition is performed. Hereinafter, each component of the device 100 will be described in detail.

특징 추출부(110)는 음성 신호를 수신하면, 실제 화자가 발성한 음성 부분만을 검출하여, 음성 신호의 특징을 나타낼 수 있는 정보를 추출할 수 있다. 음성 신호의 특징을 나타내는 정보는, 예를 들면, 음성 신호가 속하는 파형(waveform)에 따라 입 모양 또는 혀의 위치를 나타낼 수 있는 정보를 포함할 수 있다. 특징 추출부(110)에 의하여 추출된 특징 정보에 기초하여 음성 신호와 대응되는 발음열이 검출될 수 있다.When a voice signal is received, the feature extractor 110 may detect only a voice part uttered by an actual speaker and extract information representing characteristics of the voice signal. The information indicating the characteristics of the voice signal may include, for example, information capable of indicating the shape of the mouth or the position of the tongue according to the waveform to which the voice signal belongs. A pronunciation sequence corresponding to the voice signal may be detected based on the feature information extracted by the feature extractor 110 .

후보 발음열 검출부(120)는 추출된 음성 신호의 특징 정보와 음향 모델(acoustic model, 130)을 이용하여 음성 신호와 매칭될 수 있는 적어도 하나의 후보 발음열(pronunciation variants)을 검출할 수 있다. 음성 신호에 따라 복수 개의 후보 발음열이 검출될 수 있다. 예를 들어, 'th', 'd' 등의 발음은 비슷하므로, 동일 음성 신호에 대하여 'th', 'd' 등의 비슷한 발음을 포함하는 복수 개의 후보 발음열이 검출될 수 있다. 후보 발음열은 단어 단위로 검출될 수 있으나, 이에 한하지 않고, 후보 발음열은 음운, 음소, 음절 등의 다양한 단위로 검출될 수 있다.The candidate pronunciation sequence detector 120 may detect at least one candidate pronunciation sequence (pronunciation variants) that may be matched with the speech signal by using the characteristic information of the extracted speech signal and the acoustic model 130 . A plurality of candidate pronunciation sequences may be detected according to the voice signal. For example, since pronunciations of 'th' and 'd' are similar, a plurality of candidate pronunciation sequences including similar pronunciations of 'th' and 'd' may be detected for the same voice signal. Candidate pronunciation sequences may be detected in word units, but are not limited thereto, and candidate pronunciation sequences may be detected in various units such as phonemes, phonemes, and syllables.

음향 모델(130)은 음성 신호의 특징 정보로부터 후보 발음열을 검출하기 위한 정보를 포함할 수 있다. 또한, 음향 모델(130)은 많은 양의 음성 데이터로부터 통계적인 방법을 통하여 생성될 수 있다. 예를 들면, 음향 모델(130)은 불특정 다수의 발화 데이터로부터 생성될 수도 있고, 특정 화자로부터 수집된 발화 데이터로부터 생성될 수도 있다. 따라서, 화자에 따라 음성 인식 시 개별적으로 적용될 수 있는 음향 모델(130)이 존재할 수 있다.The acoustic model 130 may include information for detecting a candidate pronunciation sequence from feature information of a speech signal. Also, the acoustic model 130 may be generated through a statistical method from a large amount of voice data. For example, the acoustic model 130 may be generated from unspecified plurality of speech data or speech data collected from a specific speaker. Accordingly, there may be an acoustic model 130 that can be individually applied in voice recognition according to a speaker.

언어 선택부(140)는 발음 사전(150) 및 문법 모델(160)을 이용하여 후보 발음열 검출부(120)에서 검출된 각 후보 발음열의 출현 확률 정보를 구할 수 있다. 그리고, 언어 선택부(140)는 각 후보 발음열의 출현 확률 정보에 기초하여, 음성 인식된 최종 언어를 선택한다. The language selection unit 140 may use the pronunciation dictionary 150 and the grammar model 160 to obtain information on the probability of occurrence of each candidate pronunciation sequence detected by the candidate pronunciation sequence detection unit 120 . Then, the language selection unit 140 selects the final language recognized as speech based on the information about the probability of occurrence of each candidate pronunciation sequence.

또한, 언어 선택부(140)는 발음 사전(150)을 이용하여 각 후보 발음열과 대응되는 단어를 구할 수 있다. 언어선택부(140)는 각 후보 발음열과 대응되는 단어에 대한 출현 확률 값을 문법 모델(160)을 이용하여 구할 수 있다. 언어 선택부(140)는 단어에 대한 출현 확률 값이 가장 큰 후보 발음열을 최종 선택할 수 있다. 최종 선택된 발음열과 대응되는 단어가 음성 인식된 단어로 출력될 수 있다.Also, the language selector 140 may use the pronunciation dictionary 150 to obtain a word corresponding to each candidate pronunciation sequence. The language selector 140 may obtain an appearance probability value of a word corresponding to each candidate pronunciation sequence using the grammar model 160 . The language selector 140 may finally select a candidate pronunciation sequence having the largest occurrence probability value for a word. A word corresponding to the finally selected pronunciation sequence may be output as a voice-recognized word.

발음 사전(150)은 후보 발음열 검출부(120)에 의해 검출된 후보 발음열과 대응되는 단어(word)를 획득하기 위해 필요한 정보를 포함할 수 있다. 발음 사전(150)은 각 단어의 음운변화 현상에 따라 획득된 발음열로부터 구축될 수 있다.The pronunciation dictionary 150 may include information necessary to obtain a word corresponding to the candidate pronunciation sequence detected by the candidate pronunciation sequence detector 120 . The pronunciation dictionary 150 may be constructed from a sequence of pronunciations obtained according to the phonological change of each word.

단어의 발음은 해당 단어의 앞뒤 단어 또는 문장 상 위치, 화자의 특성 등에 따라 변경될 수 있어 일관적이지 않다. 또한, 출현 확률 값은 현재 단어가 출현할 확률 또는, 현재 단어가 특정 단어와 동시에 출현할 확률을 의미한다. 디바이스(100)는 문법 모델(160)에 포함된 단어 또는 발음열에 대한 출현 확률 값을 이용함으로써, 문맥을 고려한 음성 인식을 수행할 수 있다.Pronunciation of a word is inconsistent because it can be changed according to the words before and after the word or position in a sentence, characteristics of the speaker, and the like. In addition, the appearance probability value means a probability that the current word appears or a probability that the current word appears simultaneously with a specific word. The device 100 may perform speech recognition in consideration of context by using an appearance probability value for a word or pronunciation sequence included in the grammar model 160 .

디바이스(100)는 발음 사전(150)을 이용하여 후보 발음열의 단어를 구하고, 문법 모델(160)을 이용하여 후보 발음열과 대응되는 단어에 대한 출현 확률 정보를 구함으로써, 음성 인식을 수행할 수 있다. 이에 한하지 않고, 디바이스(100)는 발음 사전(150)을 통해 발음열과 대응되는 단어를 구하지 않고 후보 발음열에 대한 출현 확률 정보를 문법 모델(160)로부터 구할 수도 있다.The device 100 may perform speech recognition by obtaining words in a candidate pronunciation sequence using the pronunciation dictionary 150 and obtaining appearance probability information for words corresponding to the candidate pronunciation sequence using the grammar model 160. . The device 100 is not limited thereto, and instead of obtaining a word corresponding to a pronunciation string through the pronunciation dictionary 150, the device 100 may obtain occurrence probability information for a candidate pronunciation string from the grammar model 160.

예를 들면, 한국어의 경우, 후보 발음열 검출부(120)는 '학꾜(hakkkyo)'라는 후보 발음열을 검출하였을 때, 언어 선택부(140)는 검출된 후보 발음열 '학꾜(hakkkyo)'와 대응되는 단어로, 발음 사전(150)을 이용하여 '학교'라는 단어를 구할 수 있다. 또 다른 예로, 영어의 경우, 후보 발음열 검출부(120)는 'skul'이라는 후보 발음열을 검출하였을 때, 언어 선택부(140)는 검출된 후보 발음열 'skul'과 대응되는 단어로, 발음 사전(150)을 이용하여 'school'이라는 단어를 구할 수 있다. 언어 선택부(140)는 '학교' 또는 'school'이란 단어에 대한 출현 확률 정보에 기초하여 음성 신호와 대응되는 단어를 최종적으로 선택하고 선택된 단어를 출력할 수 있다.For example, in the case of Korean, when the candidate pronunciation sequence detection unit 120 detects a candidate pronunciation sequence 'hakkkyo', the language selector 140 selects the detected candidate pronunciation sequence 'hakkkyo' and As a corresponding word, the word 'school' can be obtained using the pronunciation dictionary 150 . As another example, in the case of English, when the candidate pronunciation sequence detection unit 120 detects a candidate pronunciation sequence 'skul', the language selector 140 selects a word corresponding to the detected candidate pronunciation sequence 'skul', and pronounces the word. The word 'school' can be obtained using the dictionary 150 . The language selector 140 may finally select a word corresponding to the voice signal based on the appearance probability information for the word 'school' or 'school' and output the selected word.

문법 모델(160)은 단어에 대한 출현 확률 정보를 포함할 수 있다. 출현 확률 정보는 단어 별로 존재할 수 있다. 디바이스(100)는 각 후보 발음열에 포함된 단어에 대한 출현 확률 정보를 문법 모델(160)로부터 획득할 수 있다.The grammar model 160 may include occurrence probability information for words. Appearance probability information may exist for each word. The device 100 may obtain information about appearance probabilities of words included in each candidate pronunciation sequence from the grammar model 160 .

예를 들어, 문법 모델(160)은 현재 단어 B가 출현하기 이전에 단어 A가 출현한 경우, 현재 단어 B가 출현할 확률에 관한 정보인 P(B|A)를 포함할 수 있다. 다시 말하면, 단어 B의 출현 확률 정보인 P(B|A)는 단어 B의 출현 전에 단어 A가 출현하는 것을 조건으로 할 수 있다. 또 다른 예로, 문법 모델(160)은 단어 B의 출현 전에 단어 A 및 C, 즉, 복수 개의 단어가 출현하는 것을 조건으로 하는 P(B|A C)를 포함할 수 있다. 다시 말하면, P(B|A C)는 단어 B의 출현 전에 단어 A와 C 모두 출현하는 것을 조건으로 할 수 있다. 또 다른 예로, 문법 모델(160)은 조건부 확률이 아닌, 단어 B에 대한 출현 확률 정보 P(B)를 포함할 수 있다. P(B)는 단어 B가 음성 인식 시 출현할 확률을 의미한다.For example, when the word A appears before the current word B appears, the grammar model 160 may include P(B|A), which is information about a probability that the current word B appears. In other words, the appearance probability information of word B, P(B|A), may be conditional on the appearance of word A before the appearance of word B. As another example, the grammar model 160 may include words A and C before the appearance of word B, that is, P(B|A C) conditional on the appearance of a plurality of words. In other words, P(B|A C) can be conditional on the occurrence of both words A and C before the occurrence of word B. As another example, the grammar model 160 may include occurrence probability information P(B) for word B, which is not a conditional probability. P(B) means the probability that word B appears during speech recognition.

디바이스(100)는 문법 모델(160)을 이용하여 언어 선택부(140)에서 각 후보 발음열과 대응되는 단어의 출현 확률 정보에 기초하여 음성 인식된 단어를 최종 결정할 수 있다. 즉, 디바이스(100)는 출현 확률 정보가 가장 높은 단어를 음성 인식된 단어로 최종 결정할 수 있다. 언어 선택부(140)는 음성 인식된 단어를 텍스트로 출력할 수 있다.The device 100 may finally determine the voice-recognized word based on the appearance probability information of a word corresponding to each candidate pronunciation sequence in the language selector 140 using the grammar model 160 . That is, the device 100 may finally determine a word having the highest occurrence probability information as a speech recognition word. The language selector 140 may output the voice-recognized words as text.

한편, 본 발명의 일 실시예에 따른 언어 선택부(140)는 문법모델(160)을 이용하여 인식된 사용자 언어에 문법상 오류가 존재하는지를 문법오류모델(170)을 활용하여 수행할 수 있다.Meanwhile, the language selector 140 according to an embodiment of the present invention may use the grammatical error model 170 to determine whether there are grammatical errors in the user language recognized using the grammatical model 160 .

즉, 본 실시예에 따른 프로세서는, 상기 사용자의 음성을 문자로 인식하기 위해 사용되는 제 1 문법 모델 (160)뿐만 아니라, 상기 사용자의 음성을 문자로 인식한 후, 문자 인식된 언어에 존재하는 문법상 오류를 인식하기 위해 사용되는 제 2 문법 모델(170)을 추가적으로 사용하는 것으로 볼 수 있다.That is, the processor according to the present embodiment includes not only the first grammar model 160 used to recognize the user's voice as a text, but also the user's voice as a text, and then the text that exists in the recognized language. It can be seen as additionally using the second grammar model 170 used to recognize grammatical errors.

상기 제 2 문법 모델(170)은 웹 서버의 데이터에 기반하여 학습 기반으로 각 문법상 오류 및 각 문법상 오류에 대응하는 문법 설명을 쌍으로써 갱신하도록 구성될 수 있다. 예를 들어, 온라인 교육 서비스 등으로 축적된 사용자가 빈번하게 발생시키는 문법 오류(예를 들어, 시제, 단/복수 표현, 구문 등)와 이러한 문법 오류에 대한 설명을 쌍으로 갱신하도록 구성될 수 있다. The second grammar model 170 may be configured to update each grammatical error and a grammatical explanation corresponding to each grammatical error in pairs based on learning based on data of the web server. For example, it may be configured to update grammatical errors (eg, tense, singular/plural expressions, syntax, etc.) frequently generated by users accumulated through online education services, and descriptions of these grammatical errors in pairs. .

물론, 문법 오류와 해당 문법 오류에 대한 설명이 1대1 관계를 가지지 않을 수 있다. 예를 들어, 도 2의 제 2 음성 반응에 대한 예에서 'be going to 구문에 대한 문법 설명'이 쌍으로 연결될 수 있으며, 'be going to 구문에 대한 문법 설명'은 도 2의 예시뿐만 아니라 다양한 문장의 문법 오류와 연결될 수 있다.Of course, there may not be a one-to-one relationship between a grammatical error and an explanation of the grammatical error. For example, in the example of the second voice response of FIG. 2, 'grammar description of the phrase 'be going to' may be paired, and 'grammar description of the phrase 'be going to' may be connected in various ways as well as the example of FIG. 2. It can be linked to grammatical errors in sentences.

도 3의 내용을 정리하면, 본 실시예에 따른 상기 프로세서는, 사용자의 음성에서 특징을 추출한 후(110), 상기 특징에 대응하는 후보 발음열을 검출하고(120), 상기 후보 발음열 중 제 1 문법 모델(160)을 이용하여 소정 확률 이상으로 조합가능한 언어를 선택하여(140), 상기 사용자의 음성을 문자로 인식하는 동작을 수행할 수 있다.Summarizing the contents of FIG. 3, the processor according to the present embodiment extracts a feature from the user's voice (110), detects a candidate pronunciation sequence corresponding to the feature (120), An operation of recognizing the user's voice as a text may be performed by selecting a language that can be combined with a predetermined probability or higher using the 1 grammar model 160 (140).

여기서, 상기 후보 발음열 중 상기 제 1 문법 모델(160)을 이용하여 조합가능한 언어를 선택하지 못하는 경우, 상기 오류가 발음상 오류인 것으로 판단할 수 있다.Here, if a language that can be combined using the first grammar model 160 is not selected from among the candidate pronunciation sequences, it may be determined that the error is a pronunciation error.

이와 달리, 상기 프로세서는, 상기 후보 발음열 중 상기 제 1 문법 모델(160)을 이용하여 조합가능한 언어를 선택하되, 상기 후보 발음열 중 최상위 확률로 인식되는 문자와 상기 선택된 언어 사이에 차이가 소정 기준 이상인 경우, 상기 오류가 발음상 오류인 것으로 판단할 수도 있다.In contrast, the processor selects a combinable language from among the candidate pronunciation sequences using the first grammar model 160, and the difference between a character recognized with the highest probability and the selected language is determined by using the first grammar model 160. If it is greater than or equal to the standard, it may be determined that the error is a pronunciation error.

상술한 설명에서는 사용자가 말하는 음성이 외국어인 경우를 가정하여 설명하였으나, 이에 한정할 필요는 없다.In the above description, it is assumed that the voice spoken by the user is in a foreign language, but it is not necessary to be limited thereto.

예를 들어, 본 발명의 일 실시예에 따른 프로세서는, 상기 사용자 음성 중 모국어 음성과 외국어 음성을 구분할 수 있다. 프로세서가 상기 사용자 음성이 상기 외국어 음성으로 판단되는 경우, 도 2 내지 도 3과 관련하여 상술한 동작을 수행하도록 구성될 수 있다.For example, the processor according to an embodiment of the present invention may distinguish between a native language voice and a foreign language voice among the user voices. When the processor determines that the user's voice is the foreign language voice, the processor may be configured to perform the above-described operation with respect to FIGS. 2 and 3 .

한편, 본 발명의 일 실시예에서, 상기 사용자의 음성은 상기 사용자 기기의 동작 모드를 변경하기 위한 명령어를 포함할 수 있다. 상기 명령어에 따라 상기 프로세서는 (1) 1 문장 단위로 상기 사용자의 음성을 처리하여 상기 음성 반응을 상기 스피커에 제공하는 문답 모드, 또는 (2) 소정 시간 상기 사용자의 음성을 지속적으로 기록한 후 상기 음성 반응을 상기 스피커에 제공하는 장문 기록 모드로 선택적으로 동작할 수 있다.Meanwhile, in one embodiment of the present invention, the user's voice may include a command for changing the operation mode of the user device. According to the command, the processor performs (1) a question-and-answer mode in which the user's voice is processed in sentence-by-sentence units and the voice response is provided to the speaker, or (2) the user's voice is continuously recorded for a predetermined time and then the voice is It can optionally operate in a palm print mode providing a response to the speaker.

(1) 문답 모드(1) Q&A mode

상술한 바와 달리 사용자가 사전에 휴대폰, 노트북 등 사용자 기기에 외국어 별로 해당 모드를 설명해 둘 수 있다. 대체로 해당 외국어의 초급 단계에서는 장문의 문장을 이야기하기 어려우며, 상황에 맞게 본 실시예에 따른 사용자 기기의 스피커를 통해 1문장의 질문이 제공되고, 이에 대해 사용자의 1문장 답변을 수신하여 상술한 외국어 교육 서비스를 제공할 수 있다.Unlike the foregoing, the user may explain the corresponding mode for each foreign language on the user device, such as a mobile phone or a laptop computer, in advance. In general, it is difficult to speak long sentences at the beginner level of the foreign language, and a one-sentence question is provided through the speaker of the user device according to the present embodiment according to the situation, and the user's one-sentence answer is received and the above-mentioned foreign language We can provide educational services.

문답 모드에서는 사용자의 문법 오류와 발음 오류 중 발음 오류의 확률을 높이는 것이 바람직하나, 이에 한정될 필요는 없다.In the question-and-answer mode, it is desirable to increase the probability of a user's pronunciation error among grammatical errors and pronunciation errors, but it is not necessary to be limited thereto.

(2) 장문 기록 모드(2) Long Record Mode

상술한 바와 달리 사용자가 사전에 휴대폰, 노트북 등 사용자 기기에 외국어별로 해당 모드를 설명해 둘 수 있음은 동일하다. 대체로 해당 외국어의 고급 단계에서는 예를 들어, 일기를 쓰듯이 약 1분 가량의 사용자 speech를 수신하고, 이에 대한 일반적인 대화 (예: 상술한 제3음성 응답)을 제공하는 것이 교육상 효율적일 수 있다. Unlike the above, it is the same that the user can explain the corresponding mode for each foreign language on the user device such as a mobile phone or a laptop computer in advance. In general, at the advanced level of the foreign language, for example, it may be effective for education to receive about 1 minute of user speech as if writing a diary and to provide a general conversation (eg, the above-described third voice response).

사용자와 외국어로 자연어 대화를 하는 와중에 기 설정된 빈도로 사용자의 speech 중 문법상 개선사항을 제공해 줄 수 있다. 물론, 본 실시예에 따른 사용자 기기(예: AI 스피커)와 대화를 통해 문법 지적 빈도수를 조절할 수도 있다. 예를 들어, “Bob (스피커 캐릭터 이름), reduce the frequency of error correction” 등을 이야기하여 빈도수를 조절할 수 있다.While having a natural language conversation with the user in a foreign language, grammatical improvement in the user's speech may be provided at a predetermined frequency. Of course, the grammar point frequency may be adjusted through a conversation with a user device (eg, an AI speaker) according to the present embodiment. For example, you can adjust the frequency by saying “Bob (speaker character name), reduce the frequency of error correction”.

한편, 본 발명의 일 실시예에서는 상술한 모드들 이외에 하기와 같은 교육 모드를 제공할 수 있다.Meanwhile, in one embodiment of the present invention, the following education mode may be provided in addition to the above-described modes.

(3) 표현 변환(3) expression conversion

동일한 의미를 가지는 사용자 음성 표현에 대해 조금 더 격식있는 표현을 알려주거나, 더 Casual한 표현을 알려주는 방식으로 다양한 표현을 알려줄 수 있다.Various expressions can be informed by notifying a more formal expression or a more casual expression for a user voice expression having the same meaning.

예를 들어, 사용자가 'I'm going to finish this patent specification drafting'이라고 이야기하였을 때, 본 실시예에 따른 사용자 기기는 기 설정된 모드에 따라, 또는 사용자의 별도 문의에 따라 이를 더 Casual한 표현으로서 'I'm gonna finish this patent specification drafting'을 알려줄 수 있으며, 그 반대로 사용자의 음성 표현을 더 격식있는 표현으로 변경하여 알려줄 수도 있다.For example, when the user says 'I'm going to finish this patent specification drafting', the user device according to the present embodiment converts it into a more casual expression according to a preset mode or a separate inquiry from the user. It can tell you 'I'm gonna finish this patent specification drafting', or vice versa by changing your speech to something more formal.

유사한 기능으로서 사용자의 영어 표현에 대해 모범적인 미국 영어 발음, 영국 영어 발음 등을 제시해 줄 수도 있다.As a similar function, it can suggest exemplary American English pronunciation and British English pronunciation for the user's English expression.

(4) 강의 모드 (4) Lecture mode

AI 스피커로 대표되는 본 발명의 일 실시예에 따른 사용자 기기를 통해 사용자에게 강의를 제공할 수 있다. 강의 중 사용자의 음성 답변을 요구하는 상호작용이 있을 수 있다.A lecture may be provided to a user through a user device according to an embodiment of the present invention represented by an AI speaker. During a lecture, there may be interactions that require a user's voice response.

음성 강의 중 시각적인 표시가 필요한 경우 후술하는 바와 같이 IoT 방식 등 무선통신 방식으로 연결된 TV, 모니터 등을 통해 시각자료를 표시할 수 있다. 이러한 시각자료 표시에 대해 사용자가 현 시점에서 모니터/TV 등에 접근하기 어려운 경우, 이를 생략하고 진행하도록 명령할 수 있다.If visual display is required during audio lectures, as will be described later, visual data can be displayed through a TV, monitor, etc. connected through a wireless communication method such as an IoT method. If it is difficult for the user to access the monitor/TV at the present time regarding the display of such visual data, it may be commanded to skip it and proceed.

또한, 이러한 강의 모드에서, 또는 이와 별도로 본 발명의 일 실시예에 따른 사용자 기기는 사용자의 교육 스케줄링 서비스를 제공할 수 있으며, 교육 과정 별 성취도에 따라 제공되는 교육 내용을 변경하여 제안할 수도 있다.In addition, in this lecture mode or separately, the user device according to an embodiment of the present invention may provide a user's education scheduling service, and may change and suggest educational contents provided according to the achievement level of each educational course.

(5) 사전 모드(5) advance mode

사용자가 수시로 외국어 단어, 구문의 의미를 음성으로 문의할 수 있으며, 이에 대해 본 실시예에 따른 사용자 기기는 웹에 연결되어 해당 단어, 구문의 의미/정확한 발음 등을 사용자에게 알려줄 수 있다.A user may frequently inquire about the meaning of a word or phrase in a foreign language, and in response to this, the user device according to the present embodiment can notify the user of the meaning/accurate pronunciation of the word or phrase by being connected to the web.

도 4는 본 발명의 일 실시예에 따른 사용자 기기를 통해 댁내 기기를 제어하는 개념을 설명하기 위한 도면이다.4 is a diagram for explaining a concept of controlling an in-house device through a user device according to an embodiment of the present invention.

도 4에 도시된 바와 같이, 사용자 기기(10)는 IoT 방식으로 하나 이상의 댁내 기기(201, 202, 203, 204, 205, 206)와 무선 연결될 수 있으며, 사용자 기기(10)와 연결될 수 있는 댁내 기기는 도 4에 도시된 휴대폰(201), 다른 AI 스피커(202), 세탁기(203), 로봇 청소기(204), 에어컨(205), 냉장고 (206) 등을 포함할 수 있으며, 이에 한정될 필요는 없다. As shown in FIG. 4, the user device 10 may be wirelessly connected to one or more in-home devices 201, 202, 203, 204, 205, and 206 in an IoT manner, and The device may include a mobile phone 201 shown in FIG. 4, another AI speaker 202, a washing machine 203, a robot vacuum cleaner 204, an air conditioner 205, a refrigerator 206, and the like, and need to be limited thereto. There is no

본 실시예에 따른 사용자 기기(10)는 상기 사용자 음성 중 모국어 음성 또는 외국어 음성 중 하나 이상을 통해 상기 댁내 기기(201, 202, 203, 204, 205, 206)를 제어할 수 있다. 또한, 사용자의 외국어 음성을 통해 댁내 기기(201, 202, 203, 204, 205, 206)를 제어하는 과정에서 도 2 내지 도 3과 관련하여 상술한 외국어 교육을 제공할 수도 있다.The user device 10 according to the present embodiment may control the in-house devices 201 , 202 , 203 , 204 , 205 , and 206 through at least one of a native language voice and a foreign language voice among the user voices. In addition, in the process of controlling the in-house devices 201, 202, 203, 204, 205, and 206 through the user's foreign language voice, the foreign language education described above with reference to FIGS. 2 and 3 may be provided.

도 5는 본 발명의 일 실시예에 따라 외국어 교육을 제공하는 방법을 설명하기 위한 도면이다.5 is a diagram for explaining a method of providing foreign language education according to an embodiment of the present invention.

도 4까지의 설명에서 사용자 기기는 AI 스피커 개념의 사용자 기기를 가정하였으나, 본 발명의 다른 실시예에 따른 사용자 기기는 사용자가 노트북, 데스크탑 등 모니터와 같은 디스플레이 수단(400)을 구비한 다른 형태의 사용자 기기를 통해서도 제공될 수 있다.In the description up to FIG. 4, it is assumed that the user device is an AI speaker user device, but the user device according to another embodiment of the present invention allows the user to use a laptop computer, a desktop computer, or other type of display device having a display unit 400 such as a monitor. It may also be provided through a user device.

예를 들어, 본 실시예에 따른 사용자 기기의 온라인 접속을 통해 외국어 교육을 제공하는 방법은, 상기 사용자 기기의 마이크를 통해 입력된 사용자의 외국어 문장 음성 발음을 인식하여 문자로 변환하고, 상기 문자로 변환된 문장의 오류를 판단할 수 있다. For example, in the method of providing foreign language education through online access of a user device according to the present embodiment, a user's foreign language sentence voice pronunciation input through a microphone of the user device is recognized, converted into text, and converted into text. Errors in converted sentences can be judged.

상기 오류가 있는 부분이 문법상 오류인 경우, 상기 오류가 있는 부분에 대응하는 기 마련된 문법 설명을 상기 오류가 있는 부분의 강조 표시(410)와 함께 상기 사용자 기기의 표시장치(400)에 도 5에 예시된 바와 같이 표시할 수 있다. 즉, 도 5에 도시된 예에서 'I'와 'gonna' 사이에 'be 동사'가 빠져 있는 문법상 오류가 있기 때문에 'I gonna' 부분 (410)을 오류가 있는 부분으로서 강조 표시하고, 이에 대한 반응으로서 “Good luck Aaron, go for it ! But, you'd better put 'be verb' in front of 'gonna'를 표시해 줄 수도 있다.If the erroneous part is a grammatical error, a prepared grammatical explanation corresponding to the erroneous part is shown in FIG. can be displayed as shown in That is, in the example shown in FIG. 5, since there is a grammatical error in which the 'be verb' is missing between 'I' and 'gonna', the 'I gonna' part 410 is highlighted as an error part, and thus In response, “Good luck Aaron, go for it! But, you'd better put 'be verb' in front of 'gonna'.

아울러, 상술한 예에서 “[be going to] 구문에 대한 문법 설명보기 (420)”를 추가적으로 표시하고, 해당 설명보기(420)를 클릭하면, 대응하는 구체적 문법 설명 또는 문법 강의를 수강할 수 있도록 할 수 있다.In addition, in the above example, “View grammar explanation 420 for the [be going to] phrase” is additionally displayed, and when the corresponding view explanation 420 is clicked, the corresponding specific grammar explanation or grammar lecture can be taken. can do.

한편, 상기 오류가 있는 부분이 발음상 오류인 경우, 상기 오류가 있는 부분을 대체할 수 있는 표현을 상기 오류가 있는 부분의 강조 표시와 함께 상기 사용자 기기의 표시장치(400)에 표시하여 추천하고, 상기 추천 표현을 포함하는 예시적인 문장의 추천 발음을 상기 사용자 기기의 스피커를 통해 제시할 수 있다. 이를 통해 사용자에게 추천 발음을 통해 모범적인 발음에 대한 교육을 제공할 수 있다. 이러한 모범적인 발음은 단어 사이의 연음/묵음 관계를 고려하여 사용자에게 보다 유창한 발음으로서 제공해 줄 수 있다. On the other hand, if the part with the error is a pronunciation error, an expression that can replace the part with the error is displayed on the display device 400 of the user device together with the highlight of the part with the error to recommend it. , a recommended pronunciation of an exemplary sentence including the recommended expression may be presented through a speaker of the user device. Through this, it is possible to provide education on exemplary pronunciation to the user through recommended pronunciation. Such exemplary pronunciation can be provided as a more fluent pronunciation to the user by considering the liaison/silent relationship between words.

한편, 상기 오류가 있는 부분을 대체할 수 있는 표현을 추천하는 것은, 복수의 표현을 추천하여 상기 사용자 기기의 표시장치(400)에 표시하고, 상기 사용자 기기의 입력 장치를 통해 선택된 표현을 상기 추천 표현으로 결정하며, 상기 사용자의 외국어 문장 음성 발음 중 오류가 있는 부분과 상기 결정된 추천 표현 사이의 맵핑 관계를 상기 외국어 교육을 제공하는 서버의 데이터 베이스에 저장하여 인공지능 기반으로 상기 서버를 학습시키는 것을 포함할 수도 있다. 이러한 사용자와의 상호 작용으로서 본 실시예에 따른 사용자 기기는 특정 사용자에 맞도록 스스로 진화할 수 있다.On the other hand, recommending an expression that can replace the erroneous part recommends a plurality of expressions and displays them on the display device 400 of the user device, and recommends an expression selected through the input device of the user device. expression, storing the mapping relationship between the erroneous part of the user's foreign language sentence voice pronunciation and the determined recommended expression in the database of the server providing foreign language education, and learning the server based on artificial intelligence. may also include As such an interaction with the user, the user device according to the present embodiment can evolve itself to suit a specific user.

한편, 상기 오류가 있는 부분과 상기 기 마련된 문법 설명의 대응 데이터는 인공지능 방식으로 상기 외국어 교육을 제공하는 서버의 데이터 베이스에 증가시키며, 상기 오류가 있는 부분이 문법상 오류인 것으로 판단되는 경우는, 상기 오류가 있는 부분과 대응하는 기 마련된 문법 설명이 존재하는 경우일 수 있다.On the other hand, the corresponding data of the erroneous part and the prepared grammatical explanation is increased in the database of the server providing the foreign language education by artificial intelligence, and when it is determined that the erroneous part is a grammatical error, , it may be a case where there is a pre-prepared grammar explanation corresponding to the part with the error.

상술한 바와 같은 본 발명의 다양한 실시예들은 본 발명의 구체적 이해를 위해 제공된 것이며, 본 발명은 상술한 설명의 구체적 일례에 한정되지 않고 명세서 전체에서 제공하는 외국어 교육용 인공지능 기능을 구비한 사용자 기기 및 외국어 교육 방법에 기반하여 판단되어야 한다.Various embodiments of the present invention as described above are provided for a specific understanding of the present invention, and the present invention is not limited to the specific examples of the above description, and user devices having artificial intelligence functions for foreign language education provided throughout the specification, and It should be judged based on the foreign language teaching method.

상술한 바와 같은 본 발명에 따른 외국어 교육용 인공지능 기능을 구비한 사용자 기기 및 외국어 교육 방법은 비대면 시대에 새로운 교육 방식으로서 광범위하게 활용될 수 있다.As described above, the user device and foreign language education method equipped with an artificial intelligence function for foreign language education according to the present invention can be widely used as a new education method in the non-face-to-face era.

Claims

In a user device having an artificial intelligence function for foreign language education,
a voice input device that receives a user's voice;
a processor for recognizing and processing the user's voice as text; and
A speaker for providing a voice response to a user according to the processing of the processor;
the processor,
(a) determining whether the user's voice error is a pronunciation error or a grammatical error;
(b) when the error is a pronunciation error, providing a first audio response including an expression capable of replacing the erroneous part through the speaker;
(c) if the error is a grammatical error, providing a second audio response including a pre-prepared grammatical explanation corresponding to the erroneous part through the speaker;
The user's voice includes a command for changing the operation mode of the user device;
According to the command, the processor performs (1) a question-and-answer mode in which the user's voice is processed in sentence-by-sentence units and the voice response is provided to the speaker, or (2) the user's voice is continuously recorded for a predetermined time and then the voice is and selectively operating in a palm print mode providing a response to the speaker.

According to claim 1,
the processor,
A first grammar model used to recognize the user's voice as text; and
After recognizing the user's voice as text, the user device uses a second grammar model used to recognize grammatical errors present in the text-recognized language.

According to claim 2,
wherein the second grammar model is configured to update each grammatical error and a grammatical description corresponding to each grammatical error in pairs on a learning basis based on data of a web server.

According to claim 1,
the processor,
If there is no error, the user device provides a third voice response corresponding to the content of the user's language recognized as text and the content of conversation through the speaker.

delete