KR20220124351A

KR20220124351A - System and method for providing language learning service based on artificial intelligence

Info

Publication number: KR20220124351A
Application number: KR1020210027825A
Authority: KR
Inventors: 정인명
Original assignee: 정인명
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2022-09-14

Abstract

The present invention relates to a system and method for providing a language learning service. According to one embodiment of the present invention, the system for providing a language learning service includes: a service server providing image data and subtitle data corresponding to the image data; and a user terminal recognizing and evaluating at least one between voice data and text data input by a user in a section pre-set based on the corresponding subtitle data when playing an image based on the image data and performing continuous playing of the image or repetition learning according to a result of the evaluation.

Description

SYSTEM AND METHOD FOR PROVIDING LANGUAGE LEARNING SERVICE BASED ON ARTIFICIAL INTELLIGENCE

본 발명은 인공지능 기반 언어 학습 서비스 제공 시스템 및 방법에 관한 것으로, 보다 상세하게는 영상 및 자막을 기반으로 미디어 속 인물과 대화가 가능하도록 하는 인공지능 기반 언어 학습 서비스 제공 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for providing an artificial intelligence-based language learning service, and more particularly, to a system and method for providing an artificial intelligence-based language learning service that enables conversation with a person in media based on images and subtitles.

최근 인터넷이나 영상 매체를 통하여 언어를 학습하는 방법이 다수 제시되고 있는데, 주로 학습하고자 하는 언어에 대한 단어, 문장, 그리고 문법 등을 제공하며, 특히 언어 듣기능력 향상의 일환으로 단어, 혹은 문장 등에 대해서 원어민의 발음을 제공하기도 한다.Recently, a number of methods of learning a language have been proposed through the Internet or video media. It mainly provides words, sentences, and grammar for the language to be learned. In particular, as part of improving language listening ability, It also provides the pronunciation of native speakers.

그러나, 외국어를 단어나 문장 단위씩 모국어로 단순히 해석을 보여주거나 긴 문장을 반복적으로 듣게 하면, 의미도 이해하지 못한 채 듣게되어 학습효과도 떨어지고 쉽게 지루해질 수 있다. 또한 단어의 사전적 의미만을 이해하고 발음을 익힌다면 영작이나 외국인과의 회화시에 제대로 활용할 수 없어 외국어학습의 목적을 이룰 수 없고, 의미도 이해하지 못하는 긴 문장을 반복적으로 듣는다고 해서 그 표현을 자동적으로 활용할 수 있는 것도 아니므로 오히려 학습에 흥미를 잃게 하고 지치게 만들 뿐이었다.However, if a foreign language is simply interpreted in the mother tongue in units of words or sentences, or if a long sentence is repeatedly heard, the learning effect may be reduced and the learning effect may be reduced and boredom may occur. In addition, if you understand only the dictionary meaning of a word and learn the pronunciation, you cannot use it properly for English writing or conversation with foreigners, so you cannot achieve the purpose of learning a foreign language. It is not something that can be used as a tool, so it only makes them lose interest in learning and make them tired.

또한, 대부분 단일의 학습제공자와 단일 또는 다수의 학습자 간에 모니터의 화면을 통하여 강의를 진행하는 방식에 의하여 진행되었으므로, 학습자의 이해 수준에 따라 정확하고 효율적인 언어 학습이 진행되지 않게 됨은 물론, 일방적인 주입식 학습의 형태로 진행되므로 학습자의 개인적인 성취도를 최대로 이끌어 낼 수 없는 등의 문제점이 있었다.In addition, since most lectures were conducted through a monitor screen between a single learning provider and a single or multiple learners, accurate and efficient language learning does not proceed according to the learner's level of understanding, as well as a unilateral injection method. Since it is conducted in the form of learning, there were problems such as not being able to bring out the learner's personal achievement to the maximum.

따라서, 애니메이션, 영화, 드라마 등의 영상을 기반으로 언어를 학습할 수 있도록 하되, 인공지능을 이용하여 사용자의 음성 또는 문자를 인식하도록 함으로써 대화하는 방식으로 언어를 학습할 수 있도록 하는 기술이 개발될 필요가 있다. Therefore, a technology that enables language learning based on images such as animation, movies, and dramas, but uses artificial intelligence to recognize the user's voice or text, so that the language can be learned in a conversational manner will be developed. There is a need.

한국공개특허공보 제10-2003-0079497호(공개일: 2003년 10월 10일)Korean Patent Application Laid-Open No. 10-2003-0079497 (published on October 10, 2003)

본 발명은 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, 영상을 기반으로 언어를 학습할 수 있도록 하되, 사용자로부터 입력되는 음성 또는 문자를 인식하여 대화하는 방식으로 언어를 학습할 수 있도록 함으로써, 사용자의 흥미를 유발하는 동시에 학습 능률을 향상시킬 수 있도록 하는 언어 학습 서비스 제공 시스템 및 방법을 제공함에 있다.The present invention has been proposed in order to solve the above problems, so that language can be learned based on an image, and the language can be learned in a conversational manner by recognizing a voice or text input from a user, An object of the present invention is to provide a system and method for providing a language learning service that arouses user's interest while improving learning efficiency.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 시스템은, 영상 데이터 및 상기 영상 데이터에 대응하는 자막 데이터를 제공하는 서비스 서버; 및 상기 영상 데이터를 기반으로 영상을 재생할 시, 대응하는 자막 데이터를 기반으로 미리 설정된 구간에서 사용자에 의해 입력되는 음성 데이터 및 문자 데이터 중 적어도 하나를 인식 및 평가하고, 그 평가 결과에 따라 영상을 계속 재생하거나 반복 학습을 수행하도록 하는 사용자 단말을 포함한다.According to an embodiment of the present invention, there is provided a system for providing a language learning service, comprising: a service server providing image data and caption data corresponding to the image data; and when reproducing an image based on the image data, recognizes and evaluates at least one of voice data and text data input by the user in a preset section based on the corresponding subtitle data, and continues the image according to the evaluation result It includes a user terminal for playing or performing repetitive learning.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명에 의하면, 영상을 기반으로 언어를 학습할 수 있도록 하되, 사용자로부터 입력되는 음성 또는 문자를 인식하여 대화하는 방식으로 언어를 학습할 수 있도록 함으로써, 사용자의 흥미를 유발하는 동시에 학습 능률을 향상시킬 수 있도록 한다.According to the present invention, it is possible to learn a language based on an image, but by recognizing a voice or text input from a user to learn a language in a conversational manner, thereby stimulating the user's interest and improving the learning efficiency. make it possible

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 시스템의 네트워크 구성도이다. 도 2는 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 방법을 나타내는 순서도이다. 도 3a 내지 도 3f는 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 방법을 기반으로 사용자가 학습을 수행함에 따라 변화하는 사용자 단말의 디스플레이를 나타내는 도면이다.1 is a network configuration diagram of a language learning service providing system according to an embodiment of the present invention. 2 is a flowchart illustrating a method of providing a language learning service according to an embodiment of the present invention. 3A to 3F are diagrams illustrating a display of a user terminal that changes as a user performs learning based on a method of providing a language learning service according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully inform those skilled in the art of the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless clearly specifically defined.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소들과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용시 또는 동작시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. 예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.Spatially relative terms "below", "beneath", "lower", "above", "upper", etc. It can be used to easily describe the correlation between a component and other components. A spatially relative term should be understood as a term that includes different directions of components during use or operation in addition to the directions shown in the drawings. For example, when a component shown in the drawing is turned over, a component described as “beneath” or “beneath” of another component may be placed “above” of the other component. can Accordingly, the exemplary term “below” may include both directions below and above. Components may also be oriented in other orientations, and thus spatially relative terms may be interpreted according to orientation.

명세서에서 사용되는 "부" 또는 "모듈"이라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부" 또는 "모듈"은 어떤 역할들을 수행한다. 그렇지만 "부" 또는 "모듈"은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부" 또는 "모듈"은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부" 또는 "모듈"은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부" 또는 "모듈"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부" 또는 "모듈"들로 결합되거나 추가적인 구성요소들과 "부" 또는 "모듈"들로 더 분리될 수 있다.As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles. However, "part" or "module" is not meant to be limited to software or hardware. A “part” or “module” may be configured to reside on an addressable storage medium and may be configured to reproduce one or more processors. Thus, by way of example, “part” or “module” refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, Includes procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. Components and functionality provided within “parts” or “modules” may be combined into a smaller number of components and “parts” or “modules” or additional components and “parts” or “modules”. can be further separated.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 시스템의 네트워크 구성도이다.1 is a network configuration diagram of a language learning service providing system according to an embodiment of the present invention.

도 1을 참조하면, 언어 학습 서비스 제공 시스템(100)은 서비스 서버(110) 및 사용자 단말(130)을 포함할 수 있다. 여기서, 사용자 단말(130)은 태블릿 PC, 스마트폰, 웨어러블 기기, 랩탑, 데이스탑 등이 될 수 있으며, 이를 한정하지 않는다.Referring to FIG. 1 , the language learning service providing system 100 may include a service server 110 and a user terminal 130 . Here, the user terminal 130 may be a tablet PC, a smart phone, a wearable device, a laptop, a daystop, and the like, but is not limited thereto.

서비스 서버(110)은 언어 학습 서비스 제공을 위한 서버로서, 사용자 단말(130)로 언어 학습을 위한 영상 데이터와 그 영상 데이터에 대응하는 자막 데이터를 제공한다.The service server 110 is a server for providing a language learning service, and provides image data for language learning and caption data corresponding to the image data to the user terminal 130 .

여기서, 자막 데이터는 일정 길이를 갖는 문장(텍스트)들로 구성되는 자막 세트를 포함하는데, 그 자막 세트에 포함된 각 문장들은 영상의 재생 시간에 따른 타임라인을 갖는다. 또한, 자막 데이터는 복수개의 자막 세트를 포함할 수 있는데, 이 경우 복수개의 자막 세트는 각각 상이한 언어로 구성될 수 있다. Here, the subtitle data includes a subtitle set composed of sentences (texts) having a predetermined length, and each sentence included in the subtitle set has a timeline according to the reproduction time of the image. Also, the subtitle data may include a plurality of subtitle sets. In this case, the plurality of subtitle sets may be configured in different languages.

예를 들어, 자막 정보는 제1 언어를 한국어로 하는 제1 자막 세트와, 영어, 중국어, 일본어 등을 제2 언어로 하여 제1 자막 세트를 각각 번역한 제2 자막 세트들을 포함하도록 구성될 수 있다. 이때, 언어의 종류 및 개수에는 제한이 없으며, 이를 한정하지 않는다. 또한, 사용자의 설정에 따라, 영상 재생 시 복수개의 자막 세트 중 적어도 하나의 자막 세트가 사용자 단말(130)의 디스플레이 상에 표시될 수도 있고, 자막 세트 없이 영상만 재생되도록 할 수도 있다.For example, the subtitle information may be configured to include a first subtitle set in which the first language is Korean, and second subtitle sets in which the first subtitle set is translated into English, Chinese, Japanese, etc. as a second language, respectively. have. In this case, there is no limitation on the type and number of languages, and this is not limited thereto. In addition, according to a user's setting, at least one subtitle set among a plurality of subtitle sets may be displayed on the display of the user terminal 130 when an image is reproduced, or only an image may be reproduced without a subtitle set.

한편, 영상 데이터의 확장자는 avi, mov, wmv, asf, mpeg 및 flv 중 어느 하나일 수 있으며, 자막 데이터의 확장자는 smi 및 srt 중 어느 하나일 수 있다. Meanwhile, the extension of the image data may be any one of avi, mov, wmv, asf, mpeg, and flv, and the extension of the subtitle data may be any one of smi and srt.

사용자 단말(130)은 언어 학습을 수행하고자 사용자가 사용하는 단말로서, 서비스 서버(110)로부터 영상 데이터 및 자막 데이터를 제공받아 언어 학습을 수행한다. 이때, 사용자 단말(130)은 웹페이지 또는 별도의 어플리케이션을 기반으로 언어 학습을 수행할 수 있다.The user terminal 130 is a terminal used by a user to perform language learning, and performs language learning by receiving image data and subtitle data from the service server 110 . In this case, the user terminal 130 may perform language learning based on a web page or a separate application.

구체적으로, 사용자 단말(130)은 서비스 서버(110)로부터 영상 데이터와 그에 대응하는 자막 데이터가 다운로드(download) 되면, 영상 데이터를 기반으로 영상을 재생하고, 재생 데이터를 기반으로 타임라인에 따라 자막을 출력한다.Specifically, when the image data and the subtitle data corresponding thereto are downloaded from the service server 110 , the user terminal 130 reproduces the image based on the image data, and displays the subtitles according to the timeline based on the reproduction data. to output

사용자는 영상을 재생하기 이전에 영상에 등장하는 등장인물을 지정하거나, 특정 문장을 선택하여 자신이 학습하고자 하는 학습 구간들을 설정할 수 있다. Before playing the video, the user can designate a character appearing in the video or select a specific sentence to set the learning sections in which he or she wants to learn.

사용자 단말(130)은 그 학습 구간에 대응하는 자막이 출력되어야 하는 시점에서는 스피커 출력을 정지하거나 볼륨(volume)을 미리 설정된 크기로 낮춰 출력하되, 사용자의 음성 및/또는 문자 인식을 위한 API(Application Programming Interface)를 호출한다.The user terminal 130 stops outputting the speaker or lowers the volume to a preset size at the point in time when the subtitle corresponding to the learning section is to be output, but outputs the output, but an API (Application Application for Recognition of the user's voice and/or text) programming interface).

사용자 단말(130)은 사용자가 발화함에 따라 입력되는 음성 데이터 및/또는 사용자가 문자를 입력함에 따라 입력되는 문자 데이터를 기반으로 음성 및/또는 문자를 인식하고, 미리 설정된 알고리즘을 이용하여 그 유사도(정확도)를 판단한다. 다시 말해, 원본 데이터인 영상 데이터 및 자막 데이터를 기반으로 하여 입력되는 음성 데이터 또는 문자 데이터의 유사도(정확도)를 판단한다.The user terminal 130 recognizes voice and/or text based on voice data input as the user utters and/or text data input as the user inputs text, and uses a preset algorithm to determine the similarity ( accuracy). In other words, the similarity (accuracy) of the input voice data or text data is determined based on the original data, that is, the image data and the caption data.

그러나, 이는 하나의 실시예 일 뿐, 음성 및 문자 중 어느 하나만을 인식하도록 설정하거나, 동시에 인식하도록 설정할 수도 있으며, 이를 한정하지 않는다. 여기서, 문자를 인식하도록 설정된 경우에는 사용자는 사용자 단말(130)의 디스플레이 상에 표시되는 키보드를 터치하거나, 사용자 단말(130)에 구비된 버튼을 누름으로써 문자 데이터를 입력할 수도 있고, 그 디스플레이 상에 터치펜을 이용하여 문자를 직접 기재함으로써 문자 데이터를 입력할 수도 있다. 추가적으로, 음성을 인식하도록 설정된 경우에는 입력되는 음성 데이터를 문자로 변환하여 사용자 단말(130)의 디스플레이 상에 표시하여 나타낼 수도 있다.However, this is only one embodiment, and it may be set to recognize only one of voice and text, or may be set to recognize at the same time, but is not limited thereto. Here, when it is set to recognize text, the user may input text data by touching a keyboard displayed on the display of the user terminal 130 or pressing a button provided in the user terminal 130 , and may be displayed on the display. Character data can also be input by directly writing characters using a touch pen. Additionally, when the voice recognition is set, the input voice data may be converted into text and displayed on the display of the user terminal 130 .

한편, 사용자 단말(130)은 유사도 판단 결과에 따라 영상을 계속 재생 여부를 결정할 수 있다. 이때, 음성 데이터 또는 문자 데이터를 원본 데이터와 비교하여 유사도를 나타내는 수치값을 산출하고, 그 산출된 수치값이 미리 설정된 임계값 이상인 경우에는 영상을 계속 재생하도록 하고, 그 산출된 수치값이 미리 설정된 임계값 미만인 경우에는 반복 학습을 하도록 할 수 있다. Meanwhile, the user terminal 130 may determine whether to continue playing the image according to the similarity determination result. At this time, a numerical value indicating the degree of similarity is calculated by comparing the voice data or text data with the original data, and if the calculated numerical value is equal to or greater than a preset threshold value, the image is continuously played, and the calculated numerical value is preset If it is less than the threshold value, iterative learning may be performed.

예를 들어, 입력된 음성 데이터 또는 문자 데이터에 포함된 단어와 원본 데이터에 포함된 단어가 일치하는 비율을 산출하고, 그 비율이 미리 설정된 임계값 이상이면 영상을 계속 재생하도록 하고, 그 비율이 미리 설정된 임계값 미만이면 반복 학습을 하도록 한다. For example, a ratio between a word included in the input voice data or text data and a word included in the original data is calculated, and if the ratio is greater than or equal to a preset threshold, the image is continuously played, and the ratio is set in advance. If it is less than the set threshold, iterative learning is performed.

도 2는 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 방법을 나타내는 순서도이다.2 is a flowchart illustrating a method of providing a language learning service according to an embodiment of the present invention.

도 2를 참조하면, 사용자 단말(130)은 서비스 서버(110)로부터 영상 데이터 및 그에 대응하는 자막 데이터를 다운로드 하여 입력(저장)하고(S201), 사용자로부터 재생 버튼이 터치(클릭)되는 것을 감지하면(S203), 자막 데이터가 존재하는지 여부를 확인한다(S205).Referring to FIG. 2 , the user terminal 130 downloads image data and subtitle data corresponding thereto from the service server 110 and inputs (stores) it (S201), and detects that the user touches (clicks) the play button. At the bottom (S203), it is checked whether there is subtitle data (S205).

S205 단계에서의 확인 결과, 자막 데이터가 존재하지 않으면 영상 데이터를 기반으로 단순 영상 재생을 수행하고(S207), 자막 데이터가 존재하면 자막 기반의 음성인식 API 기능이 설정 허용되었는지 여부를 확인한다(S209).As a result of checking in step S205, if there is no caption data, simple video playback is performed based on the video data (S207), and if caption data exists, it is checked whether the caption-based voice recognition API function is allowed to be set (S209) ).

S209 단계에서의 확인 결과, 자막 기반의 음성인식 API 기능이 설정 허용되지 않았다면, 자막 기반의 문자인식 API 기능이 설정 허용되었는지 여부를 확인한다(S211). As a result of the check in step S209, if the subtitle-based voice recognition API function is not allowed to be set, it is checked whether the subtitle-based character recognition API function is set allowed (S211).

S211 단계에서의 확인 결과, 자막 기반의 문자인식 API 기능 또한 설정 허용되지 않았다면, 단순 영상 재생을 수행하고(S207), 자막 기반의 문자인식 API 기능은 설정 허용되어 있는 상태라면 영상 재생 시 문자 인식을 위한 API가 동작하도록 한다(S219).As a result of checking in step S211, if the subtitle-based character recognition API function is also not allowed to be set, simple video playback is performed (S207), and if the subtitle-based character recognition API function is set, character recognition is performed when playing the video. to operate the API (S219).

한편, S209 단계에서의 확인 결과, 자막 기반의 음성인식 API 기능이 설정 허용되었다면, 자막 기반의 문자인식 API 기능이 설정 허용되었는지 여부를 확인한다(S213).On the other hand, as a result of checking in step S209, if the subtitle-based voice recognition API function is allowed to be set, it is checked whether the subtitle-based character recognition API function is set allowed (S213).

S213 단계에서의 확인 결과, 자막 기반의 문자인식 API 기능이 설정 허용되었다면, 영상 재생 시 음성인식을 위한 API와 문자인식을 위한 API가 모두 동작하도록 하고(S215), 자막 기반의 문자인식 API 기능이 설정 허용되지 않았다면, 영상 재생 시 음성인식을 위한 API가 동작하도록 한다(S217).As a result of checking in step S213, if the subtitle-based character recognition API function is allowed to be set, both the API for voice recognition and the API for character recognition operate during video playback (S215), and the subtitle-based character recognition API function is enabled. If the setting is not allowed, the API for voice recognition is operated during video playback (S217).

즉, 사용자가 음성인식과 문자인식을 모두 허용한 상태인지 음성인식과 문자인식 중 어느 하나만을 허용한 상태인지 여부에 따라 영상 재생 시 음성인식을 위한 API와 문자인식을 위한 API가 함께 동작하거나 어느 하나만 선택적으로 동작하도록 한다.That is, the API for voice recognition and the API for character recognition operate together or which Make only one optional operation.

한편, 도 2에는 영상을 재생하기 이전에 영상에 등장하는 등장인물을 지정하거나, 특정 문장을 선택하여 자신이 학습하고자 하는 학습 구간들을 설정하는 단계를 도시하지 않았으나, S201 단계 또는 S203 단계 이후에 학습 구간을 설정하는 단계가 포함될 수 있다.On the other hand, Figure 2 does not show the step of designating a character appearing in the video or setting the learning sections to be learned by selecting a specific sentence before playing the video, but learning after step S201 or step S203 A step of setting a section may be included.

도 3a 내지 도 3f는 본 발명의 일 실시예에 따른 언어 학습 서비스 제공 방법을 기반으로 사용자가 학습을 수행함에 따라 변화하는 사용자 단말의 디스플레이를 나타내는 도면이다.3A to 3F are diagrams illustrating a display of a user terminal that changes as a user performs learning based on a method of providing a language learning service according to an embodiment of the present invention.

사용자가 사용자 단말(130)을 통해 어플리케이션 또는 웹페이지를 기반으로 언어 학습을 실행하게 되면, 도 3a와 같이 그 디스플레이 상에는 사용자가 언어 학습을 위해 선택할 수 있는 영상 리스트(video list)가 표시된다. 비록, 도 3a에는 영상 리스트에 하나의 영상이 포함된 경우를 도시하였으나, 이는 일 실시예일 뿐, 복수개가 표시될 수 있다.When a user executes language learning based on an application or a web page through the user terminal 130 , a video list that the user can select for language learning is displayed on the display as shown in FIG. 3A . Although FIG. 3A illustrates a case in which one image is included in the image list, this is only an example and a plurality of images may be displayed.

사용자가 그 영상 리스트에 포함된 영상을 터치하여 선택하면, 도 3b와 같이 그 선택된 영상에 대응하는 자막 리스트(display sublist)가 사용자 단말(130)의 디스플레이 상에 표시된다. 이때, 자막 리스트에는 해당 영상에 대한 적어도 하나 이상의 자막세트가 포함될 수 있으며, 이때 각각의 자막세트는 서로 상이한 언어로 번역된 것일 수 있다. 여기서, 자막 세트는 영상 재생 시에 영상과 함께 디스플레이 상에 함께 표시되도록 하기 위한 것이다When the user touches and selects an image included in the image list, a subtitle list (display sublist) corresponding to the selected image is displayed on the display of the user terminal 130 as shown in FIG. 3B . In this case, the subtitle list may include at least one subtitle set for the corresponding image, and each subtitle set may be translated into different languages. Here, the subtitle set is to be displayed together with the image on the display when the image is reproduced.

이후, 도 3c와 같이 음성인식 또는 문자인식에 사용하고자 하는 자막세트들을 포함하는 자막 리스트가 사용자 단말(130)의 디스플레이 상에 표시되며, 사용자는 그 자막세트들 중 어느 하나를 터치하여 선택할 수 있다. 그 선택된 자막세트를 통해 사용자는 자신이 발화하여 입력한 음성데이터나, 터치 방식 등을 통해 입력한 문자데이터의 유사도(정확도) 여부를 평가받을 수 있다. Thereafter, as shown in FIG. 3C , a subtitle list including subtitle sets to be used for voice recognition or character recognition is displayed on the display of the user terminal 130, and the user can select one of the subtitle sets by touching one of the subtitle sets. . Through the selected subtitle set, the user can evaluate whether the similarity (accuracy) of voice data input by the user or text data input through a touch method or the like.

이와 같이, 사용자가 자막 세트에 대한 선택을 완료하면, 도 3d와 같이 자막 세트에 포함된 자막 데이터들 중 사용자가 학습하고자 하는 문장을 직접 선택함으로써 학습 구간을 설정할 수 있다. 구체적으로, 사용자는 각 타임라인에 해당하는 문장들 중 학습을 원하는 문장을 선택하는데, 해당 문장을 음성인식을 통해 학습하고자 하는 경우에는 해당 문장과 함께 표시되는 마이크 아이콘을 터치하여 선택하도록 구성할 수 있다. 한편, 도 3d에는 도시되지 않았으나, 문장과 함께 펜 아이콘을 더 표시하도록 하여 해당 문장을 문자인식을 통해 학습하고자 하는 경우에는, 그 펜 아이콘을 터치하여 선택하도록 구성할 수 있다. 물론, 음성인식과 문자인식 두 방식을 통해 모두 학습을 하고자 하는 경우에는 두 아이콘을 모두 선택하도록 할 수 있다.In this way, when the user completes the selection of the subtitle set, the learning section can be set by directly selecting the sentence the user wants to learn from among the subtitle data included in the subtitle set as shown in FIG. 3D . Specifically, the user selects a sentence he wants to learn from among the sentences corresponding to each timeline, and when he wants to learn the sentence through voice recognition, it can be configured to select by touching the microphone icon displayed along with the sentence. have. Meanwhile, although not shown in FIG. 3D , when a pen icon is further displayed along with a sentence to learn a corresponding sentence through character recognition, the pen icon may be selected to be selected. Of course, if you want to learn through both voice recognition and character recognition, you can select both icons.

사용자가 학습 구간을 모두 설정한 후 영상을 재생하면, 설정된 학습구간에 따라 음성인식 또는 문자인식을 통한 학습이 수행될 수 있다. 즉, 해당 학습 구간에 대해 음성인식을 통해 학습하도록 설정한 경우에는, 도 3e와 같이 선택한 문장에 대응하는 타임라인에 음성인식을 위한 API를 호출하여 사용자로부터 발화되는 음성을 인식하도록 할 수 있다. 해당 학습 구간에 대해 문자인식을 통해 학습하도록 설정한 경우에는, 도 3f와 같이 선택한 문장에 대응하는 타임라인에 문자인식을 위한 API를 호출하여 사용자로부터 터치를 통해 입력되는 문자를 인식하도록 할 수 있다.When the user plays an image after setting all the learning sections, learning through voice recognition or character recognition may be performed according to the set learning section. That is, when it is set to learn through voice recognition for the corresponding learning section, the voice uttered by the user can be recognized by calling the API for voice recognition on the timeline corresponding to the selected sentence as shown in FIG. 3E. When it is set to learn through character recognition for the corresponding learning section, the API for character recognition is called on the timeline corresponding to the selected sentence as shown in FIG. 3f to recognize characters input through touch from the user. .

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of the method or algorithm described in relation to the embodiment of the present invention may be implemented directly in hardware, implemented as a software module executed by hardware, or implemented by a combination thereof. A software module may include random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. As mentioned above, although embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can realize that the present invention can be embodied in other specific forms without changing its technical spirit or essential features. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100 : 언어 학습 서비스 제공 시스템 110 : 서비스 서버 130: 사용자 단말100: language learning service provision system 110: service server 130: user terminal

Claims

a service server providing image data and caption data corresponding to the image data; And When an image is reproduced based on the image data, at least one of voice data and text data input by the user is recognized and evaluated in a preset section based on the corresponding subtitle data, and the image is continued according to the evaluation result A system for providing a language learning service based on artificial intelligence, characterized in that it comprises a user terminal for reproducing or performing repetitive learning.

The artificial intelligence of claim 1, wherein the user terminal calls at least one of an API for voice recognition and an API for character recognition in a learning section set based on the subtitle data when playing an image. based language learning service provision system.