KR102552857B1

KR102552857B1 - Subtitle processing method for language education and apparatus thereof

Info

Publication number: KR102552857B1
Application number: KR1020180055306A
Authority: KR
Inventors: 남정화
Original assignee: (주)우리랑코리아
Priority date: 2018-05-15
Filing date: 2018-05-15
Publication date: 2023-07-10
Also published as: KR20190130774A

Abstract

본 발명은 언어 교육을 위한 영상의 자막 처리 방법 및 장치에 관한 것으로서, 언어 교육을 위한 영상의 자막 처리 방법은 동영상의 음성을 수신하고 인식하여 텍스트로 수신하는 단계; 텍스트 중 적어도 일부에 대해서 데이터베이스를 참조하여 단어의 난이도 또는 문법적 요소를 결정하는 단계; 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 적용하는 단계; 및 적용된 텍스트 중 적어도 일부를 동영상 상에 자막으로 출력하는 단계를 포함한다. 이에, 사용자에게 만족도 높은 자막 서비스를 제공할 수 있다.The present invention relates to a method and apparatus for processing captions of images for language education. The method for processing captions of images for language education includes the steps of receiving and recognizing voices of a video and receiving them as text; determining difficulty or grammatical elements of words by referring to a database for at least a portion of the text; applying a difficulty or grammatical element of a word to at least some of the texts to be distinguished from other texts; and outputting at least a part of the applied text as a subtitle on the video. Accordingly, it is possible to provide a caption service with high satisfaction to the user.

Description

Subtitle processing method and apparatus for video for language education {SUBTITLE PROCESSING METHOD FOR LANGUAGE EDUCATION AND APPARATUS THEREOF}

본 발명은 언어 교육을 위한 영상의 자막 처리 방법 및 장치에 관한 것으로서, 보다 상세하게는 음성 인식을 기반으로 동영상의 음성을 텍스트로 수신하고, 단어의 난이도 및 문법적 요소를 결정하여 다른 텍스트와 구별되도록 적용함으로써, 사용자의 언어 학습 효율을 증대시키는, 언어 교육을 위한 영상의 자막 처리 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for processing video subtitles for language education, and more particularly, based on speech recognition, audio of a video is received as text, and difficulty and grammatical elements of words are determined to distinguish them from other texts. The present invention relates to a method and apparatus for processing subtitles for a video for language education, thereby increasing the efficiency of a user's language learning.

최근 들어, 한국어 대한 외국인들의 관심도가 크게 증가하고 있다. 이에, 한국어를 학습하고자 하는 외국인들은 드라마, 가요, 예능 프로그램 등 다양한 콘텐츠를 활용하는 경우가 대다수이다. 이때, 외국인들은 한국어를 자유롭게 구사하는 것을 최종적인 목표로 하며, 이를 위해 아주 기초적인 문자, 발음 등에서부터 언어 학습을 시작한다. 뿐만 아니라, 언어를 학습하기 위해, 학원 강의, 어학 연수, 방송 청취 등 다양한 방법을 사용하지만, 학원 강의나 어학 연수 등은 비용이 비용과 시간의 제약이 있다. 그렇기 때문에, 한국어를 학습하고자 하는 사람들은 비용과 시간이 많이 들지 않으면서도 언어를 학습하는데 효과적인 방법을 찾기 위해 노력하며, 그러한 방법 중에는 영상을 통한 언어 학습 방법이 있다.Recently, foreigners' interest in the Korean language has greatly increased. Therefore, most foreigners who want to learn Korean use various contents such as dramas, songs, and entertainment programs. At this time, foreigners aim to speak Korean freely, and for this, they start language learning from very basic characters and pronunciation. In addition, various methods are used to learn a language, such as lectures at private institutes, language training, and listening to broadcasts. However, lectures at private institutes and language training have cost and time constraints. Therefore, people who want to learn Korean try to find an effective way to learn the language that does not require a lot of cost and time, and among such methods, there is a language learning method through video.

영상을 통한 언어 학습 방법은 학원 강의나 어학 연수 등의 언어 학습 방법과 비교하면, 비용이나 시간적인 측면에서 제약이 크지 않다는 장점이 있다.Compared to language learning methods such as private institute lectures or language training, the video-based language learning method has the advantage of not having significant limitations in terms of cost and time.

다만, 드라마 및 가요 등과 같은 영상에는 미리 저장된 자막 정보를 불러와 자막을 통한 학습이 가능하지만, 미리 저장된 자막 정보가 존재하지 않는 다른 일부 콘텐츠 들은 대충 이해하는 정도에 머무를 수 밖에 없는 단점이 있다.However, although it is possible to learn through subtitles by loading pre-stored subtitle information for videos such as dramas and songs, there is a disadvantage in that some other contents that do not have pre-stored subtitle information can only be roughly understood.

뿐만 아니라, 미리 저장된 자막 정보가 있는 동영상 콘텐츠라고 하더라도 파생어, 변형어, 합성어 등에 대한 변형예를 외국인들이 단번에 이해하기란 쉽지 않다. 예를 들어, -해요, -했어요, -했었어요 가 '-하다'에서 파생된 것과 같이, 어미 변화의 헷갈림 정도를 완화할 필요성이 있다. In addition, it is not easy for foreigners to understand variations of derived words, transformed words, compound words, etc. at once, even for video content with pre-stored subtitle information. For example, there is a need to alleviate the degree of confusion in changing endings, such as -do, -did, -did have been derived from '-had'.

이와 같이 동영상 콘텐츠를 통해 학습하는 학습자들이 영상을 통해 언어 학습 효율을 증대시키기 위해서는 음성인식을 통한 자막 인식 기술을 통해 모든 동영상 콘텐츠에 대한 자막 제공이 가능해야 한다.In this way, in order for learners learning through video contents to increase language learning efficiency through video, it is necessary to provide subtitles for all video contents through subtitle recognition technology through voice recognition.

본 발명이 해결하고자 하는 과제는 상술한 바와 같은 문제점을 해결하기 위해 언어 교육을 위한 영상의 자막 처리 방법 및 장치를 제공하는 것이다.An object of the present invention is to provide a method and apparatus for processing subtitles for language education in order to solve the above problems.

구체적으로, 본 발명이 해결하고자 하는 과제는 사용자에게 동영상의 음성을 자동으로 인식하고 인식된 음성을 텍스트로 변환함으로써 자막 정보를 출력하는 언어 교육을 위한 영상의 자막 처리 방법 및 장치를 제공하는 것이다.Specifically, an object to be solved by the present invention is to provide a video caption processing method and apparatus for language education that outputs caption information by automatically recognizing audio of a video and converting the recognized voice into text to a user.

본 발명이 해결하고자 하는 또 다른 과제는 출력된 텍스트 중 단어의 난이도 및 문법적 요소를 자동으로 인식하는 영상의 자막 인식 기술을 활용하여 사용자의 언어 학습 효율을 증대시킬 수 있는 언어 교육을 위한 영상의 자막 처리 방법 및 장치를 제공하는 것이다.Another problem to be solved by the present invention is video subtitles for language education that can increase the user's language learning efficiency by utilizing video subtitle recognition technology that automatically recognizes the difficulty and grammatical elements of words in output text. It is to provide a processing method and apparatus.

본 발명의 과제들은 이상에서 언급한 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The tasks of the present invention are not limited to the tasks mentioned above, and other tasks not mentioned will be clearly understood by those skilled in the art from the following description.

전술한 바와 같은 과제를 해결하기 위하여 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 방법은 동영상의 음성을 수신하고 인식하여 텍스트로 수신하는 단계; 텍스트 중 적어도 일부에 대해서 데이터베이스를 참조하여 단어의 난이도 또는 문법적 요소를 결정하는 단계; 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 적용하는 단계; 및 적용된 텍스트 중 적어도 일부를 동영상 상에 자막으로 출력하는 단계를 포함한다. 이에, 사용자에게 만족도 높은 자막 서비스를 제공할 수 있다.In order to solve the above problems, a caption processing method for a video for language education according to an embodiment of the present invention includes the steps of receiving and recognizing the audio of a video and receiving it as text; determining difficulty or grammatical elements of words by referring to a database for at least a portion of the text; applying a difficulty or grammatical element of a word to at least some of the texts to be distinguished from other texts; and outputting at least a part of the applied text as a subtitle on the video. Accordingly, it is possible to provide a caption service with high satisfaction to the user.

본 발명의 다른 특징에 따르면, 동영상은 사용자가 학습하고자 하는 주제에 대한 난이도, 분류, 단어 및 문법 중 적어도 하나에 대한 사용자의 입력에 대응하여 출력될 수 있다.According to another feature of the present invention, a video may be output in response to a user's input on at least one of difficulty, classification, vocabulary, and grammar for a subject the user wants to learn.

본 발명의 또 다른 특징에 따르면, 파생어, 유의어, 합성어, 복합어 및 단일어 중 적어도 어느 하나의 형태로 이루어진 단어에 대한 변형어를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 결정하는 단계를 더 포함할 수 있다.According to another feature of the present invention, the step of determining a modified word for a word consisting of at least any one of a derivative word, a synonym, a compound word, a compound word, and a single word to be distinguished from other texts in at least a part of the text may further include. there is.

본 발명의 또 다른 특징에 따르면, 문법적 요소는 조사, 어미 및 접사 중 적어도 어느 하나의 형태로 이루어진 텍스트를 기초로 문법적 요소를 결정하는 단계를 포함할 수 있다.According to another feature of the present invention, the grammatical element may include determining the grammatical element based on a text formed in the form of at least one of a particle, an ending, and an affix.

본 발명의 또 다른 특징에 따르면, 텍스트 중 단어의 난이도 또는 문법적 요소를 갖는 텍스트에 대응하는 사전적 내용을 자막과는 별도로 출력하는 단계를 더 포함할 수 있다.According to another feature of the present invention, the method may further include outputting, separately from subtitles, dictionary contents corresponding to texts having difficulty or grammatical elements of words among texts.

본 발명의 또 다른 특징에 따르면, 조사는 주격, 목적격, 관형격, 부사격 및 서술격 중 적어도 하나의 변형예를 더 포함하며, 어미는 평서형, 의문형, 명령형, 청유형, 감탄형, 부정형, 높임, 시제, 양태, 피동 및 사동의 변형예 중 적어도 하나의 변형예를 더 포함할 수 있다.According to another feature of the present invention, the postpositional case further includes at least one modification of the nominative case, objective case, tubular case, adverbial case, and descriptive case, and the ending is a declarative type, interrogative type, imperative type, clarification type, exclamation type, negative type, exalted, tense, aspect , may further include at least one of the passive and passive variants.

본 발명의 또 다른 특징에 따르면, 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 하는 단계는,According to another feature of the present invention, the step of distinguishing the difficulty or grammatical elements of words from other texts in at least some of the text,

단어의 난이도 또는 문법적 요소를 포함하는 텍스트에 대한 속성을 변경시키는 단계를 포함할 수 있다. A step of changing properties of text including difficulty or grammatical elements of words may be included.

본 발명의 또 다른 특징에 따르면, 제1 영역에 대한 사전 정보를 선택적으로 출력하는 단계를 더 포함할 수 있다.According to another feature of the present invention, a step of selectively outputting prior information on the first region may be further included.

본 발명의 또 다른 특징에 따르면, 텍스트에 대한 속성을 변경시키는 단계는, 단어의 난이도를 포함하는 텍스트에 제1 속성을 적용시키는 단계; 및 문법적 요소를 포함하는 텍스트에 제2 속성을 적용시키는 단계를 포함할 수 있다.According to another feature of the present invention, the changing of the property of the text may include applying a first property to the text including the difficulty level of the word; and applying the second property to text including the grammatical element.

본 발명의 또 다른 특징에 따르면, 제1 속성 및 제2 속성은 텍스트에 대한 크기, 색상, 모양 및 하이라이트 중 적어도 어느 하나의 속성으로 구별되도록 적용할 수 있다.According to another feature of the present invention, the first property and the second property may be applied to text to be distinguished by at least one of size, color, shape, and highlight.

본 발명의 또 다른 특징에 따르면, 제1 속성 및 제2 속성이 동시에 변경되는 경우, 제1 속성 및 제2 속성에는 속성 중 서로 상이한 속성을 적용시킬 수 있다.According to another feature of the present invention, when the first property and the second property are simultaneously changed, different properties may be applied to the first property and the second property.

본 발명의 또 다른 특징에 따르면, 텍스트 중 적어도 일부를 동영상 상에 자막으로 출력하는 단계는,According to another feature of the present invention, the step of outputting at least a part of the text as a subtitle on a moving picture,

동영상에 포함된 적어도 한 명의 화자에 대한 음성에 대응하여 화자별로 자막이 구별되도록 표시하여 출력하는 단계를 포함할 수 있다. The method may include displaying and outputting subtitles to be distinguished for each speaker in response to audio of at least one speaker included in the video.

본 발명의 또 다른 특징에 따르면, 동영상에 지정된 대화 상대를 선택적으로 결정하고 결정된 대화 상대와 사용자의 영상을 동시에 출력하는 단계를 더 포함할 수 있다.According to another feature of the present invention, the method may further include selectively determining a conversation partner specified in the video and simultaneously outputting images of the determined conversation partner and the user.

본 발명의 또 다른 특징에 따르면, 대화 상대의 영상 및 사용자의 영상의 음성을 각각 인식하여 영상 상에 자막을 출력하는 단계를 더 포함할 수 있다.According to another feature of the present invention, the method may further include recognizing the audio of the conversation partner's video and the user's video, respectively, and outputting a caption on the video.

본 발명의 또 다른 특징에 따르면, 동영상에 대한 시청이 종료된 경우, 자막으로 출력된 텍스트 중 단어의 난이도 또는 문법적 요소에 대응되는 문제를 출제하는 단계를 더 포함할 수 있다.According to another feature of the present invention, when the viewing of the video is finished, the step of setting a question corresponding to the difficulty or grammatical elements of words in the text output as subtitles may be further included.

본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치는, 동영상의 음성을 수신하고 인식하여 텍스트로 수신하는 수신부; 단어의 난이도를 가진 텍스트 및 문법적 요소를 가진 텍스트가 저장된 데이터베이스; 수신된 텍스트 중 적어도 일부에 대해서 데이터베이스를 참조하여 단어의 난이도 또는 문법적 요소를 결정하는 결정부; 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 적용하고, 적용된 텍스트 중 적어도 일부를 동영상 상에 자막으로 출력하는 출력부를 포함할 수 있다.An apparatus for processing video captions for language education according to an embodiment of the present invention includes a receiving unit that receives and recognizes the audio of a video and receives it as text; A database in which texts with word difficulty and texts with grammatical elements are stored; a determination unit for determining a difficulty level or a grammatical element of a word by referring to a database for at least a portion of the received text; It may include an output unit that applies the difficulty or grammatical elements of words to at least a portion of text to be distinguished from other text, and outputs at least a portion of the applied text as a subtitle on a video.

기타 실시예의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other embodiment specifics are included in the detailed description and drawings.

본 발명은 사용자에게 동영상의 음성 인식을 기반으로 자막 정보를 제공함으로써, 사용자에게 만족도 높은 자막 서비스를 제공할 수 있다.According to the present invention, a caption service with high satisfaction can be provided to a user by providing caption information to the user based on voice recognition of a video.

또한, 본 발명은 자막 정보 내에서 단어 및 문법적 요소를 자동으로 인식하는 영상의 자막 인식 기술을 활용하여 사용자의 언어 학습 효율을 증대시킬 수 있다.In addition, the present invention can increase the user's language learning efficiency by utilizing a video caption recognition technology that automatically recognizes words and grammatical elements in caption information.

또한, 본 발명은 음성 인식을 기반으로 한 자막 정보가 포함된 동영상에 대한 학습이 완료된 후 자막 정보에 대응되는 복수의 문제 형태를 갖는 문제들을 제공함으로써, 사용자의 동영상에 대한 이해도를 파악할 수 있다.In addition, the present invention can grasp the user's understanding of the video by providing problems having a plurality of problem types corresponding to the caption information after learning of the video including the caption information based on voice recognition is completed.

본 발명에 따른 효과는 이상에서 예시된 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 명세서 내에 포함되어 있다.Effects according to the present invention are not limited by the contents exemplified above, and more various effects are included in the present specification.

도 1은 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 시스템을 설명하기 위한 개략도이다.
도 2는 본 발명의 일 실시예에 따른 사용자 장치를 설명하기 위한 개략도이다.
도 3은 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 방법을 설명하기 위한 개략적인 순서도이다.
도 4a는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치의 검색 카테고리를 통해 사용자가 학습하고자 하는 문법 및/또는 주제를 수신하여 동영상을 표시하는 예시적인 출력 화면을 도시한 도면이다.
도 4b는 본 발명의 일 실시예에 따른 사용자 장치로부터 사용자가 학습하고자 하는 문법 및/또는 주제에 대응하는 동영상 출력창과 해당 동영상에 대한 대화 상대 정보 제공창을 출력하는 예시적인 출력 화면을 도시한 도면이다.
도 4c는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 예문 메뉴에 대응하는 예문창을 출력하는 예시적인 출력 화면을 도시한 도면이다.
도 4d는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치의 대화신청 메뉴에 대응하여 대화 상대 영상 및 사용자 영상이 동시에 출력되는 예시적인 출력 화면을 도시한 도면이다.
도 4e는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치의 주제 검색 방법을 설명하기 위한 도면이다.1 is a schematic diagram for explaining a video caption processing system for language education according to an embodiment of the present invention.
2 is a schematic diagram illustrating a user device according to an embodiment of the present invention.
3 is a schematic flowchart illustrating a method of processing captions of images for language education according to an embodiment of the present invention.
4A is a diagram illustrating an exemplary output screen for displaying a video by receiving a grammar and/or topic that a user wants to learn through a search category of a video caption processing apparatus for language education according to an embodiment of the present invention. am.
4B is a diagram illustrating an exemplary output screen for outputting a video output window corresponding to a grammar and/or topic that a user wants to learn and a chat partner information providing window for the corresponding video from a user device according to an embodiment of the present invention. am.
4C is a diagram showing an exemplary output screen for outputting an example sentence window corresponding to an example sentence menu of an image for language education according to an embodiment of the present invention.
4D is a diagram illustrating an exemplary output screen on which a conversation partner video and a user video are simultaneously output in response to a chat request menu of a video caption processing apparatus for language education according to an embodiment of the present invention.
4E is a diagram for explaining a subject search method of a video caption processing apparatus for language education according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, only these embodiments make the disclosure of the present invention complete, and common knowledge in the art to which the present invention belongs. It is provided to fully inform the holder of the scope of the invention, and the present invention is only defined by the scope of the claims.

또한 제1, 제2 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.In addition, although first, second, etc. are used to describe various components, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first element mentioned below may also be the second element within the technical spirit of the present invention.

명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Like reference numbers designate like elements throughout the specification.

본 발명의 여러 실시예들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하며, 당업자가 충분히 이해할 수 있듯이 기술적으로 다양한 연동 및 구동이 가능하며, 각 실시예들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관 관계로 함께 실시 가능할 수도 있다.Each feature of the various embodiments of the present invention can be partially or entirely combined or combined with each other, and as those skilled in the art can fully understand, various interlocking and driving operations are possible, and each embodiment can be implemented independently of each other. It may be possible to implement together in an association relationship.

이하, 첨부된 도면을 참조하여 본 발명의 다양한 실시예들을 상세히 설명한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 시스템을 설명하기 위한 개략도이다.1 is a schematic diagram for explaining a video caption processing system for language education according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 시스템은 서비스 제공 장치(100) 및 적어도 하나의 사용자 장치(200)를 포함한다. 여기서, 언어 교육을 위한 영상의 자막 처리 시스템은 동영상의 음성을 수신하고 인식하여 텍스트로 수신하고, 텍스트 중 적어도 일부에 대해서 데이터베이스를 참조하여 단어의 난이도 또는 문법적 요소를 결정하고, 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 적용하고, 적용된 텍스트 중 적어도 일부를 동영상 상에 자막으로 출력하는 시스템이다.Referring to FIG. 1 , a caption processing system for a video for language education according to an embodiment of the present invention includes a service providing device 100 and at least one user device 200 . Here, the video caption processing system for language education receives and recognizes the audio of the video and receives it as text, determines the difficulty or grammatical elements of words by referring to a database for at least a part of the text, and determines the difficulty or grammatical elements of words. A system that applies elements to at least a portion of text to be distinguished from other text, and outputs at least a portion of the applied text as a subtitle on a video.

서비스 제공 장치(100)는 적어도 하나의 사용자 장치(200)로부터 학습하고자 하는 문법 및/또는 주제를 수신하고, 수신된 문법 및/또는 주제에 대응되는 복수의 동영상 콘텐츠(403)를 저장하고 있는 데이터베이스(202)로부터 적어도 하나의 동영상을 수신하고, 복수의 동영상 콘텐츠(403)에 중 수신된 동영상의 음성을 수신하고 인식하여 텍스트를 적어도 하나의 사용자 장치(200)로 송신할 수 있다. The service providing device 100 receives a grammar and/or topic to be learned from at least one user device 200, and a database that stores a plurality of video contents 403 corresponding to the received grammar and/or topic. At least one video may be received from 202 , audio of the received video among the plurality of video contents 403 may be received and recognized, and text may be transmitted to at least one user device 200 .

여기서, 학습하고자 하는 문법 및/또는 주제는 사용자가 학습하고자 하는 주제에 대한 레벨, 분류, 문법 및 주제 중 적어도 하나에 대한 선택을 포함할 수 있다.Here, the grammar and/or subject to be learned may include selection of at least one of a level, classification, grammar, and subject for the subject to be learned by the user.

또한, 복수의 동영상 콘텐츠(403)는 언어 교육을 위한 영상의 자막 처리 시스템에서 제공하는 여행, 음식, K-문화, 패션, 뷰티, 시사, 강연, 비즈니스 등을 포함하는 콘텐츠일 수 있다. In addition, the plurality of video contents 403 may be contents including travel, food, K-culture, fashion, beauty, current affairs, lectures, business, etc. provided by a video caption processing system for language education.

또한, 서비스 제공 장치(100) 및 적어도 하나의 사용자 장치(200)는 서로 동일한 네트워크 내에 존재하거나 통신으로 연결되어 본 발명의 언어 교육을 위한 영상의 자막 처리 방법 및 이를 이용한 장치를 수행할 수 있도록 구성된다. 서비스 제공 장치(100)의 구체적인 구성 및 기능에 대해서는 도 2를 참조하여 후술하도록 한다.In addition, the service providing device 100 and at least one user device 200 exist in the same network or are connected to each other through communication so that they can perform the caption processing method of video for language education and the device using the same according to the present invention. do. A detailed configuration and function of the service providing device 100 will be described later with reference to FIG. 2 .

적어도 하나의 사용자 장치(200)는 사용자에게 언어 교육을 위한 영상의 자막 서비스를 제공하기 위한 사용자 인터페이스를 제공하는 장치로서, 사용자의 컴퓨터일 수 있고, 또는 사용자의 휴대용 단말일 수도 있다. 도 1에서는 적어도 하나의 사용자 장치(200)가 스마트폰 또는 컴퓨터인 것으로 도시하고 있지만, 본 발명의 사상은 이에 제한되지 않는다. 적어도 하나의 사용자 장치(200)는 상술한 바와 같이 언어 교육을 위한 영상의 자막 서비스를 제공하기 위한 어플리케이션 또는 프로그램이 탑재될 수 있는 것이라면 제한 없이 채용될 수 있다. 예를 들어, 스마트폰, 태블릿 PC, 웨어러블 장치 등일 수 있다. At least one user device 200 is a device that provides a user interface for providing a video caption service for language education to a user, and may be a user's computer or a user's portable terminal. Although FIG. 1 shows that at least one user device 200 is a smart phone or a computer, the scope of the present invention is not limited thereto. As described above, at least one user device 200 may be employed without limitation as long as an application or program for providing a video caption service for language education can be loaded. For example, it may be a smart phone, a tablet PC, or a wearable device.

적어도 하나의 사용자 장치(200)에 탑재된 해당 어플리케이션 또는 프로그램은 적어도 하나의 사용자 장치(200)와 서비스 제공 장치(100)를 서로 접속시킬 수 있다. A corresponding application or program loaded on at least one user device 200 may connect the at least one user device 200 and the service providing device 100 to each other.

사용자는 해당 어플리케이션 또는 프로그램을 통해서 온라인 상에서 사용자의 위치에 따른 적어도 하나의 동영상에 대응하여 음성인식을 기반으로 자막 정보를 제공받는 것을 기본으로 한다. 다만, 이에 제한되지 않고, 저장된 자막 정보를 제공받을 수 있으며, 사용자는 이렇게 제공받은 자막 정보에 대한 복수의 형태의 문제를 제공받을 수 있다. 다양한 실시예에서 해당 어플리케이션 또는 프로그램은 예를 들어, 사용자의 계정마다 온라인 선생님이 할당되고, 할당된 온라인 선생님과 학습한 자막 정보 또는 동영상에 대한 영상 채팅 서비스를 제공할 수 있다. 또한, 동영상에 대응하는 문제가 제공되고, 정답률에 따라 지급되는 보상 서비스를 제공할 수도 있다.Basically, the user is provided with subtitle information based on voice recognition corresponding to at least one video according to the user's location online through a corresponding application or program. However, the present invention is not limited thereto, and stored caption information may be provided, and the user may be provided with a plurality of types of problems for the caption information thus provided. In various embodiments, a corresponding application or program may assign, for example, an online teacher to each user's account, and provide a video chat service for subtitle information or videos learned with the assigned online teacher. In addition, a problem corresponding to a video may be provided and a reward service provided according to a correct answer rate may be provided.

구체적으로, 적어도 하나의 사용자 장치(200)는 화면을 표시하는 디스플레이 및 사용자로부터 데이터를 입력받는 입력 장치를 구비할 수도 있다.Specifically, at least one user device 200 may include a display displaying a screen and an input device receiving data from a user.

다양한 실시예에서 적어도 하나의 사용자 장치(200) 및 서비스 제공 장치(100)는 영상 채팅을 위한 카메라 모듈을 구비할 수 있다. 구체적으로, 적어도 하나의 사용자 장치(200)는 카메라 모듈을 통해 사용자의 영상 정보(예: 사용자 장치에 대한 영상 정보)를 획득하고, 획득된 영상 정보를 서비스 제공 장치(100)로 전달할 수 있다. In various embodiments, at least one user device 200 and the service providing device 100 may include a camera module for video chatting. Specifically, at least one user device 200 may obtain user image information (eg, image information about the user device) through a camera module and transmit the acquired image information to the service providing device 100 .

다양한 실시예에서 적어도 하나의 사용자 장치(200) 및 서비스 제공 장치(100)는 근거리 통신(예: 블루투스, BLE, Zigbee 또는 NFC 등)을 위한 통신 모듈을 구비할 수 있다. 구체적으로, 적어도 하나의 사용자 장치(200)는 근거리 통신을 통해 서비스 제공 장치(100)로부터 문제 정보, 자막 정보 또는 보상 서비스에 대한 정보 등을 수신하고, 서비스 제공 장치(100)로 문제 정보 요청, 문제의 정답/오답에 대한 메시지를 전달할 수 있다. In various embodiments, at least one user device 200 and the service providing device 100 may include a communication module for short-range communication (eg, Bluetooth, BLE, Zigbee, or NFC). Specifically, at least one user device 200 receives problem information, subtitle information, or information on a compensation service from the service providing device 100 through short-range communication, requests problem information to the service providing device 100, Messages about the correct/incorrect answer to the problem can be delivered.

도 1에서는 설명의 편의를 위해 언어 교육을 위한 서비스 제공 장치(100) 및 사용자 장치가 각각 1개인 것으로 도시되어 있지만, 이에 제한되지 않고, 서비스 제공 장치(100) 및 사용자 장치는 복수 개 구비될 수도 있고 서비스 제공 장치(100)가 적어도 하나의 사용자 장치(200)와 통신할 수 있다.In FIG. 1 , for convenience of description, each service providing device 100 for language education and one user device are illustrated, but the service providing device 100 and the user device are not limited thereto, and a plurality of service providing devices 100 and user devices may be provided. and the service providing device 100 may communicate with at least one user device 200 .

이하에서는, 언어 교육을 위한 영상의 자막 처리 시스템에서의 서비스 제공 장치(100)의 구성에 대하여 설명하기로 한다. 보다 상세한 설명을 위하여 도 2를 함께 참조한다.Hereinafter, the configuration of the service providing apparatus 100 in the video caption processing system for language education will be described. For a more detailed description, see FIG. 2 together.

도 2는 본 발명의 일 실시예에 따른 사용자 장치를 설명하기 위한 개략도이다. 2 is a schematic diagram illustrating a user device according to an embodiment of the present invention.

도 2를 참조하면, 사용자 장치는 수신부(201), 데이터베이스(202), 결정부(203) 및 출력부(204)를 포함한다.Referring to FIG. 2 , the user device includes a receiving unit 201 , a database 202 , a determining unit 203 and an output unit 204 .

수신부(201)는 사용자 장치가 외부 장치와 통신 가능하도록 연결한다. 예를 들어, 수신부(201)는 유/무선 네트워크를 통해 서비스 제공 장치(100)와 연결되고, 사용자의 영상 정보를 서비스 제공 장치(100)로 전달하고, 서비스 제공 장치(100)로부터 학습하고자 하는 문법 및/또는 주제를 수신하고, 수신된 문법 및/또는 주제에 대응되는 복수의 동영상 콘텐츠(403)를 저장하고 있는 데이터베이스(202)로부터 적어도 하나의 동영상을 수신하고, 동영상 콘텐츠(403) 중 수신된 동영상의 음성을 수신하고 인식하여 텍스트를 수신할 수 있다. 다양한 실시예에서 수신부(201)는 셀룰러 통신 또는 와이파이(WiFi), 블루투스, NFC(near field communication) 등과 같은 근거리 무선 통신 또는 GPS(Global Positioning System)를 포함하는 무선 통신을 이용할 수 있다.The receiving unit 201 connects a user device to communicate with an external device. For example, the receiving unit 201 is connected to the service providing device 100 through a wired/wireless network, transmits image information of the user to the service providing device 100, and receives data to learn from the service providing device 100. At least one video is received from the database 202 storing a plurality of video contents 403 corresponding to the received grammar and/or subject, and one of the video contents 403 is received. Text can be received by receiving and recognizing the audio of the video. In various embodiments, the receiver 201 may use cellular communication, short-range wireless communication such as WiFi, Bluetooth, near field communication (NFC), or wireless communication including a Global Positioning System (GPS).

데이터베이스(202)는 언어 교육을 위한 영상의 자막 서비스에 대한 다양한 데이터를 저장할 수 있다. 예를 들어, 데이터베이스(202)는 언어 교육을 위한 영상의 자막 서비스를 제공하기 위한 어플리케이션 또는 프로그램 등을 저장하거나 수신부(201)를 통해서 서비스 제공 장치(100)로부터 수신된 학습하고자 하는 문법 및/또는 주제, 수신된 문법 및/또는 주제에 대응되는 복수의 동영상 콘텐츠(403), 및 복수의 동영상 콘텐츠(403) 중 동영상의 음성을 수신하고 인식하여 수신된 텍스트(즉, 자막 정보)를 저장할 수 있다.The database 202 may store various data about subtitle services of images for language education. For example, the database 202 stores an application or program for providing a caption service for a video for language education, or a grammar to be learned received from the service providing device 100 through the receiving unit 201 and/or A plurality of video contents 403 corresponding to the subject, received grammar and/or subject, and audio of a video among the plurality of video contents 403 may be received and recognized, and the received text (ie, subtitle information) may be stored. .

결정부(203)는 수신된 텍스트 중 적어도 일부에 대해서 데이터베이스(202)를 참조하여 단어의 난이도 또는 문법적 요소를 결정할 수 있다. 구체적으로, 데이터베이스(202)에 저장된 난이도별 단어와 단어에 대한 의미와 단어의 다양한 변형예 등을 참조하여 단어의 난이도를 결정할 수 있다. 또한, 데이터베이스(202)에 저장된 문법적 요소의 형태와 문법적 요소의 의미 등을 참조하여 문법적 요소를 결정할 수 있다.The determination unit 203 may determine the level of difficulty or grammatical elements of words by referring to the database 202 for at least a portion of the received text. Specifically, the difficulty level of the word may be determined by referring to words for each level of difficulty stored in the database 202, meanings of the words, and various variations of the words. In addition, the grammatical element may be determined by referring to the shape of the grammatical element and the meaning of the grammatical element stored in the database 202 .

출력부(204)는 사용자에게 각종 콘텐츠(예: 텍스트, 이미지, 비디오, 아이콘, 배너 또는 심볼 등)을 표시할 수 있다. 구체적으로, 출력부(204)는 사용자에 요청에 의해서 실행된 언어 교육을 위한 영상의 자막 서비스를 제공하기 위한 어플리케이션 또는 프로그램 등의 실행 화면을 표시할 수 있다. 출력부(203)는 수신부(201)를 통해 서비스 제공 장치(100)로부터 수신된 학습하고자 하는 문법 및/또는 주제를 입력하기 위한 사용자 인터페이스를 표시하거나, 학습하고자 하는 문법 및/또는 주제에 대응하는 동영상과 해당 동영상의 음성을 수신하고 인식하여 텍스트를 표시할 수 있다. 다양한 실시예에서 출력부(203)는 터치스크린을 포함할 수 있으며, 예를 들면, 전자 펜 또는 사용자의 신체의 일부를 이용한 터치(touch), 제스처(gesture), 근접, 드래그(drag), 스와이프(swipe) 또는 호버링(hovering) 입력 등을 수신할 수 있다.The output unit 204 may display various contents (eg, text, image, video, icon, banner, or symbol) to the user. Specifically, the output unit 204 may display an execution screen such as an application or program for providing a subtitle service of an image for language education executed at the user's request. The output unit 203 displays a user interface for inputting the grammar and/or subject to be learned received from the service providing device 100 through the receiver 201, or responds to the grammar and/or subject to be learned. Text can be displayed by receiving and recognizing a video and a voice of the video. In various embodiments, the output unit 203 may include a touch screen, and for example, a touch using an electronic pen or a part of the user's body, a gesture, proximity, drag, and A swipe or hovering input may be received.

도 2에 도시되지는 않았지만, 프로세서는 수신부(201), 데이터베이스(202) 및 출력부(204)와 동작 가능하게 연결될 수 있으며, 언어 교육을 위한 영상의 자막 서비스를 제공하기 위한 다양한 명령들을 수행할 수 있다. 이하에서는 언어 교육을 위한 영상의 자막 서비스를 제공하는 어플리케이션을 이용하여 동영상의 음성을 수신하고 인식하여 텍스트로 수신하고, 그에 따른 동영상 및 자막 정보를 표시하는 동작들에 대해서 구체적으로 설명하도록 한다.Although not shown in FIG. 2, the processor may be operatively connected to the receiving unit 201, the database 202, and the output unit 204, and may perform various commands for providing a subtitle service for language education. can Hereinafter, operations for receiving and recognizing video audio as text using an application providing a video caption service for language education, and displaying video and caption information according to the audio will be described in detail.

수신부(201)는 사용자로부터 학습하고자 하는 문법 및/또는 주제에 대한 입력을 제공받고, 상기 입력에 대응하여 복수의 동영상 콘텐츠(403) 중 적어도 하나의 동영상 및 동영상의 음성을 수신한다. 이후에, 수신된 동영상의 음성을 인식하여 텍스트로 수신한다. 본 명세서에서는 수신부(201)가 동영상을 수신하고 수신된 동영상의 음성을 수신하고 인식하여 텍스트로 수신하는 것으로 명시되어 있지만, 동영상만 수신할 수도 있다.The receiving unit 201 receives input from the user on grammar and/or subject to be learned, and receives at least one video and audio of the video among the plurality of video contents 403 in response to the input. Thereafter, the voice of the received video is recognized and received as text. In this specification, although it is specified that the receiving unit 201 receives a video, receives and recognizes the audio of the received video, and receives it as text, it may also receive only a video.

결정부(203)는 수신된 텍스트 중 적어도 일부에 대해서 데이터베이스(202)를 참조하여 단어의 난이도 또는 문법적 요소를 결정할 수 있다.The determination unit 203 may determine the level of difficulty or grammatical elements of words by referring to the database 202 for at least a portion of the received text.

출력부(204)는 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 적용하고, 적용된 텍스트 중 적어도 일부를 동영상 상에 자막으로 출력할 수 있다. 여기서, 자막은 사용자로부터 입력된 동영상 콘텐츠(403) 중 적어도 하나의 동영상에 매칭된 자막 정보를 의미하며, 사용자 장치(200)의 출력부(204)에 표시된다.The output unit 204 may apply the difficulty or grammatical elements of words to at least a portion of the text to be distinguished from other text, and output at least a portion of the applied text as a subtitle on the video. Here, the caption means caption information matched to at least one video among the video contents 403 input from the user, and is displayed on the output unit 204 of the user device 200 .

출력부(204)는 사용자의 어플리케이션 실행 요청에 따라 언어 교육을 위한 영상의 자막 서비스를 제공하는 어플리케이션의 실행 화면을 표시할 수 있다. 상기 실행 화면은 자막 정보를 제공하기 위해 동영상 내에서 발생하는 음성 정보를 자동으로 인식하고, 인식된 음성 정보들을 텍스트로 변환하여 출력된 자막 정보를 표시할 수 있다. 또한, 사용자로부터 입력되고 수신하는 검색 카테고리, 콘텐츠 대분류 메뉴, 콘텐츠 소분류 메뉴, 키워드 메뉴 등을 표시할 수 있다.The output unit 204 may display an execution screen of an application providing a subtitle service for a video for language education according to a user's application execution request. The execution screen may automatically recognize audio information generated in a video to provide subtitle information, convert the recognized audio information into text, and display the output subtitle information. In addition, a search category input and received from a user, a large content classification menu, a small content classification menu, a keyword menu, and the like may be displayed.

또한, 출력부(204)는 사용자 입력에 대응하는 동영상 콘텐츠 및 동영상 출력창이 표시될 수 있고, 해당 동영상에 대한 단어 메뉴(406), 문법 메뉴(407), 예문 메뉴(408), 질문 메뉴(409), 대화신청 메뉴(410) 등을 표시할 수 있다. In addition, the output unit 204 may display video content corresponding to a user input and a video output window, and a word menu 406, a grammar menu 407, an example menu 408, and a question menu 409 for the corresponding video. ), a chat request menu 410, and the like can be displayed.

상술한 음성 인식 시스템을 통해 자동으로 텍스트로 변환된 자막 정보는 단어의 난이도 또는 문법적 요소를 포함하는 텍스트를 다른 텍스트 즉, 단어의 난이도 또는 문법적 요소를 포함하고 있지 않은 텍스트와 구별되도록 할 수 있다.Subtitle information automatically converted into text through the above-described speech recognition system can distinguish text including a word difficulty or grammatical element from other text, that is, text not including a word difficulty or grammatical element.

이때, 단어의 난이도 또는 문법적 요소를 포함하는 텍스트는 단어의 난이도 또는 문법적 요소를 포함하고 있지 않은 다른 텍스트들과 구별되도록 속성을 변경시킬 수 있다. In this case, the text including the difficulty of words or grammatical elements may be changed in properties so as to be distinguished from other texts not including the difficulty of words or grammatical elements.

구체적으로, 단어의 난이도를 포함하는 텍스트에 제1 속성(404a', 404a'')을 적용시키고, 문법적 요소를 포함하는 텍스트에 제2 속성(404b)을 적용시킬 수 있다. 제1 속성(404a', 404a'') 및 제2 속성(404b)은 각각 밑줄 표시 또는 하이라이트 표시 또는 색깔 표시 중 어느 하나로 표시되는 표시될 수 있다. 후술할 도 4a에서는 제1 속성(404a' 404a'')이 각각 밑줄 표시 및 하이라이트 표시, 제2 속성(404b)이 붉은색 글씨로 표시되어 있지만, 이에 제한되지 않고 굵은 글자 표시 등 다양한 표식으로 표시될 수 있으며, 제1 속성(404a', 404a'')와 제2 속성(404b)에 표시되는 표식의 순서가 바뀔 수도 있다. Specifically, the first properties 404a' and 404a'' may be applied to text including the difficulty level of words, and the second property 404b may be applied to text including grammatical elements. The first properties 404a' and 404a'' and the second properties 404b may be displayed in any one of underlining, highlighting, and color marking, respectively. In FIG. 4A, which will be described later, the first properties 404a' and 404a'' are underlined and highlighted, and the second property 404b is displayed in red letters. Also, the order of the marks displayed in the first properties 404a' and 404a'' and the second properties 404b may be changed.

이하에서는, 앞서 도 1 및 도 2를 참조하여 설명한 본 발명의 일 실시예에 따른 적어도 하나의 사용자 장치(200)에 대한 설명에 기초하고, 도 3 내지 도 4e를 참조하여 언어 교육을 위한 영상의 자막 처리 방법에 대하여 설명하기로 한다.Hereinafter, based on the description of at least one user device 200 according to an embodiment of the present invention described above with reference to FIGS. 1 and 2, and with reference to FIGS. 3 to 4E, images for language education are provided. A caption processing method will be described.

도 3은 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 방법을 설명하기 위한 개략적인 순서도이다. 도 4a는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치의 검색 카테고리를 통해 사용자가 학습하고자 하는 문법 및/또는 주제를 수신하여 동영상을 표시하는 예시적인 출력 화면을 도시한 도면이다. 도 4b는 본 발명의 일 실시예에 따른 사용자 장치로부터 사용자가 학습하고자 하는 문법 및/또는 주제에 대응하는 동영상 출력창과 해당 동영상에 대한 대화 상대 정보 제공창을 출력하는 예시적인 출력 화면을 도시한 도면이다. 도 4c는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 예문 메뉴에 대응하는 예문창을 출력하는 예시적인 출력 화면을 도시한 도면이다. 도 4d는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치의 대화신청 메뉴에 대응하여 대화 상대 영상 및 사용자 영상이 동시에 출력되는 예시적인 출력 화면을 도시한 도면이다. 도 4e는 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치의 주제 검색 방법을 설명하기 위한 도면이다.3 is a schematic flowchart illustrating a method of processing captions of images for language education according to an embodiment of the present invention. 4A is a diagram illustrating an exemplary output screen for displaying a video by receiving a grammar and/or topic that a user wants to learn through a search category of a video caption processing apparatus for language education according to an embodiment of the present invention. am. 4B is a diagram illustrating an exemplary output screen for outputting a video output window corresponding to a grammar and/or topic that a user wants to learn and a chat partner information providing window for the corresponding video from a user device according to an embodiment of the present invention. am. 4C is a diagram showing an exemplary output screen for outputting an example sentence window corresponding to an example sentence menu of an image for language education according to an embodiment of the present invention. 4D is a diagram illustrating an exemplary output screen on which a conversation partner video and a user video are simultaneously output in response to a chat request menu of a video caption processing apparatus for language education according to an embodiment of the present invention. 4E is a diagram for explaining a subject search method of a video caption processing apparatus for language education according to an embodiment of the present invention.

먼저, 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치는 사용자 디바이스(사용자 장치, 200)로부터 동영상의 음성을 수신하고 인식하여 텍스트로 수신한다(S100).First, the apparatus for processing video captions for language education according to an embodiment of the present invention receives and recognizes the audio of a video from a user device (user device 200) and receives it as text (S100).

도 4a를 참조하면, 본 발명의 일 실시예에 적어도 하나의 사용자 장치(200)의 디스플레이(203)는 검색 카테고리(401) 및 콘텐츠 카테고리(402) 중 적어도 어느 하나의 방법으로 학습하고자 하는 동영상 콘텐츠(403)에 대응하는 복수의 동영상을 출력하여 표시할 수 있다.Referring to FIG. 4A , the display 203 of at least one user device 200 according to an embodiment of the present invention includes a video content to be learned using at least one of a search category 401 and a content category 402. A plurality of videos corresponding to (403) can be output and displayed.

도 4a를 참조하면, 검색 카테고리(401)는 콘텐츠 대분류 메뉴(401a), 콘텐츠 소분류 메뉴(401b) 및 키워드 메뉴(401c)를 통해 사용자에게 구현될 수 있다. 콘텐츠 대분류 메뉴(401a), 콘텐츠 소분류 메뉴(401b)를 통해 사용자가 학습하고자 하는 주제를 설정할 수 있다. 사용자는 키워드 메뉴(401c)를 통해 사용자가 학습하고자 하는 단어 또는 문법을 설정할 수 있다. 검색 카테고리(401)는 콘텐츠 대분류 메뉴(401a), 콘텐츠 소분류 메뉴(401b) 및 키워드 메뉴(401c) 중 적어도 어느 하나에 대한 입력을 수신할 수 있다.Referring to FIG. 4A , a search category 401 may be implemented to the user through a large content category menu 401a, a small content category menu 401b, and a keyword menu 401c. Through the main content classification menu 401a and the content subcategory menu 401b, the user can set a subject to be studied. The user may set a word or grammar that the user wants to learn through the keyword menu 401c. The search category 401 may receive an input for at least one of a large content classification menu 401a, a small content classification menu 401b, and a keyword menu 401c.

도 4a를 참조하면, 콘텐츠 카테고리(402)는 사용자가 학습하고자 하는 주제를 식별이 용이하도록 카테고리별로 분류한 것으로서, 사용자에 의해 구현될 수 있다. 예를 들어, 콘텐츠 카테고리(402)는 언어 교육을 위한 영상의 자막 서비스에서 제공하는 우리랑 패턴, 여행, 음식, K-문화, 패션, 뷰티, 시사, 강연, 비즈니스 등으로 대분류될 수 있다. 여기서, K-문화는 K-POP, K-DRAMA 및 기타로 소분류될 수 있다.Referring to FIG. 4A , a content category 402 is a category-by-category classification for easy identification of a subject the user wants to learn, and can be implemented by the user. For example, the content category 402 can be broadly classified into Urirang patterns, travel, food, K-culture, fashion, beauty, current events, lectures, and business provided by a video subtitle service for language education. Here, K-culture can be subclassed into K-POP, K-DRAMA, and others.

도 4a에서는 검색 카테고리(401)가 콘텐츠 대분류 메뉴(401a), 콘텐츠 소분류 메뉴(401b) 및 키워드 메뉴(401c)로 구성되는 것으로 도시되었으나, 도 4e에 도시된 바와 같이, 보다 세부적으로 검색할 수도 있다. 이에 대해, 도 4e를 참조하여 후술하기로 한다.In FIG. 4A, the search category 401 is illustrated as being composed of a large content classification menu 401a, a subcontent classification menu 401b, and a keyword menu 401c, but as shown in FIG. 4E, a more detailed search may be performed. . This will be described later with reference to FIG. 4E.

예를 들어, 사용자가 학습하고자 하는 주제로 비즈니스를 선택한 경우, 도 4a에 도시된 바와 같이, 비즈니스에 해당하는 복수의 동영상 콘텐츠(403)들이 출력부(204)에 표시될 수 있다. 각각의 동영상 콘텐츠(403)들은 동영상 콘텐츠(403)들의 레벨(또는 난이도), 재생 시간, 재생에 필요한 코인, 조회수, 관심도 등에 대한 정보를 표시할 수 있다.For example, when a user selects business as a subject to study, as shown in FIG. 4A , a plurality of video contents 403 corresponding to the business may be displayed on the output unit 204 . Each of the video contents 403 may display information about the level (or difficulty) of the video contents 403, playback time, coins required for playback, number of views, interest, and the like.

이하에서는, 도 4b 및 도 4c를 참조하여, 사용자가 복수의 동영상 콘텐츠(403) 중에서 난이도가 초급인 피자시장의 혁신에 대한 동영상을 학습하고자 선택한 경우에 대하여 구체적으로 설명하기로 한다.Hereinafter, with reference to FIGS. 4B and 4C , a case in which a user selects a video about innovation in the pizza market with a difficulty level among a plurality of video contents 403 will be described in detail.

이어서, 본 발명의 언어 교육을 위한 영상의 자막 처리 장치는 텍스트 중 적어도 일부에 대해서 데이터베이스(202)를 참조하여 단어의 난이도 또는 문법적 요소를 결정한다(S200). 이어서, 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치는 단어의 난이도 또는 문법적 요소를 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 적용한다(S300).Subsequently, the apparatus for processing captions for language education according to the present invention refers to the database 202 for at least a portion of text and determines the level of difficulty or grammatical elements of words (S200). Subsequently, the apparatus for processing captions of images for language education according to an embodiment of the present invention applies a level of difficulty or a grammatical element of a word to at least a part of text to be distinguished from other texts (S300).

도 4b를 참조하면, 상술한 바와 같은 검색 카테고리(401)를 통해 출력된 복수의 동영상 콘텐츠(403) 중 사용자의 입력에 대응하는 동영상 출력창(404)이 출력부(204)에 표시될 수 있다.Referring to FIG. 4B , among the plurality of video contents 403 output through the search category 401 as described above, a video output window 404 corresponding to a user's input may be displayed on the output unit 204. .

도 4b를 참조하면, 동영상 출력창(404)은 제1 속성(404a', 404a'') 및 제2 속성(404b)을 포함하는 텍스트를 표시할 수 있다. 여기서, 텍스트는 동영상에 대한 음성 정보를 자동으로 인식하여 음성 정보가 텍스트 정보로 자동 변환된 것을 의미하는 것으로서, 텍스트는 제1 속성(404a', 404a'')이 적용된 텍스트, 제2 속성(404b)이 텍스트 및 나머지 텍스트로 구성될 수 있다.Referring to FIG. 4B , the video output window 404 may display text including first properties 404a' and 404a'' and second properties 404b. Here, the text means that audio information for a video is automatically recognized and the audio information is automatically converted into text information. ) may be composed of this text and the rest of the text.

여기서, 제1 속성(404a', 404a'')은 단어의 난이도를 포함하는 텍스트에 적용되고, 제2 속성(404b)은 문법적 요소를 포함하는 텍스트에 적용되는 것을 특징으로 한다. 자막 정보를 구성하는 텍스트에 제1 속성(404a', 404a'') 및 제2 속성(404b) 중 어느 하나만 변경될 경우에는 텍스트에 대한 크기, 색상, 모양 및 하이라이트 중 적어도 어느 하나의 속성이 적용되지만, 자막 정보를 구성하는 텍스트에 제1 속성(404a', 404a'') 및 제2 속성(404b)이 동시에 변경되는 경우에는 우에는 제1 속성(404a', 404a'')과 제2 속성(404b)에 서로 다른 속성을 적용시켜 구별되도록 하는 것을 특징으로 한다.Here, the first properties 404a' and 404a'' are applied to text including the difficulty level of words, and the second property 404b is applied to text including grammatical elements. When only one of the first properties 404a' and 404a'' and the second property 404b is changed in the text constituting the subtitle information, at least one property of size, color, shape, and highlight is applied to the text. However, when the first properties 404a' and 404a'' and the second property 404b are simultaneously changed in the text constituting the subtitle information, the first properties 404a' and 404a'' and the second property It is characterized in that different attributes are applied to (404b) to distinguish them.

또한, 자막 정보는 동영상 출력창(404)의 영상 상에서 일정한 길이의 문장으로 구분되어 출력되는 것을 특징으로 한다.In addition, the subtitle information is characterized in that it is divided into sentences of a certain length and output on the image of the video output window 404 .

복수의 단어 형태는 파생어, 유의어, 합성어, 복합어 및 단일어 중 적어도 하나의 형태로 이루어진 단어에 대한 변형어를 포함할 수 있고, 해당 변형어는 상기 자막 정보를 구성하는 텍스트 중 적어도 일부에 다른 텍스트와 구별되도록 결정하는 단계를 포함할 수 있다.The plurality of word forms may include variants for words formed of at least one of a derivative word, a synonym, a compound word, a compound word, and a single word, and the variant word distinguishes at least some of the text constituting the subtitle information from other texts. It may include the step of deciding to be.

또한, 문법적 요소는 조사, 어미 및 접사 중 적어도 어느 하나의 형태로 이루어진 텍스트를 기초로 결정되는 것을 특징으로 한다. 여기서, 조사는 주격, 목적격, 관형격, 부사격 및 서술격 중 적어도 하나의 변형예를 더 포함할 수 있다.In addition, the grammatical element is characterized in that it is determined based on the text formed in the form of at least one of a particle, an ending, and an affix. Here, the modal case may further include at least one modification among the nominative case, the objective case, the tubular case, the adverbial case, and the descriptive case.

또한, 어미는 평서형, 의문형, 명령형, 청유형, 감탄형, 부정형, 높임, 시제, 양태, 피동 및 사동의 변형예 중 적어도 하나의 변형예를 더 포함할 수 있다.In addition, the ending may further include at least one modification among declarative, interrogative, imperative, inquiry, exclamatory, indefinite, exalted, tense, modal, passive, and passive variations.

또한, 본 발명의 다른 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치는 텍스트 중 단어의 난이도 또는 문법적 요소를 갖는 텍스트에 대응하는 사전적 내용을 자막과는 별도로 출력할 수 있다.In addition, an apparatus for processing captions of images for language education according to another embodiment of the present invention may output dictionary contents corresponding to texts having difficulty or grammatical elements of words among texts separately from captions.

구체적으로, 사용자가 단어의 난이도 및 문법적 요소 중 적어도 어느 하나를 갖는 텍스트에 대한 사전적 내용을 위해 해당 텍스트를 클릭하면 자막과는 별도로 사전적 내용을 포함하는 팝업창을 생성할 수도 있고, 자막의 상단 및 하단 중 어느 한 쪽에 사전적 내용을 표시할 수도 있다.Specifically, when the user clicks the corresponding text for dictionary content on text having at least one of the difficulty of the word and grammatical elements, a pop-up window including the dictionary content may be generated separately from the subtitle, and a pop-up window may be generated at the top of the subtitle. And dictionary content may be displayed on either side of the bottom.

즉, 사용자가 디스플레이(203)의 단어메뉴(406)를 선택하지 않고, 제1 속성(404a', 404a'')이 적용된 단어를 선택함으로써 단어에 대한 사전적 의미를 새로운 창에 표시할 수 있다. 이때, 상기 사전 정보는 동영상 출력창(404) 내에서 팝업 형태로 표시될 수 있으며, 해당 단어에 대한 사전적 의미가 검색된 사전에 대한 링크를 자동 연결하여 해당 사이트가 출력될 수 있다.That is, by selecting a word to which the first properties 404a' and 404a'' are applied without selecting the word menu 406 of the display 203, the dictionary meaning of the word can be displayed in a new window. . At this time, the dictionary information may be displayed in a pop-up form within the video output window 404, and a corresponding site may be output by automatically connecting a link to a dictionary in which a dictionary meaning of a corresponding word is searched.

도 4b에 도시하지는 않았지만, 본 발명의 다른 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치는 상술한 단어의 난이도 및 문법적 요소에 대한 사전적 내용을 별도로 저장함으로써, 사용자만의 사전 정보를 만들 수 있다. 이에 따라, 사용자는 학습한 동영상 별로 사용자만의 사전 정보를 저장하고 표시할 수 있으며, 학습한 모든 동영상에 대한 사전 정보를 저장하고 표시할 수도 있다. Although not shown in FIG. 4B, the apparatus for processing video captions for language education according to another embodiment of the present invention creates dictionary information for a user by separately storing dictionary contents on the difficulty level and grammatical elements of a word. can Accordingly, the user can store and display user-only dictionary information for each learned video, or store and display dictionary information for all learned videos.

따라서, 사용자는 음성 인식을 기반으로 한 자막 정보가 포함된 동영상에 대한 학습이 완료된 후, 사용자가 별도로 저장한 사전 정보를 통해 모르는 단어 또는 문법에 대한 학습을 효과적으로 할 수 있는 효과가 있다.Therefore, after the user completes learning of a video including subtitle information based on voice recognition, the user can effectively learn unknown words or grammar through dictionary information stored separately by the user.

또한, 본 발명의 다른 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치는 동영상 출력창(404)의 영상 중 자막 정보를 구성하는 텍스트 중 단어 형태의 텍스트를 해당 텍스트와 대응하는 영상 이미지가 서로 대응하는 것임을 표시할 수 있다. 예를 들어, 제1 영역에 '피자'라는 단어가 표시되고 동영상 출력창(404)에 피자와 관련된 이미지가 표시된 경우, 해당 단어와 피자 이미지가 동일한 것임을 나타내도록 하는 표시를 제공할 수 있다. 이때, 상기 표시는 해당 이미지에 대한 블록처리일 수 있고, 동그라미 표시일 수 있고, 선을 통한 이미지와 단어 간 연결일 수 있다.In addition, in the apparatus for processing captions of images for language education according to another embodiment of the present invention, text in word form among text constituting caption information among images in the video output window 404 is converted into video images corresponding to the text. You can indicate that you are responding. For example, when the word 'pizza' is displayed in the first area and an image related to pizza is displayed in the video output window 404, a display indicating that the word and the image of pizza are the same may be provided. In this case, the display may be a block process for the corresponding image, a circle display, or a connection between an image and a word through a line.

또한, 문법적 요소는 조사, 어미 및 접사 중 적어도 하나를 포함할 수 있다. 여기서, 조사는 주격, 목적격, 관형격, 부사격 및 서술격 중 적어도 하나의 변형예를 더 포함하며, 어미는 평서형 의문형, 명령형, 청유형, 감탄형, 부정형, 높임, 시제, 양태, 피동 및 사동의 변형예 중 적어도 하나의 변형예를 더 포함할 수 있다.In addition, the grammatical element may include at least one of a particle, an ending, and an affix. Here, the particle further includes at least one variant of the nominative case, objective case, tubular case, adverbial case, and descriptive case, and the ending is a declarative interrogative type, an imperative type, a hearing type, an exclamation type, an indefinite type, an exaltation type, a tense type, a mode, a passive type, and a passive type. At least one modification may be further included.

상술한 제1 속성(404a', 404a'') 및 제2 속성(404b)는 상기 자막 정보를 구성하는 글자에 대한 크기, 색상, 모양 및 하이라이트 중 적어도 하나일 수 있고, 제1 속성(404a', 404a''), 제2 속성(404b) 및 나머지 영역은 모두 상이한 속성을 갖을 수 있다. 즉, 나머지 영역은 어떠한 속성도 갖지 않는 글자를 의미하는 것일 수 있다.The above-described first properties 404a' and 404a'' and the second property 404b may be at least one of the size, color, shape, and highlight of characters constituting the caption information, and the first property 404a' , 404a″), the second property 404b, and the remaining areas may all have different properties. That is, the remaining area may mean characters having no attributes.

도 4b에서는, 자막 정보를 구성하는 복수의 구성요소 중 데이터베이스(202)를 참조하여 단어의 난이도를 갖는 텍스트인, '거대한'을 제1 속성(404a')으로 표시하였으며, '들썩이'를 제1 속성(404'')으로 표시하였습니다.In FIG. 4B , among a plurality of components constituting subtitle information, referring to the database 202, 'huge', which is a text having a word difficulty, is indicated as a first attribute 404a', and 'excitement' is the first property 404a. 1 attribute (404'').

또한, 자막 정보를 구성하는 복수의 구성요소 중 데이터베이스(202)를 참조하여 문법적 요소를 포함하는 텍스트를 제2 표식(404b)으로 표시하였습니다. 제1 속성(404a', 404a'')과 제2 속성(404b)에는 각각 밑줄, 하이라이트, 붉은색의 속성이 적용되었지만, 제1 속성(404a', 404a'')과 제2 속성(404b)에 서로 상이한 속성이 적용된다면 이에 제한되지 않을 수 있다.In addition, among a plurality of components constituting subtitle information, the database 202 is referred to, and text including grammatical elements is displayed as a second marker 404b. Although underline, highlight, and red attributes are applied to the first attributes 404a' and 404a'' and the second attributes 404b, respectively, the first attributes 404a' and 404a'' and the second attributes 404b If different properties are applied to , it may not be limited thereto.

도 4b를 참조하면, 본 발명의 일 실시예에 따른 출력부(204)는 대화 상대 정보 제공창(405)을 출력할 수 있다. 대화 상대 정보 제공창(405)은 예를 들어, 동영상 출력창(404)에 해당하는 온라인 선생님에 대한 정보를 표시하고, 각 선생님에 대한 프로필 정보를 함께 제공하여 사용자가 대화하고자 하는 선생님을 선택하여 대화할 수 있도록 대화신청 메뉴(410)를 함께 출력하여 표시할 수 있다. Referring to FIG. 4B , the output unit 204 according to an embodiment of the present invention may output a conversation partner information providing window 405 . For example, the conversation partner information providing window 405 displays information about online teachers corresponding to the video output window 404 and provides profile information for each teacher, so that the user can select a teacher to talk to. The conversation request menu 410 can be output and displayed together so that conversation can be made.

이하에서는, 도 4d를 참조하여 대화신청 메뉴(410)에 대응하는 사용자의 입력에 대응하는 예시적인 출력 화면을 구체적으로 후술하기로 한다.Hereinafter, an exemplary output screen corresponding to a user's input corresponding to the chat request menu 410 will be described in detail with reference to FIG. 4D.

도 4b를 참조하면, 본 발명의 일 실시예에 따른 디스플레이(203)는 동영상 출력창(404)에 대한 학습을 위한 단어 메뉴(406), 문법 메뉴(407), 예문 메뉴(408) 및 질문 메뉴(409)를 더 포함할 수 있다.Referring to FIG. 4B , the display 203 according to an embodiment of the present invention includes a word menu 406 for learning on a video output window 404, a grammar menu 407, an example sentence menu 408, and a question menu. (409) may be further included.

단어 메뉴(406)는 사용자가 학습하고자 하는 동영상 입력에 대응하는 동영상에 포함된 복수의 단어에 대한 발음 표시 및 단어의 의미를 제공하기 위한 메뉴일 수 있다. 예를 들어, 도 4b에 도시된 바와 같이 동영상 출력창(404)에서 출력되는 자막 정보를 포함한 동영상은 '거대한', '들썩이다'와 같은 단어를 포함하고, 각각의 단어는 제1 표식(404a) 및 제2 표식(404b)으로 표시되어 있다. 이때, 사용자가 제1 표식(404a) 및 제2 표식(404b)으로 표시된 단어를 학습하고자 단어 메뉴(406)를 선택하는 경우, 각각 단어에 대한 파생어, 발음 및 의미를 제공할 수 있다. 예를 들어, '거대한'에 대한 단어의 파생어, 발음 및 의미로서 거대하다, [거대하다], to be huge, great가 표시되고, '들썩이다'에 대한 단어의 발음 및 의미로서 [들써기다], to be turn up slightly가 표시된다. 여기서, 단어의 의미에 대한 이해를 위해 한국어를 공부한 외국어로 표시되는 것을 기본으로 하지만, 이에 제한되는 것은 아니다.The word menu 406 may be a menu for providing pronunciation display and meaning of a plurality of words included in a video corresponding to a video input that a user wants to learn. For example, as shown in FIG. 4B , a video including subtitle information output from the video output window 404 includes words such as 'huge' and 'exciting', and each word is a first mark 404a. ) and a second mark 404b. In this case, when the user selects the word menu 406 to study words indicated by the first mark 404a and the second mark 404b, derivatives, pronunciations, and meanings of each word may be provided. For example, huge, [huge], to be huge, great are displayed as the derivation, pronunciation and meaning of the word for 'huge', and [excited] as the pronunciation and meaning of the word for 'excited'. , to be turn up slightly is displayed. Here, it is based on being displayed in a foreign language studied Korean for understanding the meaning of the word, but is not limited thereto.

문법 메뉴(407)는 사용자가 학습하고자 하는 동영상 입력에 대응하는 동영상에 포함된 문법 즉, 한국말의 패턴을 제공하기 위한 메뉴일 수 있다. 예를 들어, 도 4b에 도시된 바와 같이, 동영상 출력창(404)에서 출력되는 자막 정보를 포함한 동영상은 '~하고 있다'와 같은 문법적 요소를 포함하하는 텍스트인 '고 있'이 제2 속성(404b)으로 표시되어 있다. 이때, 사용자가 제2 속성(404b)으로 표시된 문법을 학습하고자 문법 메뉴(407)를 선택하는 경우, '고 있'에 해당하는 문법 정보가 표시될 수 있다. 즉, 해당 문법적 요소의 패턴과 해당 패턴의 의미 및 해당 패턴을 활용한 예문까지 함께 표시될 수 있다. 도 4b에 도시된 바와 같이, 제2 속성(404b)을 갖는 '고 있'은 '(동사)+ ~고 있다'의 패턴을 갖는 문법으로서, 어떤 동작이나 상태가 진행되고 있음을 나타내는 표현임을 의미하고, 상기 패턴이 사용되는 예문으로는 최근 수박이 많이 팔리'고 있'다, 지금 공부하'고 있'다 등이 제공될 수 있다.The grammar menu 407 may be a menu for providing grammar included in a video corresponding to a video input that the user wants to learn, that is, a Korean language pattern. For example, as shown in FIG. 4B , a video including subtitle information output from the video output window 404 has 'going', which is text including a grammatical element such as 'doing', as a second attribute. 404b. In this case, when the user selects the grammar menu 407 to learn the grammar indicated by the second attribute 404b, grammar information corresponding to 'having' may be displayed. That is, the pattern of the corresponding grammatical element, the meaning of the corresponding pattern, and example sentences using the corresponding pattern may be displayed together. As shown in FIG. 4B, 'there' is a grammar having a pattern of '(verb) + ~', which has a second property 404b, and means that it is an expression indicating that a certain action or state is in progress. And examples of using the above pattern may include, 'recently, a lot of watermelons are sold', 'I am studying', and the like.

예문 메뉴(408)는 사용자가 학습하고자 하는 동영상 입력에 대응하는 동영상에 포함된 모든 자막에 대한 정보를 제공하기 위한 메뉴일 수 있다. 여기서, 본 발명의 일 실시예에 따른 자막 정보는 음성 인식을 기초로 출력되는 것을 특징으로 하지만, 자막 데이터를 데이터베이스(202)에 미리 저장하여 데이터베이스(202)로부터 자막 데이터를 수신할 수도 있다. 예를 들어, 도 4c에 도시된 바와 같이, 동영상 출력창(404)에서 출력되는 동영상 측면에 동영상에 매칭되는 자막 데이터가 미리 저장된 데이터베이스(202)로부터 자막 데이터를 수신하여 예문창(418)에 표시할 수 있다. 사용자는 예문창(418)을 통해 사용자가 학습하고자 하는 동영상에 포함된 단어 및 문법적 패턴을 한눈에 파악할 수 있는 효과가 있다.The example sentence menu 408 may be a menu for providing information on all subtitles included in a video corresponding to a video input that the user wants to learn. Here, although the caption information according to an embodiment of the present invention is characterized in that it is output based on voice recognition, the caption data may be received from the database 202 by storing the caption data in advance in the database 202 . For example, as shown in FIG. 4C , subtitle data is received from the database 202 in which subtitle data matching the video is previously stored on the side of the video output from the video output window 404 and displayed on the example sentence window 418. can do. There is an effect that the user can grasp words and grammatical patterns included in the video that the user wants to learn at a glance through the example sentence window 418 .

이이서, 질문 메뉴(409)는 사용자가 학습한 동영상에 대한 이해도 파악을 위해 미리 제공되는 복수의 질문사항을 제공하기 위한 메뉴일 수 있다. 이에, 사용자들은 동영상에 대한 학습이 종료된 후 질문 메뉴(409)에 제공되는 질문, 예를 들어, '한국식 피자를 먹어 봤어요? 어땠어요?'와 같은 질문사항 등이 제공될 수 있다. 또한, 도 4c에 도시되지는 않았지만, 질문 메뉴(409)에서 제공되는 질문사항에 대한 피드백이 이루어질 수 있도록 녹음기능이 더 제공될 수 있다. Subsequently, the question menu 409 may be a menu for providing a plurality of pre-provided questions to determine the degree of understanding of the video learned by the user. Accordingly, users ask a question provided in the question menu 409 after learning about the video, for example, 'Have you eaten Korean pizza? Questions such as 'How was it?' may be provided. Also, although not shown in FIG. 4C , a recording function may be further provided so that feedback on questions provided in the question menu 409 can be made.

또한, 사용자는 제공되는 질문 메뉴(409)에서 제공되는 질문사항에 대한 답을 생각한 후, 도 4c에 도시된 대화신청 메뉴(410)를 통해 온라인 선생님과 해당 질문사항에 대한 대화를 나눌 수 있다.In addition, after thinking about answers to the questions provided in the provided question menu 409, the user can have a conversation with the online teacher about the questions through the conversation request menu 410 shown in FIG. 4C.

도 4d를 참조하면, 사용자가 선택한 대화신청 메뉴(410)에 대응하는 대화 상대 영상(411)과 사용자 영상(412)이 출력부(204)에 동시에 출력되고, 출력부(203)에는 사용자가 학습한 동영상에 대한 예문창(418)이 함께 출력될 수 있다. 이에, 사용자는 대화 상대 즉, 선생님과의 대화 중에도 동시에 표시되는 예문창(418)을 함께 보면서 해당 예문창(418)에서의 질문사항 또는 질문 메뉴(409)에서 제공된 질문사항에 대한 대화를 나눌 수 있다.Referring to FIG. 4D , a conversation partner image 411 and a user image 412 corresponding to the chat request menu 410 selected by the user are simultaneously output to the output unit 204, and the output unit 203 allows the user to learn An example sentence window 418 for one video may be output together. Therefore, the user can have a conversation about the questions provided in the question menu 409 or the questions in the example sentence window 418 while viewing the example sentence window 418 displayed at the same time during the conversation with the conversation partner, that is, the teacher. there is.

이때, 대화 상대 영상(411) 및 사용자 영상(412)은 음성 인식 시스템을 통해 실시간으로 자막 정보를 출력하여 표시할 수 있다. 즉, 도 4d에 도시된 바와 같이, 대화 상대 영상(411)의 음성을 각각 인식하여 대화 상대 영상(411) 상에 제1 자막(411a)을 출력한다. 또한, 사용자 영상(412)의 음성을 인식하여 사용자 영상(412) 상에 제2 자막(412a)을 출력한다. 여기서, 대화 상대 영상(411) 및 사용자 영상(412)은 동시에 출력되는 것을 특징으로 하지만, 이에 제한되지 않으며, 각각 다른 화면에서 출력될 수도 있다. In this case, the conversation partner image 411 and the user image 412 may output and display subtitle information in real time through a voice recognition system. That is, as shown in FIG. 4D , the voice of the conversation partner image 411 is recognized and the first caption 411a is output on the conversation partner image 411 . In addition, the voice of the user image 412 is recognized and the second caption 412a is output on the user image 412 . Here, the conversation partner image 411 and the user image 412 are characterized in that they are output simultaneously, but are not limited thereto, and may be output on different screens.

제1 자막(411a) 및 제2 자막(412a)을 포함하는 대화 상대 영상(411) 및 사용자 영상(412)을 통해 학습한 동영상에 대한 피드백을 수행하는 사용자는 음성을 자동으로 인식하여 출력되는 자막 정보를 통해 대화 상대와의 대화를 보다 신속하고 정확하게 이해할 수 있는 효과가 있다.The user performing feedback on the video learned through the conversation partner video 411 and the user video 412 including the first subtitle 411a and the second subtitle 412a automatically recognizes and outputs the subtitle Through the information, there is an effect of understanding the conversation with the conversation partner more quickly and accurately.

또한, 도 4d에 도시하지는 않았지만, 제1 자막(411a) 및 제2 자막(412a)에는 단어의 난이도 또는 문법적 요소를 포함하는 텍스트를 각각 제1 속성 및 제2 속성으로 나누어 표시할 수 있다. 따라서, 사용자는 대화 상대인 선생님과의 대화를 하면서도 단어의 난이도 또는 문법적 요소를 한눈에 파악할 수 있는 효과가 있다.Also, although not shown in FIG. 4D , text including a difficulty level or a grammatical element of a word may be divided into a first property and a second property, respectively, and displayed in the first caption 411a and the second caption 412a. Accordingly, the user can grasp the difficulty level or grammatical elements of a word at a glance while having a conversation with a teacher who is a conversation partner.

이하에서는, 도 4e를 참조하여 언어 교육을 위한 영상의 자막 처리 장치의 주제 검색 방법을 상세하게 설명하기로 한다.Hereinafter, with reference to FIG. 4E, a subject search method of the caption processing apparatus for a video for language education will be described in detail.

도 4e를 참조하면, 사용자는 콘텐츠 대분류 메뉴(401a), 콘텐츠 소분류 메뉴(401b) 및 키워드 메뉴(401c)로 구성된 검색 카테고리(401) 이외에 주제검색 메뉴를 통한 세부적인 주제검색을 할 수 있다.Referring to FIG. 4E , a user may perform a detailed subject search through a subject search menu in addition to a search category 401 composed of a content major category menu 401a, a content subcategory menu 401b, and a keyword menu 401c.

도 4e를 참조하면, 사용자는 주제검색 메뉴를 통해 주제의 레벨, 분류, 문법 및 주제를 선택하여 작문 시스템을 활용할 수 있다. 도 4e에 도시된 바와 같이, 사용자가 레벨로서는 초급을 선택하고, 분류로서는 '새로운 것, 시도'에 대한 것을 선택하고, 학습하고자 하는 문법으로는 '아/어 보다'를 선택하고, 주제로서는 여행을 선택한 경우, 사용자의 선택에 대응하는 문장을 자동적으로 제공해줄 수 있다. 예를 들어, 상기 선택에 대응하여 '전주에서 비빔밥을 먹어 보다'와 같은 예문이 출력되어 표시될 수 있고, '-아/어 보다'의 변형예인 '-해 보다'를 포함한 '제주도에서 수영을 해 보다'와 같은 예문이 출력되어 표시될 수 있다.Referring to FIG. 4E , the user may utilize the writing system by selecting the level, classification, grammar, and subject of the subject through the subject search menu. As shown in FIG. 4E, the user selects beginner level as the level, 'something new, try' as the classification, 'see ah/uh' as the grammar to learn, and travel as the subject. When is selected, a sentence corresponding to the user's selection may be automatically provided. For example, in response to the selection, example sentences such as 'try bibimbap in Jeonju' may be output and displayed, and 'swim in Jeju Island' including 'see -hae', which is a variation of 'see -ah/uh'. An example sentence such as 'try it' may be output and displayed.

도 4e에서는 주제선택 메뉴와 검색문장 메뉴에 문장이 동시에 출력된 것으로 도시되었으나, 순차적으로 출력될 수 있다.In FIG. 4E , the sentences are shown to be simultaneously output to the subject selection menu and the search sentence menu, but may be sequentially output.

구체적으로, 주제검색 메뉴 중 분류에 대응하는 사용자의 입력에 대응하여 출력된 주제선택 예문 중, '제주도에서 수영을 해 보다'와 관련하여 사용자가 작문을 하려는 경우, 사용자가 검색문장 메뉴에 직접 문장을 입력할 수 있다. Specifically, among the subject selection example sentences output in response to the user's input corresponding to classification in the subject search menu, when the user wants to write a composition in relation to 'swimming in Jeju Island', the user can enter the sentence directly in the search sentence menu. can be entered.

사용자가 검색문장 메뉴에 문장을 입력하고 검색 메뉴를 선택할 경우, 사용자가 입력한 문장에 대한 오기 및 비문 등을 자동으로 확인해주면서, 보다 정확한 문장을 제공해 줄 수 있다. 예를 들어, 사용자가 검색문장에 '내일 친구와 전주에서 비빔밥을 먹어 보다'라고 입력하고 검색 메뉴를 선택한 경우, 제안문장 메뉴에서는 상기 검색문장에서의 틀린 표현들을 수정하여 올바른 표현의 문장을 출력하여 표시할 수 있다. 즉, 제안문장 메뉴는 '내일 친구와 전주에서 비빔밥을 먹어 볼 것이다' 또는 '오늘 친구와 전주에서 비빔밥을 먹어 보았다' 와 같은 표현을 함께 제공하여 사용자가 올바른 표현을 사용할 수 있도록 여러가지 활용예를 표시할 수 있다.When the user inputs a sentence in the search sentence menu and selects the search menu, a more accurate sentence can be provided while automatically checking for misspellings and inscriptions in the sentence entered by the user. For example, when a user inputs 'try eating bibimbap with friends in Jeonju tomorrow' in the search sentence and selects the search menu, the suggested sentence menu corrects incorrect expressions in the search sentence and outputs a sentence with the correct expression. can be displayed That is, the suggestion sentence menu displays various usage examples so that the user can use the correct expression by providing expressions such as 'I will try bibimbap with a friend tomorrow in Jeonju' or 'I ate bibimbap with a friend in Jeonju today'. can do.

이에, 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치 및 방법에서는 사용자에게 자동 자막 인식 기술을 활용하여 사용자가 손쉽게 문법 지식 및 단어를 파악하여 학습할 수 있도록 하는 효과가 있고, 여기에 실생활에서 사용되는 활용예를 예문을 통해 제공함으로써 구어체에 대한 학습능력도 향상시킬 수 있는 효과가 있습니다.Accordingly, in the apparatus and method for processing captions of images for language education according to an embodiment of the present invention, an automatic caption recognition technology is used for a user to easily identify and learn grammar knowledge and words, Here, by providing examples of usage used in real life through example sentences, it has the effect of improving the learning ability for colloquial speech.

또한, 학습한 동영상에 대해 사용자가 원하는 온라인 선생님을 매칭시킴으로써 온라인 선생님과 동영상에 대한 피드백을 실시간으로 수행할 수 있으므로, 문법, 읽기, 쓰기 뿐만 아니라 말하기도 학습할 수 있도록 하는 효과가 있다.In addition, by matching the online teacher desired by the user to the learned video, feedback on the online teacher and the video can be performed in real time, so that not only grammar, reading and writing, but also speaking can be learned.

따라서, 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치 및 방법을 사용하는 사용자는 자막 정보 내에서 단어 및 문법적 요소를 빠르고 효율적으로 파악하여 언어 학습 효율을 증대시킬 수 있다.Accordingly, a user who uses the apparatus and method for processing captions for language education according to an embodiment of the present invention can quickly and efficiently identify words and grammatical elements within caption information to increase language learning efficiency.

또한, 본 발명의 일 실시예에 따른 언어 교육을 위한 영상의 자막 처리 장치 및 방법에서는 음성 인식을 기반으로 한 자막 정보가 포함된 동영상에 대한 학습이 완료된 후 자막 정보에 대응되는 복수의 문제 형태를 갖는 문제들을 제공함으로써, 사용자의 동영상에 대한 이해도를 파악할 수 있다.In addition, in the apparatus and method for processing video captions for language education according to an embodiment of the present invention, after learning of a video including caption information based on voice recognition is completed, a plurality of problem types corresponding to the caption information are determined. It is possible to grasp the level of understanding of the user's video by providing problems with the user.

본 명세서에서, 각 블록 또는 각 단계는 특정된 논리적 기능 (들) 을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또한, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.In this specification, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing specified logical function(s). It should also be noted that in some alternative embodiments it is possible for the functions recited in blocks or steps to occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order depending on their function.

본 명세서에 개시된 실시예들과 관련하여 설명된 방법 또는 알고리즘의 단계는 프로세서에 의해 실행되는 하드웨어, 소프트웨어 모듈 또는 그 2 개의 결합으로 직접 구현될 수도 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터, 하드 디스크, 착탈형 디스크, CD-ROM 또는 당업계에 알려진 임의의 다른 형태의 저장 매체에 상주할 수도 있다. 예시적인 저장 매체는 프로세서에 커플링되며, 그 프로세서는 저장 매체로부터 정보를 판독할 수 있고 저장 매체에 정보를 기입할 수 있다. 다른 방법으로, 저장 매체는 프로세서와 일체형일 수도 있다. 프로세서 및 저장 매체는 주문형 집적회로 내에 상주할 수도 있다. ASIC는 사용자 단말기 내에 상주할 수도 있다. 다른 방법으로, 프로세서 및 저장 매체는 사용자 단말기 내에 개별 컴포넌트로서 상주할 수도 있다.The steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly embodied as hardware executed by a processor, a software module, or a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor, and the processor can read information from, and write information to, the storage medium. Alternatively, the storage medium may be integral with the processor. A processor and storage medium may reside within an application specific integrated circuit. An ASIC may reside within a user terminal. Alternatively, the processor and storage medium may reside as separate components within a user terminal.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형실시될 수 있다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although the embodiments of the present invention have been described in more detail with reference to the accompanying drawings, the present invention is not necessarily limited to these embodiments, and may be variously modified without departing from the technical spirit of the present invention. . Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain, and the scope of the technical idea of the present invention is not limited by these embodiments. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The protection scope of the present invention should be construed according to the claims below, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

100: 서비스 제공 장치
200 적어도 하나의 사용자 장치
201: 수신부
202: 데이터베이스
203: 결정부
204: 출력부
401: 검색 카테고리
401a: 콘텐츠 대분류 메뉴
401b: 콘텐츠 소분류 메뉴
401c: 키워드 메뉴
402: 콘텐츠 카테고리
403: 동영상 콘텐츠
404: 동영상 출력창
404a', 404a'': 제1 속성
404b: 제2 속성
405: 대화 상대 정보 제공창
406: 단어 메뉴
407: 문법 메뉴
408: 예문 메뉴
409: 질문 메뉴
410: 대화신청 메뉴
411: 대화 상대 영상
411a: 제1 자막
412: 사용자 영상
412a: 제2 자막
418: 예문창100: service providing device
200 at least one user device
201: receiver
202: database
203: decision unit
204: output unit
401: search category
401a: Main Content Classification Menu
401b: content subcategory menu
401c: keyword menu
402: content category
403: video content
404: video output window
404a', 404a'': first attribute
404b: second attribute
405: Contact information provision window
406: word menu
407: grammar menu
408: example menu
409: Question menu
410: conversation request menu
411: Buddy video
411a: first subtitle
412 User image
412a: second subtitle
418: Example sentence window

Claims

outputting a video related to a subject that a user wants to learn through a user input and at least one of difficulty, classification, and grammar for the subject;
obtaining text by receiving and recognizing the audio of the video;
determining at least some texts having difficulty or grammatical elements of words among the texts by referring to a database;
determining a property for distinguishing the at least some texts having the difficulty level or the grammatical elements from other texts; and
outputting at least a portion of the text, the property of which is determined, as a subtitle on the video; Including,
The step of determining the at least some text,
A word corresponding to a derivative, a synonym, a compound word, a compound word, and a modified word in the form of a single word is determined as a text having the above level of difficulty, and the propositional, declarative, and interrogative forms include at least one modification among the subject case, object case, tubular case, adverbial case, and predicate case. , Imperative type, request type, exclamatory type, indefinite type, elevation, tense, mode, passive and inactive, and determining words consisting of endings and affixes including at least one modification example as text having the grammatical elements,
In the step of outputting the subtitle,
A step of outputting an output screen including a word menu, grammar menu, and question menu area for providing a learning service through interaction with the subtitle to an area adjacent to a window for outputting the video,
The output screen is
When any one word constituting the text is selected from the word menu through a user input, it is configured to output the derivative, pronunciation, and meaning of the word,
When a text having the grammatical element is selected from among the texts through a user input in the grammar menu, a pattern for the grammatical element, the meaning of the pattern, and a usage example are output,
After the learning of the video is completed, a question related to the subject, the difficulty, or the text having the grammatical elements is output through a user input in the question menu, and a user answer is recorded for feedback on the question. configured so that
After the step of outputting the subtitle,
outputting at least a portion of text having the difficulty level or the grammatical element among texts for user answers obtained through recognition of the recording; Further comprising a subtitle processing method of video for language education.

delete

According to claim 1,
The method of processing video captions for language education, further comprising outputting, separately from the captions, dictionary contents corresponding to the at least some texts having the difficulty level or the grammatical element among the texts.

According to claim 5,
The method of processing subtitles for a video for language education, further comprising storing the dictionary content for the difficulty level or the grammatical element as separate dictionary information.

delete

According to claim 1,
To determine the properties,
applying a first property to the text having the difficulty level; and
A method of processing video captions for language education, comprising applying a second property to text having the grammatical elements.

According to claim 9,
The first property and the second property are
A method for processing video captions for language education, wherein the text is distinguished by at least one attribute of size, color, shape, and highlight.

According to claim 9,
When the first property and the second property are simultaneously applied to any part of the text, the first property and the second property, which are different from each other, are applied and distinguished.

According to claim 1,
In the step of outputting the subtitle,
A method of processing video captions for language education, comprising displaying and outputting captions to be distinguished for each speaker in response to audio of at least one speaker included in the video.

According to claim 1,
The video caption processing method for language education further comprising the step of selectively determining a conversation partner specified in the video and simultaneously outputting images of the determined conversation partner and the user.

According to claim 13,
The method of processing captions for a video for language education, further comprising recognizing the audio of the conversation partner's video and the user's video, respectively, and outputting a caption on the video.

delete

a receiving unit configured to receive and recognize audio of a video related to a subject to be learned by a user and at least one of difficulty, classification, and grammar of the subject through a user input, and obtain text;
a database configured to store texts with word difficulty and texts with grammatical elements;
a determination unit configured to determine at least some of the texts having difficulty or grammatical elements of words among the texts by referring to the database among the texts acquired through the reception unit;
An output unit configured to determine a property for distinguishing the at least some text having the difficulty or the grammatical element from other text, and to output at least some of the text, the property of which is determined, as a subtitle on the video; ,
The decision section,
A word corresponding to a derivative, a synonym, a compound word, a compound word, and a modified word in the form of a single word is determined as a text having the above level of difficulty, and the propositional, declarative, and interrogative forms include at least one modification among the subject case, object case, tubular case, adverbial case, and predicate case. , imperative, hearing, exclamatory, indefinite, elevated, tense, modal, passive, and passive, and determining words consisting of endings and affixes including at least one variation among the grammatical elements as text having the grammatical elements,
the output unit,
It is configured to output an output screen including a word menu, grammar menu, and question menu area for providing a learning service through interaction with the subtitle to an area adjacent to the window for outputting the video,
The output screen is
When any one word constituting the text is selected from the word menu through a user input, it is configured to output the derivative, pronunciation, and meaning of the word,
When a text having the grammatical element is selected from among the texts through a user input in the grammar menu, a pattern for the grammatical element, the meaning of the pattern, and a usage example are output,
After the learning of the video is completed, a question related to the subject, the difficulty, or the text having the grammatical elements is output through a user input in the question menu, and a user answer is recorded for feedback on the question. configured so that
the output unit,
The subtitle processing apparatus for video for language education, further configured to output at least a portion of text having the difficulty level or the grammatical element among texts for user answers obtained through recognition of the recording.