KR20180119101A

KR20180119101A - System and method for creating broadcast subtitle

Info

Publication number: KR20180119101A
Application number: KR1020180025541A
Authority: KR
Inventors: 김정수; 박재우; 노웅래; 지효은; 양경희
Original assignee: 주식회사 소리보기
Priority date: 2017-04-24
Filing date: 2018-03-05
Publication date: 2018-11-01
Also published as: KR102044689B1

Abstract

According to the present invention, disclosed are a system for creating a broadcasting subtitle and a method thereof. According to an embodiment of the present invention, the present invention generates and stores an original subtitle based on voice included in a broadcasting signal, divides a sentence of the voice included in the received broadcasting signal into each word, performs a voice recognition algorithm for each of the divided words to be converted into a text form, and generates a converted subtitle. Also, the present invention corrects a text conversion error based on a correlation value for a syllable of each word of the generated converted subtitle and generates a corrected subtitle in which the text conversion error corrected converted subtitle is synchronized with the voice of the broadcasting signal based on a delay time for voice recognition algorithm performance for each word and transmits the corrected subtitle to a user terminal. Therefore, the present invention can generate the corrected subtitle accurately synchronized with the voice of the broadcasting signal and can increase an immersion degree and interest in received broadcasting content. The system for creating a broadcasting subtitle comprises the user terminal and a subtitle creating server.

Description

SYSTEM AND METHOD FOR CREATING BROADCAST SUBTITLE [0002]

본 발명은 방송자막 제작 시스템 및 방법에 관한 것으로서, 보다 상세하게는, 방송 신호의 음성을 텍스트 형태로 변환하여 변환 자막을 생성하고 생성된 변환 자막의 텍스트 변환 오류를 정정하고 방송 신호의 음성에 동기되도록 보정함에 따라, 방송 컨텐츠의 음성에 정확하게 동기된 자막을 사용자 단말로 전달할 수 있도록 하는 기술에 관한 것이다.The present invention relates to a broadcasting subtitle production system and method, and more particularly, to a system and method for generating a subtitle broadcasting, So that the subtitles accurately synchronized with the audio of the broadcast contents can be transmitted to the user terminal.

기존 방송 콘텐츠의 청각장애인을 위한 Closed Caption 자막 데이터는 사전 제작되는 일부 콘텐츠를 제외하고 많은 부분이 실 방송 시에 방송오디오를 속기사가 듣고 이를 속기로 작성한 자막데이터를 방송 신호와 함께 전송하고 있다. Closed caption data for existing hearing impaired people of existing broadcasting contents, except for some pre - produced contents, many parts of the broadcast audio are transmitted by the stenographers at the time of actual broadcasting, and the subtitle data created by shorthand is transmitted together with the broadcasting signal.

이러한 이유로 실제 오디오와 이에 해당하는 자막 데이터 사이의 시간지연이 1초이상 많게는 3~5초정도 발생하는 것이 일반적인 현상이다.For this reason, it is a general phenomenon that the time delay between the actual audio and the corresponding caption data is more than 1 second and more than 3 to 5 seconds.

이러한 지연시간이 일반인에게는 별 문제가 되지 않을 수 있으나 오디오를 듣지 못하는 청각장애인들에게는 상당한 불편함을 가져다 준다. This delay may not be a problem for the general public, but it can be a significant inconvenience for hearing-impaired people who can not hear audio.

즉 오디오가 나오는 시간과 자막이 나오는 시간이 차이가 생김에 따라서 누가 말했는지를 정확히 이해하기가 어려운 문제점이 있다.That is, there is a problem in that it is difficult to accurately understand who said what time the audio output time and the subtitling time differ.

이에 본 발명의 목적은 방송 신호의 음성으로부터 원자막을 생성하여 저장하고, 음성인식 알고리즘을 이용하여 방송 신호의 음성을 텍스트 형태로 변환하여 변환 자막을 생성하며 생성된 변환 자막의 음절 단위의 상관도값을 토대로 텍스트 변환 오류를 정정하고 각 단어 별 텍스트 변환에 따른 변환 자막과 상기 원자막의 지연 시간을 토대로 자막 위치가 방송 신호의 음성에 동기되도록 보정함에 따라, 텍스트 형태로 변환된 변환 자막과 원 자막과의 상관도값을 토대로 텍스트 변환 시 발생된 변환 자막의 오류를 정정하면서 방송 신호의 음성에 자막을 정확하게 동기시킬 수 있는 방송자막 제작 시스템 및 방법을 제공하고자 함에 있다.Accordingly, an object of the present invention is to provide a method and apparatus for generating and storing a character string from a voice of a broadcast signal, generating a converted caption by converting a voice of the broadcast signal into a text form using a speech recognition algorithm, And corrects the text conversion error to synchronize the subtitle position with the speech of the broadcasting signal based on the conversion subtitle according to the text conversion for each word and the delay time of the original subtitle, The present invention is to provide a broadcasting subtitle production system and method capable of accurately synchronizing subtitles with a speech of a broadcast signal while correcting an error of a converted subtitle generated during text conversion based on the correlation value with the subtitle.

또한 본 발명의 다른 목적은 방송 신호의 음성과 정확하게 동기화된 자막을 실시간으로 사용자 단말로 제공함에 따라, 방송 컨텐츠에 대한 집중도 및 흥미성을 더욱 향상시킬 수 있는 방송자막 제작 시스템 및 방법을 제공하고자 함에 있다.Another object of the present invention is to provide a broadcasting subtitle production system and method capable of further improving concentration and interest in broadcasting contents by providing subtitles accurately synchronized with a voice of a broadcasting signal to a user terminal in real time .

전술한 목적을 달성하기 위한 본 발명의 일 실시 태양으로 방송자막 제작 시스템은, According to an aspect of the present invention, there is provided a broadcast subtitle production system including:

사용자 단말; 및A user terminal; And

방송 신호에 포함된 음성을 토대로 원자막을 생성하고, 상기 방송 신호에 포함된 음성을 다수의 단어로 분할하고 분할된 각각의 단어에 대해 음성인식 알고리즘을 이용하여 텍스트 형태로 변환하여 변환 자막을 생성하며, 상기 오류 정정된 변환 자막의 각 단어에 대해 음성인식 알고리즘 수행에 따른 지연 시간을 토대로 변환 자막이 방송 신호의 음성에 동기화된 보정 자막을 생성하여 상기 사용자 단말로 전달하는 자막 제작 서버를 포함하는 것을 특징으로 한다.A speech signal is generated based on a voice included in a broadcast signal, the voice included in the broadcast signal is divided into a plurality of words, and each divided word is converted into a text form using a speech recognition algorithm to generate a converted caption And a subtitle generation server for generating a corrected subtitle in which the converted subtitle is synchronized with the audio of the broadcast signal and delivering the corrected subtitle to the user terminal based on the delay time according to speech recognition algorithm execution for each word of the error corrected transformed subtitle .

바람직하게 상기 자막 제작 서버는, 방송국으로부터 송출된 방송 신호에 포함된 음성을 토대로 원 자막을 생성하여 자막 DB로 전송하는 방송신호 수신장치; 상기 방송신호의 음성을 제공받아 음성의 문장을 다수의 단어로 분할하고 분할된 각각의 단어에 대해 음성인식 알고리즘을 수행하여 텍스트 형태로 변환하여 변환 자막을 생성하며, 생성된 각각의 변환 자막의 각 단어에 대해 음절과 원 자막의 음절의 상관도값을 도출하여 도출된 상관도값을 토대로 텍스트 변환 시 발생한 변환 자막의 오류를 정정하고, 변환 자막의 각 단어에 대한 음성인식 알고리즘 수행에 따른 지연 시간을 토대로 오류 정정된 변환 자막이 음성에 동기화된 보정 자막을 생성하는 자막 보정 장치를 포함할 수 있다.Preferably, the caption production server comprises: a broadcast signal receiver for generating a caption based on a voice included in a broadcast signal transmitted from a broadcasting station and transmitting the generated caption to a caption DB; The speech signal is divided into a plurality of words by receiving the voice of the broadcast signal, a speech recognition algorithm is performed on each divided word to convert the speech signal into a text form to generate a converted caption, The error of the conversion subtitle generated in the text conversion is corrected based on the correlation value derived by deriving the correlation value of the syllable of the syllable and the syllable of the word and the delay time And a subtitle correction device for generating a corrected subtitle whose error corrected transcription subtitle is synchronized with the speech.

바람직하게 상기 자막 보정 장치는, 상기 방송신호의 음성을 제공받아 음성의 문장을 단어 별로 분할하는 음성 수신부; 분할된 각각의 단어에 대해 음성인식 알고리즘을 이용하여 텍스트 형태로 변환하여 변환 자막을 생성하는 텍스트 변환부; 생성된 각각의 변환 자막의 각 단어와 상기 원자막의 단어에 대해 음절 별 상관도값을 도출하여 도출된 상관도값을 기준으로 대응되는 원자막을 찾아 텍스트 변환 시 발생된 변환 자막의 오류를 정정하는 변환 자막 생성부; 및 상기 오류 정정된 변환 자막의 각 단어에 대해 음성인식 알고리즘 수행에 따른 지연 시간을 토대로 변환 자막이 방송 신호의 음성에 동기화된 보정 자막을 생성하는 보정 자막 생성부를 포함할 수 있다.Preferably, the caption correction device includes: a voice receiving unit that receives a voice of the broadcast signal and divides a sentence of the voice into words; A text conversion unit for converting the divided words into a text form using a speech recognition algorithm to generate a converted caption; A syllable correlation value is derived for each word of each converted subtitle and the word of the subtitle, and the corresponding subtitle is found based on the derived correlation value, thereby correcting the error of the subtitle generated during the text conversion A transformed subtitle generating unit; And a corrected caption generation unit for generating a corrected caption in which the converted caption is synchronized with the audio of the broadcast signal based on the delay time according to the speech recognition algorithm performed for each word of the error corrected caption.

바람직하게 상기 보정 자막 생성부는, 음성의 화자인식 알고리즘을 이용하여 화자를 구분하고 각 화자 별로 보정 자막을 생성하도록 구비될 수 있다.Preferably, the corrected caption generation unit may be configured to classify the speakers using a speech recognition algorithm of the speech, and to generate corrected captions for the respective speakers.

바람직하게 상기 자막 제작 장치는 상기 오류 정정된 변환 자막을 요청된 언어로 번역하여 번역 자막을 생성한 후 생성된 번역 자막을 상기 보정 자막 생성부로 전달하는 번역부를 더 포함할 수 있다.Preferably, the subtitle production apparatus may further include a translator for translating the error-corrected translatable subtitles into a requested language to generate a translated subtitles and transmitting the generated translated subtitles to the corrected subtitle generation unit.

본 발명의 다른 실시 태양에 의거 자막 제작 서버는, 방송국으로부터 송출된 방송 신호에 포함된 음성을 토대로 원 자막을 생성하여 자막 DB로 전송하는 방송신호 수신장치; 상기 방송신호의 음성을 제공받아 음성의 문장을 다수의 단어로 분할하고 분할된 각각의 단어에 대해 음성인식 알고리즘을 수행하여 텍스트 형태로 변환하여 변환 자막을 생성하며, 생성된 각각의 변환 자막의 각 단어에 대해 음절과 원 자막의 음절의 상관도값을 도출하여 도출된 상관도값을 토대로 텍스트 변환 시 발생한 변환 자막의 오류를 정정하고, 변환 자막의 각 단어에 대한 음성인식 알고리즘 수행에 따른 지연 시간을 토대로 오류 정정된 변환 자막이 방송 신호의 음성에 동기화된 보정 자막을 생성하는 자막 보정 장치를 포함할 수 있다.According to another embodiment of the present invention, a subtitle production server comprises: a broadcast signal reception apparatus for generating a source subtitle based on a voice included in a broadcast signal transmitted from a broadcasting station and transmitting the generated subtitle to a subtitle DB; The speech signal is divided into a plurality of words by receiving the voice of the broadcast signal, a speech recognition algorithm is performed on each divided word to convert the speech signal into a text form to generate a converted caption, The error of the conversion subtitle generated in the text conversion is corrected based on the correlation value derived by deriving the correlation value of the syllable of the syllable and the syllable of the word and the delay time And a subtitle correction device for generating a corrected subtitle whose error corrected transcription subtitle is synchronized with the audio of the broadcast signal.

바람직하게 상기 자막 보정 장치는, 상기 방송신호의 음성을 제공받아 음성의 문장을 단어 별로 분할하는 음성 수신부; 분할된 각각의 단어에 대해 음성인식 알고리즘을 이용하여 텍스트 형태로 변환하여 변환 자막을 생성하는 텍스트 변환부; 생성된 각각의 변환 자막의 각 단어와 상기 원자막의 단어에 대해 음절 별 상관도값을 도출하여 도출된 상관도값을 기준으로 변환자막에 대응되는 원자막을 찾아 텍스트 변환 시 발생된 변환 자막의 오류를 정정하는 변환 자막 생성부; 및 상기 오류 정정된 변환 자막의 각 단어에 대해 음성인식 알고리즘 수행에 따른 지연 시간을 토대로 변환 자막이 방송 신호의 음성에 동기화된 보정 자막을 생성하는 보정 자막 생성부를 포함할 수 있다.Preferably, the caption correction device includes: a voice receiving unit that receives a voice of the broadcast signal and divides a sentence of the voice into words; A text conversion unit for converting the divided words into a text form using a speech recognition algorithm to generate a converted caption; A syllable correlation value is derived for each word of each converted subtitle and a word of the subtitle, and a source subtitle corresponding to the subtitle is found based on the derived correlation value, A conversion subtitle generation unit for correcting an error; And a corrected caption generation unit for generating a corrected caption in which the converted caption is synchronized with the audio of the broadcast signal based on the delay time according to the speech recognition algorithm performed for each word of the error corrected caption.

본 발명의 또 다른 태양에 의거 방송 자막 제작 방법은, 자막 제작 서버에서 방송국을 통해 수신된 방송 신호에 포함된 음성에 대해 원자막을 생성하여 저장하는 단계; 방송 신호로부터 수신된 방송 신호에 포함된 음성에 대한 문장을 단어 별로 분할하고 분할된 단어에 대해 음성 인식 알고리즘을 수용하여 텍스트 형태로 변환하여 변환 자막을 생성하고 생성된 변환 자막의 음절과 원자막의 음절 별 상관도값을 도출하여 도출된 상관도값을 토대로 텍스트 변환 시 발생된 변환 자막에 대한 오류를 정정하는 단계; 오류가 정정된 변환 자막을 상기 원자막에 동기시켜 보정 자막을 생성하는 단계; 및 상기 보정 자막을 셋업 박스를 통해 수신된 방송 신호에 매칭시켜 사용자 단말에 표시하는 단계를 포함하는 것을 특징으로 한다.According to still another aspect of the present invention, there is provided a method for generating a broadcast subtitle, the method comprising: generating and storing a source subtitle for a speech included in a broadcast signal received through a broadcast station in a subtitle production server; A sentence for a voice included in a broadcast signal received from a broadcast signal is divided into words, a speech recognition algorithm is received for the divided words, and the converted speech is converted into a text form to generate a converted caption, Correcting an error of the converted subtitle generated in the text conversion based on the derived correlation value by deriving the correlation value for each syllable; Generating a corrected subtitle by synchronizing the transformed subtitle with the corrected error to the original subtitle; And displaying the corrected subtitle on the user terminal by matching the broadcast signal received through the setup box.

바람직하게 보정 자막을 생성하는 단계는, 변환 자막의 단어 별 음성 인식 알고리즘을 수행하는 지연 시간을 토대로 상기 변환 자막이 방송 신호의 음성에 동기화된 보정 자막을 생성되도록 구비될 수 있다.Preferably, the step of generating the correction subtitles may include a step of generating the corrected subtitles in which the converted subtitles are synchronized with the audio of the broadcast signal, based on the delay time for performing the word-by-word speech recognition algorithm of the subtitles.

바람직하게 상기 보정 자막을 생성하는 단계는, 음성의 화자인식 알고리즘을 이용하여 화자를 구분하고 각 화자 별 보정 자막을 생성하도록 구비될 수 있다.Preferably, the step of generating the corrected caption may be configured to classify the speakers using a speech recognition algorithm of the speech and to generate corrected captions for each speaker.

본 발명에 따르면 방송 신호에 포함된 음성을 기초로 원자막을 생성하여 저장하고 수신된 방송 신호에 포함된 음성의 문장을 단어별로 분할하고 분할된 각각의 단어에 대해 음성인식알고리즘을 수행하여 텍스트 형태로 변환하여 변환 자막을 생성하고 생성된 변환 자막의 각 단어의 음절에 대한 상관도값을 토대로 텍스트 변환 오류를 정정하고 단어 별 음성인식 알고리즘 수행을 위한 지연 시간을 토대로 변환 자막이 방송 신호의 음성에 동기된 보정 자막을 생성하여 사용자 단말로 전달함에 따라, 방송 신호의 음성에 정확하게 동기된 보정 자막을 생성할 수 있고, 이에 따라 수신된 방송 콘텐츠에 대한 몰입도 및 흥미성을 향상시킬 수 있는 효과를 얻는다.According to the present invention, a character string is generated and stored on the basis of a voice included in a broadcast signal, a sentence of a voice included in the received broadcast signal is divided into words, a speech recognition algorithm is performed on each divided word, And generates the converted subtitle and corrects the text conversion error based on the correlation value of the syllable of each word of the generated subtitle. Based on the delay time for executing the speech recognition algorithm for each word, The synchronized corrected caption is generated and transmitted to the user terminal. Thus, a corrected caption that is precisely synchronized with the audio of the broadcast signal can be generated, thereby obtaining an effect of improving the immersion and the interest in the received broadcast content .

본 명세서에서 첨부되는 다음의 도면들은 본 발명의 바람직한 실시 예를 예시하는 것이며, 후술하는 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니된다.
도 1은 본 발명의 실시 예에 따른 방송 자막 제작 시스템의 구성을 보인 도이다.
도 2는 본 발명의 실시 예에 따른 방송 자막 제작 시스템의 자막 제작 서버의 세부적인 구성을 보인 도이다.
도 3은 본 발명의 실시 예에 따른 방송 자막 제작 시스템의 자막 보정 장치의 세부적인 구성을 보인 도이다.
도 4는 본 발명의 실시 예에 따른 방송 자막 제작 시스템의 원자막을 보인 예시도이다.
도 5는 도 4는 본 발명의 실시 예에 따른 방송 자막 제작 시스템의 변환 자막을 보인 예시도이다.
도 6은 본 발명의 실시 예에 따른 방송 자막 제작 시스템의 변환 자막의 텍스트 오류를 정정하기 위한 상관도값을 보인 예시도이다.
도 7은 본 발명의 실시 예에 따른 방송 자막 제작 시스템 자막 제작 서버가 적용되는 다른 실시 예를 보인 도이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate preferred embodiments of the invention and, together with the description of the invention given below, serve to further understand the technical idea of the invention. And should not be construed as limiting.
1 is a diagram illustrating a configuration of a broadcast subtitle production system according to an embodiment of the present invention.
FIG. 2 is a detailed configuration diagram of a subtitle production server of a broadcast subtitle production system according to an embodiment of the present invention.
3 is a diagram illustrating a detailed configuration of a subtitle correction apparatus in a broadcast subtitle production system according to an embodiment of the present invention.
FIG. 4 is a view illustrating an elementary film of a broadcast subtitle production system according to an embodiment of the present invention.
FIG. 5 is an exemplary diagram showing a converted subtitle of the broadcast subtitle production system according to the embodiment of the present invention.
FIG. 6 is an exemplary diagram illustrating a correlation value for correcting a text error of a converted subtitle in the broadcasting subtitle production system according to the embodiment of the present invention. FIG.
7 is a diagram illustrating another embodiment in which a subtitle production system caption production server according to an embodiment of the present invention is applied.

이하에서는 도면을 참조하여 본 발명의 실시예들을 보다 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Brief Description of the Drawings The advantages and features of the present invention, and how to accomplish them, will become apparent with reference to the embodiments described hereinafter with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly described and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.While the present invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiments. Also, in certain cases, there may be a term selected arbitrarily by the applicant, in which case the meaning thereof will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term, not on the name of a simple term, but on the entire contents of the present invention.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에서 사용되는 "부"라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부"는 어떤 역할들을 수행한다. 그렇지만 "부"는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부"는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다.When an element is referred to as "including" an element throughout the specification, it is to be understood that the element may include other elements as well, without departing from the spirit or scope of the present invention. Also, as used herein, the term "part " refers to a hardware component such as software, FPGA or ASIC, and" part " However, "part" is not meant to be limited to software or hardware. "Part" may be configured to reside on an addressable storage medium and may be configured to play back one or more processors.

따라서, 일 예로서 "부"는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부"들로 결합되거나 추가적인 구성요소들과 "부"들로 더 분리될 수 있다.Thus, by way of example, and not limitation, "part (s) " refers to components such as software components, object oriented software components, class components and task components, and processes, Subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functions provided in the components and "parts " may be combined into a smaller number of components and" parts " or further separated into additional components and "parts ".

아래에서는 첨부한 도면을 참고하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. In order to clearly explain the present invention in the drawings, parts not related to the description will be omitted.

이하에서는 본 발명의 실시 예에 따른 방송자막 제작 시스템 및 방법에 대해 첨부된 도면을 참조하여 구체적으로 설명하기로 한다.Hereinafter, a broadcasting subtitle production system and method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 방송자막 제작 시스템의 전체적인 구성을 보인 도면으로서, 도 1을 참조하면, 본 발명의 실시 예에 따른 방송자막 제작 시스템은, 크게 자막 제작 서버(S1), 셋업 박스(S2), 및 사용자 단말(S3)로 포함된다.1, a broadcasting subtitle production system according to an embodiment of the present invention includes a subtitle production server S1, a set-up server S1, A box S2, and a user terminal S3.

여기서, 자막 제작 서버(S1)는 방송국으로부터 제공된 방송 신호로부터 획득된 원자막을 저장하고, 방송 신호에 포함된 음성을 텍스트 형태로 변환하여 변환 자막을 생성하며, 원자막과 변환 자막을 동기되도록 변환 자막의 글자 위치를 보정한 후 보정 자막을 사용자 단말(S3)로 전달한다. Here, the subtitle production server S1 stores the original subtitle obtained from the broadcasting signal provided from the broadcasting station, converts the voice included in the broadcasting signal into a text form to generate a converted subtitle, and converts the original subtitle into a converted subtitle And transmits the corrected caption to the user terminal S3 after correcting the character position of the caption.

여기서, 원자막은 방송국으로부터 제공된 방송 신호의 음성이 속기사에 의해 텍스트 형태로 제작된 자막으로서, 98%의 정확도를 가지나 방송 신호의 음성과 4초 이상의 지연 시간을 가진다. 이에 따라 방송 신호의 음성과 정확하게 동기된 자막의 제작이 필요하다. 이에 본 발명은 음성인식 알고리즘을 통해 제작된 변환 자막과 원 자막의 각 음절 별 상관도값을 토대로 변환 자막의 텍스트 변환 오류를 정정하고 변환 자막과 음성과의 지연 시간을 토대로 변환 자막이 방송 신호의 음성에 동기화된 보정 자막을 생성한다. Here, the original subtitles are subtitles produced in the form of a text by the narrator in the voice of the broadcasting signal provided from the broadcasting station, and have 98% accuracy but have a delay time of 4 seconds or more with the audio of the broadcasting signal. Accordingly, it is necessary to produce subtitles exactly synchronized with the audio of the broadcast signal. Accordingly, the present invention corrects the text conversion error of the converted subtitle based on the correlation value of each syllable of the converted subtitle generated by the speech recognition algorithm, And generates a corrected caption synchronized with the voice.

또한, 셋업 박스(S2)는 방송국으로부터 제공된 방송 신호를 사용자 인증을 수행한 후 인증 성공된 방송 신호를 사용자 단말(S3)에 표시한다. Also, the setup box S2 displays a broadcast signal, which has been successfully authenticated, on the user terminal S3 after performing user authentication on the broadcast signal provided from the broadcast station.

셋업 박스(S2)는 TV에서 재생할 수 있는 영상 콘텐츠를 수신하고 사용자 단말(S3)에 장착된 실시간 콘텐츠 인식기에 의거 수신된 영상 콘텐츠의 음성으로부터 어떠한 콘텐츠인 지를 인식한다. 이때 인식 방법은 ACR(Automatic Contents Recognition)이라 하고 여기에는 워터마크 방법이나 핑거프린트 방법, 혹은 TV-단말간 페어링 방법, TV셋탑을 이용한 셋탑과 단말기간의 페어링 방법 등 이며, 이에 한정하지 아니한다.The set-up box S2 receives the image content reproducible on the TV and recognizes the content from the audio of the received image content based on the real-time content recognizer mounted on the user terminal S3. At this time, the recognition method is referred to as ACR (Automatic Contents Recognition), and the watermark method, the fingerprint method, the TV-terminal pairing method, and the pairing method between the settop and the terminal using the TV settop are not limited thereto.

그리고, 셋업 박스(S2)는 콘텐츠 인식 결과가 자막 제작 서버(S1)으로 전달하고 자막 제작 서버(S1)에서 해당되는 콘텐츠에 상응하는 보정 자막을 생성하여 사용자 단말(S3)로 전달함에 따라 방송 신호의 음성과 정확하게 동기된 자막이 사용자 단말(S3)에 표시된다.The setup box S2 transmits the content recognition result to the subtitle production server S1 and generates a correction subtitle corresponding to the content in the subtitle production server S1 and transmits the correction subtitle to the user terminal S3, The subtitles exactly synchronized with the voice of the user terminal S3 are displayed on the user terminal S3.

이때 상기 셋업 박스(S2)와 사용자 단말(S3) 간에 근거리 통신망을 이용하여 정보를 송수신하고, 이 경우 근거리 통신망은 블루투스, 지그비 프로, IEEE802.15.4 c/d, 또는 IEEE 802.15. NAN 기반의 지그비 통신망과, IEEE 802. 15. 4, 지그비, Z-wave, INSTEON, 또는 Wavents 기반의 저전력 저속의 WPAN과, 자체 솔루션에 센서 네트워크를 이용한 RFID/USN 통합 플랫폼 기반의 통신망을 적용 가능하며, 이에 한정하지 아니한다.At this time, information is exchanged between the setup box S2 and the user terminal S3 using the local area network. In this case, the local area network may be Bluetooth, ZigBee Pro, IEEE802.15.4 c / d, or IEEE 802.15. NAN-based ZigBee communication network, IEEE 802.14.4, ZigBee, Z-wave, INSTEON, or Wavents based low-power low-speed WPAN and RFID / USN integrated platform network using sensor network in its own solution And shall not be limited to this.

도 2는 도 1에 도시된 자막 제작 서버의 세부적인 구성을 보인 도면이고 도 3은 도 2에 도시된 자막 보정 장치의 세부적인 구성을 보인 도면이다. 도 2 내지 도 3을 참조하면, 본 발명의 실시 예에 따른 방송자막 제작 시스템의 자막 제작 서버(S1)는 방송국으로부터 제공된 방송 신호로부터 획득된 원자막을 저장하고, 방송 신호에 포함된 음성을 텍스트 형태로 변환하여 변환 자막을 생성하며, 원자막과 변환 자막을 동기되도록 변환 자막의 글자 위치를 보정한 후 보정 자막을 사용자 단말로 전달하도록 구비될 수 있고, 이에 서버(S1)는 방송 신호 수신 장치(100), 자막 DB(200), 자막 보정 장치(300), 및 번역 장치(400)를 포함할 수 있다.FIG. 2 is a detailed configuration diagram of the caption production server shown in FIG. 1. FIG. 3 is a detailed configuration of the caption correction apparatus shown in FIG. 2 to 3, the subtitle production server S1 of the broadcasting subtitle production system according to the embodiment of the present invention stores a source subtitle obtained from a broadcasting signal provided from a broadcasting station, The server S1 may be configured to generate a converted subtitle and transmit the corrected subtitle to the user terminal after correcting the character position of the converted subtitle so as to synchronize the original subtitle with the converted subtitle. A subtitle DB 200, a subtitle correction apparatus 300, and a translation apparatus 400. [0031]

방송 신호 수신 장치(100)는, 방송 신호에 포함된 음성에 대해 속기사에 의해 텍스트 형태로 변환하여 원자막을 생성하고 생성된 원자막을 자막 DB(200)에 제공한다. 이에 자막 DB(200)는 원자막을 해당 방송 콘텐츠 별로 대응시켜 저장한다. The broadcast signal receiving apparatus 100 converts the speech included in the broadcast signal into a text form by a speeding machine to generate the elementary film and provides the generated elementary film to the subtitle DB 200. [ The subtitle DB 200 stores the original subtitles corresponding to the corresponding broadcast contents.

또한, 방송 신호 수신장치(100)의 방송 신호의 음성은 자막 보정 장치(300)로 전달된다.Also, the audio of the broadcasting signal of the broadcast signal receiving apparatus 100 is transmitted to the caption correcting apparatus 300.

자막 보정 장치(300)는 방송 신호의 음성을 음성 인식 알고리즘(STT: Speech To Text)을 이용하여 텍스트 형태로 변환하여 변환 자막을 생성하고, 생성된 변환 자막은 사용자 단말(S3)로 전달한다. 이에 자막 보정 장치(300)는 도 3에 도시된 바와 같이, 음성 수신부(310), 텍스트 변환부(320), 변환 자막 생성부(330), 및 보정 자막 생성부(340)를 포함할 수 있다.The caption correction apparatus 300 converts the audio of the broadcast signal into a text form using a speech recognition algorithm (STT: Speech To Text) to generate a converted caption, and transmits the generated caption to the user terminal S3. 3, the caption correcting apparatus 300 may include a voice receiving unit 310, a text converting unit 320, a converted caption generating unit 330, and a corrected caption generating unit 340 .

음성 수신부(310)는 방송 신호에 포함된 음성을 수신하고, 수신된 음성은 텍스트 변환부(320)로 전달된다.The voice receiving unit 310 receives the voice included in the broadcast signal, and the received voice is transmitted to the text converting unit 320.

텍스트 변환부(320)는 수신된 음성의 문장을 단어 별로 분할하고 분할된 각각의 단어를 텍스트 형태로 변환하고, 텍스트 형태로 변환된 단어를 변환 자막 생성부(330)로 전달된다.The text converting unit 320 divides the sentence of the received voice into words, converts each divided word into a text form, and transmits the text-converted word to the converted caption generating unit 330.

변환 자막 생성부(330)는 변환된 텍스트 형태의 단어와 음절 간의 상관도값을 도출하고 도출된 각 단어별 상관도값을 토대로 텍스트 변환된 단어의 오류를 정정하여 변환 자막을 생성하며, 생성된 변환 자막은 보정 자막 생성부(340)로 전달된다. 이때 변환 자막 생성부(330)는 각 단어별 상관도값을 기준으로 변환 자막에 대응되는 원자막을 탐색하여 텍스트 변환 시 발생된 변환 자막의 오류를 정정한다. The converted subtitle generation unit 330 derives a correlation value between the converted word and syllable and generates a converted subtitle by correcting the error of the text-converted word based on the derived correlation value of each word, The converted subtitle is transmitted to the corrected subtitle generation unit 340. At this time, the converted subtitle generating unit 330 searches the original subtitle corresponding to the converted subtitle based on the correlation value of each word to correct errors of the converted subtitle generated in the text conversion.

보정 자막 생성부(340)는 생성된 변환 자막 데이터와 자막 DB(200)에 기록된 원 자막을 비교하여 비교 결과를 토대로 변환 자막을 원자막에 동기하도록 변환 자막의 위치를 보정하여 보정 자막을 생성한다. The corrected subtitle generating unit 340 compares the generated converted subtitle data with the original subtitle recorded on the subtitle DB 200 and corrects the position of the converted subtitle to synchronize the converted subtitle with the original based on the comparison result to generate a corrected subtitle do.

변환 자막의 위치 보정은 각 단어 별 음성 인식 알고리즘을 수행하는 소요 시간을 각각의 지연 시간으로 설정하고 설정된 지연 시간에 대한 평균값을 획득하며 획득된 지연 시간의 평균값만큼 이동함에 따라 1차 위치 보정이 이루어진다. 그리고, 1차 위치 보정된 보정 자막을 이때 방송 신호에 대한 음성과 원자막의 동기화에 대한 정확도가 98%을 가지는 원자막의 지연 시간에 의거 1차 보정된 자막의 위치가 2차 보정된다. 이에 보정 자막 생성부(340)은 2차 보정된 자막 위치를 가지는 보정 자막을 출력한다. 본 발명의 실시 예에서, 설명의 편리성을 위해, 변환 자막의 1차 위치 보정은 각 단어 별 음성 인식 알고리즘을 수행하는 소요 시간을 각각의 지연 시간으로 설정하고 설정된 지연 시간에 대한 평균값을 획득하며 획득된 지연 시간의 평균값만큼 이동하는 것을 예를 들어 설명하고 있으나, 각 단어 별 음성 인식 알고리즘을 수행하는 소요 시간을 각각의 지연 시간을 고려하여 변환 자막의 1차 위치 보정을 수행하는 일련의 과정은 다양한 방안으로 고려될 수 있으며 본 발명의 실시 예에서 이에 한정하지 아니한다.The correction of the position of the converted subtitle is performed by setting the time required to perform the speech recognition algorithm for each word to each delay time, obtaining the average value for the set delay time, and moving by the average value of the obtained delay time, . At this time, the position of the first corrected subtitle is corrected secondarily based on the delay time of the original film having an accuracy of 98% for synchronization of the audio and the original film with respect to the broadcast signal at this time. The corrected caption generation unit 340 outputs the corrected caption having the secondary corrected caption position. In the embodiment of the present invention, for convenience of explanation, the primary positional correction of the converted subtitle is performed by setting the time required for performing the speech recognition algorithm for each word to each delay time and obtaining an average value for the set delay time A series of processes for performing the first positional correction of the converted subtitle considering the delay time for the time required to perform the speech recognition algorithm for each word is described as an example, But may be considered in various ways and is not limited to the embodiments of the present invention.

한편, 상기 보정 자막 생성부(340)는 음성의 화자인식 알고리즘을 이용하여 화자를 구분하고 각 화자 별로 보정 자막을 생성하며, 본 발명의 실시 예에서 다양한 화자인식 알고리즘을 이용하여 화자를 인식하는 일련의 과정은 본 발명의 실시 예와 관련된 기술분야에서 통상의 지식을 가진 자라면 이해할 수 있다. The corrected caption generation unit 340 separates the speakers using a speech recognition algorithm of the speech, generates a correction caption for each speaker, and recognizes the speaker using various speaker recognition algorithms in the embodiment of the present invention. Can be understood by those skilled in the art with reference to the embodiments of the present invention.

이에 따라 방송 신호에 포함된 음성에 대한 자막 제작함에 있어, 음성 인식 알고리즘에 따른 지연 시간을 고려하여 변환 자막의 표시 위치가 원자막에 매칭되도록 변환 자막 위치가 보정되므로, 방송 신호에 대한 음성과 보정 자막 위치 동기화에 대한 정확도가 근본적으로 향상된다.Accordingly, in producing subtitles for a voice included in a broadcast signal, the converted subtitle position is corrected so that the display position of the converted subtitle matches the original subtitle considering the delay time according to the speech recognition algorithm. Therefore, The accuracy of caption position synchronization is fundamentally improved.

전술한 방송 신호의 음성의 문장을 단어 별로 분할하고 분할된 각각의 단어에 대해 텍스트 형태로 변환한 후 변환된 텍스트 형태의 단어와 음절 간의 상관도 값을 토대로 단어 별 텍스트 변환 시 발생하는 오류를 정정하는 일련의 과정은 도 4 내지 도 6을 참조하여 보다 구체적으로 설명한다. The sentence of the above-mentioned speech signal is divided into words, and each divided word is converted into a text form. Then, based on the degree of correlation between the converted word and the syllable, A more detailed description will be given with reference to FIGS. 4 to 6. FIG.

도 4는 도 2에 도시된 자막 보정 장치(300)에서 원자막을 생성하는 일 례를 보인 도면으로서, 도 4를 참조하면, 자막 보정 장치(300)는 방송신호에 포함된 “우리들은 아름다운 서울 중구에 살고 있습니다” 라는 문장에 대해 원자막을 생성함에 있어, Td의 지연 시간이 경과됨을 확인할 수 있다.Referring to FIG. 4, the subtitle correction apparatus 300 includes a subtitle correction unit 300, which is included in the broadcasting signal, It can be confirmed that the delay time of Td has elapsed in generating the original film for the sentence "I live in the middle district".

도 5는 도 2에 도시된 자막 보정 장치(300)에서 변환 자막을 생성하는 일 례를 보인 도면으로서, 도 5를 참조하면, 자막 보정 장치(300)는 단어 별로 분할하고 분할된 단어 별로 음성 인식 알고리즘을 이용하여 텍스트 형태로 변환하여 단어 별 변환 자막을 도출하며, 각 단어 별 텍스트 변환 과정을 4개의 STT모듈을 병렬 수행하여 한 문장의 변환 자막이 생성된다. 이에 따라 자막생성시간인 Td를 크게 줄일 수 있다. 즉 STT알고리즘을 병렬 수행함으로써 변환자막 생성 시간을 줄일 수 있음을 보여주고 있다.Referring to FIG. 5, the subtitle correcting apparatus 300 divides the subtitle by word and outputs the subtitle by speech segmentation by divided words Algorithm to derive the transformed subtitles for each word and convert the text for each word into a transformed subtitle of one sentence by executing 4 STT modules in parallel. Thus, the subtitle generation time Td can be greatly reduced. In other words, it shows that it is possible to reduce the generation time of converted subtitles by performing the STT algorithm in parallel.

이때 각 단어 별 지연 시간은 도 5에 도시된 바와 같이, Td1 내지 Td4 이다. 즉, 각 단어 별 지연 시간은 Td1<Td2<Td3<Td4를 만족하므로 총 문장의 변환 자막의 지연 시간은 Td4이며, 이에 따라 변환 자막을 생성하는 시간은 원자막을 생성하기 위한 지연 시간(Td) 보다 작다는 것을 확인할 수 있다. At this time, the delay time for each word is Td1 to Td4 as shown in Fig. That is, since the delay time for each word satisfies Td1 <Td2 <Td3 <Td4, the delay time of the converted subtitle in the total sentence is Td4, and the time for generating the converted subtitle is the delay time (Td) . &Lt; / RTI >

도 6은 도 4에 도시된 자막 보정 장치(300)에서 각 단어 별 변환 자막에 대한 상관도값을 도출하는 과정을 보인 예시도로서, 도 6의 (a)에 도시된 바와 같이, “아름다운” 단어의 변환 자막에 대해 음절 별(글자단위)로 이동하면서 상관도가 도출되고, “아름다운” 단어의 변환 자막과 방송 신호에 포함된 “아름다운” 음성이 일치하는 경우 이에 5번째 글자 위치에서 상관도값이 최대값인 4로 도출된다.FIG. 6 is a diagram illustrating a process of deriving a correlation value for a transformed caption for each word in the caption correcting apparatus 300 shown in FIG. 4. As shown in FIG. 6 (a) The degree of correlation is derived while moving the syllable by the syllable (character unit) with respect to the converted subtitle of the word, and when the subtitle of the " beautiful " word coincides with the " The value is derived as the maximum value 4.

(b)에 도시된 바와 같이, “아립다운” 단어의 변환 자막에 대해 음절 별로 이동하면서 원 자막의 각 음절 간의 상관도값이 도출되고, “아립다운” 단어의 변환 자막과 방송 신호에 포함된 “아름다운” 음성이 불일치하는 경우 5번째 글자 위치에서 3의 상관도값이 도출된다. 즉, 상관도값을 토대로 단어 별 변환 자막 중 하나의 음절에서 오류가 발생되었음을 확인할 수 있다. 이에 따라 글자 위치 및 상관도값을 토대로 텍스트 형태로 변환 시 생성된 텍스트 형태의 음절의 오류를 정정하고 변환 자막의 위치를 보정하여 보정 자막이 생성된다. (b), the correlation value between each syllable of the original film is derived while moving for each syllable with respect to the transcription subtitle of the word " obedience down ", and the transcoding value of the word " If the "beautiful" voice is inconsistent, a correlation value of 3 is derived at the fifth character position. That is, it can be confirmed that an error occurs in one syllable of the transformed subtitle according to the word based on the correlation value. Accordingly, the correction of the error of the syllable in the text form generated upon conversion into the text form based on the character position and the correlation value is corrected, and the corrected caption is generated by correcting the position of the converted caption.

또한, 설명의 편의에 따라 본 발명과 관련된 구성에 대해서는 설명함에 유의하여야 한다. 필요에 따라 도시된 서버들 외 사용자의 인증 관련, 과금 관련, 광고 관련, 및 번역 관련 등과 같은 다양한 기능을 제공하기 위한 서버들이 추가로 포함될 수 있을 것이다.It should be noted that the configuration related to the present invention will be described with the convenience of explanation. Servers for providing various functions such as authorization related to users other than the servers shown, billing related, advertisement related, and translation related, etc., as needed, may be further included.

즉, 번역 기능을 수행하는 번역 장치(400)는 자막 보정 장치(300)에서 출력된 보정 자막을 기 정해진 번역기를 이용하여 번역한 후 번역 자막을 생성하고 생성된 번역 자막을 사용자 단말(S3)로 전달하는 기능을 수행한다. 이때 변환 자막에 대한 번역은 실시간으로 이루어지며 실시간으로 생성된 번역 자막은 자막 보정 장치(300)에 의거 번역 자막의 위치를 원자막을 토대로 보정하여 사용자 단말(S3)로 전달된다. 여기서, 실시간으로 변환 자막을 번역기를 통해 번역하는 일련의 과정은 본 발명의 실시 예와 관련된 기술분야에서 통상의 지식을 가진 자라면 이해할 수 있다. That is, the translation device 400 performing the translation function translates the corrected subtitles output from the subtitle correction apparatus 300 using the predetermined translator, generates a translated subtitles, and transmits the generated translated subtitles to the user terminal S3 . At this time, the translation of the translated caption is performed in real time, and the translation caption generated in real time is transmitted to the user terminal S3 by correcting the position of the translated caption based on the original caption by the caption correcting device 300. Herein, a series of processes of translating the converted subtitles through the translator in real time can be understood by those skilled in the art in connection with the embodiments of the present invention.

또한, 본 발명의 실시 예에서 번역 장치(400)는 설명의 편의 상으로 자막 보정 장치(300)에 종속적으로 구성하는 것을 예시하고 있으나, 자막 보정 장치(300)와 각 기능별로 독립적으로 구성될 수 있으며, 하나 또는 그 이상의 서버에 상기 기능들이 통합되어 구비될 수 있다. In the embodiment of the present invention, the translating apparatus 400 is configured to be dependent on the caption correcting apparatus 300 for convenience of explanation. However, the translating apparatus 400 may be constituted independently of the caption correcting apparatus 300 and each function And the functions may be integrated into one or more servers.

도 7은 본 발명의 실시 예에 따른 화자의 음성인식기를 이용하여 여러 사람이 동시에 얘기하거나 회의를 진행할 때 자동으로 회의록을 작성해주는 시스템으로도 사용할 수 있다.FIG. 7 is a block diagram of a voice recognition system according to an exemplary embodiment of the present invention. Referring to FIG.

도 7을 참조하면, 음성인식기를 이용하여 여러 사람이 동시에 얘기하거나 회의를 진행할 때 자동으로 회의록을 작성해주는 시스템으로, 화자인식을 이용하여 말한 사람을 구분해 자동으로 회의록을 작성할 수 있는 자동속기록생성시스템을 더 구성할 수 있다. 여기서 화자 인식 시스템은 음성데이터를 이용하여 특정인을 구분할 수 있는 기술이다. 음성인식시스템은 음성데이터를 이용하여 텍스트로 변환할 수 있는 기능을 갖고 있는 음성인식 알고리즘(STT :Speech To Text)기능을 수행한다. 자동속기록생성부는 STT에 의해서 만들어진 음성인식텍스트 데이터를 화자인식결과에 의해 화자를 구별하여 속기록을 만들어 내는 기능을 수행한다.Referring to FIG. 7, a system for automatically creating a meeting record when a plurality of people talk at the same time using a voice recognizer or conducts a meeting, generates automatic shoplifting which can distinguish a person who has spoken by using a speaker recognition, The system can be further configured. Here, the speaker recognition system is a technology capable of distinguishing a specific person by using voice data. The speech recognition system performs a speech recognition algorithm (STT: Speech To Text) function that can convert text data using voice data. The automatic shorthand generation unit performs speech recognition text data created by the STT by distinguishing the speakers based on the speaker recognition result to generate a shorthand.

이때 자동 속기록 생성부는 텍스트 형태로 변환된 화자의 자막에 대해 음절 별 상관도값을 토대로 텍스트 변환 오류를 정정하고 각 단어 별 텍스트 변환 시 지연 시간의 평균값을 토대로 텍스트 형태로 변환된 화자의 자막을 화자의 원음성과 동기화시켜 기록할 수 있다. At this time, the automatic shorthand generation unit corrects the text conversion error on the basis of the syllable correlation value for the subtitle of the speaker converted into the text form, and subtracts the subtitle of the speaker converted into the text form on the basis of the average value of the delay time, In synchronization with the original voice of the user.

이에 따라, 본 발명을 방송시스템에서 활용된다면, 지연시간을 최소화하면서 정확한 시점에 정확한 방송자막 데이터를 방송할 수 있는 장점이 있다. 기존에 방송된 방송 신호를 활용하여 재생 시간이 1~2분정도로 편집된 짧은 영상을 제공해주는 비디오 클립 서비스 시, 기존의 자막방송의 문자 데이터와 실제 출연자의 음성과 동기화된 서비스가 가능해지는 효과가 있다.Accordingly, if the present invention is applied to a broadcasting system, accurate broadcast subtitle data can be broadcasted at a precise time while minimizing a delay time. In the video clip service, which provides a short video edited with a reproduction time of about 1 to 2 minutes by utilizing the broadcasting signal broadcasted in the past, the effect of enabling the service synchronized with the character data of the existing subtitle broadcast and the voice of the actual performer becomes possible have.

또한 현재 방송된 자막방송데이터는 실제 방송에서 출력되는 소리와 최대 4초 이상의 동기가 차이가 날 수 있는데 이러한 오차를 최소화하여 양질의 방송자막데이터를 제작하여 장애인들에게 혼란을 주지 않는 서비스가 가능하다. In addition, the currently broadcasted caption broadcasting data may be different from the sound output from the actual broadcasting and the synchronization for a maximum of 4 seconds or more. By minimizing such error, high quality broadcasting subtitle data can be produced, .

한편, 본 발명의 다른 실시 태양에 따른 방송 자막 제작 방법은, 자막 제작 서버에서 방송국을 통해 수신된 방송 신호에 포함된 음성에 대한 원자막을 생성하여 저장하는 단계; 방송 신호로부터 수신된 방송 신호에 포함된 음성에 대한 문장을 단어 별로 분할하고 분할된 단어에 대해 음성 인식 알고리즘을 이용하여 텍스트 형태로 변환하여 변환 자막을 생성하고 생성된 변환 자막의 음절 별 상관도값을 도출하여 변환 자막에 대한 오류를 정정하는 단계; 오류가 정정된 변환 자막을 원자막에 동기시켜 보정 자막을 생성하는 단계; 및 보정 자막을 셋업 박스를 통해 수신된 방송 신호에 매칭시켜 사용자 단말에 표시하는 단계를 포함할 수 있다. 또한 보정 자막을 생성하는 단계는, 변환 자막의 단어 별 음성 인식 알고리즘을 수행하는 지연 시간에 대한 평균값을 기초로 원자막에 동기되도록 구비될 수 있다. 이러한 상기의 각 단계는, 전술한 자막 제작 서버(S1), 음성 수신부(310), 텍스트 변환부(320), 변환 자막 생성부(330), 및 보정 자막 생성부(340)에서 수행되는 기능으로 자세한 원용은 생략한다.According to another aspect of the present invention, there is provided a method for generating a broadcast subtitle, the method comprising: generating and storing a source video for a voice included in a broadcast signal received through a broadcast station in a subtitle production server; A sentence about a voice included in a broadcast signal received from a broadcast signal is divided into words, and the divided words are converted into a text form using a speech recognition algorithm to generate a converted caption, and the correlation value And correcting an error for the converted subtitle; Generating a corrected subtitle by synchronizing the transformed subtitle with the corrected error to the original subtitle; And displaying the corrected caption on the user terminal by matching the broadcast signal received through the setup box. Also, the step of generating the corrected caption may be provided to be synchronized with the original film based on the average value of the delay time for performing the word-by-word speech recognition algorithm of the converted caption. Each of the above steps is performed by the functions performed by the caption production server S1, the voice receiving unit 310, the text conversion unit 320, the converted caption generation unit 330, and the corrected caption generation unit 340 Detailed abstraction is omitted.

이상으로 본 발명의 기술적 사상을 예시하기 위한 바람직한 실시예와 관련하여 설명하고 도시하였지만, 본 발명은 이와 같이 도시되고 설명된 그대로의 구성 및 작용에만 국한되는 것이 아니며, 기술적 사상의 범주를 일탈함이 없이 본 발명에 대해 다수의 변경 및 수정이 가능함을 당업자들은 잘 이해할 수 있을 것이다. 따라서, 그러한 모든 적절한 변경 및 수정과 균등물들도 본 발명의 범위에 속하는 것으로 간주되어야 할 것이다While the present invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. It will be appreciated by those skilled in the art that numerous changes and modifications may be made without departing from the invention. Accordingly, all such modifications and variations as fall within the scope of the present invention are to be considered

방송 신호에 포함된 음성을 기초로 원자막을 생성하여 저장하고 수신된 방송 신호에 포함된 음성의 문장을 단어별로 분할하고 분할된 각각의 단어에 대해 음성인식알고리즘을 수행하여 텍스트 형태로 변환하여 변환 자막을 생성하고 생성된 변환 자막의 각 단어의 음절에 대한 상관도값을 토대로 텍스트 변환 오류를 정정하고 단어 별 음성인식 알고리즘 수행을 위한 지연 시간을 토대로 텍스트 변환 오류 정정된 변환 자막이 방송 신호의 음성에 동기된 보정 자막을 생성하여 사용자 단말로 전달함에 따라, 방송 신호의 음성에 정확하게 동기된 보정 자막을 생성할 수 있고, 이에 따라 수신된 방송 콘텐츠에 대한 몰입도 및 흥미성을 향상시킬 수 있는 방송 자막 제작 시스템 및 방법에 대한 운용의 정확성 및 신뢰도 측면, 더 나아가 성능 효율 면에 매우 큰 진보를 가져올 수 있으며, 방송 콘텐츠 수신 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.The method includes generating and storing a character string based on the voice included in the broadcast signal, dividing the sentence of the voice included in the received broadcast signal into words, performing a speech recognition algorithm on each word, converting the text into a text form, The text conversion error is corrected based on the correlation value of the syllable of each word of the generated conversion subtitle and the text conversion error based on the delay time for executing the speech recognition algorithm for each word. And transmits the corrected caption to the user terminal. Accordingly, it is possible to generate the corrected caption accurately synchronized with the audio of the broadcast signal, and thereby, the broadcast caption that can improve the immersion and the interest in the received broadcast content The accuracy and reliability of the operation of the production system and method, and furthermore the performance efficiency It is possible to make great progress and it is possible that the broadcast content receiving apparatus is not only commercially available or operable but also can be practically and practically used.

Claims

A user terminal; And
A speech signal is generated based on a voice included in a broadcast signal, the voice included in the broadcast signal is divided into a plurality of words, and each divided word is converted into a text form using a speech recognition algorithm to generate a converted caption And a subtitle generation server for generating a corrected subtitle in which the converted subtitle is synchronized with the audio of the broadcast signal and delivering the corrected subtitle to the user terminal based on the delay time according to speech recognition algorithm execution for each word of the error corrected transformed subtitle Wherein the broadcast subtitle production system comprises:

The apparatus according to claim 1,
A broadcast signal receiving apparatus for generating a character string based on a voice included in a broadcast signal transmitted from a broadcasting station and transmitting the generated character string to a subtitle DB; And
The speech signal is divided into a plurality of words by receiving the voice of the broadcast signal, a speech recognition algorithm is performed on each divided word to convert the speech signal into a text form to generate a converted caption, Correlation values of the syllable and the syllable of the word are derived for the word, the original corresponding to the converted subtitle is found on the basis of the derived correlation value, the error of the converted subtitle generated in the text conversion is corrected, And a subtitle compensating device for generating a corrected subtitle whose synchronous transcoding is corrected with the speech of the broadcast signal based on the delay time of the speech recognition algorithm performed on the subtitle.

3. The apparatus according to claim 2,
A voice receiving unit for receiving a voice of the broadcast signal and dividing a sentence of voice into words;
A text conversion unit for converting the divided words into a text form using a speech recognition algorithm to generate a converted caption;
A syllable correlation value is derived for each word of each transformed subtitle and a word of the subtitle, and a transformed subtitle corresponding to the transformed subtitle is found based on the derived correlation value, A conversion subtitle generation unit for correcting an error; And
And a corrected subtitle generation unit for generating a corrected subtitle in which the converted subtitle is synchronized with the audio of the broadcast signal based on the delay time according to the speech recognition algorithm performed for each word of the error corrected transformed subtitle, .

4. The apparatus of claim 3,
Wherein the speaker is divided into speakers by using a speech recognition algorithm and a corrected caption is generated for each speaker.

The apparatus of claim 3, wherein the subtitle production server
Further comprising: a translation device for translating the error-corrected converted subtitle into a requested language to generate a translated subtitle and transmitting the generated translated subtitle to the corrected subtitle generation unit.

A broadcast signal receiving apparatus for generating a character string based on a voice included in a broadcast signal transmitted from a broadcasting station and transmitting the generated character string to a subtitle DB;
The speech signal is divided into a plurality of words by receiving the voice of the broadcast signal, a speech recognition algorithm is performed on each divided word to convert the speech signal into a text form to generate a converted caption, The error of the conversion subtitle generated in the text conversion is corrected based on the correlation value derived by deriving the correlation value of the syllable of the syllable and the syllable of the word and the delay time And a subtitle correction device for generating a corrected subtitle whose error-corrected converted subtitle is synchronized with a voice based on the corrected subtitle.

7. The apparatus according to claim 6,
A voice receiving unit for receiving a voice of the broadcast signal and dividing a sentence of voice into words;
A text conversion unit for converting the divided words into a text form using a speech recognition algorithm to generate a converted caption;
A transformed subtitle generation unit for correcting an error of a transformed subtitle generated at the time of text conversion by deriving a sociodal correlation value for each word of each generated transformed subtitle and a word of the original subtitle; And
And a corrected subtitle generation unit for generating a corrected subtitle in which the converted subtitle is synchronized with the audio of the broadcast signal based on the delay time according to the speech recognition algorithm performed for each word of the error corrected transformed subtitle, Subtitle production server.

8. The apparatus of claim 7,
Wherein the speaker is divided into speakers by using a speech recognition algorithm and a corrected caption is generated for each speaker.

The method of claim 7, wherein the subtitle production server
Further comprising a translation device for translating the error-corrected translation subtitle into a requested language to generate a translation subtitle and transmitting the generated translation subtitle to the corrected subtitle generation section.

A step of generating and storing an elementary film for a voice included in a broadcast signal received through a broadcasting station in a subtitle production server;
A sentence for a voice included in a broadcast signal received from a broadcast signal is divided into words, a speech recognition algorithm is received for the divided words, and the converted speech is converted into a text form to generate a converted caption, Correcting an error of the converted subtitle generated in the text conversion based on the derived correlation value by deriving the correlation value for each syllable;
Generating a corrected subtitle by synchronizing the transformed subtitle with the corrected error to the original subtitle; And
And displaying the corrected caption on a user terminal by matching the corrected caption to a broadcast signal received through a setup box.

11. The method of claim 10, wherein generating the corrected subtitle comprises:
Wherein the converted subtitle is generated so that a corrected subtitle is synchronized with a voice of the broadcast signal based on a delay time for performing a word-by-word speech recognition algorithm of the converted subtitle.

12. The method of claim 11, wherein generating the corrected caption comprises:
Wherein the speaker is divided into speakers by using a speech recognition algorithm and a corrected caption is generated for each speaker.