KR20180087009A

KR20180087009A - Real time audio streaming analysis system, terminal and subtitle server

Info

Publication number: KR20180087009A
Application number: KR1020170011172A
Authority: KR
Inventors: 안문학
Original assignee: 주식회사 소리자바
Priority date: 2017-01-24
Filing date: 2017-01-24
Publication date: 2018-08-01

Abstract

The present invention relates to a subtitle providing system, a terminal, and a subtitle server through real-time audio streaming analysis for extracting subtitles matched with moving picture data using a fingerprinting technique and overlaying the subtitles on moving picture data. The subtitle providing system through real-time audio streaming analysis according to the present invention comprises a content server for providing content, a terminal for capturing audio data from the moving picture data when the moving picture data of the content provided from the content server is executed, converting the audio data into a hash code by fingerprinting and transmitting the set language information together with the hash code, and a caption server for receiving a hash code and language information from the terminal, extracting subtitles which match with the hash code among the stored subtitles or whose confidence value is equal to or higher than a predetermined level and transmitting a subtitle matching with the language information among the extracted subtitles to the terminal.

Description

Technical Field [0001] The present invention relates to a caption providing system, a terminal, and a caption server through real-time audio streaming analysis,

본 발명은 자막 제공 시스템에 관한 것으로, 더욱 상세하게는 핑거 프린팅 기술을 이용하여 동영상 데이터와 매칭되는 자막을 추출하여, 동영상 데이터에 오버레이 시키기 위한 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템, 단말기 및 자막 서버에 관한 것이다.The present invention relates to a subtitle providing system, and more particularly, to a subtitle providing system for extracting subtitles matched with moving picture data using fingerprinting technology and real-time audio streaming analysis for overlaying the subtitles on moving picture data, .

최근에는 자신의 컴퓨터 또는 이동통신단말기 등의 단말기에 인터넷 등으로부터 동영상을 다운로드 받아 저장해두었다가 이를 재생하는 기능을 많이 이용하고 있다.Recently, a function of downloading a moving picture from the Internet or the like to a terminal such as a computer or a mobile communication terminal and storing the moving picture is widely used.

이와 같이 단말기에서 재생하고자 하는 사용자 동영상에 관련된 자막정보가 있는 경우, 사용자는 자신의 단말기에 사용자 동영상과 이와 관련된 자막정보를 함께 저장해두고 사용자 동영상을 재생하게 되면, 단말기는 재생할 사용자 동영상과 관련되어 저장된 자막정보를 함께 동기화시켜 사용자 동영상과 자막정보를 함께 재생시킨다.If the user has the caption information related to the user video to be played back on the terminal, the user stores the user video and the related caption information together in the terminal, and if the user plays the user video, the terminal stores And the user video and the caption information are reproduced together by synchronizing the caption information together.

그러나 단말기에 사용자 동영상만 저장되어 있고, 해당 자막 정보가 저장되어 있지 않은 경우에는, 사용자가 인터넷 등을 통해서 사용자 동영상과 관련된 자막정보를 검색해야만 불편함이 있다.However, if only the user video is stored in the terminal, and the corresponding caption information is not stored, it is inconvenient for the user to search the caption information related to the user video through the Internet or the like.

또한 인터넷 등을 통해 사용자 동영상의 자막 정보를 검색할 때, 인터넷상에 존재하는 자막 정보의 파일명이 해당 사용자 동영상과 관련된 이름 등과 같이 쉽게 확인할 수 있는 파일명으로 되어 있지 않은 경우, 원하는 사용자 동영상의 자막 정보를 쉽게 검색하지 못하는 문제점도 있다.Also, when caption information of a user video is searched through the Internet or the like, if the file name of the caption information existing on the Internet is not a readily identifiable file name such as a name related to the user video, There is a problem in that it can not be easily searched.

따라서 본 발명의 목적은 자막 정보를 구비하지 않은 동영상 데이터에 대해서 해당 동영상 데이터에 매칭되는 자막을 용이하게 제공할 수 있는 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템, 단말기 및 자막 서버를 제공하는 데 있다.Accordingly, it is an object of the present invention to provide a caption providing system, a terminal, and a caption server through real-time audio streaming analysis that can easily provide a caption matched to the corresponding moving picture data with respect to moving picture data without caption information.

본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템은 콘텐츠를 제공하는 콘텐츠 서버, 상기 콘텐츠 서버로부터 제공되는 상기 콘텐츠의 동영상 데이터를 실행하게 되면, 상기 동영상 데이터에서 오디오 데이터를 캡처하고, 상기 오디오 데이터를 핑거 프린팅을 통해 해쉬 코드로 변환하고, 설정된 언어 정보를 상기 해쉬 코드와 함께 전송하는 단말기, 상기 단말기로부터 상기 해쉬 코드 및 상기 언어 정보를 전달받고, 저장된 자막 중 상기 해쉬 코드와 일치하거나, 컨피던스 값이 기 설정된 레벨 이상인 자막을 추출하고, 추출된 자막 중 상기 언어 정보와 일치하는 자막을 상기 단말기로 전송하는 자막 서버를 포함한다.A caption providing system through real-time audio streaming analysis according to the present invention includes a content server for providing contents, a video server for capturing audio data from the moving picture data when the moving picture data of the contents provided from the contents server is executed, A terminal for receiving the hash code and the language information from the terminal and for matching the hash code among the stored captions or a confidential value And a subtitle server for extracting the subtitles having the predetermined level or higher and transmitting the subtitles matching the language information among the extracted subtitles to the terminal.

본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템에 있어서, 상기 단말기는 플러그인을 통해 상기 콘텐츠 서버에서 제공하는 콘텐츠의 상기 동영상 데이터에서 상기 오디오 데이터를 캡처하고, 상기 오디오 데이터를 해쉬 코드로 변환하여 상기 자막 서버로 전송하는 것을 특징으로 한다.The terminal captures the audio data from the moving picture data of the content provided by the content server through a plug-in, converts the audio data into a hash code To the caption server.

본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템에 있어서, 상기 단말기는 상기 해쉬 코드의 헤더에 상기 언어 정보를 포함시켜 상기 자막 서버로 전송하는 것을 특징으로 한다.In the subtitle providing system through real-time audio streaming analysis according to the present invention, the terminal may include the language information in the header of the hash code and transmit the same to the subtitle server.

본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템에 있어서, 상기 단말기는 상기 자막 서버로부터 전달받은 자막을 상기 동영상 데이터에 오버레이하는 것을 특징으로 한다.In the caption providing system through real-time audio streaming analysis according to the present invention, the terminal overlays the caption received from the caption server on the video data.

본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템에 있어서, 상기 단말기는 상기 오디오 데이터의 2 ~ 4초 구간을 캡처하여 해쉬 코드로 변환하는 것을 특징으로 한다.In the caption providing system according to the present invention, the terminal captures 2 to 4 seconds of the audio data, and converts the captured audio data into a hash code.

본 발명에 따른 단말기는 콘텐츠 서버 및 자막 서버와 통신을 수행하는 통신부, 상기 통신부를 통해 상기 콘텐츠 서버로부터 제공되는 상기 콘텐츠의 동영상 데이터를 실행하게 되면, 상기 동영상 데이터에서 오디오 데이터를 캡처하고, 상기 오디오 데이터를 핑거 프린팅을 통해 해쉬 코드로 변환하고, 설정된 언어 정보를 상기 해쉬 코드와 함께 상기 자막 서버로 전송하여, 상기 자막 서버로부터 저장된 자막 중 상기 해쉬 코드와 일치하거나, 컨피던스 값이 기 설정된 레벨 이상인 자막을 추출하고, 추출된 자막 중 상기 언어 정보와 일치하는 자막을 상기 통신부를 통해 수신하는 제어부를 포함한다.The terminal according to the present invention includes a communication unit for communicating with a content server and a caption server, and a communication unit for capturing audio data from the moving picture data when the moving picture data of the content provided from the content server is executed through the communication unit, And transmits the set language information together with the hash code to the caption server. The caption server receives the caption data from the caption server, And a control unit for receiving, through the communication unit, a caption matching the language information among the extracted captions.

본 발명에 따른 자막 서버는 단말기와 통신을 수행하는 서버 통신부, 상기 단말기로부터 콘텐츠의 동영상 데이터를 실행하게 되면, 동영상 데이터에서 오디오 데이터를 캡처하고, 상기 오디오 데이터를 핑거 프린팅을 통해 해쉬 코드로 변환하고, 설정된 언어 정보를 상기 해쉬 코드와 함께 상기 서버 통신부를 통해 수신하고, 저장된 자막 중 상기 해쉬 코드와 일치하거나, 컨피던스 값이 기 설정된 레벨 이상인 자막을 추출하고, 추출된 자막 중 상기 언어 정보와 일치하는 자막을 상기 서버 통신부를 통해 상기 단말기로 전송하는 서버 제어부를 포함한다.A caption server according to the present invention includes a server communication unit for communicating with a terminal, a video communication unit for capturing audio data from moving picture data and converting the audio data into a hash code through fingerprinting And receiving the set language information through the server communication unit together with the hash code, extracting a subtitle having a match value with the hash code or a confidence value equal to or higher than a predetermined level among the stored subtitles, And a server control unit for transmitting the caption to the terminal through the server communication unit.

본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템은 단말기가 동영상 데이터를 실행하게 되면, 핑거 프린팅을 통해 오디오 데이터를 해쉬 코로 변환하고, 변환된 해쉬 코드를 자막 서버로 전송하여 해쉬 코드를 통해 자막을 추출한 후 단말기로 전송하도록 하여 동영상 데이터에 오버레이 시킴으로써, 사용자가 별도의 자막 검색을 수행하지 않고도 매칭되는 자막을 용이하게 제공받을 수 있다.The present invention provides a subtitle providing system for real-time audio streaming analysis according to the present invention. When a terminal executes moving picture data, the system converts the audio data into a hash code through fingerprinting, transmits the converted hash code to the subtitle server, And then transmits the extracted subtitle to the terminal so that the subtitle is matched with the subtitle without overlapping the moving picture data.

또한 본 발명에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템은 오디오 데이터를 핑거 프린팅을 통해 변환한 해쉬 코드와 함께 설정 언어 정보를 자막 서버로 전송함으로써, 자막 서버가 언어 정보와 일치하는 자막을 자동으로 추출하여 단말기로 제공할 수 있다.In addition, the system for providing a caption through real-time audio streaming analysis according to the present invention transmits setting language information to a caption server together with a hash code converted by fingerprinting of audio data, And provide it to the terminal.

도 1은 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 구성을 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 단말기의 구성을 나타낸 도면이다.
도 3은 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 자막 서버의 구성을 나타낸 도면이다.FIG. 1 is a diagram illustrating a configuration of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention.
2 is a diagram illustrating a configuration of a terminal of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention.
3 is a diagram illustrating a configuration of a caption server of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention.

하기의 설명에서는 본 발명의 실시예를 이해하는데 필요한 부분만이 설명되며, 그 이외 부분의 설명은 본 발명의 요지를 흩트리지 않도록 생략될 것이라는 것을 유의하여야 한다.In the following description, only parts necessary for understanding the embodiments of the present invention will be described, and the description of other parts will be omitted so as not to obscure the gist of the present invention.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 바람직한 실시예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.The terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary meanings and the inventor is not limited to the meaning of the terms in order to describe his invention in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely preferred embodiments of the present invention, and are not intended to represent all of the technical ideas of the present invention, so that various equivalents And variations are possible.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 보다 상세하게 설명하고자 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 구성을 나타낸 도면이다.FIG. 1 is a diagram illustrating a configuration of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템(400)은 콘텐츠 서버(100), 단말기(200) 및 자막 서버(300)를 포함한다.Referring to FIG. 1, a caption providing system 400 through real time audio streaming analysis according to an embodiment of the present invention includes a content server 100, a terminal 200, and a caption server 300.

콘텐츠 서버(100)는 단말기(200)에 동영상 데이터를 포함하는 다양한 콘텐츠를 제공하게 된다. 예컨대 콘텐츠 서버(100)는 각 방송사의 방송을 제공하는 방송 매체이거나, 각종 동영상 데이터를 제공하는 동영상 공유 매체, 각 사용자가 동영상을 업로드하고 공유하는 SNS(Social Networking Service) 등 동영상을 제공하는 다양한 공유 매체들이 될 수 있다.The content server 100 provides the terminal 200 with various contents including moving picture data. For example, the content server 100 may be a broadcasting medium for providing broadcasting of each broadcasting company, a video sharing medium for providing various video data, a social networking service (SNS) in which each user uploads and shares a video, Media.

단말기(200)는 콘텐츠 서버(100)로부터 제공되는 콘텐츠의 동영상 데이터를 실행하여 화면상에 제공한다. The terminal 200 executes the moving image data of the content provided from the content server 100 and provides the moving image data on the screen.

한편 서로 다른 콘텐츠 서버(100)에서 동일한 동영상을 제공할 경우, 이들 동영상 데이터의 URL(uniform resource locator)은 서로 상이하나, 동영상에서 재생되는 오디오는 서로 동일하다.On the other hand, when different contents servers 100 provide the same moving picture, the URLs (uniform resource locators) of the moving picture data are different from each other, but the audio reproduced from moving pictures is the same.

이에 따라 단말기(200)는 동영상 데이터에서 오디오 데이터를 캡처하고, 캡처된 오디오 데이터를 핑거 프린팅을 통해 해쉬 코드(Hash code)로 변환할 수 있다. 여기서 변환된 해쉬 코드는 오디오 데이터에 매칭되는 자막을 검색하기 위한 코드가 될 수 있다.Accordingly, the terminal 200 can capture the audio data from the moving picture data and convert the captured audio data into a hash code through fingerprinting. The converted hash code may be a code for searching for a subtitle matched with the audio data.

여기서 단말기(200)는 콘텐츠 서버(100)에서 제공하는 콘텐츠를 제공받기 위한 브라우저(browser)의 플러그 인(Plug in)을 통해 콘텐츠 서버(100)에서 제공하는 콘텐츠의 동영상 데이터에서 오디오 데이터를 캡처하고, 오디오 데이터를 해쉬 코드로 변환하여 자막 서버(300)로 전송할 수 있다.Here, the terminal 200 captures audio data from the moving picture data of the contents provided by the contents server 100 through a plug-in of a browser for receiving the contents provided by the contents server 100 , The audio data can be converted into a hash code and transmitted to the subtitle server 300.

여기서 단말기(200)는 오디오 데이터를 핑거 프린팅하여 변환한 해쉬 코드를 자막 서버(300)로 송신하게 된다. 이때, 단말기(200)는 해쉬 코드와 함께 현재 설정되어 있는 언어 정보를 해쉬 코드와 함께 자막 서버(300)로 전송할 수 있다.Here, the terminal 200 transmits the hash code converted by fingerprinting the audio data to the subtitle server 300. At this time, the terminal 200 can transmit the currently set language information together with the hash code to the subtitle server 300 together with the hash code.

또한 단말기(200)는 자막 서버(300)로부터 송신한 해쉬 코드 및 언어 정보에 매칭되는 자막을 수신하여, 동영상 데이터에 오버레이(Overlay) 시킬 수 있다.In addition, the terminal 200 may receive the subtitle matching the hash code and the language information transmitted from the subtitle server 300 and may overlay the subtitle on the moving picture data.

이러한 단말기(200)는 통신망에 연결되어 어플리케이션을 실행하고, 제공되는 동영상 데이터를 실행할 수 있는 이동통신단말기를 대표적인 예로서 설명하지만 단말기는 이동통신단말기에 한정된 것이 아니고, 모든 정보통신기기, 멀티미디어 단말기, 유선 단말기, 고정형 단말기 및 IP(Internet Protocol) 단말기 등의 다양한 단말기에 적용될 수 있다. 또한, 단말기는 휴대폰, PMP(Portable Multimedia Player), MID(Mobile Internet Device), 스마트폰(Smart Phone), 데스크톱(Desktop), 태블릿컴퓨터(Tablet PC), 노트북(Note book), 넷북(Net book) 및 정보통신 기기 등과 같은 다양한 이동통신 사양을 갖는 모바일(Mobile) 단말기일 때 유리하게 활용될 수 있다.Although the terminal 200 is connected to a communication network to execute an application and can execute the provided moving picture data, the terminal 200 is not limited to the mobile communication terminal but may be any information communication device, multimedia terminal, A wired terminal, a fixed terminal, and an IP (Internet Protocol) terminal. Also, the terminal may be a mobile phone, a portable multimedia player (PMP), a mobile Internet device (MID), a smart phone, a desktop, a tablet PC, a notebook, And an information communication device, which can be advantageously used in a mobile terminal having various mobile communication specifications.

자막 서버(300)는 단말기(200)로부터 오디오 데이터를 핑거 프린팅을 이용하여 변환한 해쉬 코드와, 언어 정보를 전달받을 수 있다.The subtitle server 300 can receive the hash code and the language information converted from the terminal 200 using the fingerprinting audio data.

자막 서버(300)는 전달받은 해쉬 코드를 통해, 해쉬 코드와 일치하거나, 컨피던스 값이 기 설정된 레벨 이상인 자막을 추출할 수 있다. 또한 자막 서버(300)는 추출된 자막 중 단말기(200)로부터 전달받은 언어 정보와 일치하는 자막을 추출하여 단말기(200)로 전송할 수 있다.The subtitle server 300 can extract a subtitle which matches the hash code or has the confidence value equal to or higher than a predetermined level through the received hash code. The caption server 300 may extract the caption matching the language information received from the terminal 200 among the extracted captions and transmit the extracted caption to the terminal 200.

이러한 자막 서버(300)는 단말기(100)가 유무선 통신망을 통해 접속할 수 있는 웹 서버(Web server) 또는 왑 서버(WAP server) 등의 형태로 구현될 수 있으며, 자막 정보와, 각 자막 정보의 해쉬 코드 등을 저장할 수 있다. 이러한 자막 서버(300)는 복수로 구현될 수 있으며, 각각 자막 정보 및 해당 해쉬 코드를 연계하여 데이터베이스에 저장시켜 관리할 수 있다.The subtitle server 300 may be implemented in the form of a Web server or a WAP server that the terminal 100 can access via a wired or wireless communication network. The subtitle server 300 may include subtitle information, Code and so on. The caption server 300 may be implemented in a plurality of formats, and each of the caption information and the corresponding hash code may be associated with each other and stored in a database for management.

이하 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 단말기의 구성을 상세히 설명하도록 한다.Hereinafter, a configuration of a terminal of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention will be described in detail.

도 2는 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 단말기의 구성을 나타낸 도면이다.2 is a diagram illustrating a configuration of a terminal of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 단말기(200)는 통신부(210), 입력부(220), 표시부(230), 저장부(240) 및 제어부(250)를 포함한다.1 and 2, the terminal 200 includes a communication unit 210, an input unit 220, a display unit 230, a storage unit 240, and a control unit 250.

통신부(210)는 콘텐츠 서버(100) 및 자막 서버(300)와 통신망을 통해 데이터를 송수신하기 위한 기능을 수행한다. 통신부(210)는 콘텐츠 서버(100)로부터 콘텐츠를 수신하고, 자막 서버(300)로 오디오 데이터를 핑거 프린팅한 해쉬 코드 및 언어 설정 정보를 전달할 수 있다. 또한 통신부(210)는 자막 서버(300)로부터 해쉬 코드에 매칭되는 자막을 전달받을 수 있다. 여기서 통신부(210)는 다양한 형태의 통신망이 이용될 수 있으며, 예컨대, 무선랜(WLAN, Wireless LAN), 와이파이(Wi-Fi), 와이브로(Wibro), 와이맥스(Wimax), 고속하향패킷접속(HSDPA, High Speed Downlink Packet Access) 등의 무선 통신방식 또는 이더넷(Ethernet), xDSL(ADSL, VDSL), HFC(Hybrid Fiber Coax), FTTC(Fiber to The Curb), FTTH(Fiber To The Home) 등의 유선 통신방식이 이용될 수 있다. 한편, 통신부(210)는 상기에 제시된 통신방식에 한정되는 것은 아니며, 상술한 통신 방식 이외에도 기타 널리 공지되었거나 향후 개발될 모든 형태의 통신 방식을 포함할 수 있다.The communication unit 210 performs a function for transmitting and receiving data to and from the content server 100 and the caption server 300 through a communication network. The communication unit 210 can receive the content from the content server 100 and deliver the hash code and the language setting information in which the audio data is fingerprinted to the caption server 300. In addition, the communication unit 210 can receive the subtitle matching the hash code from the subtitle server 300. The communication unit 210 may be any of various types of communication networks and may be a wireless LAN (WLAN), a Wi-Fi, a Wibro, a WiMAX, a HSDPA , High Speed Downlink Packet Access), or a wired network such as Ethernet, xDSL (ADSL, VDSL), HFC (Hybrid Fiber Coax), FTTC (Fiber to the Curb), FTTH (Fiber To The Home) A communication method can be used. Meanwhile, the communication unit 210 is not limited to the above-described communication system, but may include any other widely known or later-developed communication systems in addition to the above-described communication system.

입력부(220)는 숫자 및 문자 정보 등의 다양한 정보를 입력 받고, 각종 기능을 설정 및 단말기(200)의 기능 제어와 관련하여 입력되는 신호를 제어부(250)로 전달한다. 또한, 입력부(220)는 사용자의 터치 또는 조작에 따른 입력 신호를 발생하는 키패드와 터치패드 중 적어도 하나를 포함하여 구성될 수 있다. 이때, 입력부(220)는 표시부(230)와 함께 하나의 터치패널(또는 터치스크린(touch screen))의 형태로 구성되어 입력과 표시 기능을 동시에 수행할 수 있다. 또한, 입력부(220)는 키보드, 키패드, 마우스, 조이스틱 등과 같은 입력 장치 외에도 향후 개발될 수 있는 모든 형태의 입력 수단이 사용될 수 있다. 특히 입력부(220)는 콘텐츠 서버(100)로부터 제공되는 콘텐츠에서 동영상 데이터를 실행하기 위한 입렵 신호를 감지하여 제어부(250)로 전달한다.The input unit 220 receives various information such as numbers and character information, and transmits various signals to the control unit 250 in connection with setting various functions and controlling functions of the terminal 200. In addition, the input unit 220 may include at least one of a keypad and a touchpad that generates an input signal according to a user's touch or operation. At this time, the input unit 220 may be configured in the form of a touch panel (or a touch screen) together with the display unit 230 to simultaneously perform input and display functions. The input unit 220 may be any type of input device that can be developed in addition to an input device such as a keyboard, a keypad, a mouse, a joystick, and the like. In particular, the input unit 220 senses an incoming signal for executing the moving image data from the content provided from the content server 100, and transmits the received signal to the controller 250.

표시부(230)는 단말기(200)의 기능 수행 중에 발생하는 일련의 동작상태 및 동작결과 등에 대한 정보를 표시한다. 또한, 표시부(230)는 단말기(200)의 메뉴 및 사용자가 입력한 사용자 데이터 등을 표시할 수 있다. 여기서, 표시부(230)는 액정표시장치(LCD, Liquid Crystal Display), 초박막 액정표시장치(TFT-LCD, Thin Film Transistor LCD), 발광다이오드(LED, Light Emitting Diode), 유기 발광다이오드(OLED, Organic LED), 능동형 유기발광다이오드(AMOLED, Active Matrix OLED), 레티나 디스플레이(Retina Display), 플렉시블 디스플레이(Flexible display) 및 3차원(3 Dimension) 디스플레이 등으로 구성될 수 있다. 이때, 표시부(230)가 터치스크린(Touch screen) 형태로 구성된 경우, 표시부(230)는 입력부(220)의 기능 중 일부 또는 전부를 수행할 수 있다. 특히 표시부(230)는 콘텐츠 서버(100)로부터 제공되는 콘텐츠 메뉴 및 동영상을 화면 상에 표시하며, 자막 서버(300)로부터 전달받은 자막을 동영상 내에 표시할 수 있다.The display unit 230 displays information on a series of operation states, operation results, and the like that occur during the performance of the function of the terminal 200. In addition, the display unit 230 can display menus of the terminal 200 and user data input by the user. The display unit 230 may be a liquid crystal display (LCD), a thin film transistor LCD (TFT-LCD), a light emitting diode (LED), an organic light emitting diode LEDs, active matrix organic light emitting diodes (AMOLED), active matrix OLEDs, retina displays, flexible displays, and three-dimensional displays. In this case, when the display unit 230 is configured as a touch screen, the display unit 230 may perform some or all of the functions of the input unit 220. [ In particular, the display unit 230 displays a content menu and a moving image provided from the content server 100 on the screen, and displays the caption received from the caption server 300 in the moving image.

저장부(240)는 단말기(200)의 기능 동작에 필요한 응용 프로그램을 저장한다. 여기서 저장부(240)는 사용자의 요청에 상응하여 각 기능이 활성화된 경우, 제어부(250)의 제어 하에 해당 응용 프로그램들을 실행하여 각 기능을 제공하게 된다. 특히 저장부(240)는 콘텐츠 서버(100)로부터 제공되는 콘텐츠를 실행하는 프로그램, 동영상 데이터에서 오디오 데이터를 캡처하고, 오디어 데이터를 핑거 프린팅하여 해쉬 코드로 변환하는 프로그램 등을 저장할 수 있다.The storage unit 240 stores an application program required for a function operation of the terminal 200. Here, if each function is activated according to a user's request, the storage unit 240 executes the corresponding application programs under the control of the controller 250 to provide each function. In particular, the storage unit 240 may store a program for executing a content provided from the content server 100, a program for capturing audio data from moving picture data, fingerprinting audio data, and converting the audio data into a hash code.

제어부(250)는 운영 체제(OS, Operation System) 및 각 구성을 구동시키는 프로세스 장치가 될 수 있다. 특히 제어부(250)는 콘텐츠 서버(100)로부터 제공되는 콘텐츠를 실행한다. 또한 제어부(250)는 콘텐츠가 제공하는 동영상 데이터를 실행할 수도 있다.The control unit 250 may be an operating system (OS) and a process unit for driving each configuration. In particular, the control unit 250 executes the content provided from the content server 100. [ In addition, the control unit 250 may execute moving picture data provided by the contents.

이러한 제어부(250)는 오디오 데이터 캡처 모듈(251), 핑거 프린팅 모듈(252), 언어 정보 결합 모듈(253) 및 오버레이 모듈(254)을 포함한다.The control unit 250 includes an audio data capture module 251, a fingerprinting module 252, a language information combination module 253, and an overlay module 254.

오디오 데이터 캡처 모듈(251)은 사용자에 의해 동영상이 실행된 경우, 동영상 데이터 내에 포함된 오디오 데이터를 캡처하게 된다. 여기서 오디오 데이터 캡처 모듈(251)은 오디오 데이터의 2 ~ 4초 구간을 캡처하여 해쉬 코드로 변환하는 할 수 있다. 한편 오디오 데이터 캡처 모듈(251)은 동영상 데이터를 캡처할 수 있으나, 오디오데이터를 통해 보다 정확한 자막 동기화가 이루어지도록 할 수 있으며, 상대적으로 적은 용량의 해쉬 코드를 생성할 수 있다.The audio data capture module 251 captures the audio data contained in the moving picture data when the moving picture is executed by the user. Here, the audio data capture module 251 can capture 2 to 4 seconds of the audio data and convert it into a hash code. On the other hand, the audio data capture module 251 can capture the moving picture data, but can perform more accurate caption synchronization through the audio data and generate a relatively small capacity hash code.

핑거 프린팅 모듈(252)은 캡처된 오디오 데이터를 해쉬 코드를 추출할 수 있다. 핑거 프린팅 모듈(252)은 오디오 데이터의 2 ~ 4초 구간을 해쉬 코드로 변환할 수 있다. The fingerprinting module 252 may extract the hash code from the captured audio data. The fingerprinting module 252 may convert a 2 to 4 second interval of the audio data into a hash code.

언어 정보 결합 모듈(253)은 핑거 프린팅 모듈(252)에서 추출된 해쉬 코드를 설정된 언어 정보와 함께 통신부(210)를 통해 자막 서버(300)로 전달할 수 있다. 여기서 언어 정보는 현재 단말기(200)에 설정된 언어에 대한 정보일 수 있다. 언어 정보 결합 모듈(253)은 현재 단말기(200)에 설정된 언어 정보를 해쉬 코드의 헤더에 결합시켜 통신부(210)를 통해 자막 서버(300)로 전달할 수 있다.The language information combination module 253 can transmit the hash code extracted from the fingerprinting module 252 to the caption server 300 through the communication unit 210 together with the set language information. Here, the language information may be information on the language currently set in the terminal 200. The language information combination module 253 may combine the language information set in the terminal 200 with the header of the hash code and transmit the combined language information to the caption server 300 through the communication unit 210. [

오버레이 모듈(254)은 자막 서버(200)로부터 전달받은 자막을 동영상 데이터에 출력될 수 있도록 오버레이 할 수 있다. 이러한 오버레이 모듈(254)은 핑거 프린팅 모듈(252)을 통해 추출된 해쉬 코드와, 자막 서버(300)로부터 전달받은 자막의 해쉬 코드를 비교하여, 해쉬 코드 일치 구간을 찾아 싱크를 맞추어 자막을 출력할 수 있다.The overlay module 254 can overlay the subtitles received from the subtitle server 200 so that the subtitles can be outputted to the moving image data. The overlay module 254 compares the hash code extracted through the fingerprinting module 252 with the hash code of the subtitle received from the subtitle server 300, finds a hash code match interval, and outputs a subtitle .

이하 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 자막 서버에 대하여 상세히 설명하도록 한다.Hereinafter, a caption server of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention will be described in detail.

도 3은 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템의 자막 서버의 구성을 나타낸 도면이다.3 is a diagram illustrating a configuration of a caption server of a caption providing system through real-time audio streaming analysis according to an embodiment of the present invention.

도 1 및 도 3을 참조하면, 자막 서버(300)는 서버 통신부(210). 데이터 베이스(320) 및 서버 제어부(330)를 포함한다.Referring to FIGS. 1 and 3, the subtitle server 300 includes a server communication unit 210. A database 320, and a server control unit 330.

서버 통신부(310)는 단말기(200)와 통신망을 통해 데이터를 송수신하기 위한 기능을 수행한다. 서버 통신부(310)는 단말기(200)로부터 자막 추출을 원하는 오디오 데이터에 대한 해쉬 코드를 전달받을 수 있다. 또한 서버 통신부(310)는 단말기(200)로부터 전달받은 해쉬 코드와 매칭되는 자막을 단말기(200)로 전달할 수 있다.The server communication unit 310 performs a function of transmitting and receiving data through the communication network with the terminal 200. The server communication unit 310 may receive a hash code for audio data to be subtracted from the terminal 200. Also, the server communication unit 310 may transmit the caption matching the hash code received from the terminal 200 to the terminal 200. [

데이터 베이스(320)는 자막 서버(300)의 기능 동작에 필요한 응용 프로그램을 저장한다. 여기서 데이터 베이스(320)는 서버 통신부(310)를 통해 단말기(200)로부터 전달받은 해쉬 코드에 매칭되는 자막을 검색하기 위한 프로그램, 검색한 자막 중 언어 정보와 매칭되는 자막을 검색하기 위한 프로그램 등을 저장할 수 있다. 또한 데이터 베이스(320)는 다양한 자막과, 자막에 해당되는 해쉬 코드를 저장할 수 있다.The database 320 stores an application program required for the function operation of the subtitle server 300. Here, the database 320 includes a program for searching for a subtitle matched with a hash code transmitted from the terminal 200 through the server communication unit 310, a program for searching a subtitle matched with the language information among the searched subtitles, and the like Can be stored. In addition, the database 320 may store various subtitles and hash codes corresponding to subtitles.

서버 제어부(330)는 자막 추출 모듈(331) 및 언어 정보 매칭 모듈(333)을 포함할 수 있다.The server control unit 330 may include a subtitle extraction module 331 and a language information matching module 333. [

자막 추출 모듈(331)은 단말기(200)로부터 전달받은 해쉬 코드를 통해 동영상 데이터에 해당하는 자막을 데이터 베이스(320)에서 검색하여 추출하는 수단으로 단말기(200)로부터 잔달받은 해쉬 코드를 검색키로 하여 동영상 데이터에 해당하는 자막을 검색한다. 즉 자막 추출 모듈(331)은 데이터 베이스(320)에 저장된 자막 중 단말기(200)로부터 전달받은 해쉬 코드와 일치하거나, 컨피던스 값이 기 설정된 레벨 이상인 자막을 추출할 수 있다.The subtitle extraction module 331 extracts the subtitle corresponding to the moving picture data from the database 320 through the hash code received from the terminal 200 and extracts the hash code from the terminal 200 as a search key The subtitle corresponding to the moving picture data is searched. That is, the subtitle extraction module 331 can extract subtitles that match the hash code received from the terminal 200 among the captions stored in the database 320, or whose confidence value is equal to or higher than a predetermined level.

언어 정보 매칭 모듈(333)은 자막 추출 모듈(331)에 의해 추출된 자막 중에서 단말기(200)로부터 전달받은 언어 정보와 매칭되는 자막을 추출하여 서버 통신부(310)를 통해 단말기(200)로 전달할 수 있다.The language information matching module 333 extracts subtitles matched with the language information received from the terminal 200 among the subtitles extracted by the subtitle extraction module 331 and transmits the extracted subtitles to the terminal 200 through the server communication unit 310 have.

즉 자막 추출 모듈(331)에서 추출된 자막은 단말기(200)로부터 전달받은 해쉬 코드와 유사한 해쉬 코드를 갖는 다양한 언어로 된 복수의 자막들이 검색될 수 있다. 이에 따라 언어 정보 매칭 모듈(333)은 검색된 자막들 중 단말기(200)에 설정된 언어 정보와 매칭되는 자막을 자동 선정하여 단말기(200)로 전달할 수 있다.That is, the subtitles extracted from the subtitle extraction module 331 can be searched for a plurality of subtitles in various languages having hash codes similar to the hash codes transmitted from the terminal 200. Accordingly, the language information matching module 333 can automatically select the subtitles matched with the language information set in the terminal 200 among the retrieved subtitles, and transmit the subtitles to the terminal 200.

이와 같이, 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템(400)은 단말기(200)가 동영상 데이터를 실행하게 되면, 핑거 프린팅을 통해 오디오 데이터를 해쉬 코로 변환하고, 변환된 해쉬 코드를 자막 서버(300)로 전송하여 해쉬 코드를 통해 자막을 추출한 후 단말기(200)로 전송하도록 하여 동영상 데이터에 오버레이 시킴으로써, 사용자가 별도의 자막 검색을 수행하지 않고도 매칭되는 자막을 용이하게 제공받을 수 있다.As described above, when the terminal 200 executes the moving picture data, the subtitle providing system 400 according to the embodiment of the present invention converts the audio data into the hash code through fingerprinting, The code is transmitted to the caption server 300, the caption is extracted through the hash code, and the caption is transmitted to the terminal 200, thereby overlaying the caption on the video data. Thus, the user can easily receive the matched caption without performing a separate caption search .

또한 본 발명의 실시예에 따른 실시간 오디오 스트리밍 분석을 통한 자막 제공 시스템(400)은 오디오 데이터를 핑거 프린팅을 통해 변환한 해쉬 코드와 함께 설정 언어 정보를 자막 서버로 전송함으로써, 자막 서버가 언어 정보와 일치하는 자막을 자동으로 추출하여 단말기(200)로 제공할 수 있다.In addition, the subtitle providing system 400 through the real-time audio streaming analysis according to the embodiment of the present invention transmits the set language information to the subtitle server together with the hash code obtained by converting the audio data through fingerprinting, It is possible to automatically extract coincident subtitles and provide them to the terminal 200.

한편, 본 도면에 개시된 실시예는 이해를 돕기 위해 특정 예를 제시한 것에 지나지 않으며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시예 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형예들이 실시 가능하다는 것은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게는 자명한 것이다.It should be noted that the embodiments disclosed in the drawings are merely examples of specific examples for the purpose of understanding, and are not intended to limit the scope of the present invention. It will be apparent to those skilled in the art that other modifications based on the technical idea of the present invention are possible in addition to the embodiments disclosed herein.

100 : 콘텐츠 서버 200 : 단말기
210 : 통신부 220 : 입력부
230 : 표시부 240 : 저장부
250 : 제어부 251 : 오디오 데이터 캡처 모듈
252 : 핑거 프린팅 모듈 253 : 언어 정보 결합 모듈
254 : 오버레이 모듈 300 : 자막 서버
310 : 서버 통신부 320 : 데이터베이스
330 : 서버 제어부 331 : 자막 추출 모듈
333 : 언어 정모 매칭 모듈 400 : 자막 제공 시스템100: Content server 200:
210: communication unit 220: input unit
230: Display unit 240:
250: control unit 251: audio data capture module
252: fingerprinting module 253: language information combining module
254: Overlay module 300: Subtitle server
310: server communication unit 320: database
330: server control unit 331: caption extraction module
333: Language matching module 400: Subtitle provisioning system

Claims

A content server for providing content;
The audio data is captured by the moving picture data, the audio data is converted into a hash code by fingerprinting, and the set language information is transmitted together with the hash code ;
Extracts a caption having the hash code or a confidential value equal to or higher than a predetermined level among the stored captions, extracts a caption matching the language information among the extracted captions, To a subtitle server;
And generating a subtitle based on the audio stream.

The method according to claim 1,
Wherein the terminal captures the audio data from the moving picture data of the content provided by the content server through a plug-in, converts the audio data into a hash code, and transmits the hash code to the caption server. Subtitle provision system.

3. The method of claim 2,
Wherein the terminal includes the language information in a header of the hash code and transmits the language information to the subtitle server.

The method of claim 3,
Wherein the terminal overlays the subtitle received from the subtitle server on the moving picture data.

The method according to claim 1,
Wherein the terminal captures 2 to 4 seconds intervals of the audio data and converts the captured audio data into a hash code.

A communication unit for communicating with a content server and a caption server;
The audio data is captured by the moving picture data, the audio data is converted into a hash code by fingerprinting, and the set language information is transmitted to the hash And extracts a caption having a match value equal to or higher than a predetermined level among the captions stored from the caption server and matching the hash code, and extracting a caption matching the language information among the extracted captions, Via a control unit;
And a second terminal.

A server communication unit for performing communication with the terminal;
The audio data is converted into a hash code by fingerprinting, and the set language information is received through the server communication unit together with the hash code A server control unit for extracting subtitles whose coincidence with the hash code or the confidential value is equal to or higher than a predetermined level among the stored subtitles and transmitting the subtitles matching the language information among the extracted subtitles to the terminal through the server communication unit;
Wherein the subtitle server comprises: