KR20170085237A

KR20170085237A - Subtile Providing System

Info

Publication number: KR20170085237A
Application number: KR1020160004595A
Authority: KR
Inventors: 안문학
Original assignee: 주식회사 소리자바
Priority date: 2016-01-14
Filing date: 2016-01-14
Publication date: 2017-07-24

Abstract

More particularly, the present invention relates to a caption information storage unit 120 for storing fingerprint data of a voice section for the broadcast program 100 and caption data corresponding to the fingerprint data, A voice section detector 110 for detecting a voice section in real time in the program 100 and generating fingerprint data of the voice section; a voice section detecting section 110 for detecting, in response to the fingerprint data generated in real time in the voice section detecting section 110, And a caption inserting unit 130 for inserting the caption data provided from the information storage unit 120 into corresponding video data of the broadcast program 100. In the present invention, only the audio section of the audio data of the broadcast program is detected, and only the subtitle information of the audio section is received and inserted into the broadcast program, the database amount of the subtitle information can be drastically reduced, The insertion processing time can be reduced.

Description

{Subtile Providing System}

The present invention relates to a subtitle providing service, and more particularly, to a subtitle providing system for providing a broadcast subtitle in real time to a broadcast program not provided with a subtitle.

A broadcasting station produces various broadcast contents, and broadcasts the produced broadcast contents at a predetermined frequency. Then viewers are equipped with a broadcast receiver, for example, a TV, and receive and reproduce broadcast content to be broadcast. The broadcast contents include video signals related to various fields such as music, entertainment, sports, movies, news, etc., and audio signals related to the video. The broadcasting station collects video signals and audio signals corresponding to characteristics of broadcast contents in the course of producing such broadcast contents, and combines and arranges them according to a certain rule, and broadcasts the arranged data.

Conventionally, in the case of broadcast contents, various text information, i.e., caption information is added in order to maximize the ability of viewers to transmit information of the corresponding broadcast contents. The above-described caption information provides a function of transmitting broadcast information suitable for people who need caption information such as a hearing-impaired person.

However, since all broadcasting contents do not include subtitle information, there is a problem that appropriate subtitle information can not be provided to people who need subtitle information.

Conventionally, in order to provide caption data corresponding to audio data of broadcast contents, audio fingerprint data for audio data of the entire area of a sound source is generated and corresponding caption data for each audio fingerprint data is generated , The audio fingerprint data is detected at the time of transmitting the broadcast contents, and the subtitle data is received and added to the broadcast contents to be watched so that viewers can watch audio and subtitles corresponding to the video while watching the video.

The audio fingerprint described above generally refers to data that can explain the characteristics of audio data. The audio fingerprint is generated by analyzing audio data by various methods by a method such as frequency conversion, It is used for determining whether the audio data is illegally used or for searching audio data by audio fingerprint.

In the conventional method of generating such an audio fingerprint, various methods have been proposed. However, in the conventional audio fingerprint generation method, when the amount of audio data to be searched for is large (about 10000 or more) There is a disadvantage in that the speed at which the fingerprint is generated is remarkably slowed, so that it is not appropriate to compare a large amount of audio data.

In addition, Korean Patent Registration No. 10-0456408 discloses an audio gene extracting method using a binary feature. However, in this method, the spectral energy of each frame of each audio data in the database is divided into 0, 1, adds the value to the search table value (audio signal ID, corresponding frame i), extracts the 32-bit pattern in the same manner for the input audio of a certain number of seconds, (Audio signal ID, frame index) are variable in each entry of the search table, and thus a sufficient search speed can not be ensured. Also, since the binary feature vector extraction method is fixed, It is relatively vulnerable to damage.

Korean Patent No. 10-1245155 Korean Patent No. 10-0456408

In order to solve the above-mentioned problems, the present invention aims to provide a subtitle providing system for detecting only a speech section of a speaker among audio data of broadcast contents and displaying subtitles for a voice of a corresponding section.

In order to accomplish the above object, according to the present invention, there is provided a program for a broadcast program, the program comprising: a caption information storage unit for storing fingerprint data of a voice section for a broadcast program and caption data corresponding to the fingerprint data; A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data; And a caption inserting unit for inserting the caption data provided in the caption information storage unit into the corresponding image data of the broadcast program corresponding to the fingerprint data generated in real time in the speech interval detection unit.

When the fingerprint data matching the fingerprint data transmitted from the voice section detection unit is retrieved, the caption information storage unit transmits the result to the voice segment detection unit, and the VAD data matching the VAD data transmitted from the voice segment detection unit is searched It is preferable that the subtitle data is transmitted to the subtitle embedding unit when the comparison result is matched.

According to another aspect of the present invention, there is provided a broadcast program storage medium including: a caption information storage unit for storing fingerprint data of a voice section of a broadcast program and caption data corresponding to the fingerprint data; A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data for the voice section; A subtitle transmission management unit for transmitting the fingerprint data generated by the speech segment detection unit to the subtitle information storage unit, receiving the subtitle information corresponding to the fingerprint data, and controlling transmission of the received subtitle information; And a caption inserting unit for inserting caption data transmitted from the caption transmission managing unit corresponding to the fingerprint data generated in real time in the voice interval detecting unit into corresponding video data of the broadcasting program.

The voice section detector includes a function of separating a video signal and an audio signal, and a VAD (Voice Activity Detector) function for detecting a speaker's voice interval in the audio signal to generate fingerprint data and detecting VAD data corresponding thereto .

The subtitle transmission management unit searches the subtitle information storage unit for the fingerprint data corresponding to the fingerprint data transmitted from the speech segment detection unit, and transmits the result to the speech segment detection unit. The subtitle transmission management unit reads the time index TI ), VAD data and caption data (TX) are retrieved from the caption information storage unit, and the VAD data transmitted from the voice segment detection unit and the retrieved VAD data are compared to each other, To the subtitle insertion unit.

The caption information storage unit may be installed in a cloud server.

In the present invention, each constituent block is connected to a wired / wireless network, and each constituent block preferably includes a communication module for data transmission / reception.

In the present invention, only the audio section of the audio data of the broadcast program is detected, and only the subtitle information of the audio section is received and inserted into the broadcast program, the database amount of the subtitle information can be drastically reduced, The insertion processing time can be reduced.

1 is a block diagram of an embodiment of a subtitle providing system according to the present invention;
2 is an exemplary diagram showing generation of fingerprint data in the present invention;
3 illustrates an example of a fingerprint data comparison process in the present invention.
4 is an exemplary diagram showing a VAD data comparison operation in the present invention.
5 is a block diagram showing another embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. The embodiments of the present invention will be described in detail with reference to specific embodiments. However, in the following description of the embodiments, the description of the common parts will be replaced with the previously described parts, So that duplication of unnecessary description is avoided.

FIG. 1 is a block diagram showing an embodiment of a caption providing apparatus according to the present invention. As shown in FIG. 1, fingerprint data of a voice section for a broadcast program 100 and caption data corresponding to the fingerprint data are stored A voice section detector 110 for detecting a voice section in real time in the broadcast program 100 and generating fingerprint data of the voice section; And a caption inserting unit 130 for inserting caption data provided from the caption information storage unit 120 corresponding to the fingerprint data generated in real time into the corresponding image data of the broadcast program 100.

The voice section detector 110 is configured to detect a voice signal section of an audio signal separated from a broadcast program by incorporating a function of separating a video signal and an audio signal and a function of a VAD (Voice Activity Detector).

The voice section detection unit 110, the caption information storage unit 120, and the caption insertion unit 130 each include a communication module built in to connect to a wired / wireless network. The functions and configurations of the wired / wireless network are variously applied and widely known, so detailed description of the operation will be omitted.

An embodiment according to the present invention will be described as follows.

First, the subtitle information storage unit 120 stores the fingerprint data of the speech interval for the broadcast program 100 and the subtitle information corresponding to the fingerprint data in advance before broadcasting, . The caption information storage unit 120 may be configured in a cloud server.

When the broadcasting station starts transmitting the broadcast program 100, the audio section detector 110 separates the video signal and the audio signal from the broadcast program 100 in real time, detects the audio section of the speaker from the audio signal, The fingerprint data is generated when a start portion of the audio signal is detected as a hash code.

For example, when the waveform of the audio signal separated from the current broadcast program is as shown in FIG. 2 (a), the voice section detection section 110 generates fingerprint data at the beginning of the voice section as shown in FIG. 2 (b) Since the subtitle section is not searched for the first fingerprint data and the subtitle value and the VAD are searched for in the subtitle section for the second fingerprint data, After receiving the result information indicating that matching fingerprint data is found in the caption information storage unit 120 after transmitting the data to the storage unit 120, the voice segment detection unit 110 detects VAD data of the voice segment (for example, VAD0, VAD1, and VAD2) and transmits them to the caption information storage unit 120 in order.

3, the subtitle information storage unit 120 searches for the fingerprint data corresponding to the fingerprint data transmitted from the speech segment detection unit 110 and outputs the resultant information to the speech segment detection unit 110 (TX), VAD data, and time index (TI) that are connected to the fingerprint data.

When the VAD data is transmitted from the voice section detector 110, the subtitle information storage unit 120 compares the VAD data with the VAD data, The subtitle data TX corresponding to the matching VAD data is transmitted to the subtitle embedding unit 130 together with the time index TI related to the subtitle position.

Accordingly, the subtitle insertion unit 130 checks the TI of the subtitle information TX and TI transmitted from the subtitle information storage unit 120 and outputs the subtitle data TX to the corresponding location of the broadcast program 100. [ And transmits the broadcast program in which the caption information is inserted.

Accordingly, when a broadcast program is viewed, a display device such as a TV displays a video and subtitles corresponding to the video on the screen, and the viewer can watch subtitles of the audio while listening to the voice of the speaker.

In this operation, the audio section detector 110 and the caption information storage unit 120 repeatedly perform the fingerprint data search and the VAD data comparison operation while switching the fingerprint data search mode and the VAD comparison mode.

As shown in the block diagram of FIG. 5, another embodiment of the present invention includes the fingerprint data of the voice section for the broadcast program 300, the caption data TX corresponding to the fingerprint data, A voice section detector 310 for detecting a voice section in real time in the broadcast program 330 and generating fingerprint data for the voice section; When the fingerprint data matching the fingerprint data is searched in the subtitle information storage unit 320 and matching fingerprint data is retrieved, the subtitle information (TX, TI, VAD) corresponding to the fingerprint data is retrieved from the subtitle information The VAD data transmitted from the voice section detector 310 and the VAD data retrieved from the subtitle information storage 320 are searched by the storage unit 320, A subtitle transmission management unit 340 for controlling the transmission of the subtitle data transmitted from the subtitle transmission management unit 340 corresponding to the fingerprint data generated in real time by the speech segment detection unit 310, And a subtitle inserting unit 330 for inserting the subtitle into the corresponding video data of the program.

In another embodiment of the present invention configured as described above, the configuration and functions of the speech segment detection unit 310 and the caption insertion unit 330 are the same as those of the embodiment of FIG.

The caption information storage unit 320 stores fingerprint data for a broadcast program, a time index (TI) connected to the fingerprint data, VAD data, and caption data (TX) ).

The voice section detector 310 transmits fingerprint data generated by detecting a speaker's voice section in the audio signal of the broadcast program to the caption transmission management section 340. The caption transmission management section 340 receives the fingerprint data, The subtitle information storage unit 320 searches for fingerprint data that matches the fingerprint data transmitted from the fingerprint data storage unit 310. [

Subsequently, when the caption information storage unit 320 retrieves matching fingerprint data from the caption information storage unit 320, the caption information management unit 340 searches the time index (TI), the VAD data, and the caption data TX associated with the fingerprint data, (TI) and the caption data (TX) to the caption inserting unit 330 when the VAD data is found to be identical.

The subtitle insertion unit 330 checks the subtitle information TI and TX transmitted from the subtitle transmission management unit 340 and inserts the subtitle data TX at a corresponding position of the broadcast program 300, The broadcast program is transmitted.

Accordingly, when a broadcast program is viewed, a display device such as a TV displays a video and subtitles corresponding to the video on the screen, and the viewer can watch the video while watching the subtitles while listening to the voice of the speaker.

As described above, according to the present invention, when the caption information is requested, only the voice section of the speaker is detected in the audio signal of the broadcast program, and only the caption information is requested. Therefore, the amount of the database for storing the caption information Can be dramatically reduced.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the invention.

110, 310: Voice section detection section 120, 320: Subtitle information storage section
130, 330: Subtitle insertion section 340: Subtitle transmission management section

Claims

A caption information storage unit for storing fingerprint data of a voice section for a broadcast program and caption data corresponding to the fingerprint data;
A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data; And
And a caption inserting unit for inserting the caption data provided from the caption information storage unit into the corresponding image data of the broadcast program corresponding to the fingerprint data generated in real time by the voice interval detection unit.

A caption information storage unit for storing fingerprint data of a voice section for a broadcast program and caption data corresponding to the fingerprint data;
A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data for the voice section;
A subtitle transmission management unit for transmitting the fingerprint data generated by the speech segment detection unit to the subtitle information storage unit, receiving the subtitle information corresponding to the fingerprint data, and controlling transmission of the received subtitle information; And
And a subtitle inserting unit for inserting the subtitle data transmitted from the subtitle transmission management unit into the corresponding video data of the broadcast program corresponding to the fingerprint data generated in real time by the speech segment detection unit.

3. The apparatus according to claim 1 or 2, wherein the voice section detecting section
A function of separating a video signal and an audio signal,
And a VAD (Voice Activity Detector) function for detecting fingerprint data of a speaker from the audio signal to generate fingerprint data and detecting VAD data corresponding to the fingerprint data.

The apparatus of claim 1, wherein the subtitle information storage unit
When the fingerprint data matching the fingerprint data transmitted from the voice section detection unit is retrieved, the result is transmitted to the voice segment detection unit and an operation of comparing the VAD data corresponding to the VAD data transmitted from the voice segment detection unit is performed And transmits the caption data to the caption inserting unit when the comparison results match.

The apparatus according to claim 1 or 2, wherein the caption information storage unit
A subtitle provisioning system installed on a cloud server.

The apparatus of claim 2, wherein the subtitle transmission management unit
The fingerprint data corresponding to the fingerprint data transmitted from the voice section detector is retrieved from the subtitle information storage unit, and the result is transmitted to the voice segment detection unit, and the time index (TI), the VAD data, The subtitle data storage unit searches the caption data storage unit and compares the VAD data transmitted from the voice segment detection unit and the retrieved VAD data to see if they match. If the comparison result is identical, the corresponding caption data is transmitted to the caption insertion unit A subtitle providing system for transmitting the subtitle.

3. The semiconductor memory device according to claim 1 or 2, wherein each constituent block
And a communication module for transmitting and receiving data is included in each constituent block.