KR20170085237A - Subtile Providing System - Google Patents
Subtile Providing System Download PDFInfo
- Publication number
- KR20170085237A KR20170085237A KR1020160004595A KR20160004595A KR20170085237A KR 20170085237 A KR20170085237 A KR 20170085237A KR 1020160004595 A KR1020160004595 A KR 1020160004595A KR 20160004595 A KR20160004595 A KR 20160004595A KR 20170085237 A KR20170085237 A KR 20170085237A
- Authority
- KR
- South Korea
- Prior art keywords
- data
- subtitle
- caption
- fingerprint data
- voice
- Prior art date
Links
- 238000003780 insertion Methods 0.000 claims abstract description 9
- 230000037431 insertion Effects 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims description 24
- 230000005540 biological transmission Effects 0.000 claims description 17
- 230000005236 sound signal Effects 0.000 claims description 16
- 238000007726 management method Methods 0.000 claims description 13
- 239000000470 constituent Substances 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 239000004065 semiconductor Substances 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 2
- 238000000034 method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 101150059859 VAD1 gene Proteins 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
More particularly, the present invention relates to a caption information storage unit 120 for storing fingerprint data of a voice section for the broadcast program 100 and caption data corresponding to the fingerprint data, A voice section detector 110 for detecting a voice section in real time in the program 100 and generating fingerprint data of the voice section; a voice section detecting section 110 for detecting, in response to the fingerprint data generated in real time in the voice section detecting section 110, And a caption inserting unit 130 for inserting the caption data provided from the information storage unit 120 into corresponding video data of the broadcast program 100. In the present invention, only the audio section of the audio data of the broadcast program is detected, and only the subtitle information of the audio section is received and inserted into the broadcast program, the database amount of the subtitle information can be drastically reduced, The insertion processing time can be reduced.
Description
The present invention relates to a subtitle providing service, and more particularly, to a subtitle providing system for providing a broadcast subtitle in real time to a broadcast program not provided with a subtitle.
A broadcasting station produces various broadcast contents, and broadcasts the produced broadcast contents at a predetermined frequency. Then viewers are equipped with a broadcast receiver, for example, a TV, and receive and reproduce broadcast content to be broadcast. The broadcast contents include video signals related to various fields such as music, entertainment, sports, movies, news, etc., and audio signals related to the video. The broadcasting station collects video signals and audio signals corresponding to characteristics of broadcast contents in the course of producing such broadcast contents, and combines and arranges them according to a certain rule, and broadcasts the arranged data.
Conventionally, in the case of broadcast contents, various text information, i.e., caption information is added in order to maximize the ability of viewers to transmit information of the corresponding broadcast contents. The above-described caption information provides a function of transmitting broadcast information suitable for people who need caption information such as a hearing-impaired person.
However, since all broadcasting contents do not include subtitle information, there is a problem that appropriate subtitle information can not be provided to people who need subtitle information.
Conventionally, in order to provide caption data corresponding to audio data of broadcast contents, audio fingerprint data for audio data of the entire area of a sound source is generated and corresponding caption data for each audio fingerprint data is generated , The audio fingerprint data is detected at the time of transmitting the broadcast contents, and the subtitle data is received and added to the broadcast contents to be watched so that viewers can watch audio and subtitles corresponding to the video while watching the video.
The audio fingerprint described above generally refers to data that can explain the characteristics of audio data. The audio fingerprint is generated by analyzing audio data by various methods by a method such as frequency conversion, It is used for determining whether the audio data is illegally used or for searching audio data by audio fingerprint.
In the conventional method of generating such an audio fingerprint, various methods have been proposed. However, in the conventional audio fingerprint generation method, when the amount of audio data to be searched for is large (about 10000 or more) There is a disadvantage in that the speed at which the fingerprint is generated is remarkably slowed, so that it is not appropriate to compare a large amount of audio data.
In addition, Korean Patent Registration No. 10-0456408 discloses an audio gene extracting method using a binary feature. However, in this method, the spectral energy of each frame of each audio data in the database is divided into 0, 1, adds the value to the search table value (audio signal ID, corresponding frame i), extracts the 32-bit pattern in the same manner for the input audio of a certain number of seconds, (Audio signal ID, frame index) are variable in each entry of the search table, and thus a sufficient search speed can not be ensured. Also, since the binary feature vector extraction method is fixed, It is relatively vulnerable to damage.
In order to solve the above-mentioned problems, the present invention aims to provide a subtitle providing system for detecting only a speech section of a speaker among audio data of broadcast contents and displaying subtitles for a voice of a corresponding section.
In order to accomplish the above object, according to the present invention, there is provided a program for a broadcast program, the program comprising: a caption information storage unit for storing fingerprint data of a voice section for a broadcast program and caption data corresponding to the fingerprint data; A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data; And a caption inserting unit for inserting the caption data provided in the caption information storage unit into the corresponding image data of the broadcast program corresponding to the fingerprint data generated in real time in the speech interval detection unit.
When the fingerprint data matching the fingerprint data transmitted from the voice section detection unit is retrieved, the caption information storage unit transmits the result to the voice segment detection unit, and the VAD data matching the VAD data transmitted from the voice segment detection unit is searched It is preferable that the subtitle data is transmitted to the subtitle embedding unit when the comparison result is matched.
According to another aspect of the present invention, there is provided a broadcast program storage medium including: a caption information storage unit for storing fingerprint data of a voice section of a broadcast program and caption data corresponding to the fingerprint data; A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data for the voice section; A subtitle transmission management unit for transmitting the fingerprint data generated by the speech segment detection unit to the subtitle information storage unit, receiving the subtitle information corresponding to the fingerprint data, and controlling transmission of the received subtitle information; And a caption inserting unit for inserting caption data transmitted from the caption transmission managing unit corresponding to the fingerprint data generated in real time in the voice interval detecting unit into corresponding video data of the broadcasting program.
The voice section detector includes a function of separating a video signal and an audio signal, and a VAD (Voice Activity Detector) function for detecting a speaker's voice interval in the audio signal to generate fingerprint data and detecting VAD data corresponding thereto .
The subtitle transmission management unit searches the subtitle information storage unit for the fingerprint data corresponding to the fingerprint data transmitted from the speech segment detection unit, and transmits the result to the speech segment detection unit. The subtitle transmission management unit reads the time index TI ), VAD data and caption data (TX) are retrieved from the caption information storage unit, and the VAD data transmitted from the voice segment detection unit and the retrieved VAD data are compared to each other, To the subtitle insertion unit.
The caption information storage unit may be installed in a cloud server.
In the present invention, each constituent block is connected to a wired / wireless network, and each constituent block preferably includes a communication module for data transmission / reception.
In the present invention, only the audio section of the audio data of the broadcast program is detected, and only the subtitle information of the audio section is received and inserted into the broadcast program, the database amount of the subtitle information can be drastically reduced, The insertion processing time can be reduced.
1 is a block diagram of an embodiment of a subtitle providing system according to the present invention;
2 is an exemplary diagram showing generation of fingerprint data in the present invention;
3 illustrates an example of a fingerprint data comparison process in the present invention.
4 is an exemplary diagram showing a VAD data comparison operation in the present invention.
5 is a block diagram showing another embodiment of the present invention.
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.
In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. The embodiments of the present invention will be described in detail with reference to specific embodiments. However, in the following description of the embodiments, the description of the common parts will be replaced with the previously described parts, So that duplication of unnecessary description is avoided.
FIG. 1 is a block diagram showing an embodiment of a caption providing apparatus according to the present invention. As shown in FIG. 1, fingerprint data of a voice section for a
The
The voice
An embodiment according to the present invention will be described as follows.
First, the subtitle
When the broadcasting station starts transmitting the
For example, when the waveform of the audio signal separated from the current broadcast program is as shown in FIG. 2 (a), the voice
3, the subtitle
When the VAD data is transmitted from the
Accordingly, the
Accordingly, when a broadcast program is viewed, a display device such as a TV displays a video and subtitles corresponding to the video on the screen, and the viewer can watch subtitles of the audio while listening to the voice of the speaker.
In this operation, the
As shown in the block diagram of FIG. 5, another embodiment of the present invention includes the fingerprint data of the voice section for the
In another embodiment of the present invention configured as described above, the configuration and functions of the speech
The caption
The
Subsequently, when the caption
The
Accordingly, when a broadcast program is viewed, a display device such as a TV displays a video and subtitles corresponding to the video on the screen, and the viewer can watch the video while watching the subtitles while listening to the voice of the speaker.
As described above, according to the present invention, when the caption information is requested, only the voice section of the speaker is detected in the audio signal of the broadcast program, and only the caption information is requested. Therefore, the amount of the database for storing the caption information Can be dramatically reduced.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the invention.
110, 310: Voice
130, 330: Subtitle insertion section 340: Subtitle transmission management section
Claims (7)
A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data; And
And a caption inserting unit for inserting the caption data provided from the caption information storage unit into the corresponding image data of the broadcast program corresponding to the fingerprint data generated in real time by the voice interval detection unit.
A voice section detector for detecting a voice section in real time in the broadcast program and generating fingerprint data for the voice section;
A subtitle transmission management unit for transmitting the fingerprint data generated by the speech segment detection unit to the subtitle information storage unit, receiving the subtitle information corresponding to the fingerprint data, and controlling transmission of the received subtitle information; And
And a subtitle inserting unit for inserting the subtitle data transmitted from the subtitle transmission management unit into the corresponding video data of the broadcast program corresponding to the fingerprint data generated in real time by the speech segment detection unit.
A function of separating a video signal and an audio signal,
And a VAD (Voice Activity Detector) function for detecting fingerprint data of a speaker from the audio signal to generate fingerprint data and detecting VAD data corresponding to the fingerprint data.
When the fingerprint data matching the fingerprint data transmitted from the voice section detection unit is retrieved, the result is transmitted to the voice segment detection unit and an operation of comparing the VAD data corresponding to the VAD data transmitted from the voice segment detection unit is performed And transmits the caption data to the caption inserting unit when the comparison results match.
A subtitle provisioning system installed on a cloud server.
The fingerprint data corresponding to the fingerprint data transmitted from the voice section detector is retrieved from the subtitle information storage unit, and the result is transmitted to the voice segment detection unit, and the time index (TI), the VAD data, The subtitle data storage unit searches the caption data storage unit and compares the VAD data transmitted from the voice segment detection unit and the retrieved VAD data to see if they match. If the comparison result is identical, the corresponding caption data is transmitted to the caption insertion unit A subtitle providing system for transmitting the subtitle.
And a communication module for transmitting and receiving data is included in each constituent block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160004595A KR20170085237A (en) | 2016-01-14 | 2016-01-14 | Subtile Providing System |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020160004595A KR20170085237A (en) | 2016-01-14 | 2016-01-14 | Subtile Providing System |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20170085237A true KR20170085237A (en) | 2017-07-24 |
Family
ID=59429056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020160004595A KR20170085237A (en) | 2016-01-14 | 2016-01-14 | Subtile Providing System |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20170085237A (en) |
-
2016
- 2016-01-14 KR KR1020160004595A patent/KR20170085237A/en not_active Application Discontinuation
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101757878B1 (en) | Contents processing apparatus, contents processing method thereof, server, information providing method of server and information providing system | |
KR102166423B1 (en) | Display device, server and method of controlling the display device | |
EP3100459B1 (en) | Methods and apparatus to synchronize second screen content with audio/video programming using closed captioning data | |
JP3953886B2 (en) | Subtitle extraction device | |
US11227620B2 (en) | Information processing apparatus and information processing method | |
KR102454002B1 (en) | Signal processing method for investigating audience rating of media, and additional information inserting apparatus, media reproducing apparatus, aduience rating determining apparatus for the same method | |
CN101631249A (en) | Inserting advance content alerts into a media item during playback | |
KR101358807B1 (en) | Method for synchronizing program between multi-device using digital watermark and system for implementing the same | |
JP2012134980A (en) | System and method of providing personalized service | |
US11164347B2 (en) | Information processing apparatus, information processing method, and program | |
EP2621180A2 (en) | Electronic device and audio output method | |
CN111345045A (en) | Display device and control method thereof | |
JP6212719B2 (en) | Video receiving apparatus, information display method, and video receiving system | |
KR20170085237A (en) | Subtile Providing System | |
KR20150023492A (en) | Synchronized movie summary | |
KR102263146B1 (en) | Video display apparatus and operating method thereof | |
KR102292552B1 (en) | Video synchronization system to improve viewing rights for the disabled | |
CN113228166B (en) | Command control device, control method, and nonvolatile storage medium | |
CN109040776B (en) | Identification method and device for playing content | |
KR20070025284A (en) | Digital multimedia broadcasting(dmb) system and words data proceeding method for proceeding words data in dmb | |
KR102668559B1 (en) | Method and apparatus for presuming contents based on sound wave signal included in contents | |
KR101245155B1 (en) | Caption Supporting Method and System | |
US20120093259A1 (en) | Method and apparatus for transmitting content, method and apparatus for receiving content, and content service system | |
WO2015110253A1 (en) | Method of generating a resource identifier, receiver, communication device and system | |
KR20180025697A (en) | Method and apparatus for presuming contents based on sound wave signal included in contents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
A302 | Request for accelerated examination | ||
E902 | Notification of reason for refusal | ||
E601 | Decision to refuse application |