JPH05176232A

JPH05176232A - Title superimposing device

Info

Publication number: JPH05176232A
Application number: JP3336838A
Authority: JP
Inventors: Shinichi Uchiyama; 伸一内山
Original assignee: Fujitsu General Ltd
Current assignee: Fujitsu General Ltd
Priority date: 1991-12-19
Filing date: 1991-12-19
Publication date: 1993-07-13

Abstract

PURPOSE:To display a picture with a title superimposed thereon by extracting an audio signal from an input video signal and audio signal, recognizing voice, and outputting converted character data while being superimposed onto the video signal. CONSTITUTION:A voice recognition section 3 extracts a characteristic parameter from an inputted audio signal 2, compares it with a stored standard pattern, implements recognition processing in the unit of, e.g. a syllable and converts the result into a character code. A sentence generating section 4 controls the character code based on language information to be stored such as syntax, semantics and context to convert the signal into a sentence comprising proper words and phrases or the like and relevant character codes are sequentially outputted. A character generating section 5 stores character pattern data corresponding to the character code and reads the data and stores the data to a video memory based on the character code from the sentence generating section 4. A title synthesis section 8 reads the character pattern data stored in the video memory 6 based on a horizontal synchronizing signal and a vertical synchronizing signal from a synchronizing signal detection section 7 and superimposes them onto a prescribed position of a video signal 1 and the result is outputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は字幕重畳装置に係わり、
映像・音声信号でなるテレビ等の信号の音声信号を文字
データに変換して映像信号に重畳するものに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a caption superimposing device,
The present invention relates to a device for converting an audio signal of a video / audio signal of a television or the like into character data and superimposing it on a video signal.

【０００２】[0002]

【従来の技術】従来、テレビの放送等においては、音声
情報を音声信号と共に字幕挿入等で映像信号付加したも
の、手話の画面を親画面のコーナー等に挿入した放送、
または文字放送での表番組を補完する文字情報サービス
等がある。しかしこれらは一部の番組に限られ、大部分
にものは映像と音の情報のみのものであった。聴覚障害
者においては、全てのテレビ番組やＶＴＲの再生に際し
て、字幕等の挿入が強く求められており、また、音声多
重による二カ国語放送での語学学習に役立てる等、字幕
等の挿入の利用価値は大きい。従って、映像・音声でな
る情報から音声情報を文字情報に変換して映像情報に重
畳するような装置が求められていた。2. Description of the Related Art Conventionally, in television broadcasting and the like, audio information is added together with a video signal by a video signal by inserting subtitles, a broadcast in which a sign language screen is inserted in a corner of a main screen,
Alternatively, there is a character information service or the like that complements a table program in a character broadcast. However, these were limited to some programs, and most of them consisted only of video and audio information. Deaf people are strongly required to insert subtitles when playing all TV programs and VTRs, and also use subtitles to help learn languages in bilingual broadcasting by voice multiplexing. Great value. Therefore, there has been a demand for a device that converts audio information from video / audio information into text information and superimposes it on video information.

【０００３】[0003]

【発明が解決しようとする課題】本発明はこのような点
に鑑みなされたもので、映像・音声信号より音声信号を
取り出して音声認識処理を行い、これらを文字データに
変換して映像信号に重畳するようにした字幕重畳装置を
提供するものである。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems. An audio signal is taken out from a video / audio signal, a voice recognition process is performed, and these are converted into character data into a video signal. A subtitle superimposing device for superimposing is provided.

【０００４】[0004]

【課題を解決するための手段】本発明は上述の課題を解
決するため、テレビ等の映像・音声信号から音声信号を
取り出し、同音声信号を音声認識により文字コードに変
換する音声認識部と、同文字コードに基づき対応の文字
データを出力する文字発生部と、文字データを一時格納
するビデオメモリと、同ビデオメモリよりの文字データ
を前記映像信号の所定位置に重畳する字幕合成部とから
構成した字幕重畳装置を提供するものである。In order to solve the above-mentioned problems, the present invention takes out an audio signal from a video / audio signal of a television or the like and converts the audio signal into a character code by voice recognition, and A character generator that outputs corresponding character data based on the same character code, a video memory that temporarily stores the character data, and a caption synthesizer that superimposes the character data from the video memory on a predetermined position of the video signal. The subtitle superimposing device is provided.

【０００５】[0005]

【作用】以上のように構成したので、本発明による字幕
重畳装置においては、入力の映像・音声信号より音声信
号を取り出し、この音声信号を音声認識により文字コー
ドに変換し、この文字コードを文字パターンに変換して
映像信号の所定位置に重畳する。With the above configuration, in the caption superimposing device according to the present invention, an audio signal is extracted from the input video / audio signal, this audio signal is converted into a character code by voice recognition, and this character code is converted into a character code. It is converted into a pattern and superimposed on a predetermined position of the video signal.

【０００６】[0006]

【実施例】以下、図面に基づいて本発明による字幕重畳
装置の実施例を詳細に説明する。図において、１は映像
信号、２は音声信号で、テレビやＶＴＲ等より入力され
る。３は音声認識部で、入力の音声信号を文字コードに
変換する。４は文章生成部で、音声認識部３よりの文字
コードに基づき適切な文章に変換した文字コードを出力
する。５は文字発生部で、文章生成部４よりの文字コー
ドを文字パターンデータに変換する。６はビデオメモリ
で、文字発生部よりの文字パターンデータを一時格納す
る。７は同期信号検出部で、映像信号より同期信号を分
離し出力する。８は字幕合成部で、ビデオメモリ６より
文字データと映像信号を合成する。９は制御部で、前記
各部を制御する。10は映像出力信号で、字幕信号が重畳
されて音声出力信号11と共にテレビ等の表示装置に出力
される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of a caption superimposing device according to the present invention will be described below in detail with reference to the drawings. In the figure, 1 is a video signal and 2 is an audio signal, which is input from a television, a VTR or the like. A voice recognition unit 3 converts an input voice signal into a character code. Reference numeral 4 denotes a sentence generation unit that outputs a character code converted into an appropriate sentence based on the character code from the voice recognition unit 3. A character generator 5 converts the character code from the sentence generator 4 into character pattern data. A video memory 6 temporarily stores character pattern data from the character generator. Reference numeral 7 denotes a sync signal detector which separates the sync signal from the video signal and outputs it. Reference numeral 8 denotes a caption synthesizing unit that synthesizes character data and a video signal from the video memory 6. A control unit 9 controls each of the above units. A video output signal 10 is superimposed on the caption signal and is output to a display device such as a television together with the audio output signal 11.

【０００７】次に、本発明による字幕重畳装置の動作を
説明する。音声信号２は、音声多重放送等で主・副音声
の場合は主音声信号が選択され、ステレオの場合はモノ
ラルに変換し、字幕表示する音声信号を音声認識部３に
入力する。音声認識部３では、音声フィルタを介して環
境音等を除去した入力の音声信号から特徴パラメータを
抽出し、記憶の標準パターンと比較して、例えば音節単
位に認識処理を行い文字コードに変換する。文章生成部
４では、構文・意味・文脈等の言語情報が記憶され、こ
の言語情報に基づき入力の文字コードを制御して適切な
単語・熟語等でなる文章に変換し、対応する文字コード
を順次出力する。文字発生部５では、それぞれの文字コ
ードに対応の文字パターンデータが記憶され、文章生成
部４よりの文字コードに基づき読みだされてビデオメモ
リ６に格納される。字幕合成部８では、同期信号検出部
７よりの水平・垂直同期信号に基づき、ビデオメモリ６
に格納の文字パターンデータを読みだし、映像信号の所
定位置に重畳して字幕挿入の映像信号を出力する。Next, the operation of the subtitle superimposing device according to the present invention will be described. For the audio signal 2, the main audio signal is selected in the case of main / sub audio in audio multiplex broadcasting or the like, and is converted to monaural in the case of stereo, and the audio signal for subtitle display is input to the voice recognition unit 3. The voice recognition unit 3 extracts a characteristic parameter from an input voice signal from which environmental sounds and the like have been removed via a voice filter, compares it with a standard pattern stored, and performs recognition processing in syllable units, for example, and converts it into a character code. . The sentence generation unit 4 stores linguistic information such as syntax, meaning, and context. Based on this linguistic information, the input character code is controlled to be converted into a sentence composed of appropriate words, idioms, etc., and the corresponding character code is converted. Output sequentially. In the character generation unit 5, character pattern data corresponding to each character code is stored, read out based on the character code from the sentence generation unit 4, and stored in the video memory 6. The subtitle synthesizer 8 uses the horizontal / vertical sync signal from the sync signal detector 7 to detect the video memory 6
The character pattern data stored in is read out, superimposed on a predetermined position of the video signal, and the video signal for caption insertion is output.

【０００８】図２は本発明による字幕重畳装置の他の実
施例の要部ブロック図で、映像信号および音声信号の遅
延手段12を設けている。遅延手段12は、例えば、ＢＢＤ
（Bucket Brigade Device)等の遅延素子を用い、所定時
間（音声信号処理時間）遅延した映像信号を字幕合成部
８に出力し、音声信号はそのまま音声出力信号11として
出力する。若しくは、所要の遅延時間が大きい場合は、
書き込みヘッドと読み取りヘッドを設け、まず、映像信
号、音声信号１、２を書き込みヘッドでエンドレスの記
録媒体に書き込み、所定時間経過後に読み取りヘッドで
映像信号を読み取り字幕合成部８に出力し、音声信号は
そのまま音声出力信号11として出力する。このようにす
れば、映像信号に同期した字幕を重畳することができ
る。FIG. 2 is a block diagram of a main part of another embodiment of the caption superimposing apparatus according to the present invention, in which a delay means 12 for video signals and audio signals is provided. The delay means 12 is, for example, a BBD.
Using a delay element such as (Bucket Brigade Device), the video signal delayed for a predetermined time (audio signal processing time) is output to the caption synthesis unit 8, and the audio signal is output as the audio output signal 11 as it is. Or if the required delay time is large,
A writing head and a reading head are provided. First, the video signal and the audio signals 1 and 2 are written on the endless recording medium by the writing head, and after a predetermined time has passed, the video signal is read by the reading head and output to the subtitle synthesizer 8. Is output as it is as the audio output signal 11. By doing so, it is possible to superimpose a subtitle in synchronization with the video signal.

【０００９】[0009]

【発明の効果】以上に説明したように、本発明による字
幕重畳装置においては、入力の映像・音声信号より音声
信号を取り出して音声認識を行い、変換された文字デー
タを映像信号に重畳し出力するようにしたので、この信
号をテレビ等のビデオ・音声入力端子に接続すれば、字
幕入りの画像を表示することができる。従って、テレ
ビ、ＶＴＲ等の映像・音声でなる全ての情報に対応でき
るので、聴覚障害者に文字情報を付加した画像情報を提
供することができる。また、二カ国語の音声多重放送に
おいて、外国語を音声で聞きながら字幕でそれを確認す
る等、語学の学習等にも利用できる。As described above, in the caption superimposing device according to the present invention, the audio signal is extracted from the input video / audio signal for voice recognition, and the converted character data is superimposed on the video signal and output. Therefore, if this signal is connected to a video / audio input terminal of a television or the like, an image with subtitles can be displayed. Therefore, it is possible to deal with all information such as video and audio of a TV, VTR, etc., and thus it is possible to provide the hearing-impaired person with image information to which character information is added. In addition, it can be used for language learning, etc., such as listening to a foreign language by voice and checking it with subtitles in a multiplex audio broadcast in two languages.

[Brief description of drawings]

【図１】本発明による字幕重畳装置の一実施例の要部ブ
ロック図である。FIG. 1 is a block diagram of a main part of an embodiment of a caption superimposing device according to the present invention.

【図２】本発明による字幕重畳装置の他の実施例の要部
ブロック図である。FIG. 2 is a block diagram of a main part of another embodiment of the caption superimposing device according to the present invention.

[Explanation of symbols]

１映像信号２音声信号３音声認識部４文章生成部５文字発生部６ビデオメモリ７字幕合成部８同期信号検出部９制御部 10 映像出力信号 11 音声出力信号 12 遅延手段 1 Video signal 2 Audio signal 3 Speech recognition unit 4 Text generation unit 5 Character generation unit 6 Video memory 7 Subtitle synthesis unit 8 Sync signal detection unit 9 Control unit 10 Video output signal 11 Audio output signal 12 Delay means

Claims

[Claims]

1. A voice recognition unit that extracts a voice signal from a video / voice signal of a television or the like, converts the voice signal into a character code by voice recognition, and a character generation unit that outputs corresponding character data based on the same character code. And a video memory for temporarily storing character data, and a caption synthesizing unit for superimposing character data from the video memory at a predetermined position of the video signal.

2. The caption superimposing apparatus according to claim 1, wherein the video signal is input to the caption synthesizing unit via a delay unit so as to be synchronized with character data for superimposition.