JP2003345809A

JP2003345809A - Database constructing system, passage retrieving device, database constructing method, and program therefor

Info

Publication number: JP2003345809A
Application number: JP2002156872A
Authority: JP
Inventors: Seiichi Takao; 誠一鷹尾; Yasuo Ariki; 康雄有木
Original assignee: NEC System Technologies Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2002-05-30
Filing date: 2002-05-30
Publication date: 2003-12-05

Abstract

<P>PROBLEM TO BE SOLVED: To register a news picture associated with a character string such as a telop character string appearing in the news picture in a database together with the character string as an index when the database from which the news picture is retrieved is generated. <P>SOLUTION: A speech writing device 2 writes a news speech in a character string. A character recognizing device 1 detects a character appearance section wherein the character string appears in the news picture and recognizes the character string. The retrieving device 3 finds similarity between words in a speech writing result corresponding to the character appearance section detected by the writing device 2 and retrieves the passage similar to the character string recognized by the character recognizing device 1 from the speech writing result by using the similarity. A registering device 4 relates and registers the character recognition result of the character recognizing device 1 and a news picture corresponding to the passage retrieved by the retrieving device 3 in a database 5. <P>COPYRIGHT: (C)2004,JPO

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ニュース映像を検
索するデータベースを構築するデータベース構築技術に
関し、特に、ニュース映像中に現れるテロップ文字列等
の文字列と、ニュース映像中の上記文字列と関連する部
分とを対応付けてデータベースに登録する技術に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a database construction technique for constructing a database for retrieving news images, and more particularly to a character string such as a telop character string appearing in a news image and the character string in the news image. The present invention relates to a technique of registering a part to be registered in a database in association with the part to be performed.

【０００２】[0002]

【従来の技術】近年、放送の多チャンネル化等により大
量のニュース映像情報が生成され始めたため、視聴者側
には興味のあるニュース番組だけを見たいという要求が
生じている。このような要求に対する従来の技術として
は、例えば、ニュース映像にインデックスを付与してデ
ータベースに登録しておき、ユーザがインデックスに基
づいて興味のあるニュース映像のみを見られるようにし
たシステムが知られている。ところで、データベースに
インデックスを付与してニュース映像を登録する場合、
従来は、ニュース映像中のテロップ文字列などの文字列
からキーワードを抽出し、このキーワードとニュース音
声書き起こし結果中の各文との類似度をシソーラスにお
ける単語間類似度を利用して求め、類似度が閾値よりも
高い文に対応するニュース映像を、上記文字列をインデ
ックスとしてデータベースに登録するようにしている。
また、従来は、人手によりニュース音声の書き起こし、
シソーラスの構築を行っている。2. Description of the Related Art In recent years, a large amount of news video information has begun to be generated due to multi-channel broadcasting and the like, and there has been a demand for viewers to watch only news programs of interest. As a conventional technique for such a request, for example, there is known a system in which an index is added to a news video and registered in a database so that a user can view only an interesting news video based on the index. ing. By the way, when registering news videos by adding an index to the database,
Conventionally, a keyword is extracted from a character string such as a telop character string in a news video, and the similarity between this keyword and each sentence in the news speech transcription result is determined using the similarity between words in a thesaurus. A news video corresponding to a sentence having a degree higher than the threshold is registered in the database using the character string as an index.
In the past, news transcripts were manually transcribed,
We are building a thesaurus.

【０００３】[0003]

【発明が解決しようとする課題】上述した従来のシステ
ムでは、テロップ文字列などから抽出した一つのキーワ
ードをもとに、シソーラスを使って書き起こし結果中の
文との対応付けを行い、キーセンテンスを抽出している
ため、関係のない文も抽出してしまう危険性があった。
このため、インデックスと無関係なニュース映像がデー
タベースに登録されてしまう危険性があった。In the conventional system described above, based on one keyword extracted from a telop character string or the like, it is associated with a sentence in a transcription result using a thesaurus, and a key sentence is obtained. , There is a risk that unrelated sentences may be extracted.
For this reason, there is a risk that news videos unrelated to the index may be registered in the database.

【０００４】また、対応するニュース映像をデータベー
スに登録するか否かを１文単位で判断していたため、登
録されたニュース映像の前後関係が理解しにくい場合が
あるといった問題点がある。[0004] Further, since it is determined on a sentence basis whether or not to register the corresponding news video in the database, there is a problem that the context of the registered news video may be difficult to understand.

【０００５】更に、人手によって構築される、構築に時
間がかかるシソーラスを用いているために、放送された
ニュース映像信号との時期差が生じやすかった。時期が
違えば、シソーラスの関係も変化している。例えば、現
在は小泉氏が総理大臣だが、そうでなかった時期もあ
る。そうでなかった時期のシソーラスにおける小泉氏と
総理大臣の類似度は低いが、現在は小泉氏が総理大臣で
あるから、類似度は高くなる。つまり、時期が違えば、
シソーラスの関係は変化しているのである。変化に対応
していないシソーラスを用いれば、システムに悪影響が
及ぼされる。[0005] Further, since a thesaurus constructed manually and requiring a long time to construct is used, a time difference from a broadcast news video signal is likely to occur. At different times, the thesaurus relationships are changing. For example, Koizumi is now Prime Minister, but at times he was not. The similarity between Koizumi and the Prime Minister in the thesaurus when it was not so is low, but the similarity is high now that Koizumi is Prime Minister. In other words, at different times,
The thesaurus relationships are changing. Using a thesaurus that does not respond to change will adversely affect the system.

【０００６】また、人手によって書き起こされたニュー
ス音声を用いているため、多大な労力を必要としてい
た。Further, since news speech transcribed manually is used, a great deal of labor is required.

【０００７】そこで、本発明の目的は、関係のないニュ
ース映像がデータベースに登録されてしまう危険性を除
去すること、また、シソーラスにおける単語間類似度を
使うのではなく、検索装置における単語間類似度計算部
で自動計算された類似度を用いることで、ニュース音声
との時期差の問題を回避すること、また、人手によって
書き起こされたニュース音声を用いるのではなく、ニュ
ース音声書き起こし装置によって書き起こされたニュー
ス音声を用いることで、労力の軽減をはかることにあ
る。Therefore, an object of the present invention is to eliminate the danger that unrelated news videos are registered in a database, and not to use the similarity between words in a thesaurus but to use the similarity between words in a search device. By using the degree of similarity automatically calculated by the degree calculation unit, it is possible to avoid the problem of the time difference with the news voice, and instead of using the news voice transcribed manually, by the news voice transcription device The aim is to reduce the effort by using the transcribed news audio.

【０００８】なお、特開2002−14973号公報には、映像
中のテロップ文字列などの文字情報を認識し、認識した
文字情報の類似度に基づいて、２つの画像の類似度を求
める技術が記載されているが、テロップ文字列などの文
字列とニュース音声の書き起こし結果との類似度を求め
る技術ではない。Japanese Unexamined Patent Application Publication No. 2002-14973 discloses a technique for recognizing character information such as a telop character string in a video and calculating a similarity between two images based on the similarity of the recognized character information. Although described, it is not a technique for calculating the similarity between a character string such as a telop character string and a transcription result of news audio.

【０００９】[0009]

【課題を解決するための手段】本発明のデータベース構
築システムは、上記目的を達成するため、ニュース映像
に対応するニュース音声を文字列に書き起こす音声書き
起こし装置と、前記ニュース映像において文字列が現れ
る文字出現区間を検出すると共に、前記文字列を認識す
る文字認識装置と、該文字認識装置で検出された文字出
現区間に対応する音声書き起こし結果中の単語間の類似
度を求め、該類似度を利用して前記音声書き起こし結果
から前記文字認識装置で認識された前記文字列と類似す
るパッセージを検索する検索装置と、前記文字認識装置
の認識結果と前記検索装置で検索されたパッセージに対
応するニュース映像とを関連付けてデータベースに登録
する登録装置とを備えている。In order to achieve the above object, a database construction system according to the present invention comprises: a voice transcriptor for writing a news voice corresponding to a news video into a character string; A character recognition device for recognizing the character string while detecting an appearing character appearance section, and a similarity between words in a speech transcription result corresponding to the character appearance section detected by the character recognition device are obtained. A search device that searches for a passage similar to the character string recognized by the character recognition device from the speech transcription result using a degree, and a recognition result of the character recognition device and a passage searched by the search device. A registration device for associating with a corresponding news video and registering the news video in a database.

【００１０】より具体的には、本発明のデータベース構
築システムは、前記検索装置が、文字認識装置で検出さ
れた文字出現区間に対応する音声書き起こし結果中の単
語間の類似度を求める単語間類似度計算部と、該類似度
を利用して前記音声書き起こし結果の中から前記文字認
識装置で認識された前記文字列と類似するパッセージを
検索するパッセージ検索部とを備えている。[0010] More specifically, in the database construction system according to the present invention, the search device may be arranged so that the search device obtains a similarity between words in a speech transcription result corresponding to a character appearance section detected by a character recognition device. A similarity calculation unit; and a passage search unit configured to search the speech transcription result for a passage similar to the character string recognized by the character recognition device using the similarity.

【００１１】[0011]

【作用】上述した構成は、ニュース映像に出現するテロ
ップ文字列やCGフリップ文字列がニュース番組の内容を
要約している点に着目することによりなされたものであ
る。上述した構成では、文字認識装置によって認識され
たテロップ文字列やCGフリップ文字列中の全単語を用い
て、ニュース音声の書き起こし結果に対してパッセージ
検索を行っている。一方、従来のシステムでは一つの単
語を用いて、ニュース音声の書き起こしに対して検索結
果を一文単位で求めていた。このため、一つの単語のシ
ソーラスの影響に引きづられて、関係のない文を抽出し
てしまい、関係のないニュース映像をデータベースに登
録してしまう危険性があった。上述した構成では、一つ
の単語だけでなく文字列中の全単語を用いているので、
一つの単語のシソーラスの影響に引きづられて関係のな
い文を抽出してしまう危険性を除去することができ、こ
の結果、関係のないニュース映像がデータベースに登録
されてしまうという危険性を除去できる。また、検索結
果をパッセージ単位とすることにより、ニュース映像の
前後関係が理解しにくいといった問題を解決することが
できる。The above-mentioned configuration is made by paying attention to the point that the telop character string and the CG flip character string appearing in the news video summarize the contents of the news program. In the above-described configuration, a passage search is performed on a transcription result of a news voice using all words in a telop character string or a CG flip character string recognized by the character recognition device. On the other hand, in the conventional system, a search result is obtained for each transcript of a news sound in units of one sentence using one word. For this reason, there is a danger that an unrelated sentence is extracted due to the influence of the thesaurus of one word, and an unrelated news video is registered in the database. In the above configuration, not only one word but all words in the character string are used.
It removes the danger of extracting unrelated sentences due to the influence of the thesaurus of one word, thereby eliminating the risk that unrelated news videos will be registered in the database. it can. Further, by using the search result in passage units, it is possible to solve the problem that the context of news videos is difficult to understand.

【００１２】また、上述した構成では、検索装置の単語
間類似度計算部が、文字認識装置における文字列出現区
間検出部によって検出された文字出現区間の音声書き起
こし結果をもとに、単語間類似度を学習する。つまり、
従来のように人手で構築したシソーラスにおける単語間
類似度を使うのではなく、直接分析対象のデータから単
語間類似度を計算するので、時期差の問題を回避するこ
とができる。Further, in the above-described configuration, the inter-word similarity calculation unit of the search device uses the inter-word inter-word transcript based on the character transcription section of the character appearance section detected by the character string appearance section detection unit of the character recognition device. Learn similarity. That is,
Rather than using the similarity between words in a manually constructed thesaurus as in the prior art, the similarity between words is calculated directly from the data to be analyzed, so that the problem of time difference can be avoided.

【００１３】[0013]

【発明の実施の形態】次に本発明の実施の形態について
図面を参照して詳細に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１４】図１は本発明にかかるデータベース構築シ
ステムの実施例のブロック図であり、文字認識装置１
と、音声書き起こし装置２と、検索装置３と、登録装置
４と、データベース５とから構成されている。FIG. 1 is a block diagram of an embodiment of a database construction system according to the present invention.
, A transcription device 2, a search device 3, a registration device 4, and a database 5.

【００１５】文字認識装置１は、ニュース映像信号に基
づいて、ニュース映像中に現れるテロップ文字列やＣＧ
フリップ文字列などの文字列を文字認識する機能や、文
字列が出現している文字列出現区間を検出する機能等を
有する。The character recognizing device 1 uses a telop character string or CG appearing in a news video based on a news video signal.
It has a function of character recognition of a character string such as a flip character string, a function of detecting a character string appearance section in which a character string appears, and the like.

【００１６】このような機能を有する文字認識装置１
は、文字列出現区間検出部１１と、文字列領域抽出部１
２と、２値化処理部１３と、ＯＣＲ部１４とから構成さ
れている。A character recognition device 1 having such a function
Are a character string appearance section detection unit 11 and a character string region extraction unit 1
2, a binarization processing unit 13 and an OCR unit 14.

【００１７】文字列出現区間検出部１１は、ニュース映
像信号に基づいて、ニュース映像においてテロップ文字
列やＣＧフリップ文字列などの文字列が出現する文字列
出現区間を検出し、文字列出現区間を示す文字列出現区
間情報を音声書き起こし装置２に出力する機能や、文字
列が静止している静止区間を検出し、静止区間を示す静
止区間情報を文字列領域抽出部１２に出力する機能等を
有する。The character string appearance section detecting section 11 detects a character string appearance section in which a character string such as a telop character string or a CG flip character string appears in a news video based on the news video signal, and detects the character string appearance section. Function to output the character string appearing section information to the speech transcriptor 2, a function to detect a still section in which the character string is stationary, and to output the still section information indicating the still section to the character string area extraction unit 12. Having.

【００１８】文字列領域抽出部１２は、ニュース映像信
号の内の、静止区間情報によって示されるニュース映像
信号に基づいて、フレーム中の文字領域を抽出する機能
を有する。The character string region extracting section 12 has a function of extracting a character region in a frame based on the news video signal indicated by the static section information from the news video signal.

【００１９】２値化処理部１３は、文字列領域抽出部１
２で抽出された文字領域と他の領域とを２値化する機能
を有する。The binarization processing unit 13 includes a character string area extraction unit 1
It has a function of binarizing the character area extracted in step 2 with other areas.

【００２０】ＯＣＲ部１４は、２値化処理部１３の処理
結果（２値化結果）に基づいて文字認識を行う機能を有
する。The OCR unit 14 has a function of performing character recognition based on the processing result (binarization result) of the binarization processing unit 13.

【００２１】音声書き起こし装置２は、ニュース映像信
号から分離したニュース音声信号を文字列に書き起こす
機能や、音声書き起こし結果の内の、文字出現区間にお
ける音声書き起こし結果を検索装置３内の単語間類似度
計算部３１に出力する機能等を有する。The voice transcription device 2 has a function of transcribing a news voice signal separated from a news video signal into a character string, and a voice transcription result in a character appearance section of a voice transcription result in the retrieval device 3. It has a function of outputting to the inter-word similarity calculator 31 and the like.

【００２２】検索装置３は、文字出現区間における音声
書き起こし結果中の単語間の類似度を求め、この類似度
を利用して音声書き起こし結果の中から文字認識装置１
で認識された文字列と類似するパッセージを検索する機
能を有する。The search device 3 obtains the similarity between words in the speech transcription result in the character appearance section, and uses this similarity to extract the character recognition device 1 from the speech transcription result.
Has a function of searching for a passage similar to the character string recognized by.

【００２３】このような機能を有する検索装置３は、単
語間類似度計算部３１と、パッセージ検索部３２とから
構成される。The search device 3 having such a function includes an inter-word similarity calculation unit 31 and a passage search unit 32.

【００２４】単語間類似度計算部３１は、文字出現区間
における音声書き起こし結果中の単語間の類似度を求め
る機能を有する。The inter-word similarity calculation unit 31 has a function of calculating the similarity between words in the speech transcription result in the character appearance section.

【００２５】パッセージ検索部３２は、単語間類似度計
算部３１が求めた単語間の類似度を利用して、音声書き
起こし結果の中から文字認識装置１で認識された文字列
と類似するパッセージを検索する機能を有する。The passage retrieval unit 32 uses the similarity between words obtained by the inter-word similarity calculation unit 31 to extract passages similar to the character string recognized by the character recognition device 1 from the speech transcription result. Has the function of searching for

【００２６】登録装置４は、パッセージ検索部３２で検
索されたパッセージに対応するニュース映像を、文字認
識装置１で認識された文字列に関連付けてデータベース
５に登録する機能を有する。The registration device 4 has a function of registering the news video corresponding to the passage searched by the passage search unit 32 in the database 5 in association with the character string recognized by the character recognition device 1.

【００２７】[0027]

【実施例の動作の説明】次に本実施例の動作について説
明する。Next, the operation of this embodiment will be described.

【００２８】先ず、文字認識装置１内の文字列出現区間
検出部１１の動作について説明する。First, the operation of the character string appearance section detection section 11 in the character recognition device 1 will be described.

【００２９】ニュース映像中の文字列の出現パターン
は、大きく以下の３種類に分類することができる。The appearance patterns of character strings in news videos can be roughly classified into the following three types.

【００３０】・先頭部から徐々に出現し、一定時間静止
する。・移動しながら出現し、一定時間静止する。・先頭部から最後尾まで同時に出現し、一定時間静止す
る。-Appears gradually from the head and stops for a certain period of time. -Appears while moving and stops for a certain period of time. -Appear simultaneously from the beginning to the end and stop for a certain period of time.

【００３１】出現途中のテロップ文字列やCGフリップ文
字列を検出し文字認識を行っても、文字列として意味を
成さないことから、本実施例では、上記のどのパターン
においても、テロップ文字列やCGフリップ文字列が一定
時間静止するという特徴に着目し、文字列が静止してい
る区間（静止区間）において文字認識を行うようにして
いる。文字列出現区間検出部１１は、静止区間を検出す
るために利用される。静止区間とは、図２に示されるよ
うに、文字列が一定の状態で静止している区間を言う。
図２において、文字出現フレームは、文字が出現し始め
るフレーム、文字消失フレームは、文字が消える手前の
フレーム、文字出現区間は、文字が出現し始めてから消
失する手前までの区間を言う。また、本実施例では、文
字列出現区間検出部１１は、文字出現区間を検出するた
めにも利用されている。Even if a telop character string or a CG flip character string that is appearing is detected and character recognition is performed, it does not make sense as a character string. Focusing on the feature that the character string or the CG flip character string is stationary for a certain period of time, character recognition is performed in a section where the character string is stationary (stationary section). The character string appearance section detection unit 11 is used to detect a still section. As shown in FIG. 2, a stationary section is a section in which a character string is stationary in a constant state.
In FIG. 2, a character appearance frame is a frame where a character starts to appear, a character disappearance frame is a frame before the character disappears, and a character appearance section is a section from the time when the character starts appearing to a point before the character disappears. In this embodiment, the character string appearance section detection unit 11 is also used to detect a character appearance section.

【００３２】文字列出現区間検出部１１は、文字出現区
間および静止区間を検出する際、図３（ａ）〜（ｄ）に
示すフレーム上に定義された複数種類の矩形領域を利用
する。When detecting a character appearance section and a stationary section, the character string appearance section detection section 11 uses a plurality of types of rectangular areas defined on the frames shown in FIGS.

【００３３】図３（ａ）は、フレームを８等分した矩形
領域を示している。図３（ｂ）は、図３（ａ）における
２つの矩形領域にまたがる同一形状の６個の矩形領域を
示し、図３（ｃ）は、図３（ａ）における４つの矩形領
域にまたがる同一形状の３個の矩形領域を示し、図３
（ｄ）は、図３（ａ）における２つの矩形領域にまたが
る同一形状の４個の矩形領域を示している。FIG. 3A shows a rectangular area obtained by dividing a frame into eight equal parts. FIG. 3B shows six rectangular regions of the same shape which straddle two rectangular regions in FIG. 3A, and FIG. 3C shows the same rectangular region which straddles four rectangular regions in FIG. FIG. 3 shows three rectangular regions of the shape,
FIG. 3D shows four rectangular regions of the same shape that straddle the two rectangular regions in FIG.

【００３４】文字列出現区間検出部１１は、先ず、ニュ
ース映像信号中の現在（注目）フレームの輝度ヒストグ
ラムと前フレームの輝度ヒストグラムとの距離に基づい
て、現在フレームが文字出現フレームの候補となるか否
かを判定する（図４、Ｓ４１）。ステップＳ４１の処理
を詳しく説明すると、次のようになる。なお、前フレー
ム、現在フレーム、後フレームは図５にて定義される。
つまり、前フレームは現在フレームよりも１フレームだ
け前のフレームであり、後フレームは現在フレームより
も１フレームだけ後のフレームである。The character string appearance section detecting section 11 firstly makes the current frame a candidate for a character appearance frame based on the distance between the luminance histogram of the current (interest) frame in the news video signal and the luminance histogram of the previous frame. It is determined whether or not (S41 in FIG. 4). The processing in step S41 will be described in detail below. Note that the previous frame, the current frame, and the subsequent frame are defined in FIG.
That is, the previous frame is a frame one frame before the current frame, and the subsequent frame is a frame one frame after the current frame.

【００３５】ステップＳ４１において、文字列出現区間
検出部１１は、先ず、図３（ａ）〜（ｄ）に示す矩形領
域の内の、使用順序が第１番目の矩形領域を使用して以
下の処理を行う。今、例えば、図３（ａ）の左上の矩形
領域３０１が使用順序が第１番目の矩形領域であるとす
ると、文字列出現区間検出部１１は、現在フレームの左
上の矩形領域３０１の輝度ヒストグラムと、上記矩形領
域３０１に対応する前フレームの左上の矩形領域３０１
の輝度ヒストグラムとの距離Ｄiff(H₁,H₂)（現在フレー
ムの輝度ヒストグラムの度数と前フレームの輝度ヒスト
グラムの度数の差の最大値）を次式（１）を用いて求め
る。In step S41, the character string appearance section detection unit 11 first uses the first rectangular area of the rectangular areas shown in FIGS. Perform processing. Now, for example, assuming that the upper left rectangular area 301 in FIG. 3A is the first rectangular area in use order, the character string appearance section detection unit 11 determines the luminance histogram of the upper left rectangular area 301 of the current frame. And the upper left rectangular area 301 of the previous frame corresponding to the rectangular area 301
Distance Diff between the luminance histogram of (H _1, H ₂₎ obtained using the following equation (1) (the maximum value of the difference between the frequency of luminance histogram of the power and the previous frame of the luminance histogram of the current frame).

【００３６】 [0036]

【００３７】式（１）において、H₁(i)は前フレームに
おける輝度ｉの度数、H₂(i)は現在フレームにおける輝
度ｉの度数を表し、absは絶対値を表している。In equation (1), H ₁ (i) represents the frequency of luminance i in the previous frame, H ₂ (i) represents the frequency of luminance i in the current frame, and abs represents the absolute value.

【００３８】その後、文字列出現区間検出部１１は、距
離Ｄiff(H₁,H₂)が予め定められている閾値を超えるか否
かを判断する。そして、閾値を超えている場合は、現在
フレームは文字出現フレームの候補になると判定し（Ｓ
４２がＹＥＳ）、ステップＳ４４の処理を行う。これに
対して、閾値を超えていない場合は、使用順序が次の矩
形領域（例えば、図３（ａ）の右上の矩形領域３０２）
を使用して前述した処理と同様の処理を行う。使用順序
が最後の矩形領域（例えば、図３（ｄ）の一番下の矩形
領域３０３）を使用しても、距離Ｄiff(H₁,H₂)が閾値を
超えない場合は、現在フレームは文字出現フレームの候
補にならないと判定し（Ｓ４２がＮＯ）、現在フレーム
を１つ進め（Ｓ４３）、再びステップＳ４１の処理を行
う。即ち、後フレームを新たな現在フレームとして、再
びステップＳ４１の処理を行う。Thereafter, the character string appearance section detecting section 11 determines whether or not the distance Diff (H ₁ , H ₂ ) exceeds a predetermined threshold. If the threshold value is exceeded, it is determined that the current frame is a candidate for a character appearance frame (S
42 is YES), the process of step S44 is performed. On the other hand, if the threshold value is not exceeded, the order of use is the next rectangular area (for example, the upper right rectangular area 302 in FIG. 3A).
To perform the same processing as the processing described above. If the distance Diff (H ₁ , H ₂ ) does not exceed the threshold even when the last rectangular area (for example, the lowermost rectangular area 303 in FIG. 3D) is used, the current frame is It is determined that it is not a character appearance frame candidate (S42: NO), the current frame is advanced by one (S43), and the process of step S41 is performed again. That is, the process of step S41 is performed again with the subsequent frame as a new current frame.

【００３９】ステップＳ４４では、現在フレームと後フ
レームとのエッジ位置一致度に基づいて現在フレームが
文字出現フレームであるか否かを判定する。エッジ位置
一致度は、エッジの位置と方向が一致している度合いを
示す。ステップＳ４４の処理を詳しく説明すると、次の
ようになる。In step S44, it is determined whether or not the current frame is a character appearance frame based on the edge position coincidence between the current frame and the subsequent frame. The edge position coincidence indicates the degree to which the position and direction of the edge coincide. The processing in step S44 will be described in detail as follows.

【００４０】ステップＳ４４において、文字列出現区間
検出部１１は、先ず、現在フレームｔ中の使用順序が第
１番目の矩形領域３０１に注目し、この矩形領域３０１
内に存在する各画素(x_s,y_s)〜(x_e,y_e)の方向ｎに対する
エッジ成分E_t(x_s,y_s,n)〜E_t(x_e,y_e,n)を、次式（２）に
示す計算により求める。なお、画素(x_s,y_s)は、座標値
(x_s,y_s)に存在する画素で、矩形領域３０１の左上の画
素を表す。また、画素(x_e,y_e)は、座標値(x_e,y_e)に存在
する画素であり、矩形領域３０１の右下の画素を表す。
また、xは、x_s≦x≦x_eの関係を有し、yは、y_s≦y≦y_eの
関係を有する。また、文字列出現区間検出部１１は、後
フレームｔ＋1の中の使用順序が第１番目の矩形領域３
０１中に存在する各画素のエッジ成分も式（２）を用い
て計算する。In step S44, the character string appearance section detecting section 11 first focuses on the first rectangular area 301 in the current frame t in the order of use.
Each pixel existing within _{_{(x s, y s) ~}} (x e, y e) edge component E _t with respect to the direction n of the _{_{(x s, y s, n}} ) ~E t (x e, y e, n) Is calculated by the calculation shown in the following equation (2). The pixel (x _s , y _s ) has the coordinate value
A pixel existing at (x _s , y _s ) represents the upper left pixel of the rectangular area 301. The pixel (x _e , y _e ) is a pixel existing at the coordinate value (x _e , y _e ), and represents the lower right pixel of the rectangular area 301.
Further, x is, has a relationship of _{_{x s ≦ x ≦ x e,}} y has a relationship of y _s ≦ y ≦ y _e. Further, the character string appearance section detection unit 11 determines that the use order in the subsequent frame t + 1 is the first rectangular area 3.
The edge component of each pixel existing in 01 is also calculated using equation (2).

【００４１】 [0041]

【００４２】式（２）においてRt(x,y)、Gt(x,y)、Bt
(x,y)は、それぞれ画素(x,y)が持っている赤、緑、黄色
の濃淡レベルを表す。ｎは、図６に示されるように、画
素(x,y)における４方向(0度、45度、90度、135度)を表
している。つまり、文字列出現区間検出部１１は、各画
素(x_s,y_s)〜(x_e,y_e)それぞれについて、n=1,2,3,4とし
て式（２）に示す演算を行い、各画素(x_s,y_s)〜(x_e,y_e)
の方向n=1,2,3,4に対するエッジ成分E_t(x_s,y_s,1)〜E_t(x
_e,y_e,1)、E_t(x_s,y_s,2)〜E_t(x_e,y_e,2)、E_t(x_s,y_s,3)〜E_t
(x_e,y_e,3)、E_t(x_s,y_s,4)〜E_t(x_e,y_e,4)を計算する。In equation (2), Rt (x, y), Gt (x, y), Bt
(x, y) represents the red, green, and yellow shading levels of the pixel (x, y). n represents four directions (0, 45, 90, and 135 degrees) in the pixel (x, y) as shown in FIG. In other words, the string appearance section detection unit 11, each pixel _{_{(x s, y s) ~}} (x e, y e) for each performs a calculation shown in equation (2) as n = 1, 2, 3, 4 , Each pixel (x _s , y _s ) to (x _e , y _e )
Edge components E _t (x _s , y _s , 1) to E _t (x
_{_{e, y e, 1),}} E t (x s, y s, 2) ~E t (x e, y e, 2), E t (x s, y s, 3) ~E t
(x _e , y _e , 3) and E _t (x _s , y _s , 4) to E _t (x _e , y _e , 4) are calculated.

【００４３】その後、文字列出現区間検出部１１は、処
理対象にしている矩形領域３０１内の各画素(x_s,y_s)〜
(x_e,y_e)それぞれに対して、次式（３）、（４）を適用
し、各画素(x_s,y_s)〜(x_e,y_e)の各方向(n=1,2,3,4)に対
する値Edge_t(x_s,y_s,n)〜Edge_t(x _e,y_e,n)を決定する。つ
まり、画素(x,y)のエッジ成分E_t(x,y,n)が閾値th₁より
大きければEdge_t(x,y,n)=1とし、閾値th₁以下であれ
ば、Edge_t(x,y,n)=0とする。例えば、Edge_t(x,y,3)=1で
あれば、画素(x,y)の方向n=3において、エッジが存在す
ると言える。また、文字列出現区間検出部１１は、後フ
レームt+1の各画素についても式（３）、（４）を適用
して同様の処理を行う。Thereafter, the character string appearance section detecting section 11 performs processing.
Each pixel (x_s, y_s) ~
(x_e, y_e) Apply the following equations (3) and (4) to each
And each pixel (x_s, y_s) ~ (X_e, y_e) In each direction (n = 1,2,3,4)
Edge_t(x_s, y_s, n) ~ Edge_t(x _e, y_e, n). One
In other words, the edge component E of the pixel (x, y)_t(x, y, n) is the threshold th₁Than
Edge if big_t(x, y, n) = 1, threshold th₁Be less than
Edge_t(x, y, n) = 0. For example, Edge_t(x, y, 3) = 1
If there is, an edge exists in the direction n = 3 of the pixel (x, y).
It can be said that. Further, the character string appearance section detection unit 11
Equations (3) and (4) are also applied to each pixel of frame t + 1
And perform the same processing.

【００４４】 [0044]

【００４５】次に、文字列出現区間検出部１１は、次式
（５）を使用することにより、処理対象にしている矩形
領域３０１内の各画素(x_s,y_s)〜(x_e,y_e)それぞれについ
て、現在フレームｔと後フレームｔ＋１との両方で、各
方向ｎ(n=1,2,3,4)においてエッジが存在しているかど
うか調べる。例えば、現在フレームｔと後フレームｔ＋
１の両方で、方向n=4にエッジが存在する画素(x,y)に対
するEcor_t(x,y,4)は１となり、そうではない画素(x,y)
に対するEcor_t(x,y,4)は０となる。Next, the character string appearance section detecting section 11 uses the following equation (5) to calculate each pixel (x _s , y _s ) to (x _e ,) in the rectangular area 301 to be processed. y _e ), it is checked whether an edge exists in each direction n (n = 1, 2, 3, 4) in both the current frame t and the subsequent frame t + 1. For example, the current frame t and the subsequent frame t +
In both 1, pixel edge exists in the direction n = 4 (x, y) for _{Ecor t (x, y, 4} ) becomes 1, is not the case the pixel (x, y)
_{Ecor t (x, y, 4} ) for is zero.

【００４６】 [0046]

【００４７】その後、処理対象にしている矩形領域３０
１内の各画素(x_s,y_s)〜(x_e,y_e)に対して次式（６）に示
す演算を行うことにより、各画素(x_s,y_s)〜(x_e,y_e)それ
ぞれについて現在フレームｔと後フレームｔ＋１とのエ
ッジの方向が一致していた度合Ecor_t(x_s,y_s)〜Ecor
_t(x_e,y_e)を計算する。例えば、ｎ＝１とｎ＝４との２方
向でエッジの方向が一致していた画素(x,y)に対するEco
r_t(x,y)は、２となる。Thereafter, the rectangular area 30 to be processed is
Each pixel in _{_{1 (x s, y s)}} ~ (x e, y e) by performing the calculation shown in equation (6) with respect to each pixel _{_{(x s, y s) ~}} (x e, y _e ) for each of them, the degree to which the direction of the edge of the current frame t and the direction of the edge of the subsequent frame t + 1 coincide with each other Eco _t (x _s , y _s ) to Eco r
Calculate _t (x _e , y _e ). For example, the Eco for a pixel (x, y) whose edge direction matches in two directions of n = 1 and n = 4
r _t (x, y) is 2.

【００４８】 [0048]

【００４９】その後、処理対象にしている矩形領域３０
１内の各画素(x_s,y_s)〜(x_e,y_e)に対する式（６）の演算
結果を合計することにより現在フレームと後フレームの
上記矩形領域３０１におけるエッジ位置一致度Ecor(nex
t)を求める。そして、このエッジ位置一致度Ecor(next)
と予め定められている閾値とを比較し、エッジ位置一致
度Ecor(next)が閾値を超えている場合は、現在フレーム
が文字出現フレームであると判定し（Ｓ４５がＹＥ
Ｓ）、ステップＳ４６の処理を行う。これに対して、エ
ッジ位置一致度Ecor(next)が閾値を超えていない場合
は、使用順序が次の矩形領域３０２を使用して前述した
処理と同様の処理を行う。使用順序が最後の矩形領域３
０３を使用しても、エッジ位置一致度Ecor(next)が閾値
を超えない場合は、現在フレームは文字出現フレームで
はない判定し（Ｓ４５がＮＯ）、現在フレームを１つ進
め（Ｓ４３）、再びステップＳ４１の処理を行う。Thereafter, the rectangular area 30 to be processed is
Each pixel in _{_{1 (x s, y s)}} ~ (x e, y e) the edge position coincidence Ecor in the square area 301 of the rear and the current frame frame by summing the calculated result of the equation (6) for ( nex
Find t). Then, this edge position coincidence Ecor (next)
Is compared with a predetermined threshold value, and if the edge position coincidence Ecor (next) exceeds the threshold value, it is determined that the current frame is a character appearance frame (S45: YE
S), the process of step S46 is performed. On the other hand, if the edge position coincidence Ecor (next) does not exceed the threshold, the same processing as the above-described processing is performed using the next rectangular area 302 in the use order. Last rectangular area 3
If the edge position coincidence Ecor (next) does not exceed the threshold value even if 03 is used, it is determined that the current frame is not a character appearance frame (S45: NO), the current frame is advanced by one (S43), and again The process of step S41 is performed.

【００５０】ステップＳ４６では、現在フレームと前フ
レームとの対応する或る矩形領域におけるエッジ位置一
致度Ecor(previous)と、現在フレームと後フレームとの
対応する上記或る矩形領域におけるエッジ位置一致度Ec
or(next)との比であるエッジ位置一致度比に基づいて、
現在フレームが静止区間の先頭フレームであるか否かを
判定する。このステップＳ４６の処理を詳しく説明する
と次のようになる。In step S46, the edge position coincidence Ecor (previous) in a certain rectangular area corresponding to the current frame and the previous frame, and the edge position coincidence in the certain rectangular area corresponding to the current frame and the subsequent frame are determined. Ec
or (next), based on the edge position coincidence ratio,
It is determined whether or not the current frame is the first frame of the still section. The processing in step S46 will be described in detail below.

【００５１】ステップＳ４６において、文字列出現区間
検出部１１は、処理順序が第１番目の矩形領域３０１を
対象にして次式（７）に示す演算を行うことにより、エ
ッジ位置一致度比を算出する。In step S46, the character string appearance section detection unit 11 calculates the edge position coincidence ratio by performing the calculation shown in the following equation (7) on the rectangular area 301 whose processing order is the first. I do.

【００５２】 [0052]

【００５３】図７に示されるように、テロップ文字列や
CGフリップ文字列が出現し始めた時は、エッジ位置一致
度比は1より大きく、静止状態に近づくにつれ、値が小
さくなっていく。静止区間では1付近の値をとり、文字
消失フレームでは急激に減少する。従って、文字列出現
区間検出部１１は、エッジ位置一致度比が１に対して所
定の誤差範囲内であるか否かを調べることにより、現在
フレームが静止区間の先頭フレームであるか否かを判定
する（Ｓ４６）。そして、静止区間の先頭フレームであ
ると判定した場合（Ｓ４７がＹＥＳ）は、現在フレーム
を１つ進めた後（Ｓ４９）、ステップＳ５０の処理を行
う。これに対して、エッジ位置一致度比が誤差範囲内で
ない場合は、使用順序が第２番目の矩形領域３０２を対
象にして上記した処理と同様の処理を行う。使用順序が
最後の矩形領域３０３を使用しても、エッジ位置一致度
比が誤差範囲内にならなかった場合は、現在フレームは
静止区間の先頭フレームではないと判定し（Ｓ４７がＮ
Ｏ）、現在フレームを１つ進めた後（Ｓ４８）、ステッ
プＳ４６の処理を再び行う。As shown in FIG. 7, a telop character string
When the CG flip character string begins to appear, the edge position coincidence ratio is greater than 1, and the value decreases as the stationary state is approached. It takes a value near 1 in the stationary section, and sharply decreases in the character lost frame. Therefore, the character string appearance section detection unit 11 checks whether or not the edge position coincidence ratio is within a predetermined error range with respect to 1 to determine whether or not the current frame is the first frame of the still section. A determination is made (S46). If it is determined that the current frame is the first frame of the stationary section (S47: YES), the process of step S50 is performed after the current frame is advanced by one (S49). On the other hand, when the edge position coincidence ratio is not within the error range, the same processing as the above-described processing is performed for the rectangular area 302 whose use order is the second. If the edge position coincidence ratio does not fall within the error range even when the last rectangular area 303 used in the use order is used, it is determined that the current frame is not the first frame of the still section (S47 is N).
O) After the current frame is advanced by one (S48), the process of step S46 is performed again.

【００５４】ステップＳ５０では、式（７）に示したエ
ッジ位置一致度比に基づいて、前フレームが静止区間の
文字消失フレームであるか否かを判定する。ステップＳ
５０の処理を詳しく説明すると、次のようになる。In step S50, it is determined based on the edge position coincidence ratio shown in equation (7) whether or not the previous frame is a character lost frame in a still section. Step S
The process 50 is described in detail as follows.

【００５５】ステップＳ５０において、文字列出現区間
検出部１１は、処理順序が第１番目の矩形領域３０１を
対象にして式（７）に示す演算を行うことにより、現在
フレームと前フレームとの処理順序が第１番目の矩形領
域におけるエッジ位置一致度Ecor(previous)と、現在フ
レームと後フレームとの処理順序が第１番目の矩形領域
におけるエッジ位置一致度Ecor(next)との比であるエッ
ジ位置一致度比を算出する。In step S50, the character string appearance section detection unit 11 performs the calculation shown in the equation (7) on the rectangular area 301 whose processing order is the first, thereby processing the current frame and the previous frame. An edge whose order is the ratio between the edge position coincidence Ecor (previous) in the first rectangular area and the processing order of the current frame and the subsequent frame is the ratio of the edge position coincidence Ecor (next) in the first rectangular area. Calculate the position coincidence ratio.

【００５６】図７に示されるように、文字消失フレーム
では、エッジ位置一致度比は急激に減少する。従って、
文字列出現区間検出部１１は、ステップＳ４６において
エッジ位置一致度比が１に対して誤差範囲内であると判
定した矩形領域を処理対象にしてエッジ位置一致度比が
１より所定値以上小さくなったか否かを調べることによ
り、前フレームが文字消失フレームであるか否かを判定
する（Ｓ５０）。そして、文字消失フレームであると判
定した場合（Ｓ５１がＹＥＳ）は、ステップＳ５２の処
理を行う。これに対して、エッジ位置一致度比が誤差範
囲内である場合は、前フレームは文字消失フレームでは
ないと判定し（Ｓ５１がＮＯ）、現在フレームを１つ進
めた後（Ｓ４９）、ステップＳ５０の処理に戻る。As shown in FIG. 7, in the character lost frame, the edge position coincidence ratio sharply decreases. Therefore,
The character string appearance section detection unit 11 sets the edge position coincidence ratio smaller than 1 by a predetermined value or more with respect to the rectangular area whose edge position coincidence ratio is determined to be within the error range with respect to 1 in step S46. It is determined whether or not the previous frame is a character lost frame by checking whether or not the frame has disappeared (S50). If it is determined that the frame is a character lost frame (S51: YES), the process of step S52 is performed. On the other hand, if the edge position coincidence ratio is within the error range, it is determined that the previous frame is not a character lost frame (S51: NO), and after the current frame is advanced by one (S49), step S50 is performed. Return to the processing of.

【００５７】ステップＳ５２では、文字出現区間を示す
文字出現区間情報を音声書き起こし装置２に対して出力
すると共に、静止区間を示す静止区間情報を文字列領域
抽出部１２に対して出力する。なお、文字出現区間情
報、静止区間情報としては、例えば、第ｍフレーム〜第
ｎフレームというように、フレーム番号を使用すること
ができる。また、フレーム番号を単位時間当たりのフレ
ーム数に基づいて時間情報に変換し、この時間情報を文
字出現区間情報として使用することもできる。In step S 52, the character appearance section information indicating the character appearance section is output to the voice transcription device 2, and the static section information indicating the static section is output to the character string area extraction unit 12. As the character appearance section information and the still section information, for example, frame numbers such as the m-th frame to the n-th frame can be used. Also, the frame number can be converted into time information based on the number of frames per unit time, and this time information can be used as character appearance section information.

【００５８】文字列出現区間検出部１１は、上記した処
理をニュース映像信号がなくなるまで（Ｓ５３がＹＥＳ
となるまで）、繰り返し行う。なお、ステップＳ５３に
おいて、ニュース映像信号が終了したことを検出した場
合、文字列出現区間検出部１１は、そのことを音声書き
起こし装置２およびパッセージ検索部３２に通知する。The character string appearance section detecting section 11 repeats the above processing until there is no news video signal (YES in S53).
Until it becomes). If it is detected in step S53 that the news video signal has ended, the character string appearance section detection unit 11 notifies the audio transcript device 2 and the passage search unit 32 of the fact.

【００５９】次に、文字列領域抽出部１２の動作につい
て説明する。文字列領域抽出部１２では、まず背景除去
によって文字以外の領域を除去し、その後局所線密度に
よる文字列領域の抽出を行う。背景除去の方法を以下に
述べる。文字が静止している区間の複数フレームにおい
て、文字領域は時間的に、以下に述べる特徴をもってい
るNext, the operation of the character string area extraction unit 12 will be described. The character string region extraction unit 12 first removes regions other than characters by background removal, and then extracts character string regions based on local linear density. The method of removing the background will be described below. In a plurality of frames in a section where a character is stationary, the character region has the following characteristics in terms of time.

【００６０】(1)文字領域はエッジ成分が安定してい
る。 (2)文字領域は色相、輝度、彩度がほとんど変化しな
い。(1) The character region has stable edge components. (2) Hue, luminance, and saturation of the character area hardly change.

【００６１】そこで、これらの特徴に基づき、複数フレ
ームを用いた背景除去を行う。先ず、文字列領域抽出部
１２は、エッジ位置一致度の変化に基づく背景除去画像
を作成する（図８、Ｓ８１）。このステップＳ８１の処
理を詳細に説明すると、次のようになる。Therefore, background removal using a plurality of frames is performed based on these features. First, the character string region extraction unit 12 creates a background-removed image based on the change in the degree of coincidence of the edge position (FIG. 8, S81). The processing in step S81 will be described in detail as follows.

【００６２】ステップＳ８１において、文字列領域抽出
部１２は、静止区間（Ｎ個のフレーム)内の或る１つの
フレームｔ中に存在する各画素(x_sf,y_sf)〜(x_ef,y_ef)そ
れぞれについて、次式（８）に示す演算を行い、各画素
(x_sf,y_sf)〜(x_ef,y_ef)の各方向ｎ(n=1,2,3,4)に対する
エッジ成分E_t(x_sf,y_sf,n)〜E_t(x_ef,y_ef,n)を求める。な
お、画素(x_sf,y_sf)は、フレームの左上の画素を表し、
画素(x_ef,y_ef)はフレームの右下の画素を表す。また、
静止区間は、文字列出現区間検出部１１から渡された静
止区間情報によって特定することができる。In step S81, the character string region extraction unit 12 determines whether each pixel (x _sf , y _sf ) to (x _ef , y) exists in a certain frame t in a still section (N frames). _ef ) for each of the pixels
_{_{(x sf, y sf) ~}} (x ef, y ef) edge component E _t for each direction n (n = 1,2,3,4) of the _{_{(x sf, y sf, n}} ) ~E t (x ef , y _ef , n). Note that the pixel (x _sf , y _sf ) represents the upper left pixel of the frame,
Pixel (x _ef , y _ef ) represents the lower right pixel of the frame. Also,
The still section can be specified by the still section information passed from the character string appearance section detection unit 11.

【００６３】 [0063]

【００６４】式(8)において、Rt(x,y)、Gt(x,y)、Bt(x,
y)はそれぞれ、画素(x,y)が持っている赤、緑、黄色の
濃淡レベルを表す。なお、xは、x_sf≦x≦x_efの関係を有
し、yは、y_sf≦y≦y_efの関係を有する。また、ｎは図６
に示されるように、画素(x,y)における４方向(0度、45
度、90度、135度)を表している。In equation (8), Rt (x, y), Gt (x, y), Bt (x,
y) represents the red, green, and yellow shading levels of the pixel (x, y), respectively. Incidentally, x is, it has a relationship of _{_{x sf ≦ x ≦ x ef,}} y has a relationship of y _sf ≦ y ≦ y _ef. In addition, n is shown in FIG.
As shown in (4), four directions (0 degree, 45 degrees) in the pixel (x, y)
Degrees, 90 degrees, 135 degrees).

【００６５】その後、文字列出現区間検出部１１は、式
（９）、（１０）を使用して各画素(x_sf,y_sf)〜(x_ef,y
_ef)のエッジベクトルEdge_t(x_sf,y_sf,n)〜Edge_t(x_ef,
y_ef,n)を求める。E_t(x,y,n)が閾値th₂より大きければ、
Edge_t(x,y,n)=1、閾値th₂以下であれば、Edge_t(x,y,n)=
0となる。例えば、Edge_t(x,y,2)=1である画素(x,y)は、
方向ｎ＝２において、エッジが存在すると言える。Thereafter, the character string appearance section detection unit 11 uses the equations (9) and (10) to calculate each pixel (x _sf , y _sf ) to (x _ef , y
_ef ) Edge vector Edge _t (x _sf , y _sf , n) to Edge _t (x _ef ,
y _ef , n). _{E t (x, y, n} ) if is greater than the threshold th _2,
_{Edge t (x, y, n} ) = 1, if the threshold value th ₂ or _{less, Edge t (x, y,} n) =
It becomes 0. For example, a pixel (x, y) where Edge _t (x, y, 2) = 1 is
It can be said that an edge exists in the direction n = 2.

【００６６】 [0066]

【００６７】更に、文字列領域抽出部１２は、静止区間
の各フレーム中の各画素(x_sf,y_sf)〜(x_ef,y_ef)に対して
次式（１１）に示す演算(AND処理)を行うことにより、
静止区間内の全てのフレームにおいて、方向ｎ（n=1or2
or3or4）にエッジを有する画素(x,y)を求める。例え
ば、画素(x,y)が、静止区間の全てのフレームにおいて
方向ｎ＝３にエッジを有していれば、式（１１）の演算
結果Edge(x,y,3)=1となるが、一つでも方向ｎ＝3にエッ
ジを有していないフレームが存在すれば、Edge(x,y,3)=
0となる。Further, the character string region extraction unit 12 performs an operation (AND) shown in the following equation (11) on each pixel (x _sf , y _sf ) to (x _ef , y _ef ) in each frame of the still section. Processing)
For all frames in the stationary section, the direction n (n = 1or2
A pixel (x, y) having an edge at or3or4) is obtained. For example, if the pixel (x, y) has an edge in the direction n = 3 in all the frames in the stationary section, the calculation result of Expression (11) becomes Edge (x, y, 3) = 1. If at least one frame has no edge in the direction n = 3, Edge (x, y, 3) =
It becomes 0.

【００６８】 [0068]

【００６９】その後、文字列領域抽出部１２は、各画素
(x_sf,y_sf)〜(x_ef,y_ef)に対して次式（１２）に示す演算
を行うことにより、エッジベクトルの要素和であるエッ
ジ位置一致度Edge(x_sf,y_sf)〜Edge(x_ef,y_ef)を算出す
る。Thereafter, the character string region extraction unit 12
By performing the operation shown in the following expression (12) on (x _sf , y _sf ) to (x _ef , y _ef ), the edge position coincidence Edge (x _sf , y _sf ) which is the sum of the elements of the edge vector is obtained. Calculate Edge (x _ef , y _ef ).

【００７０】 [0070]

【００７１】更に、文字列領域抽出部１２は、次式（１
３）、（１４）を使用して各画素(x _sf,y_sf)〜(x_ef,y_ef)
の値Ecor(x_sf,y_sf)〜Ecor(x_ef,y_ef)を決定する。即ちエ
ッジ位置一致度Edge(x,y)が閾値th₃よりも大きければ、
画素(x,y)は文字領域のエッジ部分に当たるとし、その
値Ecor(x,y)を１とする。Further, the character string area extraction unit 12 calculates the following equation (1)
3) and (14), each pixel (x _{science fiction}, y_{science fiction}) ~ (X_ef, y_ef)
The value of Ecor (x_{science fiction}, y_{science fiction}) ~ Ecor (x_ef, y_ef) Is determined. That is,
Edge position coincidence Edge (x, y) is the threshold th_ThreeIf it is larger than
Pixel (x, y) is assumed to be the edge of the character area,
The value Ecor (x, y) is set to 1.

【００７２】 [0072]

【００７３】ステップＳ８１の処理が終了すると、文字
列領域抽出部１２は、ステップＳ８２の処理を行う。ス
テップＳ８２では、RGB空間の輝度変化に基づいた背景
除去画像を生成する。ステップＳ８２の処理を詳細に説
明すると、次のようになる。When the processing in step S81 is completed, the character string area extraction unit 12 performs the processing in step S82. In step S82, a background-removed image is generated based on a change in luminance in the RGB space. The processing in step S82 will be described in detail as follows.

【００７４】ステップＳ８２において、文字列領域抽出
部１２は、ステップＳ８１でEcor(x,y)=１とされた画素
（エッジ部分として抽出された画素）を処理対象にし
て、次の処理を行う。In step S82, the character string region extraction unit 12 performs the following processing on the pixel for which Ecor (x, y) = 1 in step S81 (the pixel extracted as an edge portion). .

【００７５】先ず、静止区間（Ｎ個のフレーム）内の連
続する２つのフレームｔ、ｔ＋１の組み合わせ（第１番
目のフレームと第２番目のフレームとの組み合わせ〜第
Ｎ−１番目のフレームと第Ｎ番目のフレームとの組み合
わせ）それぞれにおいて、各処理対象画素(x_ss,y_ss)〜
(x_es,y_es)それぞれに対して次式（１５）に示す演算を
行うことにより、フレームｔにおける各画素(x_ss,y_ss)
〜(x_es,y_es)のRGB値と、フレームｔ＋1における対応す
る画素(x_ss,y_ss)〜(x_es,y_es)のRGB値との差（距離）C
_t(x_ss,y_ss)〜C_t(x_es,y_es)を求める。First, a combination of two consecutive frames t and t + 1 in a stationary section (N frames) (a combination of the first frame and the second frame to a (N−1) th frame and a Nth frame). In each case (combination with the Nth frame), each pixel to be processed ( _xss , _yss ) to
(x _es , y _es ) is calculated for each pixel (x _ss , y _ss ) in the frame t by performing the operation shown in the following equation (15)
~ (X _es, y _es) and the RGB values of the corresponding pixel (x _ss, y _ss) in the frame _{t + 1 ~ (x es,} y es) the difference (distance) between the RGB values of C
_t (x _ss , y _ss ) to C _t (x _es , y _es ) are obtained.

【００７６】 [0076]

【００７７】その後、次式（１６）に示す演算を行うこ
とにより、各処理対象画素(x_ss,y_ss)〜(x_es,y_es)それぞ
れの輝度変化量を算出する。Thereafter, by performing the calculation represented by the following equation (16), the luminance change amount of each of the processing target pixels (x _ss , y _ss ) to (x _es , y _es ) is calculated.

【００７８】 [0078]

【００７９】その後、式（１７）、（１８）を使用して
各処理対象画像(x_ss,y_ss)〜(x_es,y_e _s)それぞれの値Colo
r(x_ss,y_ss)〜Color(x_es,y_es)を決定する。即ち輝度変化
量が閾値th₄以下であれば、画素(x,y)は文字領域のエッ
ジ部分に当たると判断し、値Color(x,y)を１とする。こ
れに対して閾値th₄よりも大きければ、値Color(x,y)を0
とする。[0079] Then, equation (17), (18) using in each processing target image _{_{(x ss, y ss) ~}} (x es, y e s) the respective values Colo
Determine r (x _ss , y _ss ) to Color (x _es , y _es ). That is, if the luminance variation is the threshold value th ₄ below, the pixel (x, y) determines that hits the edge portion of the character region, the value Color (x, y) is referred to as 1. Greater than the threshold value th ₄ contrast, value Color a (x, y) 0
And

【００８０】 [0080]

【００８１】その後、文字列領域抽出部１２は、式(1
3)、式(17)ともに条件を満たす画素(x,y)のみを抽出す
る(Ｓ８３)。これにより、色が変化したり動いている背
景は除去され、文字のエッジ部分が抽出された背景除去
画像が生成される。Thereafter, the character string area extraction unit 12 calculates the expression (1
Only the pixel (x, y) satisfying the conditions in both (3) and (17) is extracted (S83). As a result, the background in which the color is changed or moving is removed, and a background-removed image in which the edge portion of the character is extracted is generated.

【００８２】その後、文字列領域抽出部１２は、局所線
密度による文字列領域の抽出処理を行う（Ｓ８４）。こ
のステップＳ８４の処理を詳細に説明すると、次のよう
になる。Thereafter, the character string area extracting section 12 performs a character string area extracting process based on the local line density (S84). The processing in step S84 will be described in detail below.

【００８３】ステップＳ８１〜Ｓ８３の背景除去処理に
よって、輝度の変化が少なく、エッジの安定している文
字領域が抽出されるが、これらの領域には図形(例え
ば、白い箱の映像)のような面積を持つテロップ以外の
領域も含まれる場合がある。これらを除去し文字領域の
みを抽出するために、文字は線分の集合体であるという
特徴を用いる。背景除去された画像において、各画素を
中心とした一定領域内の線密度を計算する。この方法に
より、図形等の領域に比べ文字領域において線密度が高
くなる。この処理を定式化したものを式（１９）に示
す。By the background removal processing in steps S81 to S83, character regions with small changes in luminance and stable edges are extracted. These regions are represented by figures (for example, white box images). A region other than a telop having an area may be included. In order to remove these and extract only the character area, the feature that a character is a set of line segments is used. In the image from which the background has been removed, the line density within a certain area centered on each pixel is calculated. According to this method, the line density is higher in the character area than in the area such as the figure. Formula (19) shows a formalization of this processing.

【００８４】 [0084]

【００８５】式（１９）において、VLD(i,j),HLD(i,j)
はそれぞれ縦方向、横方向の変化点を求める処理であ
り、それぞれ次式（２０）、（２１）で定義される。ま
た、w(i,j|x,y)は、(x,y)を中心として画素(i,j)の位置
に対する重みであり、次式（２２）で定義される。式
（２２）におけるd(i,j|x,y)は、座標(i,j)と座標(x,y)
の距離を表しており、次式（２３）で定義される。In equation (19), VLD (i, j), HLD (i, j)
Is a process for obtaining a vertical and horizontal change point, respectively, and is defined by the following equations (20) and (21), respectively. Also, w (i, j | x, y) is a weight for the position of the pixel (i, j) centered on (x, y) and is defined by the following equation (22). D (i, j | x, y) in the equation (22) is a coordinate (i, j) and a coordinate (x, y)
And is defined by the following equation (23).

【００８６】 [0086]

【００８７】なお、式（２０）の右辺は、F(i,j)とF(i,
j+1)との排他的論理和を表し、式（２１）の右辺は、F
(i,j)とF(i＋１,j)との排他的論理和を表している。ま
た、F(i,j)は、背景除去画像の座標(i,j)における抽出
点の有無(有;1、無;0)を表している。The right side of the equation (20) is expressed by F (i, j) and F (i,
j + 1), and the right side of equation (21) is F
It represents the exclusive OR of (i, j) and F (i + 1, j). F (i, j) represents the presence / absence of an extraction point at the coordinates (i, j) of the background-removed image (Yes; 1, No; 0).

【００８８】この結果、画素(x,y)における線密度の値
が求まる。最後に、各画素の線密度が周囲一定領域の平
均線密度より高く、かつ閾値以上であるなら、その画素
を文字領域の画素とし、画素の連結領域を文字領域とし
て抽出する。As a result, the value of the line density at the pixel (x, y) is obtained. Lastly, if the linear density of each pixel is higher than the average linear density of the surrounding constant area and equal to or larger than the threshold value, the pixel is set as a pixel of a character area, and a connected area of pixels is extracted as a character area.

【００８９】文字列領域抽出部１２が文字領域を抽出す
ると、２値化処理部１３が動作する。２値化処理部１３
は、文字列領域抽出部１２によって抽出された文字領域
に対し、浮動３値化処理を用いて２値化処理を行う。ま
ず、各画素を中心とする長方形マスク領域における平均
の濃淡値を求めることにより移動平均画像を求め、浮動
3値化処理により２値画像を求める。長方形マスク領域
には横長(x方向)と縦長(y方向)の２種類を用い、それぞ
れで２値画像を生成する。最後に、生成された２種類の
２値画像のAND処理を行うことで２値画像を得る。式
（２４）〜（２７）に、浮動３値化処理を定式化したも
のを示す。When the character string area extraction unit 12 extracts a character area, the binarization processing unit 13 operates. Binarization processing unit 13
Performs a binarization process on the character region extracted by the character string region extraction unit 12 using a floating ternarization process. First, a moving average image is obtained by obtaining an average grayscale value in a rectangular mask area centered on each pixel, and a floating average is obtained.
A binary image is obtained by a ternarization process. Two types of rectangular mask regions, a horizontally long (x direction) and a vertically long (y direction), are used, and a binary image is generated for each. Finally, a binary image is obtained by performing an AND process on the two types of generated binary images. Formulas (24) to (27) show formulas of the floating ternary processing.

【００９０】 [0090]

【００９１】2X、2Yは長方形マスク領域の辺の長さを表
し、I(x,y)は座標(x,y)に存在する画素の輝度値を表し
ている。kは、背景よりも輝度が高い文字領域の画像B
(x,y)と背景よりも輝度が低い文字領域の画像D(x,y)を
生成する時に、マージンを決定するために用いられる定
数である。式（２４）は、画素(x,y)を中心とした長方
形マスク領域における輝度の平均値を表している。式
（２５）では、元画像と移動平均画像の対応画素(x,y)
における差分値を求めている。従って、この式から差分
画像を求めることができる。2X and 2Y represent the lengths of the sides of the rectangular mask area, and I (x, y) represents the luminance value of the pixel at the coordinates (x, y). k is the image B of the character area with higher brightness than the background
This is a constant used for determining a margin when generating an image D (x, y) of (x, y) and a character area having lower luminance than the background. Equation (24) represents the average value of the luminance in the rectangular mask area centered on the pixel (x, y). In equation (25), the corresponding pixel (x, y) between the original image and the moving average image
Is obtained. Therefore, a difference image can be obtained from this equation.

【００９２】式（２６）、（２７）に定義されるよう
に、差分画像における画素(x,y)の濃淡値がkσ（σは画
像Ｉの標準偏差）よりも大きければ、背景よりも輝度が
高い文字領域画像B(x,y)=1、差分画像における画素(x,
y)の濃淡値が−kσよりも小さければ、背景よりも輝度
が低い文字領域画像D(x,y)=1となる。背景よりも輝度が
高い文字領域画像Bと背景よりも輝度が低い文字領域画
像DのAND画像を生成することで、文字列領域画像に対し
て浮動３値化処理を行うことができる。画像Bと画像Dの
AND処理を行う場合には、画像Dの符号を反転してから行
う。As defined by the equations (26) and (27), if the gray value of the pixel (x, y) in the difference image is larger than kσ (σ is the standard deviation of the image I), the luminance becomes higher than the background. Is high in the character area image B (x, y) = 1, and the pixel (x,
If the gray value of y) is smaller than -k [sigma], the character region image D (x, y) = 1 having a lower luminance than the background. By generating an AND image of the character region image B having a higher luminance than the background and the character region image D having a lower luminance than the background, it is possible to perform the floating ternary processing on the character string region image. Image B and Image D
When performing the AND process, the sign of the image D is inverted before the AND process.

【００９３】２値化処理部１３で２値化された文字領域
画像が生成されると、ＯＣＲ部１４が動作する。ＯＣＲ
部１４は、２値化処理部１３で２値化された文字領域画
像を用いて文字認識を行い、文字認識結果をパッセージ
検索部３２に渡す。ＯＣＲ部１４は、２値化処理部１３
において２値化された文字領域画像が生成される毎に上
述した処理を行う。パッセージ検索部３２は、ＯＣＲ部
１４から文字認識結果が渡される毎に、それを保存して
おく。ＯＣＲ部１４としては、市販のOCRを用いること
ができる。使用したOCRはWin Reader Pro Ver5.0で、認
識方法は次のとおりである。文字を切り出し、文字線の
エッジ特徴を抽出する。7×7のブロックに分割し、各ブ
ロックで4方向の加重方向ヒストグラムを取る。ヒスト
グラムを平滑化し、正準判別分析により、特徴を196次
元から64次元に圧縮する。認識では、標準パターンとの
距離を計算する線形識別関数が用いられている。When the binarized character area image is generated by the binarization processing section 13, the OCR section 14 operates. OCR
The unit 14 performs character recognition using the character area image binarized by the binarization processing unit 13, and passes the character recognition result to the passage search unit 32. The OCR unit 14 includes a binarization processing unit 13
Performs the above-described processing every time a binarized character area image is generated. The passage search unit 32 stores the character recognition result every time it is passed from the OCR unit 14. As the OCR unit 14, a commercially available OCR can be used. The OCR used is Win Reader Pro Ver5.0, and the recognition method is as follows. Cut out characters and extract edge features of character lines. Divide into 7 x 7 blocks and take a weighted direction histogram in four directions for each block. The histogram is smoothed, and the features are compressed from 196 dimensions to 64 dimensions by canonical discriminant analysis. In recognition, a linear discriminant function for calculating a distance from a standard pattern is used.

【００９４】次に、音声書き起こし装置２の動作につい
て述べる。音声書き起こし装置２は、ニュース映像信号
から分離したニュース音声信号を文字列に書き起こし、
音声書き起こし結果と時間情報（ニュース音声信号の先
頭部分から何秒後のニュース音声信号に対応する音声書
き起こし結果なのかを示す情報）とを対応付けて内部に
保存しておく。ニュース音声の書き起こしは、言語モデ
ルと音響モデルによる大語彙連続音声認識により行う。
用いた言語モデルは、毎日新聞CD-ROM版の45か月分(91
年1月〜94年9月)の記事から学習したもので、語彙数20K
のback-off bigramである。back-off smoothingにはWit
ten-Bellの推定を用いており、bigramに対するcut-off
は1とした。音響モデルは、男性不特定話者HMMで、単語
間の音素文脈依存も考慮したcross-word triphoneモデ
ルである。学習には、日本音響学会新聞記事読み上げ音
声コーパスのうち、男性話者137人分の21782発話を用い
た。音響特徴量には39次元の特徴パラメータ(12次元のM
FCCパワー、及びそれぞれのΔ、ΔΔ係数)を用いた。Next, the operation of the voice transcription device 2 will be described. The audio transcription device 2 transcribes the news audio signal separated from the news video signal into a character string,
The speech transcription result and time information (information indicating how many seconds after the head of the news speech signal the speech transcription result corresponds to the news speech signal) are stored in association with each other. Transcription of news speech is performed by large vocabulary continuous speech recognition using a language model and an acoustic model.
The language model used was 45 months for the Mainichi Newspaper CD-ROM version (91
(January-September 1994), 20K vocabulary
Back-off bigram. Wit for back-off smoothing
Using ten-Bell estimation, cut-off for bigram
Is 1. The acoustic model is a male unspecified speaker HMM, and is a cross-word triphone model that also considers phoneme context dependencies between words. For learning, we used 21782 utterances of 137 male speakers in the acoustic corpus of newspaper articles read by the ASJ. The acoustic features include 39-dimensional feature parameters (12-dimensional M
FCC power and respective Δ, ΔΔ coefficients) were used.

【００９５】また、音声書き起こし装置２は、文字列出
現区間検出部１１から文字出現区間情報が渡される毎
に、それを保存しておく。そして、文字列出現区間検出
部１１からニュース映像信号が終了したことが通知され
ると、保存しておいた音声書き起こし結果の内の、保存
しておいた各文字出現区間情報によって示される区間の
書き起こし結果を単語間類似度計算部３１に渡す。な
お、文字列出現区間検出部１１から渡された文字出現区
間情報がフレーム番号によるものである場合は、フレー
ム番号を時間情報に変換し、この時間情報を使用して該
当する音声書き起こし結果を特定し、単語間類似度計算
部３１に渡す。Further, each time the character appearance section information is passed from the character string appearance section detection section 11, the voice transcription device 2 stores it. When the end of the news video signal is notified from the character string appearance section detection unit 11, the section indicated by the stored character appearance section information in the stored voice transcription results. Is passed to the inter-word similarity calculation unit 31. If the character appearance section information passed from the character string appearance section detection unit 11 is based on a frame number, the frame number is converted into time information, and the corresponding speech transcription result is converted using the time information. It is specified and passed to the inter-word similarity calculation unit 31.

【００９６】次に、検索装置３の動作について説明す
る。検索装置３は、前述したように単語間類似度計算部
３１とパッセージ検索部３２とから構成されている。単
語間類似度計算部３１は、音声書き起こし装置２の音声
書き起こし結果の内の、文字出現区間における音声書き
起こし結果に基づいて単語間類似度を学習する。従来の
ように、人手で構築したシソーラスにおける単語間類似
度を使うのではなく、直接分析対象のデータから単語間
類似度を計算するので、時期差の問題を回避することが
できる。Next, the operation of the search device 3 will be described. The search device 3 includes the inter-word similarity calculation unit 31 and the passage search unit 32 as described above. The inter-word similarity calculation unit 31 learns the inter-word similarity based on the voice transcription result in the character appearance section among the voice transcription results of the voice transcription device 2. Instead of using the similarity between words in a thesaurus constructed manually as in the related art, the similarity between words is calculated directly from the data to be analyzed, so that the problem of time difference can be avoided.

【００９７】単語間類似度計算部３１は、音声書き起こ
し装置２から文字列出現区間検出部１１が検出した各文
字出現区間に対応する音声書き起こし結果が渡される
と、音声書き起こし結果中に存在する単語間の類似度を
求める。この処理を詳しく説明すると、次のようにな
る。When the inter-word similarity calculation unit 31 receives the speech transcription result corresponding to each character appearance section detected by the character string appearance section detection unit 11 from the speech transcription device 2, the speech transcription unit 2 Find similarity between existing words. This processing will be described in detail below.

【００９８】単語w_iと単語w_j間の距離（類似度）は、単
語w_i、w_jそれぞれが音声書き起こし結果P_mに対して持っ
ているTF、IDF、相互情報量のユークリッド空間上の距
離として、次式（２８）のように表すことができる。P_m
は文字列出現区間検出部１１によって検出された第ｍ番
目の文字出現区間におけるニュース音声信号の書き起こ
し結果であり、音声書き起こし装置２から入力されるも
のである。TFは単語w_iがP_mにおいて出現する頻度、IDF
(w_i)は、単語w_Iが出現した文字出現区間の個数に関連す
る値であり、次式（２９）で定義される。相互情報量
は、単語w_iを知ることによって、P_mに関して得られる情
報量のことであり、次式（３０）で示される。この情報
量は、P_mがそもそも持っていた情報量i(P_m)から、単語w
_iを知った後でもまだP_mが有している情報量i(P_m|w_i)と
の差として定義される。式（３０）により相互情報量
は、P_mの生起確率P(P_m)と単語w_iの生起確率P(w_i)が独立
していれば小さくなり、依存していれば大きな値とな
る。The distance (similarity) between the word w _i and the word w _j is calculated on the Euclidean space of the TF, IDF, and mutual information of each of the words w _i and w _j with respect to the speech transcription result P _m . Can be expressed as the following equation (28). P _m
Is a transcription result of a news speech signal in the m-th character appearance section detected by the character string appearance section detection unit 11, and is input from the speech transcription apparatus 2. TF is the frequency that the word w _i appears in P _m , IDF
(w _i ) is a value related to the number of character appearance sections in which the word w _I appears, and is defined by the following equation (29). Mutual information, by knowing the word w _i, and that the amount information obtained for P _m, represented by the following formula (30). This information amount is obtained by calculating the word w from the information amount i (P _m ) that P _m originally had.
It is defined as the difference from the information amount i (P _m | w _i ) that P _m still has after knowing _i . Mutual information by the equation (30) becomes smaller if the independent P _m of the occurrence probability P (P _m) and the occurrence probability of a word w _i P (w _i) is a large value if the dependent .

【００９９】 [0099]

【０１００】パッセージ検索部３２は、文字列出現区間
検出部１１からニュース映像信号の終了が通知される
と、保存してある各文字認識結果（各テロップ文字列な
どの認識結果）をそれぞれ検索質問Qk（ｋは何番目の文
字認識結果であるのかを示す添字である）とし、次のよ
うな処理を行う。先ず、ニュース音声の書き起こし結果
の先頭部分から順に分析区間（例えば、１発話を１分析
区間とする）をとり、その分析区間をずらしながら、分
析区間Pl（ｌは何番目の分析区間であるのかを示す添字
である）と検索質問Qkの類似度(X_k,X_l)を求める。この
類似度(X_k,X_l)は、次式（３１）により求める。When the end of the news video signal is notified from the character string appearance section detecting section 11, the passage search section 32 searches the stored character recognition results (recognition results of each telop character string and the like) by a search query. Qk (k is a subscript indicating the number of the character recognition result), and the following processing is performed. First, analysis sections (for example, one utterance is defined as one analysis section) are sequentially taken from the head of the transcription result of the news speech, and the analysis section is shifted while shifting the analysis section Pl (1 is the number of the analysis section). And a similarity (X _k , X _l ) between the search query Qk. The similarity (X _k , X _l ) is obtained by the following equation (31).

【０１０１】 [0101]

【０１０２】次に、パッセージ検索部３２は、類似度の
高い分析区間をパッセージ候補区間として検出する。そ
の後、パッセージ候補区間が連続している区間を、検索
結果であるパッセージとし、パッセージとＯＣＲ部１４
で認識された文字列とを登録装置４に出力する。パッセ
ージ検索部３２は、上述した処理を保存してある全ての
検索質問に対して行う。Next, the passage search section 32 detects an analysis section having a high degree of similarity as a passage candidate section. Thereafter, a section in which the passage candidate sections are continuous is defined as a passage as a search result, and the passage and the OCR unit 14 are used.
Is output to the registration device 4. The passage search unit 32 performs the above-described processing for all stored search questions.

【０１０３】登録装置４は、ＯＣＲ部１４で認識された
文字列（検索質問）と、パッセージ検索部３２で検索さ
れたパッセージに対応するニュース映像信号とを関連付
けてデータベース５に登録する。例えば、検索質問をイ
ンデックスとしてニュース映像信号をデータベース５に
登録する。ここで、パッセージと対応するニュース映像
信号は、例えば、次のようにして得ることができる。音
声書き起こし装置２に、時間情報と対応付けて保存され
ている書き起こし結果を検索し、パッセージと同一内容
部分を探し出す。その後、探し出した部分に対応付けて
保存されている時間情報を取得し、時間情報をフレーム
番号に変換する。そして、フレーム番号に基づいて、パ
ッセージに対応するニュース映像信号を取得する。The registration device 4 registers the character string (search question) recognized by the OCR unit 14 and the news video signal corresponding to the passage searched by the passage search unit 32 in the database 5 in association with each other. For example, the news video signal is registered in the database 5 using the search question as an index. Here, the news video signal corresponding to the passage can be obtained, for example, as follows. A transcription result stored in the audio transcription device 2 in association with the time information is searched to find a portion having the same content as the passage. Thereafter, time information stored in association with the searched portion is obtained, and the time information is converted into a frame number. Then, a news video signal corresponding to the passage is obtained based on the frame number.

【０１０４】図９はデータベース構築システムのハード
ウェア構成の一例を示したブロック図であり、ニュース
映像信号およびニュース音声信号を入力とするコンピュ
ータ９１と、記録媒体９２と、データベース９３（図１
のデータベース５と対応する）とから構成されている。
記録媒体９２は、ディスク、半導体メモリ、その他の記
録媒体であり、コンピュータ９１をデータベース構築シ
ステムとして機能させるためのプログラムが記録されて
いる。このプログラムは、コンピュータ９１によって読
み取られ、その動作を制御することで、コンピュータ９
１上に、図１に示した文字認識装置１、音声書き起こし
装置２、検索装置３、登録装置４を実現する。FIG. 9 is a block diagram showing an example of the hardware configuration of the database construction system. The computer 91 receives a news video signal and a news audio signal, a recording medium 92, and a database 93 (FIG. 1).
Database 5).
The recording medium 92 is a disk, a semiconductor memory, or another recording medium, and stores a program for causing the computer 91 to function as a database construction system. This program is read by the computer 91, and by controlling its operation,
1, a character recognition device 1, a voice transcription device 2, a search device 3, and a registration device 4 shown in FIG.

【０１０５】[0105]

【発明の効果】本発明では、文字認識装置によって認識
されたテロップやCGフリップ文字列中の全単語を用い
て、ニュース音声の書き起こしに対してパッセージ検索
を行っている。従って、一つの単語のシソーラスの影響
に引きづられて、関係のない文を抽出してしまう危険性
を軽減する効果がある。従って、データベースに関係の
ないニュース映像が登録される危険性を低減できる。ま
た、検索結果をパッセージ単位としているので、結果の
前後関係が理解しやすいといった効果が得られる。従っ
て、データベースに前後関係が分かりやすい形でニュー
ス映像を登録することが可能になる。According to the present invention, a passage search is performed for a transcription of a news voice using all words in a telop or CG flip character string recognized by a character recognition device. Therefore, there is an effect of reducing the risk of extracting an unrelated sentence due to the influence of the thesaurus of one word. Therefore, it is possible to reduce the risk that news videos unrelated to the database are registered. Further, since the search result is set in passage units, an effect that the context of the result can be easily understood is obtained. Therefore, the news video can be registered in the database in such a manner that the context is easy to understand.

【０１０６】検索装置の単語間類似度計算部は、文字認
識装置における文字列出現区間検出部によって検出され
た時間区間の音声信号の書き起こしをもとに、単語間類
似度を学習するものである。つまり、本発明は、従来の
技術のように人手で構築したシソーラスにおける単語間
類似度を使うのではなく、直接分析対象のデータから単
語間類似度を計算するので、時期差の問題を回避する効
果がある。The inter-word similarity calculation unit of the search device learns the inter-word similarity based on the transcription of the speech signal in the time section detected by the character string appearance section detection unit in the character recognition device. is there. That is, the present invention calculates the inter-word similarity directly from the data to be analyzed, instead of using the inter-word similarity in a manually constructed thesaurus as in the conventional technique, and thus avoids the time difference problem. effective.

【０１０７】本発明では、人手によって書き起こされた
ニュース音声を用いるのではなく、音声書き起こし装置
によって書き起こされたニュース音声を用いるので、従
来に比較して労力を軽減することができる。According to the present invention, the news voice transcribed by the voice transcriptor is used instead of the news voice transcribed manually, so that the labor can be reduced as compared with the related art.

[Brief description of the drawings]

【図１】本発明の実施例のブロック図である。FIG. 1 is a block diagram of an embodiment of the present invention.

【図２】文字出現区間および静止区間を説明するための
図である。FIG. 2 is a diagram illustrating a character appearance section and a still section.

【図３】文字列出現区間検出部１１が使用する矩形領域
を説明するための図である。FIG. 3 is a diagram for explaining a rectangular area used by a character string appearance section detection unit 11;

【図４】文字列出現区間検出部１１の処理例を示す流れ
図である。FIG. 4 is a flowchart illustrating a processing example of a character string appearance section detection unit 11;

【図５】前フレーム、現在フレーム及び後フレームを説
明するための図である。FIG. 5 is a diagram for explaining a previous frame, a current frame, and a subsequent frame.

【図６】画素(x,y)に対する方向ｎを説明するための図
である。FIG. 6 is a diagram for explaining a direction n with respect to a pixel (x, y).

【図７】エッジ位置一致度比の変化を示す図である。FIG. 7 is a diagram illustrating a change in an edge position coincidence ratio.

【図８】文字列領域抽出部１２の処理例を示す流れ図で
ある。FIG. 8 is a flowchart showing a processing example of a character string area extraction unit 12;

【図９】コンピュータによりデータベース構築システム
を実現する際のハードウェア構成を示すブロック図であ
る。FIG. 9 is a block diagram illustrating a hardware configuration when a database construction system is realized by a computer.

[Explanation of symbols]

１…文字認識装置１１…文字列出現区間検出部１２…文字列領域抽出部１３…２値化処理部１４…ＯＣＲ部２…音声書き起こし装置３…検索装置３１…単語間類似度計算部３２…パッセージ検索部４…登録装置５…データベース 1 .... Character recognition device 11 ... character string appearance section detection unit 12 ... Character string area extraction unit 13 binarization processing unit 14 OCR section 2. Voice transcription device 3. Search device 31: Inter-word similarity calculator 32 ... passage search section 4: Registration device 5 ... Database

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｎ 7/03 7/035 (72)発明者有木康雄滋賀県大津市瀬田大江町横谷１−５龍谷大学内Ｆターム(参考） 5B075 ND12 NK10 NK13 NK21 NK32 UU40 5C052 AA01 AC08 CC20 DD04 5C063 CA23 DA03 Continued on the front page (51) Int.Cl. ⁷ Identification code FI Theme coat II (Reference) H04N 7/03 7/035 (72) Inventor Yasuo Ariki 1-5 Yokotani, Seta-Oe-cho, Otsu-shi, Shiga Prefecture Ryukoku University F Terms (reference) 5B075 ND12 NK10 NK13 NK21 NK32 UU40 5C052 AA01 AC08 CC20 DD04 5C063 CA23 DA03

Claims

[Claims]

An audio transcriptor for writing a news voice corresponding to a news video into a character string, a character recognition device for detecting a character appearance section where the character string appears in the news video and recognizing the character string. The similarity between words in the speech transcription result corresponding to the character appearance section detected by the character recognition device is obtained, and the character recognition device recognizes the word from the speech transcription result using the similarity. A search device that searches for a passage similar to the character string; and a registration device that associates a recognition result of the character recognition device with a news video corresponding to the passage searched by the search device and registers the news image in a database. A database construction system characterized by:

2. The database construction system according to claim 1, wherein said character string is a telop character string or a CG flip character string.

3. The database construction system according to claim 1, wherein the search device determines a similarity between words in a speech transcription result corresponding to a character appearance section detected by the character recognition device. A passage similarity calculator that searches for a passage similar to the character string recognized by the character recognition device from the speech transcription result using the similarity calculated by the inter-word similarity calculator. And a database construction system.

4. The database construction system according to claim 3, wherein the inter-word similarity calculator calculates a word appearance frequency in a character appearance section, a number of character appearance sections in which the word appears, a word and a character appearance section. A database construction system having a configuration for calculating a degree of similarity between words based on a mutual information amount and a transcription result corresponding to (i).

5. The database construction system according to claim 4, wherein said character recognition device has a configuration for recognizing said character string in a stationary section of the character string.

6. A news voice corresponding to the news video corresponding to a character appearance section detected by a character recognition device that recognizes the character string while detecting a character appearance section where a character string appears in the news video. The similarity between words in the transcription result of the voice transcription device transcribed to the column is determined, and the similarity is used to determine a passage similar to the character string recognized by the character recognition device from the voice transcription result. A passage search device, comprising: means for searching for a passage.

7. The passage retrieval apparatus according to claim 6, wherein the character string is a telop character string or a CG flip character string.

8. A voice transcription step of writing a news voice corresponding to the news video into a character string, a character recognition step of detecting a character appearance section where the character string appears in the news video and recognizing the character string. The similarity between words in the speech transcription result corresponding to the character appearance section detected in the character recognition step is obtained, and the character recognition step recognizes the word from the speech transcription result using the similarity. A search step of searching for a passage similar to the character string; and a registration step of associating a recognition result of the character recognition step with a news video corresponding to the passage searched in the search step and registering the news image in a database. Characteristic database construction method.

9. The database construction method according to claim 8, wherein the character string is a telop character string or a CG flip character string.

10. A program for causing a computer to function as a database construction system, comprising: a sound transcriptor for writing a news voice corresponding to a news video into a character string; and a character string appearing in the news video. A character recognition device that detects a character appearance section and recognizes the character string, obtains a similarity between words in a speech transcription result corresponding to the character appearance section detected by the character recognition device, and calculates the similarity. A search device that searches for a passage similar to the character string recognized by the character recognition device from the speech transcription result using the search result; news corresponding to the recognition result of the character recognition device and the passage searched by the search device. A program for functioning as a registration device that registers images in the database in association with video .

11. The program according to claim 10, wherein the character string is a telop character string or a CG flip character string.