JPH1139343A

JPH1139343A - Video retrieval device

Info

Publication number: JPH1139343A
Application number: JP9209735A
Authority: JP
Inventors: Haruki Tsuchiya; 治紀槌屋
Original assignee: MEDIA RINKU SYST KK
Current assignee: MEDIA RINKU SYST KK
Priority date: 1997-07-17
Filing date: 1997-07-17
Publication date: 1999-02-12

Abstract

PROBLEM TO BE SOLVED: To provide a suitable video retrieval device by using it for speedily detecting a video which a viewer desires from a large quantity of spread videos. SOLUTION: A change position detection part 13 detects the switch of a scene. A representative picture extraction part 14 extracts one to several pieces of representative picture VS(video software) from the respective scenes. A character data extraction part 23 extracts character data from them. Character data is accumulated in a character data accumulation part 23 with information on the extracted position. When a user desires to check something in business or personally, accumulated character data is retrieved with a keyword showing it. The representative picture VD which is hit and original video software VS are displayed on a display 26. Data can be selected from the display of a table.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は映像検索装置に関
し、詳しくは、大量に流布される映像の中から視聴者が
欲しているものを素早く見つけ出すのに使用して好適な
映像検索装置に関する。なお本明細書では、テレビジョ
ン放送１フレーム分に相当するものを「画像」又は「画
面」と言い、これを複数枚連ねたものを「映像」と言
う。これに音声信号、キャプションコード（字幕符号）
等を付加したものを「映像ソフトウェア（図３ＶＳ（以
下、この符号ＶＳは省略））」と言う。これらは、映画
フィルム、ビデオテープ、テレビジョン放送、ＤＶＤ
（デジタルビデオディスク或いはデジタルバーサタイズ
ディスク）、インターネット、コンピュータ記憶媒体な
どで供給される。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video search device, and more particularly to a video search device suitable for use in quickly finding what a viewer wants from a large amount of distributed video. In this specification, an image corresponding to one frame of a television broadcast is referred to as an "image" or a "screen", and a series of a plurality of images is referred to as an "image". This includes audio signals and caption codes (caption codes)
The one to which the information is added is referred to as “video software (VS in FIG. 3 (hereinafter, this symbol VS is omitted))”. These include motion picture film, videotape, television broadcast, DVD
(Digital video disc or digital versatile disc), the Internet, computer storage media, and the like.

【０００２】[0002]

【従来の技術】映画やスポーツ中継など、映像ソフトウ
ェアが大量に供給される時代になって来た。ほかにも、
ドラマ、音楽、教養、報道など、実に多種多様な分野の
映像ソフトウェアが存在し、その供給量は今後更に増大
すると予想される。百聞は一見に如かずの諺にもあると
おり、これら映像ソフトウェアの利用価値は非常に高
い。仕事のことにしろ、個人の趣味レベルにしろ、これ
らを的確に把握し、そのときどきの判断材料として、或
いはライブラリとして利用出来るなら、その価値は非常
に高い。2. Description of the Related Art An era has come when a large amount of video software is supplied, such as for movies and sports broadcasting. Besides,
There is a wide variety of video software in the fields of drama, music, education, news coverage, etc., and the supply is expected to increase further in the future. As the saying goes, the picture software is very useful. Whether it's work or personal hobbies, it's very valuable if it can be accurately grasped and used as an occasional decision source or as a library.

【０００３】映像ソフトウェアは時間の流れに沿うもの
である。人はこれを見るために時間を必要とする。それ
故、このように供給量が厖大になって来ると、従来のよ
うにビデオテープに録画等しておいて、後でゆっくり見
るという訳には行かなくなる。そこで、例えば必要な画
像（フレーム）や映像にキーワード、検索用インデック
ス等を付けておき、これを使って後で検索を行うという
手法が考えられる。Video software follows the flow of time. One needs time to see this. Therefore, when the supply amount becomes enormous in this way, it is no longer possible to record it on a video tape or the like and watch it later slowly as in the past. Thus, for example, a method is conceivable in which a keyword, a search index, or the like is attached to a necessary image (frame) or video, and a search is performed later using this.

【０００４】[0004]

【発明が解決しようとする課題】しかしこの方法は、キ
ーワード、インデックス等の入力に時間と手間とが掛か
る。具体的には、一つ一つの映像ソフトウェアについて
内容を確認し、それら全体又は個々の画像毎にキーワー
ド、インデックス等を付すのであるが、このような処理
方法は、供給量が少なかった昔は兎も角、マルチメディ
ア時代と言われる今日、或いは更に量が増える将来に於
ては用い得べくもない。どんなキーワード、インデック
スを付すかも常に課題になる。However, in this method, it takes time and effort to input a keyword, an index, and the like. Specifically, the content of each piece of video software is checked, and keywords or indices are assigned to all or individual images. Such a processing method used to be a rabbit in the past when the supply amount was small. However, it cannot be used in today's multimedia age, or in the future in an ever increasing volume. It is always a matter of what keywords and indexes to add.

【０００５】コンピュータの計算速度の向上につれ、文
字を中心にした情報検索では、文字データに対し予めイ
ンデックス化やキーワードの登録をせずに、文字データ
本文そのものを「全文検索（フルテキストサーチ）」し
て該当するものを引出すようになった。この場合、本文
中の名詞を対象すればかなり的確な検索が出来ることが
知られており、利用者が日常使用している言葉を入力す
ると、その言葉を含んでいるテキストが高速で検索され
る。[0005] As the calculation speed of the computer has been improved, in the information retrieval centering on characters, the character data body itself is subjected to "full-text search (full text search)" without indexing or registering keywords in advance. And came to pull out the relevant thing. In this case, it is known that it is possible to perform a fairly accurate search by targeting nouns in the text, and when a user enters a word that is used daily, the text containing that word is searched at high speed. .

【０００６】これと同様に、画像検索でも「全画像検索
（フルイメージサーチ）」をすることが考えられる。し
かし、画像に関しては一致するものを探すこと自体がか
なり時間の掛かる処理であり、またそれが無意味である
ことが多い。この為、何らかの別の方法が必要である。Similarly, it is conceivable to perform "all image search (full image search)" in image search. However, for an image, searching for a match is a very time-consuming process and is often meaningless. For this reason, some other method is needed.

【０００７】本発明の目的は、このような大量に流布さ
れる映像の中から、視聴者が欲しているものを素早く見
つけ出すのに使用して好適な映像検索装置を提供するこ
とにある。[0007] An object of the present invention is to provide a video search apparatus suitable for use in quickly finding what a viewer wants from such a large amount of distributed video.

【０００８】[0008]

【課題を解決するための手段】上記目的達成のため請求
項１の発明では、映像ソフトウェアから文字データを抽
出する文字データ抽出手段と、その抽出位置の情報と共
に前記文字データを蓄積する文字データ蓄積手段と、前
記蓄積された文字データの中から所望のものを選択する
選択手段と、該選択された文字データに係る画像を前記
映像ソフトウェアの中から引出す引出し手段とを用い
る。According to the first aspect of the present invention, a character data extracting means for extracting character data from video software, and a character data storage for storing the character data together with information on the extraction position. Means, selecting means for selecting a desired one from the stored character data, and extracting means for extracting an image related to the selected character data from the video software.

【０００９】また請求項２の発明では、前記文字データ
抽出手段が、前記映像ソフトウェアから抽出した代表画
像から前記文字データを抽出する。また請求項３の発明
では、請求項１または請求項２にいう引出し手段が、前
記選択された文字データに係る代表画像を引出す。In the invention according to claim 2, the character data extracting means extracts the character data from a representative image extracted from the video software. In the invention according to claim 3, the extracting means according to claim 1 or 2 extracts a representative image relating to the selected character data.

【００１０】本発明は、映像ソフトウェアに含まれる文
字に注目している。ここで、映像ソフトウェアのなかの
文字は次のような特徴をもっている。（１）映画などでは、文字はタイトル、出演者などの重
要な情報を伝えるのに利用されている。（２）外国映画の字幕は、最小限の文字で会話やストー
リーを伝えるように作られている。（３）ＴＶ（テレビジョン）のニュースなどでは、事件
のあらましを整理するのに文字が利用されている。ここ
で文字はコンパクトに情報をとりまとめるために利用さ
れている。（４）スポーツの結果、競馬の結果などは、数値情報が
画面に表示されることが多い。The present invention focuses on characters included in video software. Here, the characters in the video software have the following characteristics. (1) In movies and the like, characters are used to convey important information such as titles and performers. (2) Subtitles for foreign movies are designed to convey conversations and stories with minimal characters. (3) In TV (television) news, etc., characters are used to sort out the outline of the case. Here, the characters are used to collect information compactly. (4) Numerical information is often displayed on the screen for sports results, horse racing results, and the like.

【００１１】このように、映像ソフトウェア自体は微妙
な感情や感覚的なことがらを伝えているのに対し、その
中の文字は、明示的な意味のある情報の伝達に利用され
ている。本発明では、この文字データに着目し、請求項
記載の構成によって、この文字データを抽出し、蓄積
し、言わば画像、映像のデータベースとすることで、視
聴者の利便を図る。As described above, the video software itself conveys subtle emotions and sensations, while the characters in the software are used for transmitting information having an explicit meaning. The present invention focuses on the character data, and extracts and accumulates the character data according to the configuration described in the claims, so as to form a so-called image and video database.

【００１２】[0012]

【発明の実施の形態】以下、本発明の詳細を図示実施の
形態例に基いて説明する。始めに、実施の形態例（図１
映像検索装置１００）の概要を説明する。（１）受信されたＴＶ放送（テレビジョン放送）等の原
映像ソフトウェアは、入力インタフェース１１を介して
原映像蓄積部１２に供給され、蓄積される（原映像蓄積
部１２は、正しくは「原映像ソフトウェア蓄積部」とす
べきだが冗長なので「ソフトウェア」を省略した）。ま
た変化位置検出部１３にも供給され、シーン（場面、カ
ット）の切り替わりなど、映像ソフトウェアの状態が変
化する位置が検出される。別の形態、例えばビデオテー
プ、ＤＶＤ、インターネット、ＣＤ−ＲＯＭなどで供給
された映像ソフトウェアも、同じく入力インタフェース
１１から取り込まれる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below in detail based on illustrated embodiments. First, an embodiment (FIG. 1)
An outline of the video search device 100) will be described. (1) The received original video software such as a TV broadcast (television broadcast) is supplied to the original video storage unit 12 via the input interface 11 and stored therein. It should be "video software storage unit", but "software" is omitted because it is redundant). It is also supplied to the change position detection unit 13 to detect a position where the state of the video software changes, such as switching of a scene (scene, cut). Video software supplied in another form, for example, video tape, DVD, Internet, CD-ROM, etc., is also taken in from the input interface 11.

【００１３】（２）変化位置検出部１３によって状態変
化、例えばシーンが切り替わったことが検出されると、
これに応動して代表画像抽出部１４が代表画像ＶＤ（図
２（以下、符号ＶＤ省略））を抽出する。ここに代表画
像とは、その状態変化位置付近から取り出される１枚の
静止画像または２〜５秒程度の短い動画像であり、その
集合が要約ＶＥ（図２（以下符号ＶＥ省略））として蓄
積される。要約は原映像ソフトウェアに比べ１０分の１
〜１００分の１程度のデータ量になる。従ってこの後の
処理が簡便になる。原映像ソフトウェアを見るか否かの
判断材料としても利用される。(2) When the change position detecting section 13 detects a state change, for example, a scene change,
In response, the representative image extracting unit 14 extracts the representative image VD (FIG. 2 (hereinafter, symbol VD is omitted)). Here, the representative image is a single still image or a short moving image of about 2 to 5 seconds extracted from the vicinity of the state change position, and a set thereof is accumulated as a summary VE (FIG. 2 (hereinafter abbreviated as VE)). Is done. The summary is 1/10 compared to the original video software
The data amount is about 1/100. Therefore, the subsequent processing is simplified. It is also used as a material for determining whether or not to view the original video software.

【００１４】・代表画像は一定周期で原映像ソフトウェ
アから抽出しても良い。映像ソフトウェア中の文字は比
較的長い時間表示される。これは、表示された文章を視
聴者が読み終る迄の時間を考慮しているためで、逆に言
えば、この間は同じ文字が表示されている。従ってこの
時間間隔、例えば５秒とか、１０秒とかに１枚の割で、
静止画か動画かを抽出すれば、その映像ソフトウェアに
含まれる文字は殆んど洩らさず抽出できる。この手法に
よるときは代表画像を抽出する必要はなく、原映像から
一定時間間隔で画像を抜き出して、その中に含まれる文
字を抽出する。なおここで抽出した画像も、集積して要
約として利用しても良い。The representative image may be extracted from the original video software at a fixed cycle. Characters in video software are displayed for a relatively long time. This is because the time until the viewer finishes reading the displayed text is considered. Conversely, the same characters are displayed during this time. Therefore, at this time interval, for example, 5 seconds or 10 seconds,
If a still image or a moving image is extracted, characters included in the video software can be extracted with almost no leakage. When using this method, it is not necessary to extract a representative image, but an image is extracted at regular time intervals from an original video and characters included therein are extracted. The images extracted here may also be collected and used as a summary.

【００１５】（３）代表画像から文字を抽出する。文字
認識ソフトウェアを使用し、抽出した代表画像から文字
データを取り出す。取り出した文字データはキーワード
として蓄積する。場合によってはこのキーワードを名詞
だけに限定するのも良い。キーワードを始め、前記要
約、原映像ソフトウェアを蓄積する際には、例えばチャ
ネル番号、放送年月日、時分秒、フレーム番号など、相
互のリンク関係を保てる見出し情報を付けておく。(3) Extract characters from the representative image. Using character recognition software, character data is extracted from the extracted representative image. The extracted character data is stored as a keyword. In some cases, this keyword may be limited to nouns only. When accumulating the keyword, the summary, and the original video software, heading information such as a channel number, a broadcast date, hours, minutes, seconds, and a frame number, which can maintain a mutual link relationship, is attached.

【００１６】蓄積されたキーワード、要約、原映像ソフ
トウェアは、例えば以下の如く利用する。（１）仕事に関し、或いは個人的なことで、視聴者が関
心を抱いているテーマについて、関連する映像ソフトウ
ェアが蓄積されているかを知りたい場合、その事象を端
的に表現する言葉、例えばＰＡＴＯＬＩＳのフリーキー
ワードの如き言葉を、必要の都度、キーワードとして打
ち込む。（２）これが蓄積された文字データに照合される。完全
一致、部分一致その他の検索のルールは視聴者が設定す
る。該当するものがあったら、次はそれに対応した要約
を取り出し画像の内容を確認する。（３）要約を見て、それが所望するテーマに関するもの
だと考えられたら、その原映像ソフトウェアを取り出し
て内容を確認する。The accumulated keywords, summaries, and original video software are used, for example, as follows. (1) If the user wants to know whether related video software is stored for a theme that the viewer is interested in, for work or personal reasons, a word that expresses the event simply, for example, PATHOLIS Enter words such as free keywords as keywords whenever necessary. (2) This is collated with the stored character data. Exact match, partial match and other search rules are set by the viewer. If there is any, the next step is to extract the corresponding summary and check the contents of the image. (3) Looking at the summary, if it is considered to be related to the desired theme, take out the original video software and check the contents.

【００１７】前述のとおり要約は原映像の１０分の１か
ら１００分の１の大きさで、これを見るにさほどの時間
は掛からない。従ってこのように２段構えにして、先ず
は要約で所望するものか否かを見定めることとした方
が、検索に掛かる手間が少なくて済む。As described above, the summary is one-tenth to one-hundredth of the size of the original video, and it does not take much time to see it. Therefore, it is easier to perform a search by arranging the summary in two stages and determining whether the summary is the desired one or not.

【００１８】予めキーワードを登録し、これに該当する
ものがあったらその映像ソフトウェアに関する目次情報
等を別に記録しておくのも良い。これを例えば一日一回
視聴者が蓄積状態をチェックするようにする。該当した
ものを一覧で画面に表示し、視聴者に選択させ、その要
約、原映像ソフトウェアが表示されるようにする。キー
ワードを入力するのでなく、蓄積されている文字データ
を例えば五十音順で表示し、視聴者が、その中から所望
のものを選択して、その要約、原映像ソフトウェアを表
示するようにしても良い。It is also possible to register a keyword in advance and, if there is a keyword corresponding to the keyword, separately record table-of-contents information and the like relating to the video software. For example, the viewer checks the accumulation state once a day. Applicable items are displayed on the screen in a list, and the viewer is selected, and the summary and the original video software are displayed. Instead of entering a keyword, the stored character data is displayed in the order of the Japanese syllabary, for example, and the viewer selects a desired one from among them, and displays the summary and the original video software. Is also good.

【００１９】以下、実施の形態例の映像検索装置１００
の詳細を図を引用して説明する。即ち図１に於て、ＴＶ
放送受信部１６、ＤＶＤプレーヤ１７、ビデオテープデ
ッキ１８、インターネット接続部１９、ＣＤ−ＲＯＭド
ライブ２０その他から供給される映像ソフトウェアは、
入力インタフェース１１を介し映像検索装置１００に取
り込まれる。Hereinafter, the video search apparatus 100 according to the embodiment will be described.
Will be described with reference to the drawings. That is, in FIG.
The video software supplied from the broadcast receiving unit 16, DVD player 17, video tape deck 18, Internet connection unit 19, CD-ROM drive 20, etc.
The video is retrieved by the video search device 100 via the input interface 11.

【００２０】ＴＶ放送受信部１６は、多数の受信チャネ
ルを備えている。例えば地上波ＴＶ放送（ＶＨＦ、ＵＨ
Ｆ）、衛星放送、ケーブルＴＶなど、複数のチャネルに
ついて視聴者が所望するアナログ、デジタルのＴＶ放送
を複数同時に受信する。インターネット接続部１９から
は、定期的に、或いは視聴者の操作に応動して、所望す
るインターネット上の映像ソフトウェアが取り込まれ
る。The TV broadcast receiving section 16 has a number of receiving channels. For example, terrestrial TV broadcasting (VHF, UH
F) Simultaneously receive a plurality of analog and digital TV broadcasts desired by the viewer for a plurality of channels such as satellite broadcasting and cable TV. From the Internet connection unit 19, desired video software on the Internet is taken in periodically or in response to a viewer's operation.

【００２１】これらから取り込まれた映像ソフトウェア
は原映像蓄積部１２に蓄えられる。蓄積はデジタル化し
て行なうと良い。アナログ方式のものは入力インタフェ
ース１１でデジタル化して原映像蓄積部１２に蓄積す
る。データ量を抑えるため、ＭＵＳＥ、ＭＰＥＧ、ＪＰ
ＥＧその他の方式でデータ圧縮しておくと良い。デジタ
ルテレビジョン放送、インターネット上の映像ソフトウ
ェアはその儘で良い。映像ソフトウェアは、例えば１０
分単位に区切り、一つづつのファイルとして原映像蓄積
部１２に蓄積する。夫々の受信年月日、ファイルの先頭
データの受信時分、チャンネル番号など、後での読み出
しに必要な目次情報（インデックス）を、ファイルアロ
ケーションテーブル（ＦＡＴ）に記録する。The video software fetched therefrom is stored in the original video storage unit 12. The accumulation should be done digitally. The analog type is digitized by the input interface 11 and stored in the original video storage unit 12. MUSE, MPEG, JP to reduce data volume
It is preferable to compress the data by EG or another method. Digital television broadcasting and video software on the Internet can be as it is. Video software, for example, 10
It is divided into minutes and stored in the original video storage unit 12 as one file. The table of contents information (index) required for later reading, such as the reception date and time of each file, the reception time of the file, and the channel number, is recorded in the file allocation table (FAT).

【００２２】原映像蓄積部１２への格納は、指定された
チャンネルについて、例えば７２時間というように、あ
る程度連続して実施する。音声情報も記録する。なお、
記録をする年月日、時刻、記録対象チャンネルは、ユー
ザ操作部２２により視聴者が設定する。始めから記録す
る必要がないと判っている時間帯もあるし、視聴者が関
心を持たない番組もある。そこで、単に日時だけでな
く、時、分単位ぐらいまで細かく時間帯設定可能にして
おくと、後の記録内容確認作業が楽になる。時間帯設定
手順は既存のビデオテープレコーダー等と同じで良い。The storage in the original video storage unit 12 is performed for a specified channel continuously for a certain period of time, for example, 72 hours. Voice information is also recorded. In addition,
The viewer sets the recording date, time, and recording target channel using the user operation unit 22. There are times when it is known that there is no need to record from the beginning, and there are programs where the viewer is not interested. Therefore, if the time zone can be set not only by the date and time but also in units of hours and minutes, it is easy to check the recorded contents later. The time zone setting procedure may be the same as that of an existing video tape recorder or the like.

【００２３】記録（蓄積）はサイクリック（循環的）に
行なうと良い。例えば原映像蓄積部１２の記憶容量を、
５チャンネル分、連続３日記録可能としておく。視聴者
は時間があるとき、蓄積されてる文字データ、要約、原
映像ソフトウェアを確認し、不用のものを消去する。こ
れで原映像蓄積部１２と要約蓄積部２１の記憶領域が空
く。ある期間連続して放送内容が記録されることで記憶
容量が無くなることもある。この場合は、記録済み映像
の中の一番古いものの上に、次の新しい映像信号を重ね
書きする。The recording (accumulation) is preferably performed cyclically. For example, the storage capacity of the original video storage unit 12 is
It is possible to record for 3 consecutive days for 5 channels. When the viewer has time, the viewer checks the stored character data, abstract, and original video software, and deletes unnecessary ones. As a result, the storage areas of the original video storage unit 12 and the summary storage unit 21 become free. The storage capacity may be exhausted by recording broadcast contents continuously for a certain period. In this case, the next new video signal is overwritten on the oldest one of the recorded video.

【００２４】こうすれば、記録する年月日時刻等を指定
する必要は無く、所望するチャンネル（監視をしておき
たいチャンネル）のみユーザ操作部２２で指定しておけ
ば良い。尤も、前述したとおり始めから記録しなくても
よい時間帯がある。従ってサイクリックに記録する場合
でも、記録不要の時間帯を設定可能にしておいた方が消
費する記憶容量が少なくて済む。In this case, it is not necessary to specify the date, time, and the like to be recorded, and only the desired channel (the channel desired to be monitored) need be specified by the user operation unit 22. However, as described above, there is a time zone that does not need to be recorded from the beginning. Therefore, even in the case of cyclic recording, setting a time zone in which recording is not required enables a smaller storage capacity to be consumed.

【００２５】図１に戻り、変化位置検出部１３は、その
とき取り込まれているテレビジョン放送、或いはそのと
き再生されているＤＶＤ収納映像ソフトウェアの中の、
指定された注目要素に状態変化が生じたら、その旨を代
表画像抽出部１４に通知する。Returning to FIG. 1, the change position detecting unit 13 detects the television broadcast taken at that time or the DVD stored video software being reproduced at that time.
When a state change occurs in the designated element of interest, the fact is notified to the representative image extracting unit 14.

【００２６】本発明は映像ソフトウェアの中に含まれる
文字データを抽出し、これを蓄積して映像検索のキーと
して活用する。文字データは、前述したように、元の映
像ソフトウェアから直に抽出しても良いが（図１破線
Ｌ）、ここでは、前述した如く記録内容のチェックを容
易にするために、また処理を簡略化するために、図２に
示すように、原映像ソフトウェアから代表画像ＶＤを抽
出し、この代表画像ＶＤの中から文字データを抽出する
（以下代表画像の符号ＶＤ等は適宜省略）。According to the present invention, character data included in video software is extracted, stored, and used as a key for video search. As described above, the character data may be directly extracted from the original video software (broken line L in FIG. 1), but here, as described above, the processing is simplified to facilitate the check of the recorded contents. As shown in FIG. 2, a representative image VD is extracted from the original video software, and character data is extracted from the representative image VD (hereinafter, the symbol VD of the representative image is appropriately omitted).

【００２７】状態変化位置の検出は、この代表画像抽出
の動機を与えるためのもので、これに応動して、代表画
像抽出部１４が代表画像を抽出する。また夫々の代表画
像には、１枚の静止画だけでなく、その変化位置に係る
数秒程度の動画も含まれるものとする。The detection of the state change position is for giving a motivation for extracting the representative image. In response, the representative image extracting unit 14 extracts the representative image. In addition, each representative image includes not only one still image but also a moving image of several seconds related to the change position.

【００２８】映像ソフトウェア（放送番組）には検出可
能な様々な要素（特徴）が含まれる。例えば音声（音）
には、人の声、音楽、歓声、パルス性の音、その他多数
の検出可能な要素が存在する。また映像には、静止して
いることや、画面の明るさが急変すること、画面全体の
色が急変することなど、検出可能な要素が幾つか存在す
る。ここでの代表画像抽出は、映像ソフトウェアの中に
存在する文字データを取り出すために行なうものであ
る。従って、着目する要素についても、画面への新たな
文字の出現に関連してその変化が生ずる要素を選ぶと良
い。The video software (broadcast program) includes various detectable elements (features). For example, voice (sound)
There are human voices, music, cheers, pulsed sounds, and many other detectable elements. Also, the video has several detectable elements such as being stationary, sudden changes in screen brightness, sudden changes in the color of the entire screen, and the like. Here, the representative image is extracted to extract character data existing in the video software. Therefore, for the element of interest, it is preferable to select an element that changes in relation to the appearance of a new character on the screen.

【００２９】例えばニュース番組の場合、アナウンサ
ー、キャスターによるニュース原稿の朗読が終了して声
が途切れた後に、次のニュース原稿の読み上げ開始に同
期した形で、画面に新たな文字が映出される。従って、
この音声の断続という要素の変化も、代表画像抽出の動
機に利用出来る。教育番組なども同じ傾向が見られる。For example, in the case of a news program, after the reading of the news manuscript by the announcer or caster ends and the voice is interrupted, new characters are projected on the screen in synchronization with the start of reading of the next news manuscript. Therefore,
This change in the factor of intermittent speech can also be used to motivate the representative image extraction. The same tendency is seen in educational programs.

【００３０】要素の変化と文字出現の関連性は、映像ソ
フトウェアのジャンル（分野）に応じて変る。それ故、
番組表などからそのときの蓄積対象映像ソフトウェアの
特性が明らかなら、前記記録対象チャネルを設定する
際、着目する要素をどれにするか、視聴者が設定可能で
あるようにしておく。The relationship between the change in the element and the appearance of the character changes according to the genre (field) of the video software. Therefore,
If the characteristics of the video software to be stored at that time are clear from a program guide or the like, when setting the recording target channel, the viewer can set which element to focus on.

【００３１】特性や分野に依存しない要素もある。記録
の対象を特定しない、即ち幾つかのチャネル等について
連続して記録をし、その中の文字データを検索キーとし
て取り出す、というようなときは、このような普遍的要
素に着目すると良い。以下にその例を示す。（１）先頭画像の切り出し原映像ソフトウェア開始時の画像をタイトル（代表画
像）として単純に切り出す。番組表などで当該番組の放
送開始時刻が把握出来るときは、その先頭画像を抽出す
る。例えば１０分単位で区切って長時間連続記録するな
ら、各区切りの先頭画像をタイトルとして抽出する。Some elements do not depend on characteristics or fields. When a recording target is not specified, that is, when recording is continuously performed on several channels and the like and character data in the channel is extracted as a search key, it is good to pay attention to such a universal element. An example is shown below. (1) Extraction of head image The image at the start of the original video software is simply extracted as a title (representative image). If the broadcast start time of the program can be grasped from a program table or the like, the leading image is extracted. For example, if recording is performed continuously for a long period of time in units of 10 minutes, the leading image of each segment is extracted as a title.

【００３２】（２）周期的切り出し映像ソフトウェアから一定時間ごとに周期的に画像を切
り出す。Ｎ枚の画像で構成される原映像ソフトウェアに
ついて、Ｍ枚の代表画像から成る要約を生成するなら
ば、原映像ソフトウェアからＮ／Ｍ枚ごとの画像を代表
画像としてとりだす。これは最も単純な方法である。な
お、ＭＰＥＧ、ＪＰＥＧ、ＭＵＳＥなどの画像圧縮手法
では、何枚かごとに完全な画像が送信され、その間は、
前の画像との差分が送信される。従って、このような画
像データについては、その何枚かに１枚送られて来る完
全な画像を代表画像として抽出しても良い。(2) Periodic clipping An image is clipped periodically from the video software at regular intervals. If an abstract consisting of M representative images is generated for the original video software composed of N images, images for every N / M images are extracted from the original video software as representative images. This is the simplest method. In image compression methods such as MPEG, JPEG, and MUSE, complete images are transmitted every few images.
The difference from the previous image is transmitted. Therefore, with respect to such image data, a complete image sent to some of the image data may be extracted as a representative image.

【００３３】この周期的切り出しという手法は、要約を
生成しないで、映像ソフトウェアから直に文字データを
取り出すとき有用である。即ち、映像ソフトウェアの中
の文字は、まさしく視聴者に読んで貰う為に表示される
ものであるが、人がものを読むにはそれなりの時間が掛
かる。このため、画面の文字もそれを考慮して、割に長
い時間、例えば５秒とか１０秒とか表示されることが多
い。従って、仮に５秒間とすれば、テレビジョンの場合
で、１５０フレーム毎に１枚、１０秒表示されるとすれ
ば３００フレームに１枚の割で画像を取り出して検査す
れば、その映像ソフトウェア中に現れる文字の殆んどを
抽出できる。This method of periodic clipping is useful when character data is directly extracted from video software without generating a summary. That is, the characters in the video software are displayed just for the audience to read, but it takes a certain amount of time for a person to read the characters. For this reason, the characters on the screen are often displayed for a relatively long time, for example, 5 seconds or 10 seconds in consideration of this. Therefore, if it is assumed to be 5 seconds, in the case of television, one image is displayed every 150 frames for 10 seconds, and if one image is taken out every 300 frames and inspected, the video software Most characters appearing in can be extracted.

【００３４】それ故、この手法によるときは、要約を生
成せず、原映像ソフトウェアからの直の文字抽出を実行
する。尤も前述のとおり、要約は原映像ソフトウェアを
見るべきか否かの判断材料にもなっている。従って、文
字データを、ここにいう原映像からの直接抽出とし、そ
の一方で、当該映像ソフトウェアの内容を尤も的確に表
現する要約を生成するであろう要素を別に選んで、その
状態変化を動機にしてその映像ソフトウェアの要約を生
成するようにしても良い。Therefore, according to this method, a character is directly extracted from original video software without generating an abstract. However, as mentioned above, the summary is also a source of judgment as to whether or not to view the original video software. Therefore, the character data is directly extracted from the original video as referred to herein, while, on the other hand, the elements that will generate a summary that expresses the contents of the video software in a more accurate manner are separately selected, and the state change is motivated. Then, a summary of the video software may be generated.

【００３５】この場合、着目する要素の例としては以下
のようなものがある。映像のジャンル注目要素ニュースパターン（フリップ）のあるカットドラマ字幕のあるカット音声のあるカットドキュメンタリー音声のあるカット英会話字幕のあるカットスポーツ拍手、歓声の上がるカットとその周辺（音声クライマックス）アニメ字幕のあるカット長く静止しているカット音声のあるカットＴＶショッピング字幕のあるカット（価格などの情報が見える）歌番組音楽の始まるカット（音声から判別）教育番組パターンのあるカットバラエテイショー歓声の上がるカットオーケストラ音楽がスタートするところ（周期的抽出）天気予報静止したカット（パターン（フリップ）＝文字や画を書いた板。ＴＶで話し手などが使用。）なお、このように注目要素、所定位置等は番組、放送チ
ャンネルの特性によって異なる。そこで要約生成、チェ
ックを何回か行なって、その中で好ましい注目要素、所
定位置等を決めるとよい。In this case, the following are examples of the elements of interest. Image genre Featured elements News Cut with pattern (flip) Drama Cut with subtitle Cut with audio Documentary Cut with audio English conversation Cut with subtitle Sport Applause, cheering cut and its surroundings (audio climax) Animation with subtitle Cut Long, stationary cut Cut with sound TV shopping Cut with subtitles (You can see information such as price) Song program Cut at the beginning of music (discriminate from voice) Educational program Cut with pattern Variety show Cheerful cut Orchestra Music (Periodical extraction) Weather forecast Stationary cut (pattern (flip) = board on which characters and images are written. Speakers and others use TV.) Broadcast channel characteristics Depends on Therefore, summarization and checking may be performed several times, and a preferable element of interest, a predetermined position, and the like may be determined.

【００３６】（３）シーンごとの切り出しシーンが変ったとき、画像中に新たな文字が映出される
ということが多い。従って、これを検出し、シーン毎に
適宜枚数の代表画像を取り出して文字データの有無を調
べるのも良い。シーンの中のどの位置から代表画像を取
るかは任意だが、例えば図３のＣＳ１、ＣＳ２でシーン
が変るとしたら、それらと、シーン中央ＰＣなどが一つ
の候補になる。(3) Clipping for Each Scene When the scene changes, new characters are often displayed in the image. Therefore, it is also possible to detect this, extract the appropriate number of representative images for each scene, and check for the presence or absence of character data. It is arbitrary from which position in the scene the representative image is taken. For example, if the scene changes in CS1 and CS2 in FIG. 3, those and the center PC of the scene are one candidate.

【００３７】シーンの切り替わりの判別は、例えば以下
の如く行なう。（１）ピクセルの集約処理映像は言うまでもなく２次元的な広がりを有する。例え
ばＮＴＳＣ方式は、凡そ２５０ドット×５２５ラインと
いうようなピクセル（画素、点）の集まりであり、夫々
のピクセルには明度、彩度、色相があり、このようなピ
クセルの集合からなる画像が１秒間に３０枚送信され
る。The switching of the scene is determined, for example, as follows. (1) Pixel aggregation processing Needless to say, an image has a two-dimensional spread. For example, the NTSC system is a group of pixels (pixels, dots) such as approximately 250 dots × 525 lines, and each pixel has brightness, saturation, and hue. 30 images are transmitted per second.

【００３８】ピクセルの数は、いうまでもなく多い。そ
こで、処理の高速化、簡素化の為、ここではピクセルの
集約化を行う。例えば縦横４×４のピクセルの、和な
り、平均値なりを求めるとすると、映像データは１／１
６の量になり、８×８のピクセルについて同様の処理を
すると、映像データは１／６４になる。Needless to say, the number of pixels is large. Therefore, in order to speed up and simplify the processing, pixel aggregation is performed here. For example, if the sum and average value of 4 × 4 pixels are determined, the video data is 1/1
When the same processing is performed on 8 × 8 pixels, the video data becomes 1/64.

【００３９】この集約化は、画像の緩やかな変化を捨象
し、シーンの切り替わりの判別を容易にする作用があ
る。即ち、シーンの切り替わり判別は、基本的には、前
後する二つの画像のピクセルとピクセルとを比較し、そ
こに大きな差があるか否かを見極めるものである。具体
的には、例えば画像の幾つかの領域で両者のピクセルの
データを比較し、多くが共通していたら、そこは、例え
ば背景が変らずに人物が移動しただけとかの、一つのシ
ーンの中の画像であると考える。This aggregation has the effect of ignoring gradual changes in the image and facilitating the determination of scene switching. In other words, the scene switching determination is basically to compare the pixels of the two preceding and succeeding images with each other and to determine whether there is a large difference therebetween. Specifically, for example, the data of both pixels are compared in several areas of the image, and if many of them are in common, there is, for example, a scene of one scene such as a person moving without changing the background. Think of it as the image inside.

【００４０】しかし、例えばズームアップ／ダウンやゆっくりしたカメラの向きの変更があったとき、ここでいう集約処理をしないで前後の画
像のピクセル比較をすると、見た目は僅かでも、ピクセ
ルレベルでは殆んどの部分でデータが相違して、実際に
はシーンの途中であるのに、これらの多くがシーンの切
り替わり位置であると誤って判断されてしまう。However, for example, when there is a zoom up / down or a slow change in the direction of the camera, comparing the pixels of the preceding and succeeding images without performing the aggregation processing here, the appearance is slight, but almost at the pixel level. Although the data is different in any part and is actually in the middle of the scene, many of them are erroneously determined to be the scene switching positions.

【００４１】おなじものを、集約化処理した粗いピクセ
ルレベルで見てみると、ある枚数までの画像は、ピクセ
ルが粗いためデータが同じで変化がない。それ故、これ
らは同じシーンに属する画像だと正しく判断される。そ
こで先ず前処理として、これら粗いピクセルの値、ａ（ｔ,ｘ,ｙ）をここで取り出す。ここに、ｔ：時刻ｘ，ｙ：集約処
理後の画像の座標ａ：その点（ｘ，ｙ）の色の値。ａ
は、Ｒ，Ｇ，Ｂをその儘取り入れるか、Ｃ₁・Ｒ＋Ｃ₂・
Ｇ＋Ｃ₃・Ｂとすればよい（Ｒ，Ｇ，Ｂは３原色情報の
値、Ｃ₁，Ｃ₂，Ｃ₃は重み係数である）。Looking at the same thing at the coarse pixel level after the aggregation processing, the images up to a certain number have the same data and no change because the pixels are coarse. Therefore, these are correctly determined to be images belonging to the same scene. Therefore, first, as a pre-process, the values of these coarse pixels, a (t, x, y), are extracted here. Here, t: time x, y: coordinates of the image after the aggregation processing a: color value of the point (x, y). a
Is to adopt R, G, B as they are, or C ₁ · R + C ₂ ·
G + C ₃ · B (R, G, B are values of three primary color information, and C ₁ , C ₂ , C ₃ are weighting coefficients).

【００４２】（２）時間方向の集約化処理次に、上記集約化処理を行なった映像データについて、
各画像毎にその夫々のピクセルとその前の画像の夫々の
ピクセルとの間の差分を求める。そしてその差分が、シ
ーンの切り替わり位置であると判断してよいほどの大き
さか否かを判断する。処理を単純にしたいなら、このと
きの差分の大きさについて一定のしきい値を定めてお
く。画像を構成する粗いピクセルの中のある数につい
て、その差分がしきい値を超えていたら、そこで画像が
変った、即ちシーンの切り替わりがあったと判定する。(2) Aggregation processing in the time direction Next, regarding the video data on which the above aggregation processing has been performed,
For each image, the difference between its respective pixel and each pixel of the previous image is determined. Then, it is determined whether or not the difference is large enough to be determined to be the scene switching position. To simplify the processing, a fixed threshold value is set for the magnitude of the difference at this time. If the difference between a certain number of coarse pixels constituting the image exceeds a threshold value, it is determined that the image has changed there, that is, that a scene change has occurred.

【００４３】尤も、同じシーンに属していても、その内
容によってピクセル間の差分の大きさは異なる。１コマ
１コマが早く移り変わるシーンなら前後の画像の差分は
大きく、遅いものなら差分が小さい。従って、上記の如
く単純な処理も一方法だが、ここは、より的確に代表画
像を抽出するべく、更に下記処理を加える。「映像の時間的な差分比較の処理」一枚の画像を構成す
る各点のデータａ（ｔ，ｘ，ｙ）について、時間的な
差分を求める。即ちｄ（ｔ，ｘ，ｙ）＝ａ（ｔ，ｘ，ｙ）−ａ（ｔ−△ｔ，
ｘ，ｙ）を求める。但し、△ｔ：適宜の時間幅である。これは、
時間的に△ｔだけ隣り合う２枚の画像のピクセル相互の
差分（変化量）を示している。Even if they belong to the same scene, the magnitude of the difference between pixels differs depending on the content. In a scene where one frame changes quickly, the difference between the preceding and following images is large, and when the scene is slow, the difference is small. Therefore, the simple processing as described above is one method, but here, the following processing is further added in order to more accurately extract the representative image. "Process of temporal difference comparison of video" A temporal difference is obtained for the data a (t, x, y) of each point constituting one image. That is, d (t, x, y) = a (t, x, y) −a (t− △ t,
x, y). Where Δt is an appropriate time width. this is,
The difference (change amount) between pixels of two images that are temporally adjacent to each other by Δt is shown.

【００４４】この差分ｄ（ｔ，ｘ，ｙ）を、所定期間、
例えば１分間の間の夫々の画像に関して求める。式で表
わせば、The difference d (t, x, y) is calculated for a predetermined period.
For example, it is determined for each image during one minute. In terms of the formula,

【数１】となる。(Equation 1) Becomes

【００４５】図３は、このようにして求めた例えば１分
間の集約化映像についての差分Ｄａ（ｔ）の例である。
値が大きい位置、即ち映像の差分が大きい位置ＣＳ１，
ＣＳ２が、そこで画像に何らかの大きな変化があること
を表わしており、そこではシーンが切り替わっている可
能性が高い。従って、このＣＳ１とＣＳ２の間を一つの
シーンと推定し、この中から適宜枚数の代表画像を抽出
する。これらが上記見出しにいう「映像の時間的な差分
比較処理」である。FIG. 3 shows an example of the difference Da (t) for an aggregated video image of, for example, one minute thus obtained.
The position where the value is large, that is, the position CS1,
CS2 indicates that there is some major change in the image, where it is likely that the scene has changed. Therefore, the area between CS1 and CS2 is estimated as one scene, and an appropriate number of representative images are extracted from the scene. These are the “temporal difference comparison processing of video” referred to in the above heading.

【００４６】なお映像ソフトウェアを構成する各フレー
ムについて、例えば画像中央付近の水平走査線１本分の
映像信号に着目し、これに対して画面の移り変わりに伴
う差分を求め、シーンの切り替わりを検出するようにし
ても良い。具体的には、例えば、この水平走査線１本分
の映像信号を、Ｎ個の区間に分け、夫々の区間について
和或いは平均値を求める。そして、各区間毎に、その前
の画面の当該区間の平均値に対する差分を求める。この
差分を各フレーム毎に総和し、その値が大きくなってい
る位置、即ち、図３のＣＳ１、或いはＣＳ２に当たる位
置を求め、これを上記同様のシーンの切り替え位置であ
るとしても良い。For each frame constituting the video software, for example, a video signal corresponding to one horizontal scanning line near the center of the image is focused on, and a difference accompanying a transition of the screen is determined with respect to the video signal to detect a scene change. You may do it. Specifically, for example, the video signal for one horizontal scanning line is divided into N sections, and a sum or an average value is obtained for each section. Then, for each section, a difference from the average value of the section on the previous screen is obtained. The difference may be summed for each frame, and a position where the value is large, that is, a position corresponding to CS1 or CS2 in FIG. 3 may be obtained, and this may be set as a scene switching position similar to the above.

【００４７】図１に戻り、このようにして抽出された各
代表画像は、要約蓄積部２１と文字データ抽出部２３に
供給される。要約蓄積部２１は、供給された代表画像の
集合を当該映像ソフトウェアに対応した要約として保持
する。なお、蓄積した各代表画像とその原映像との関係
が判るように、ファイルには取り出した年月日、原映像
の記録開始の時分を記録する。また、個々の代表画像に
は、その代表画像の取り出し位置の情報を付しておく。
これは、例えば原映像の記録開始時分を基点とし、そこ
からの経過秒数、フレーム番号などで表わせる。Returning to FIG. 1, each representative image extracted in this way is supplied to the summary storage unit 21 and the character data extraction unit 23. The summary storage unit 21 stores the supplied set of representative images as a summary corresponding to the video software. In addition, the date and time when the original image was recorded and the time when the recording of the original image was started are recorded in the file so that the relationship between the accumulated representative images and the original image can be understood. Each representative image is provided with information on the position where the representative image is taken out.
This can be represented, for example, by using the recording start time of the original video as the base point, the number of seconds elapsed from that point, and the frame number.

【００４８】文字データ抽出部２３は供給された代表画
像から、或いは原映像ソフトウェアから直接文字データ
を検出する。映像ソフトウェアに含まれる文字データに
は幾つかの形態がある。従ってここでの検出もこの形態
に合わせて行なう。一つの映像ソフトウェアに複数の形
態で文字データが含まれることもあるから、検出も複数
の手法で行なうと良い。The character data extraction unit 23 detects character data directly from the supplied representative image or directly from the original video software. There are several forms of character data included in video software. Therefore, the detection here is also performed according to this mode. Since one piece of video software may include character data in a plurality of forms, detection may be performed by a plurality of methods.

【００４９】（１）文字が画像として、即ちイメージ或
いはドットデータとして画像中に組み込まれている場
合。この場合は、漢字ＯＣＲ（光学的文字読取り）の手法を
適用して文字を検出する。文字は一般に画像の下方や左
右に寄せて表示される。従って、精度が多少粗くなって
も構わなければ、これらの部分について文字の有無を検
査しても良い。そうすれば処理の簡素化、高速化が出来
る。(1) A case where a character is incorporated in an image as an image, that is, as an image or dot data. In this case, characters are detected by applying the kanji OCR (optical character reading) technique. In general, characters are displayed below or to the left and right of the image. Therefore, if there is no problem if the accuracy is slightly reduced, the presence or absence of a character may be inspected in these portions. Then, the processing can be simplified and speeded up.

【００５０】（２）文字コードが付加されて来る場合。文字放送字幕のように、垂直帰還時のフレーム信号の中
に、画像のスキマの情報として文字データ（字幕デー
タ）が文字コードで供給される場合もある。この場合は
簡単である。これら文字コードを映像の位置情報ととも
に順に文字データ蓄積部２８に蓄積していけば良い。更
に発展させるなら、この文字コードが付加されている映
像ソフトウェアについては、代表画像の抽出をせず、文
字コードと映像の位置情報だけに着目して抽出を行なっ
ても良い。(2) When a character code is added. Like text broadcast subtitles, character data (caption data) may be supplied as character code as image gap information in a frame signal at the time of vertical feedback. This is simple. These character codes may be sequentially stored in the character data storage unit 28 together with the video position information. For further development, the video software to which the character code is added may be extracted by focusing only on the character code and the positional information of the video without extracting the representative image.

【００５１】再び図１に戻り、ユーザ操作部２２は、キ
ーボード、マウスなどからなる。視聴者によるこれらの
操作に応動し、検索制御部２４は、文字データ蓄積部２
８に蓄積された文字データの取り出し、選択された文字
データに対応した要約、原映像ソフトウェアの取り出
し、ディスプレイ２６への表示等を行なう。また２７は
キーワード保持部で、予め視聴者によって入力されたキ
ーワードがここに保持される。検索制御部２４は、この
キーワードの登録処理、これらと文字データ蓄積部２８
の蓄積データとの照合処理も行なう。Referring back to FIG. 1, the user operation unit 22 includes a keyboard, a mouse, and the like. In response to these operations by the viewer, the search control unit 24 causes the character data storage unit 2
8 to extract the character data stored in the memory 8, extract abstracts corresponding to the selected character data, extract original video software, display on the display 26, and the like. Reference numeral 27 denotes a keyword holding unit which holds a keyword input by a viewer in advance. The search control unit 24 performs registration processing of the keyword, the keyword registration processing, and the character data storage unit 28.
Also, a collation process with the stored data is performed.

【００５２】原映像、要約、検出された文字データは、
前述の如く、夫々に、放送年月日、時分秒、フレーム番
号等のインデックスが付され、原映像蓄積部１２、要約
蓄積部２１または文字データ蓄積部２８に蓄積される。
これらは映像ソフトウェアのデータベースそのものであ
る。従ってその利用の仕方は多様である。前述の大まか
な動作の説明と重複するが、一例を上げれば以下のとお
りである。The original video, the summary, and the detected character data are
As described above, indexes such as the broadcast date, hour, minute, second, and frame number are assigned to each of them, and are stored in the original video storage unit 12, the summary storage unit 21, or the character data storage unit 28.
These are the video software databases themselves. Therefore, there are various uses. The description of the general operation described above is the same as that of the first embodiment, but the following is an example.

【００５３】（Ａ）視聴者が必要の都度、キーワードを
入力して検索。この検索方法は、最も一般的なデータベース検索手法で
ある。具体的には、（１）仕事に関し、或いは個人的なことで、視聴者が何
か調べたいテーマが出て来たら、そのテーマを端的に表
現するであろう言葉を幾つか想定し、ユーザ操作部２２
から打ち込む。(A) Whenever the viewer needs it, search by inputting a keyword. This search method is the most common database search method. Specifically, (1) When a theme that the viewer wants to find out comes out because of work or personal matters, some words that would express the theme are assumed, Operation unit 22
Type from.

【００５４】（２）検索制御部２４は、これに応動し
て、文字データ蓄積部２８に蓄積されている文字データ
にこのキーワードを照合し、該当データの有無を検査す
る。照合は、完全一致、部分一致その他、視聴者の所望
する形態で実施する。（３）検索制御部２４は、該当する要約、映像ソフトウ
ェアが存在したら、その目次情報をディスプレイ２６に
一覧表示する。ＤＶＤなどの媒体に収容されたものな
ら、そのＤＶＤ等を表わすボリューム名やタイトル名、
トラック番号その他をディスプレイ２６に一覧表示す
る。(2) In response, the search control unit 24 collates the keyword with the character data stored in the character data storage unit 28, and checks whether or not the keyword exists. The collation is performed in a perfect match, a partial match or any other form desired by the viewer. (3) If there is a corresponding summary or video software, the search controller 24 lists the table of contents on the display 26. If stored in a medium such as a DVD, a volume name or a title name representing the DVD or the like,
A list of track numbers and the like is displayed on the display 26.

【００５５】（４）視聴者は、この一覧の中から所望の
ものを選択する。放送時間、ＤＶＤタイトルなど、一覧
表に出されている目次情報を手掛かりにすれば、所望す
る要約へのアプローチが容易になる。検索制御部２４
は、視聴者が選択した文字データに対応した要約を要約
蓄積部２１から取り出し、ディスプレイ２６に表示す
る。(4) The viewer selects a desired one from the list. Using the table of contents information, such as the broadcast time and the DVD title, provided in the list as a clue makes it easy to approach a desired summary. Search control unit 24
Extracts an abstract corresponding to the character data selected by the viewer from the abstract storage unit 21 and displays it on the display 26.

【００５６】要約の表示方法も種々想定される。例えば
ページをめくるのと同じように、視聴者がキー等を押し
た都度、一つづつ代表画像を表示していっても良い。自
動めくり式に、ある程度の速さで次々と代表画像を表示
していっても良い。映像ソフトウェアには言うまでもな
くストーリー性がある。従って、早い速度で次々と表示
していっても、視聴者は意外に的確に要約の内容を理解
できる。スロー再生、レビュー等の機能も設けておき、
必要に応じてこれを利用出来るようにしておくのも良
い。Various methods of displaying the summary are also conceivable. For example, each time a viewer presses a key or the like, a representative image may be displayed one by one in a manner similar to turning a page. The representative images may be displayed one after another at a certain speed in an automatic flip type. Needless to say, video software has a story. Therefore, the viewer can unexpectedly and accurately understand the contents of the summary even if the images are displayed one after another at a high speed. Functions such as slow playback and review are also provided,
You may want to make this available as needed.

【００５７】（５）適宜の数の要約を見て、或いは最後
まで見て、その映像ソフトウェア或いはその一部分が、
所望したテーマに関連したもの、或いは関連していそう
だと判断できたら、その代表画像を取り出した位置から
の原映像ソフトウェアの再生を指示する。要約は原映像
ソフトウェアの１０分の１から１００分の１の大きさに
なっており、見るのに多くの時間を要しない。それ故、
このように２段階にした方が、所望する原映像への到達
時間を短縮できる。(5) After looking at an appropriate number of summaries or looking at the end, the video software or a part thereof
When it is determined that the representative image is related to or is likely to be related to the desired theme, the reproduction of the original video software from the position where the representative image is extracted is instructed. The summary is one-tenth to one-hundredth the size of the original video software and does not take much time to view. Therefore,
With the two stages, the time required to reach the desired original video can be reduced.

【００５８】検索制御部２４は、上記視聴者の指示に応
動して、原映像蓄積部１２からその要約取り出し位置以
降の原映像ソフトウェアを順次読み出し、ディスプレイ
２６に表示する。この処理は、ビデオテープデッキその
他、通常の映像機器での再生処理と同様に行なわれる。
ここでも、早送り、レビュー、ポーズその他のキーを設
けておくと使い勝手が良い。In response to the viewer's instruction, the search control unit 24 sequentially reads out the original video software from the original image storage unit 12 onward after the summary extraction position and displays it on the display 26. This processing is performed in the same manner as the reproduction processing on a video tape deck or other ordinary video equipment.
Again, it is convenient to provide fast forward, review, pause and other keys.

【００５９】（Ｂ）予めキーワードを登録しておいて、
該当したものを一覧表示。この場合は、視聴者が見張っていたい事項を表わすキー
ワードを先にキーワード保持部２７に蓄積しておく。例
えば一日一回、視聴者が確認操作をすると、これに応動
して、登録されているキーワードに一致する文字データ
を含んでいた要約、或いは原映像ソフトウェアの目次情
報が、ディスプレイ２６に一覧表示される。(B) A keyword is registered in advance,
Applicable list is displayed. In this case, keywords indicating items that the viewer wants to watch are stored in the keyword holding unit 27 first. For example, when the viewer performs a confirmation operation once a day, in response to the confirmation operation, the summary including the character data matching the registered keyword or the table of contents information of the original video software is displayed on the display 26 as a list. Is done.

【００６０】即ち、我々が仕事上、或いは個人的に必要
とする、或いは見張っておきたい情報の種類は、ある程
度限られている。また、毎日同じキーワードを繰り返し
入力するのもいささか煩わしい。そこでこのように、こ
の種固定的に使用されるキーワードを、キーワード保持
部２７に予め登録しておく。そして視聴者が欲するとき
該当するものをディスプレイ２６に一覧表示する。例え
ば、特許に関する映像ソフトウェアを常に把握しておき
たいと欲するなら、「特許」「工業所有権」「特許庁」
といったキーワードをキーワード保持部２７に登録して
おき、該当する映像ソフトウェアを補捉する。政治家、
有名人、映画スターの名前を登録しておけば、自分の知
りたい映像をすぐに見つけだせる。That is, the types of information that we need on business or personally or want to keep an eye on are limited to some extent. Entering the same keyword repeatedly every day can be a little annoying. Thus, in this way, the keywords used in this kind fixedly are registered in the keyword holding unit 27 in advance. Then, when the viewer wants, a list of applicable items is displayed on the display 26. For example, if you want to keep track of the video software related to patents, you can use “Patents”, “Industrial Property Rights”,
Is registered in the keyword holding unit 27, and the corresponding video software is captured. politician,
By registering the names of celebrities and movie stars, you can quickly find the video you want to know.

【００６１】（Ｃ）蓄積されている文字データを一覧表
示。例えば蓄積されている文字データをディスプレイ２６に
五十音順に表示する。視聴者が、その中から所望のもの
を選択し、それに対応した要約や原映像ソフトウェアを
表示するようにしても良い。(C) List of stored character data. For example, the stored character data is displayed on the display 26 in alphabetical order. The viewer may select a desired one from among them and display the corresponding summary or original video software.

【００６２】これら各種手法で要約や映像ソフトウェア
を確認した後、若し保存したい代表画像、要約、映像ソ
フトウェアなどが出て来たら、不図示「保存」のキー等
を操作する。検索制御部２４はこれに応動して、それら
を所望ソフト蓄積部２９に格納する。これにより後にな
って必要なとき何時でも参照出来る。After the summary and the video software are confirmed by these various methods, if the representative image, the summary, the video software, etc. to be stored come out, the user operates a key of "save" (not shown). In response, the search control unit 24 stores them in the desired software storage unit 29. This allows you to refer to it later, whenever you need it.

【００６３】なお音声の再生方法は任意である。例えば
原映像ソフトウェアを再生するときは、同じ速度で再生
すれば良い。しかし要約を再生する場合は、元の儘で再
生は出来ない。そこで、例えば夫々の代表画像抽出位置
付近の数秒間の音声データを取り出し、これを原映像の
ときと同じか少し早めて再生する。また、デジタル化し
たあと適宜間引きして蓄積し、それを再生しても良い。The method of reproducing the sound is arbitrary. For example, when playing back the original video software, it may be played back at the same speed. However, when the summary is reproduced, it cannot be reproduced as it is. Therefore, for example, audio data for several seconds near each representative image extraction position is extracted, and is reproduced at the same or slightly earlier than the original video. Alternatively, after digitization, the data may be thinned out and stored, and then reproduced.

【００６４】変形例について述べる。要約再生は代表画
像４つとか６つとかをディスプレイ２６の一つの画面に
纏めて表示しても良い。前述のとおり代表画像はストー
リー性を持っている。従って、これらが複数個一度にデ
ィスプレイ２６に表示されても、視聴者は容易にその内
容を感得出来る。ストーリーがあることを考慮すると、
このようにした方が判りやすいとも言える。A modification will be described. In summary playback, four or six representative images may be displayed on one screen of the display 26. As described above, the representative image has a story. Therefore, even if a plurality of these are displayed on the display 26 at once, the viewer can easily perceive the contents. Given the story,
It can be said that this way is easier to understand.

【００６５】従って、例えば代表画像Ｘ枚を一度に画面
に表示したとすれば、単純には１枚づつのときのＸ分の
一の時間で当該要約を再生することが出来る。所要時間
が短くて済むから、このＸ枚の代表画像を１枚に合成し
た表示画面については、表示時間を長くして良い。これ
により一層当該映像ソフトウェアの理解が容易になる。Therefore, if, for example, X representative images are displayed on the screen at a time, the summary can be simply reproduced in 1 / X time for each image. Since the required time can be reduced, the display time of the display screen in which the X representative images are combined into one image may be increased. This further facilitates understanding of the video software.

【００６６】本発明は専用の装置として実現してもよい
し、パーソナルコンピュータのアプリケーションプログ
ラムとして実現してもよい。専用チップを製作し、パソ
コンに組み込んだり、ＤＶＤプレーヤ、ビデオテープデ
ッキ、ゲームプレーヤ等に組み込んだりしても良い。The present invention may be realized as a dedicated device or as an application program of a personal computer. A dedicated chip may be manufactured and incorporated in a personal computer, or incorporated in a DVD player, video tape deck, game player, or the like.

【００６７】一つの映像ソフトウェアについて、異なる
要素の変化を動機とする夫々の代表画像を抽出し、別個
の要約として蓄積しても良い。記録内容チェックの際、
夫々の要約を再生して、理解しやすい方でその後のチェ
ックを実行することが出来る。For one piece of video software, respective representative images motivated by changes in different elements may be extracted and stored as separate summaries. When checking the recorded contents,
Each summary can be replayed and the subsequent checks performed by those who are easy to understand.

【００６８】原映像蓄積部１２、要約蓄積部２１は、家
庭用のビデオテープその他の磁気テープを用いても良
い。記録時間が足りないときは、デッキ（ドライブ）を
複数個配置する。原映像ソフトウェアの蓄積にはそれな
りの記憶容量を要する。従ってその記録は前述の循環的
方式が良いと思われる。しかし、要約と文字コードにつ
いては、所要記憶容量が小さいから、かなり長期間蓄積
しても構わない。The original video storage section 12 and the summary storage section 21 may use a home video tape or other magnetic tape. If the recording time is not enough, arrange a plurality of decks (drives). The storage of the original video software requires a certain storage capacity. Therefore, it seems that the above-mentioned cyclical method is preferable for the recording. However, the summary and the character code may be stored for a considerably long time because the required storage capacity is small.

【００６９】重ね書きで原映像ソフトウェアが消えてし
まうので確認がやや困難にはなるが、文字データと要約
を残しておくだけでも、キーワードに該当した映像ソフ
トウェアの中身が何であったかは、ある程度把握出来
る。それらを手掛かりに、放送元、データ供給元へ問い
合せをすることも出来る。従って、この実施形式にした
としても、データベースとして効用は大きい。Although the original video software is erased by overwriting, it is somewhat difficult to confirm. However, it is possible to grasp to some extent what the contents of the video software corresponding to the keyword were by simply leaving the character data and the summary. . Using these as clues, it is also possible to make inquiries to the broadcast source and data supplier. Therefore, even with this embodiment, the utility as a database is great.

【００７０】要約と文字コードについても、検索ノイズ
を増やす原因にも成るから、必要のないものは蓄積しな
い方が良い。この為、何日分か蓄積されたものを視聴者
が検査し、保存すると決めたものだけ蓄積して、残りを
その場で消去してしまうという処理にも良い（検索ノイ
ズ＝当該キーワードに該当してはいるが、検索目的から
は外れているデータ）。It is better not to accumulate unnecessary summaries and character codes because they also increase search noise. For this reason, it is also good for a process in which the viewer checks the data stored for several days, stores only the data determined to be stored, and deletes the rest on the spot (search noise = applicable to the keyword) Data that does not serve the purpose of search).

【００７１】[0071]

【発明の効果】以上説明したように本発明は、映像ソフ
トウェア中に存在する文字が、その映像ソフトウェアの
内容を端的に表わすという性質に着目し、これらを取り
出して蓄積し、データベースとして利用可能にした。従
って、視聴者は、何時でも必要があるときに、所望の検
索形式、例えば日常使用されている言葉をキーワードす
るなどで、大量に供給される映像ソフトウェアの中から
容易に所望のもの引出すことが出来、仕事に、個人的用
途にこれらを活用することが出来る。As described above, the present invention focuses on the property that characters present in video software simply represent the contents of the video software, and retrieves and accumulates them, making them usable as a database. did. Therefore, the viewer can easily extract a desired one from the large amount of supplied video software by using a desired search format, for example, a keyword of a word used everyday, whenever necessary. You can use them for work, work, and personal use.

[Brief description of the drawings]

【図１】本発明の実施の形態の一例を示すブロック図。FIG. 1 is a block diagram illustrating an example of an embodiment of the present invention.

【図２】映像ソフトウェア、代表画像、文字データの例
を示す概念図。FIG. 2 is a conceptual diagram showing an example of video software, a representative image, and character data.

【図３】時刻ｔの映像データと、時刻ｔ＋△ｔの映像デ
ータの差分とシーンの切り替わりとの関係を示す波形
図。FIG. 3 is a waveform diagram showing a relationship between a difference between video data at time t, video data at time t + Δt, and scene switching.

[Explanation of symbols]

１００…映像検索装置１１…入力インタ
フェース１２…原映像蓄積部１３…変化位置検
出部１４…代表画像抽出部１６…ＴＶ放送受
信部１７…ＤＶＤプレーヤ１８…ビデオテー
プデッキ１９…インターネット接続部２０…ＣＤ−ＲＯ
Ｍドライブ２１…要約蓄積部２２…ユーザ操作
部２３…文字データ抽出部２４…検索制御部２６…ディスプレイ２７…キーワード
保持部２８…文字データ蓄積部２９…所望ソフト
蓄積部ＣＳ…シーン切替Ｄａ(t)…時刻ｔ
と時刻ｔ+△ｔの映像の差分
ＰＣ…シーン中央Ｌ…文字データ直接抽出ＶＤ…代表画像ＶＥ…要約ＶＳ…映像ソフト
ウェアREFERENCE SIGNS LIST 100 video search device 11 input interface 12 original video storage unit 13 change position detection unit 14 representative image extraction unit 16 TV broadcast reception unit 17 DVD player 18 video tape deck 19 Internet connection unit 20 CD -RO
M drive 21 ... Summary storage unit 22 ... User operation unit 23 ... Character data extraction unit 24 ... Search control unit 26 ... Display 27 ... Keyword storage unit 28 ... Character data storage unit 29 ... Desired software storage unit CS ... Scene switching Da (t ) ... time t
And the difference between the video at time t + △ t
PC: Scene center L: Direct extraction of character data VD: Representative image VE: Summary VS: Video software

Claims

[Claims]

1. Character data extracting means for extracting character data from video software, character data storing means for storing the character data together with the extraction position information, and selecting a desired one from the stored character data. A video search device, comprising: a selection unit that performs the selection and an extraction unit that extracts an image related to the selected character data from the video software.

2. The video search device according to claim 1, wherein the text data extraction unit extracts the text data from a representative image extracted from the video software.

3. The video search device according to claim 1, wherein the extraction unit extracts a representative image related to the selected character data.