JP2002171481A

JP2002171481A - Video processing apparatus

Info

Publication number: JP2002171481A
Application number: JP2000369068A
Authority: JP
Inventors: Hiroko Ida; 裕子井田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2000-12-04
Filing date: 2000-12-04
Publication date: 2002-06-14

Abstract

PROBLEM TO BE SOLVED: To make a job for attaching an index to a video image efficient so as to reduce the time and the cost required for the job. SOLUTION: A video processing apparatus editing and retrieving a video image is provided with a video input section 101 that receives a structured video image, an audio recognition section 102 that recognizes audio in the video image, a keyword extract section 103 that extracts a keyword from the recognized audio, and a keyword recording section 104 that records the keyword together with the cross-reference with respect to the video image.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、構造化された映像
をさらに編集あるいは検索する映像処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video processing apparatus for further editing or retrieving a structured video.

【０００２】[0002]

【従来の技術】近年、記録メディアに関する技術、通信
技術の向上により、動画像と音声とを含むディジタルデ
ータ（本明細書中では映像と記す）が盛んに流通するよ
うになっている。映像の流通に伴い、記録メディアに記
録されている映像を単に時系列的に表示するのでなく、
映像の必要な部分を取り出し、閲覧、編集することが可
能なシステムが必要になっている。2. Description of the Related Art In recent years, digital data (hereinafter, referred to as video in this specification) including moving images and audio has been actively distributed due to improvements in recording media technology and communication technology. With the distribution of video, instead of simply displaying the video recorded on the recording media in chronological order,
There is a need for a system that can extract, view, and edit the necessary parts of the video.

【０００３】現在、映像の取り出し、閲覧、編集をする
システムに、映像によって表現される内容をインデック
スとして記述し、インデックスの情報に基づいて映像を
検索し、再構成、再編集するものがある。なお、本明細
書中では、映像をインデックスが付されるグルーブに分
類する処理を映像の構造化というものとする。構造化
は、映像を階層的に分類することによってなされる場合
も多い。[0003] Currently, there is a system for extracting, browsing, and editing a video in which the contents expressed by the video are described as an index, and the video is searched, reconstructed, and re-edited based on the index information. In this specification, a process of classifying a video into a groove to which an index is added is referred to as video structuring. Structuring is often performed by hierarchically classifying images.

【０００４】映像にインデックスを付す方法には、物理
的に記述する方法と言語的に記述する方法とがある。物
理的記述方法は、例えばシーンの切換えやカメラアング
ルといった規則にしたがう内容を記述する方法である。
また、言語的記述方法は、例えば映像における被写体の
名称や被写体の行動などを文章で記述する方法である。[0004] There are two methods for indexing a video image: a physical description method and a linguistic description method. The physical description method is a method of describing contents according to rules such as scene switching and camera angle.
The linguistic description method is a method of describing, for example, the name of a subject in a video, the behavior of the subject, and the like in text.

【０００５】[0005]

【発明が解決しようとする課題】言語的記述方法でイン
デックスを付す場合、最も簡単な方法はマニュアルによ
るものである。ところが、大量の映像にマニュアルで言
語的記述方法によるインデックスを付す作業は作業量が
大きく、作業にかかる時間やコストが膨大なものになる
という欠点がある。このため、内容を反映したインデッ
クスを、より効率的に映像に付すことができる映像処理
装置が要求されている。The simplest method of indexing with a linguistic description method is manual. However, the task of manually indexing a large number of videos with a linguistic description method has a disadvantage that the amount of work is large, and the time and cost required for the work are enormous. Therefore, there is a demand for a video processing device that can efficiently attach an index reflecting the content to a video.

【０００６】本発明は、上記の点に鑑みてなされたもの
であり、映像にインデックスを付す作業を効率化し、作
業にかかる時間やコストを軽減することができる映像処
理装置を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and has as its object to provide an image processing apparatus capable of efficiently indexing a video and reducing the time and cost required for the operation. And

【０００７】[0007]

【課題を解決するための手段】上記した課題を解決し、
目的を達成するため、請求項１に記載の発明にかかる映
像処理装置は、構造化された映像を、編集、検索する映
像処理装置であって、前記構造化された映像を入力する
映像入力手段と、前記映像入力手段に入力した映像の音
声を認識する音声認識手段と、前記音声認識手段によっ
て認識された音声からキーワードを抽出するキーワード
抽出手段と、前記キーワードを前記構造化された映像に
対する対応関係と共に記録するキーワード記録手段と、
を備えることを特徴とする。[MEANS FOR SOLVING THE PROBLEMS] To solve the above-mentioned problems,
According to another aspect of the present invention, there is provided a video processing apparatus for editing and retrieving a structured video, wherein the video input unit inputs the structured video. Voice recognition means for recognizing the voice of the video input to the video input means, keyword extraction means for extracting a keyword from the voice recognized by the voice recognition means, and correspondence between the keyword and the structured video Keyword recording means for recording with the relationship;
It is characterized by having.

【０００８】この請求項１に記載の発明によれば、構造
化された映像に対し、映像に含まれる音声に基づいて自
動的にキーワードを付すことができる。このため、キー
ワードをマニュアル（作業者が映像を見てキーワードを
付す作業）で付すことに比べて作業者にかかる負荷を大
幅に軽減し、作業時間をも短縮することができる。According to the first aspect of the present invention, a keyword can be automatically attached to a structured video based on audio included in the video. For this reason, the load on the worker can be greatly reduced and the work time can be reduced as compared with the case where the keyword is manually added (the task of attaching the keyword while watching the video).

【０００９】請求項２に記載の発明にかかる映像処理装
置は、さらに、前記キーワード抽出手段によって抽出さ
れたキーワードのうちのいずれかから前記構造化された
映像のタイトルを選択するタイトル選択手段と、前記タ
イトルを前記構造化された映像に対する対応関係と共に
記録するタイトル記録手段と、を備えることを特徴とす
る。A video processing apparatus according to a second aspect of the present invention further comprises: a title selection unit that selects a title of the structured video from one of the keywords extracted by the keyword extraction unit; Title recording means for recording the title together with the correspondence to the structured video.

【００１０】この請求項２に記載の発明によれば、構造
化された映像に対し、キーワードに基づいて自動的にタ
イトルを付すことができる。According to the second aspect of the present invention, a title can be automatically given to a structured video based on a keyword.

【００１１】請求項３に記載の発明にかかる映像処理装
置は、前記タイトル選択手段は、前記キーワード抽出手
段によって抽出されたキーワードのうちの所定の単位に
おける出現頻度の高いキーワードをタイトルに選択する
ことを特徴とする。According to a third aspect of the present invention, in the video processing device, the title selecting means selects a keyword having a high frequency of appearance in a predetermined unit from among the keywords extracted by the keyword extracting means. It is characterized by.

【００１２】この請求項３に記載の発明によれば、より
多くのキーワードを映像のタイトルに反映させることが
できる。According to the third aspect of the present invention, more keywords can be reflected in the title of the video.

【００１３】請求項４に記載の発明にかかる映像処理装
置は、さらに、キーワードの候補あるいはタイトルの候
補を予め登録する候補登録手段を備え、前記構造化され
た映像の音声に含まれる情報のうちに前記候補登録手段
に登録されているキーワードあるいはタイトルと一致す
る情報が含まれている場合、該情報をキーワードあるい
はタイトルに選択する候補選択手段と、を備えることを
特徴とする。[0014] The video processing apparatus according to the invention of claim 4 further comprises candidate registration means for pre-registering a keyword candidate or a title candidate, and among the information included in the audio of the structured video. In the case where information that matches a keyword or title registered in the candidate registration means is included, candidate selection means for selecting the information as a keyword or title is provided.

【００１４】この請求項４に記載の発明によれば、予め
登録されたキーワードやタイトルの候補からキーワード
やタイトルを選択することができる。According to the fourth aspect of the present invention, a keyword or a title can be selected from keyword or title candidates registered in advance.

【００１５】請求項５に記載の発明にかかる映像処理装
置は、前記登録手段が、キーワードの候補あるいはタイ
トルの候補を選択の優先順位を付して登録することを特
徴とする。According to a fifth aspect of the present invention, in the video processing apparatus, the registering means registers a keyword candidate or a title candidate with selection priority.

【００１６】この請求項５に記載の発明によれば、登録
手段にタイトル候補語句を選択の優先順位を付して登録
しておくことができる。According to the fifth aspect of the present invention, it is possible to register the title candidate words in the registration means with the priority of selection.

【００１７】請求項６に記載の発明にかかる映像処理装
置は、前記キーワード記録手段が、キーワードをテキス
トデータまたは音声データとして記録することを特徴と
する。A video processing apparatus according to a sixth aspect of the present invention is characterized in that the keyword recording means records the keyword as text data or audio data.

【００１８】この請求項６に記載の発明によれば、キー
ワード記録手段が、キーワードをテキストデータ（語
句）で記録する他、音声のデータとして記録することも
できる。According to the invention described in claim 6, the keyword recording means can record the keyword as text data (words) or as voice data.

【００１９】請求項７に記載の発明にかかる映像処理装
置は、前記タイトル記録手段が、タイトルをテキストデ
ータまたは音声データとして記録することを特徴とす
る。According to a seventh aspect of the present invention, in the video processing apparatus, the title recording means records the title as text data or audio data.

【００２０】この請求項７に記載の発明によれば、タイ
トル記録手段が、タイトルをテキストデータ（語句）で
記録する他、音声のデータとして記録することもでき
る。According to the seventh aspect of the present invention, the title recording means can record the title as text data (words) or as audio data.

【００２１】[0021]

【発明の実施の形態】以下に添付図面を参照して、この
発明にかかる映像処理装置の好適な実施の形態１〜４を
詳細に説明する。（実施の形態１）図１は、本発明の実施の形態１の映像
処理装置の構成を説明するためのブロック図である。図
示した構成は、構造化された映像を、編集、検索する映
像処理装置である。そして、構造化された映像を入力す
る映像入力部１０１と、入力した映像の音声を認識する
音声認識部１０２と、認識された音声からキーワードを
抽出するキーワード抽出部１０３と、キーワードを構造
化された映像に対する対応関係と共に記録するキーワー
ド記録部１０４と、を備えている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments 1 to 4 of an image processing apparatus according to the present invention will be described below in detail with reference to the accompanying drawings. (Embodiment 1) FIG. 1 is a block diagram for explaining a configuration of a video processing apparatus according to Embodiment 1 of the present invention. The illustrated configuration is a video processing device that edits and searches a structured video. Then, a video input unit 101 for inputting the structured video, a voice recognition unit 102 for recognizing the voice of the input video, a keyword extraction unit 103 for extracting a keyword from the recognized voice, and a structured keyword And a keyword recording unit 104 that records the information together with the corresponding relationship to the video.

【００２２】構造化された映像は、映像入力部１０１か
ら本実施の形態の映像処理装置に入力する。構造化され
た映像は複数のセグメントで構成されている。音声認識
部１０２は、音声認識技術を用いて映像に含まれる音声
を認識する。キーワード抽出部１０３は、認識された音
声から各セグメントごとにキーワードを抽出する。The structured video is input from the video input unit 101 to the video processing device of the present embodiment. The structured video is composed of a plurality of segments. The voice recognition unit 102 recognizes voice included in the video using a voice recognition technology. The keyword extracting unit 103 extracts a keyword for each segment from the recognized speech.

【００２３】キーワードの抽出は、例えば、後述するよ
うに、予めキーワードの候補を映像処理装置に記録して
おき、音声に含まれる語句に記録されたキーワードの候
補と一致するものがあった場合、この候補と一致する語
句をキーワードとして抽出してもよい。また、例えば、
音声において認識された回数の多い語句をキーワードと
してもよい。For example, as will be described later, keyword candidates are recorded in advance in the video processing apparatus, and if there is a keyword that matches a keyword candidate recorded in a phrase included in the sound, as described later, A phrase that matches this candidate may be extracted as a keyword. Also, for example,
A phrase that is recognized many times in voice may be used as a keyword.

【００２４】上記した実施の形態１によれば、構造化さ
れた映像を構成するセグメントに対し、映像に含まれる
音声に基づいて自動的にキーワードを付すことができ
る。このような実施の形態１は、キーワードをマニュア
ル（作業者が映像を見てキーワードを付す作業）で付す
ことに比べて作業者にかかる負荷を大幅に軽減し、作業
時間をも短縮することができる。According to the first embodiment, it is possible to automatically attach a keyword to a segment constituting a structured video based on audio included in the video. According to the first embodiment, it is possible to greatly reduce the load on the worker and to shorten the work time as compared with the case where the keyword is manually added (work in which the worker attaches the keyword while watching the video). it can.

【００２５】（実施の形態２）次に、実施の形態２の映
像処理装置について説明する。図２は、実施の形態２の
映像処理装置を説明するためのブロック図である。な
お、図２に示した構成のうち、図１に示した構成と同様
の構成については同様の符号を付して説明を一部省くも
のとする。(Embodiment 2) Next, a video processing apparatus according to Embodiment 2 will be described. FIG. 2 is a block diagram for explaining the video processing device according to the second embodiment. In addition, among the configurations illustrated in FIG. 2, the same configurations as the configurations illustrated in FIG. 1 are denoted by the same reference numerals, and a description thereof is partially omitted.

【００２６】実施の形態２の映像処理装置は、さらに、
キーワード抽出部１０３によって抽出されたキーワード
のうちのいずれかから構造化された映像のタイトルを選
択するタイトル選択部２０１と、選択されたタイトルを
構造化された映像に対する対応関係と共に記録するタイ
トル記録部２０２と、を備えたものである。The video processing apparatus according to the second embodiment further includes:
A title selection unit 201 for selecting a title of a structured video from any of the keywords extracted by the keyword extraction unit 103, and a title recording unit for recording the selected title together with the correspondence to the structured video 202.

【００２７】上記した映像処理装置において、映像入力
部１０１に映像と共に入力し、音声認識部１０２で認識
された音声は、キーワード抽出部１０３でキーワードを
抽出される。抽出されたキーワードは、タイトル選択部
２０１に入力する。タイトル選択部２０１は、抽出され
たキーワードの中からタイトルを選択する。In the above-described video processing apparatus, the voice input to the video input unit 101 together with the video and the voice recognized by the voice recognition unit 102 is subjected to keyword extraction by the keyword extraction unit 103. The extracted keywords are input to the title selection unit 201. The title selection unit 201 selects a title from the extracted keywords.

【００２８】タイトルの選択は、例えば、抽出されたキ
ーワードのうちの抽出頻度が高いものを選択することに
よって行うことができる。キーワードの抽出頻度に基づ
いてタイトルを選択する場合、図３に示すように、図２
の構成にさらに出現頻度算出部を設けてキーワードの抽
出頻度を算出するようにしてもよい。また、後述するよ
うに、予めタイトルの候補を映像処理装置に記録してお
き、キーワードに記録されたタイトルの候補と一致する
ものがあった場合、この候補と一致するキーワードをタ
イトルとして抽出してもよい。The selection of a title can be performed, for example, by selecting a keyword having a high extraction frequency among the extracted keywords. When a title is selected based on the keyword extraction frequency, as shown in FIG.
May be further provided with an appearance frequency calculation unit to calculate the keyword extraction frequency. In addition, as described later, title candidates are recorded in advance in the video processing apparatus, and if there is a title that matches the candidate recorded in the keyword, the keyword that matches the candidate is extracted as a title. Is also good.

【００２９】上記した実施の形態２によれば、構造化さ
れた映像に対し、キーワードに基づいて自動的にタイト
ルを付すことができる。実施の形態２は、タイトルをマ
ニュアル（作業者が映像を見てタイトルを付す作業）で
付すことに比べて作業者にかかる負荷を大幅に軽減し、
作業時間をも短縮することができる。According to the second embodiment, a title can be automatically given to a structured video based on a keyword. In the second embodiment, the burden on the operator is significantly reduced as compared with the case where the title is manually assigned (the operation in which the operator views the video and assigns the title).
Work time can also be reduced.

【００３０】また、実施の形態２によれば、より多くの
セグメントのキーワードを映像のタイトルに反映させる
ことができる。このため、映像を構成するセグメントの
内容を反映したタイトルを映像に付すことができる。According to the second embodiment, the keywords of more segments can be reflected in the title of the video. For this reason, a title reflecting the contents of the segments constituting the video can be given to the video.

【００３１】（実施の形態３）次に、実施の形態３の映
像処理装置について説明する。図４は、実施の形態３の
映像処理装置を説明するためのブロック図である。な
お、図４に示した構成のうち、図１に示した構成と同様
の構成については同様の符号を付して説明を一部省くも
のとする。(Embodiment 3) Next, a video processing apparatus according to Embodiment 3 will be described. FIG. 4 is a block diagram for explaining the video processing device according to the third embodiment. In addition, among the configurations illustrated in FIG. 4, the same configurations as the configurations illustrated in FIG. 1 are denoted by the same reference numerals, and description thereof is partially omitted.

【００３２】実施の形態３の映像処理装置は、さらに、
キーワードの候補（キーワード候補語句）あるいはタイ
トルの候補（タイトル候補語句）が予め登録された登録
部４０３およびキーワード・タイトル照合部４０１、キ
ーワード・タイトル記録部４０２を備えている。キーワ
ード・タイトル照合部４０１は、構造化された映像の音
声に含まれる語句のうちにキーワード語句、あるいはタ
イトル候補語句と一致する語句が含まれている場合、キ
ーワード語句に一致した語句をキーワードとして抽出す
る。また、タイトル語句に一致した語句をタイトルとし
て抽出する（タイトルは、抽出されたキーワードのうち
のタイトル語句と一致するものとしてもよい）。The video processing apparatus according to the third embodiment further includes:
A registration unit 403 in which keyword candidates (keyword candidate phrases) or title candidates (title candidate phrases) are registered in advance, a keyword / title collation unit 401, and a keyword / title recording unit 402 are provided. The keyword / title collation unit 401 extracts a keyword that matches the keyword phrase as a keyword when the keyword included in the structured video sound includes a keyword phrase or a phrase that matches the title candidate phrase. I do. In addition, a phrase that matches the title phrase is extracted as a title (the title may be the same as the title phrase of the extracted keywords).

【００３３】上記した映像処理装置において、映像入力
部１０１に映像と共に入力し、音声認識部１０２で認識
された音声は、キーワード・タイトル照合部４０１にお
いて登録部４０３に登録されているキーワードと照合す
る。そして、音声に登録部４０３のキーワード候補語句
と一致する語句があった場合、一致した語句を、語句が
含まれるセグメントのキーワードとして抽出する。In the above-described video processing apparatus, the voice input together with the video to the video input unit 101 and the voice recognized by the voice recognition unit 102 is collated by the keyword / title collation unit 401 with the keyword registered in the registration unit 403. . Then, when the voice includes a phrase that matches the keyword candidate phrase in the registration unit 403, the matching phrase is extracted as a keyword of a segment including the phrase.

【００３４】なお、実施の形態３は、抽出されたキーワ
ードのうちからタイトルを抽出するものとする。このた
め、キーワード・タイトル照合部４０１は、抽出したキ
ーワードのうちにタイトル候補語句に一致するキーワー
ドがある場合、このキーワードをタイトルとする。In the third embodiment, a title is extracted from the extracted keywords. For this reason, when there is a keyword that matches the title candidate phrase among the extracted keywords, the keyword / title collation unit 401 sets this keyword as the title.

【００３５】キーワード・タイトル照合部４０１によっ
て抽出されたキーワードおよびタイトルは、キーワード
・タイトル記録部４０２に出力し、キーワード・タイト
ル記録部４０２に記録される。なお、本発明の映像処理
装置は、音声認識部１０２によって認識された音声にタ
イトル候補語句と一致する語句があった場合、この語句
がキーワードとして抽出されていない場合にもタイトル
とすることもできる。The keywords and titles extracted by the keyword / title collation unit 401 are output to the keyword / title recording unit 402 and recorded in the keyword / title recording unit 402. It should be noted that the video processing apparatus of the present invention can also set a title even when a word that matches a title candidate word is included in the voice recognized by the voice recognition unit 102, even when this word is not extracted as a keyword. .

【００３６】上記した実施の形態３によれば、予め登録
されたキーワードやタイトルの候補からキーワードやタ
イトルを選択することにより、オペレータ側の意図をキ
ーワードやタイトルに反映することができる。According to the third embodiment, by selecting a keyword or a title from candidates for a keyword or a title registered in advance, the intention of the operator can be reflected on the keyword or the title.

【００３７】また、実施の形態３は、登録部４０３にキ
ーワード候補語句、タイトル候補語句を選択の優先順位
を付して登録しておくことができる。このようにすれ
ば、よりオペレータ側の意図をキーワードやタイトルに
反映することができる。In the third embodiment, the keyword candidate words and the title candidate words can be registered in the registration unit 403 with the priority of selection. By doing so, the intention of the operator can be reflected on the keyword or title.

【００３８】さらに、実施の形態１〜３は、キーワード
記録部１０４、タイトル記録部２０２、キーワード・タ
イトル記録部４０２が、キーワードやタイトルをテキス
トデータ（語句）で記録する他、音声のデータとして記
録することもできる。このようにすれば、キーワードや
タイトルを、語句の他、音声として映像に付すことがで
きる。Further, in the first to third embodiments, the keyword recording unit 104, the title recording unit 202, and the keyword / title recording unit 402 record keywords and titles as text data (phrases) and also as audio data. You can also. In this way, keywords and titles can be attached to video as audio in addition to words and phrases.

【００３９】[0039]

【発明の効果】以上説明したように、請求項１に記載の
発明は、キーワードをマニュアル（作業者が映像を見て
キーワードを付す作業）で付すことに比べて作業者にか
かる負荷を大幅に軽減し、作業時間をも短縮することが
できるという効果を奏する。このような請求項１に記載
の発明は、映像にインデックスを付す作業を効率化し、
作業にかかる時間やコストを軽減することができるもの
といえる。As described above, according to the first aspect of the present invention, the load on the worker is significantly reduced as compared with the case where the keyword is manually added (the task of attaching the keyword by viewing the video by the worker). This has the effect of reducing the number of working hours. The invention described in claim 1 streamlines the work of indexing a video,
It can be said that the time and cost required for the work can be reduced.

【００４０】請求項２に記載の発明は、タイトルをマニ
ュアル（作業者が映像を見てタイトルを付す作業）で付
すことに比べて作業者にかかる負荷を大幅に軽減し、作
業時間をも短縮することができるという効果を奏する。
このような請求項２に記載の発明は、映像にインデック
スを付す作業を効率化し、作業にかかる時間やコストを
軽減することができるものといえる。According to the second aspect of the present invention, the burden on the operator is greatly reduced and the work time is reduced as compared with the case where the title is manually assigned (the task of giving the title while watching the video). It has the effect that it can be done.
According to the second aspect of the present invention, it can be said that the work of adding an index to a video can be made more efficient, and the time and cost required for the work can be reduced.

【００４１】請求項３に記載の発明は、映像に付された
キーワードを反映したタイトルを映像に付すことができ
るという効果を奏する。The third aspect of the invention has an effect that a title reflecting the keyword attached to the video can be provided to the video.

【００４２】請求項４に記載の発明は、オペレータ側の
意図をキーワードやタイトルに反映することができる。According to the fourth aspect of the present invention, the intention of the operator can be reflected on the keyword or the title.

【００４３】請求項５に記載の発明は、よりオペレータ
側の意図をキーワードやタイトルに反映することができ
る。According to the fifth aspect of the invention, the intention of the operator can be reflected on the keyword and the title.

【００４４】請求項６に記載の発明は、キーワードを、
語句の他、音声として映像に付すことができる。According to a sixth aspect of the present invention, a keyword is
In addition to words and phrases, they can be attached to video as audio.

【００４５】請求項７に記載の発明は、タイトルを、語
句の他、音声として映像に付すことができる。According to the seventh aspect of the present invention, a title can be attached to a video as audio in addition to a word.

[Brief description of the drawings]

【図１】本発明の実施の形態１の映像処理装置の構成を
説明するためのブロック図である。FIG. 1 is a block diagram illustrating a configuration of a video processing device according to a first embodiment of the present invention.

【図２】本発明の実施の形態２の映像処理装置の構成を
説明するためのブロック図である。FIG. 2 is a block diagram illustrating a configuration of a video processing device according to a second embodiment of the present invention.

【図３】本発明の実施の形態２の他の映像処理装置の構
成を説明するためのブロック図である。FIG. 3 is a block diagram for explaining a configuration of another video processing device according to the second embodiment of the present invention;

【図４】本発明の実施の形態３の映像処理装置の構成を
説明するためのブロック図である。FIG. 4 is a block diagram illustrating a configuration of a video processing device according to a third embodiment of the present invention.

[Explanation of symbols]

１０１映像入力部１０２音声認識部１０３キーワード抽出部１０４キーワード記録部２０１タイトル選択部２０２タイトル記録部４０１キーワード・タイトル照合部４０２キーワード・タイトル記録部４０３登録部 Reference Signs List 101 video input unit 102 voice recognition unit 103 keyword extraction unit 104 keyword recording unit 201 title selection unit 202 title recording unit 401 keyword / title collation unit 402 keyword / title recording unit 403 registration unit

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/10 Ｇ１０Ｌ 3/00 ５５１Ｇ 15/00 Ｈ０４Ｎ 5/91 ＮＨ０４Ｎ 5/93 5/93 Ｚ Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat II (Reference) G10L 15/10 G10L 3/00 551G 15/00 H04N 5/91 N H04N 5/93 5/93 Z

Claims

[Claims]

1. A video processing apparatus for editing and retrieving a structured video, comprising: a video input unit for inputting the structured video; and recognizing audio of the video input to the video input unit. Voice recognition means; keyword extraction means for extracting a keyword from the voice recognized by the voice recognition means; and keyword recording means for recording the keyword together with the correspondence to the structured video. Video processing device.

2. A title selecting unit for selecting a title of the structured video from one of the keywords extracted by the keyword extracting unit, and a correspondence between the title and the structured video. The video processing apparatus according to claim 1, further comprising: a title recording unit that records the title and the title together.

3. The title selection unit according to claim 1, wherein the title selection unit selects a keyword having a high appearance frequency in a predetermined unit from among the keywords extracted by the keyword extraction unit as a title. Video processing device.

4. The apparatus according to claim 1, further comprising a candidate registering means for registering a keyword candidate or a title candidate in advance, wherein information included in the structured video sound includes a keyword or a keyword registered in the candidate registering means. The video processing apparatus according to claim 1, further comprising: a candidate selecting unit that selects the information as a keyword or a title when the information including the title is included.

5. The video processing apparatus according to claim 4, wherein said registering means registers keyword candidates or title candidates with selection priorities.

6. The video processing apparatus according to claim 1, wherein the keyword recording unit records the keyword as text data or audio data.

7. The video processing apparatus according to claim 2, wherein the title recording unit records the title as text data or audio data.