JP3537753B2

JP3537753B2 - Editing processing device and storage medium storing editing processing program

Info

Publication number: JP3537753B2
Application number: JP2000272596A
Authority: JP
Inventors: 敦西土
Original assignee: 株式会社ジャストシステム
Priority date: 2000-09-08
Filing date: 2000-09-08
Publication date: 2004-06-14
Anticipated expiration: 2020-09-08
Also published as: JP2002084492A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、編集処理装置、及
び編集処理プログラムが記憶された記憶媒体に関し、更
に詳細には、内容を適切に表したサムネイルを付すこと
のできる映像の編集処理装置、及び編集処理プログラム
が記憶された記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an editing processing device and a storage medium storing an editing processing program, and more particularly to a video editing processing device capable of attaching a thumbnail appropriately representing the content. And a storage medium storing an editing processing program.

【０００２】[0002]

【従来の技術】従来、編集処理装置、及び編集処理プロ
グラムが記憶された記憶媒体の該編集処理プログラムに
よる編集処理においては、内容毎や所定時間毎に区切ら
れた一まとまりの動画の画像データについて、静止画像
等のサムネイルを付すことが行われている。動画の内容
毎の区切りはシーンブレイクとして検出されている。こ
のシーンブレイクは、図７（ａ）に示すように、画像を
含まない黒いフレーム（ブラックアウト）が入れられて
いる場合、図７（ｂ）に示すように、ブラックアウトを
含まずに突然シーンが遷移する場合、図７（ｃ）に示す
ように、特殊効果やカメラ等の撮像装置側によりに少し
ずつ滑らかにシーンが遷移する場合がある。従来の編集
処理では、これらのシーンブレイクを検出することによ
り一連のシーンを内容的に一まとまりの動画として特定
する。2. Description of the Related Art Conventionally, in an editing process by an editing apparatus and an editing program of a storage medium storing the editing program, image data of a group of moving images divided for each content or for a predetermined time is used. It is common to attach thumbnails such as still images. The break for each moving image content is detected as a scene break. In this scene break, as shown in FIG. 7A, when a black frame (blackout) not including an image is inserted, as shown in FIG. When the scene transitions, as shown in FIG. 7C, the scene may occasionally transition gradually and gradually depending on the special effects or the imaging device side such as a camera. In a conventional editing process, a series of scenes is identified as a group of moving images by detecting these scene breaks.

【０００３】そして、このように特定された内容的に一
まとまりの動画や、時間で区切られ一まとまりとされた
動画の画像データのうちから、時間的に最初の画像デー
タをサムネイルの画像データとして採用している。編集
者は、このサムネイルを参照して、一まとまり毎の画像
データを連続させたり、順序を変更する等の操作を行
う。[0003] The image data of the temporally first image data among the group of moving image data and the group of moving image data separated by time is specified as thumbnail image data. Has adopted. The editor refers to the thumbnails and performs operations such as continuation of the image data of each group and change of the order.

【０００４】[0004]

【発明が解決しようとする課題】しかし、上述のように
一まとまりの動画のうちの最初の画像をサムネイルとす
る場合には、このサムネイルが、一まとまりの動画の内
容を必ずしも適切に表したものとならない場合があり、
編集を行う場合に、内容を特定し難く、処理操作の円滑
性を欠く可能性がある。例えば、ニュース放送では、よ
く各トピックの始めにアナウンサーや記者が大きく映し
出され、原稿を読み始めてから、トピックに関連する映
像や背景を表示する。従って、各トピックを一まとまり
とした場合に、サムネイルには、内容とは無関係に各ま
とまりにアナウンサーや記者の画像が割り当てられてし
まう。ニュース放送以外であっても、最も重要な内容の
部分がサムネイルとはならない可能性があるのはもちろ
んである。更に、フェードインやワイプ、カメラによる
ズームやチルト等の、滑らかなシーン遷移では、シーン
ブレイクが適切に自動認識されない場合があり、遷移途
中の画像がサムネイルになったり、前のシーンの最後の
画像がサムネイルとなってしまう可能性もある。However, when the first image of a group of moving images is used as a thumbnail as described above, the thumbnail does not necessarily appropriately represent the contents of the group of moving images. May not be
When editing, it is difficult to identify the contents, and there is a possibility that the processing operation may lack smoothness. For example, in a news broadcast, an announcer or a reporter is often shown at the beginning of each topic, and after reading a manuscript, an image or background related to the topic is displayed. Therefore, when each topic is grouped, an image of an announcer or a reporter is assigned to each group of thumbnails regardless of the content. Even if it is not a news broadcast, the part of the most important content may not be a thumbnail. Furthermore, in smooth scene transitions such as fade-in, wipe, camera zoom and tilt, scene breaks may not be automatically recognized properly, and the image in transition may become a thumbnail or the last image of the previous scene. May become a thumbnail.

【０００５】尚、内容を適切に表した画像をサムネイル
とする手法としては、一まとまりの画像の各コマを一覧
表示させ、その中から操作者に手動でサムネイルを選択
させることが考えられる。しかし、各まとまりのコマは
多数となる場合が多く、それらのコマを一覧しその中か
ら選択をする作業は手間がかかる問題点がある。[0005] As a method of using an image whose content is appropriately represented as a thumbnail, it is conceivable to display a list of each frame of a group of images and let the operator manually select a thumbnail from the list. However, there are many cases where the number of frames in each group is large, and there is a problem that the work of listing those frames and selecting from among them is troublesome.

【０００６】本発明は、上述のような課題を解決するた
めになされたもので、内容を適切に表したサムネイルを
容易に付すことのできる映像の編集処理装置、及び編集
処理プログラムが記憶された記憶媒体を提供することを
目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and has a video editing processing apparatus and an editing processing program capable of easily attaching a thumbnail appropriately representing the content. It is intended to provide a storage medium.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、一まとまりの動画の画像データを取得す
る画像データ取得手段と、前記動画に対応付けられた音
声の音声データを変換した文書データを取得する文書デ
ータ取得手段と、前記文書データ取得手段で取得した前
記文書データを、所定の単位で区分する区分手段と、前
記文書データ取得手段で取得した前記文書データの形態
素解析を行い、前記区分に含まれる重要語から各区分毎
の重要度を算出する重要度算出手段と、前記重要度算出
手段で算出した前記重要度が高い区分の文書データを、
重要区分として取得する重要区分取得手段と、前記重要
区分の文書データに対応する前記音声データに対応付け
られた対応画像データから、所定の画像データを、重要
画像データとして取得する重要画像データ取得手段と、
前記重要画像データ取得手段で取得した前記重要画像デ
ータを、前記画像データ取得手段で取得した一まとまり
の前記画像データと対応付ける対応手段とを備える編集
処理装置（第１の構成）を提供することにより前記目的
を達成するものである。In order to achieve the above object, the present invention provides an image data acquiring means for acquiring a set of image data of a moving image, and a device for converting audio data of sound associated with the moving image. a document data obtaining means for obtaining document data, the document data acquired by the document data acquiring means, a dividing means for dividing a predetermined unit, before
Form of the document data obtained by the document data obtaining means
Perform elementary analysis, and the importance calculation means for calculating the importance of each segment from the key word included in said section, the document data of said calculated by the importance calculation <br/> means the high importance divided ,
Important section obtaining means for obtaining the important section, and associating the voice data with the document data of the important section
Important image data obtaining means for obtaining predetermined image data as important image data from the corresponding image data obtained;
By providing an editing processing device (first configuration) including a correspondence unit that associates the important image data acquired by the important image data acquisition unit with a set of the image data acquired by the image data acquisition unit. This achieves the above object.

【０００８】前記画像データ取得手段は、外部の撮像装
置によって取得された画像データを当該撮像装置や他の
コンピュータからインターネット等の公衆回線や専用回
線を介したりケーブル接続等により通信で、また、種々
記憶媒体からインターフェイスを介して、取得するもの
とすることができる。更に、前記画像データ取得手段
は、画像撮像部（カメラ）を備え、撮像により画像デー
タを取得するようにしてもよい。前記文書データ取得手
段は、前記動画に伴う音声の音声データが外部の音声認
識装置によって音声認識され文書データに変換されたも
のを、他のコンピュータからインターネット等の公衆回
線や専用回線を介したりケーブル接続等により通信で、
また、種々記憶媒体を介して、取得するものとすること
ができる。また、前記文書データ取得手段は、該文書デ
ータ取得手段自身の内部に音声認識手段を備え、外部の
撮像装置に内臓される録音装置や外部の撮像装置と共に
使用された録音装置から、前記動画に伴う音声の音声デ
ータを、インターネット等の公衆回線や専用回線を介し
たりケーブル接続等により通信で、また、種々記憶媒体
を介して取得し、取得した音声データを音声認識により
変換して変換結果としての文書データを取得するものと
してもよい。更に、音声認識手段に加えて音声入力部
（マイク）を備え、前記動画に伴う音声をひろい、音声
データを音声認識手段により文書データに変換して文書
データを取得するようにしてもよい。一まとまりの動画
の画像データの「一まとまり」は、サムネイルを作成す
る単位である。この一まとまりは、画像データを分析す
ることでシーンブレイクを検出する等、自動的に内容的
に統一性のあるまとまりで区切って取得するものとする
ことができる。また、所定の時刻や時間によって自動的
に区切ったまとまりとすることもできる。更に、操作者
が手動によって区切ったまとまりであってもよい。前記
区分手段により一まとまりの画像データに対応する文書
データが更に区分される区分の単位は、１または所定数
の文ごと、句ごと、文節ごと、単語ごと等、意味内容が
失われず各区分の重要度を特定可能な大きさの単位であ
る。この区分の大きさは、区分前の一まとまりの文書の
大きさ等にもよるが、重要画像として、重要区分に対応
する画像データ全体ではなくそのうち一部の画像データ
を抽出する場合には、各区分毎に重要度を特定可能な大
きさの範囲内で、なるべく小さい単位であることが好ま
しい。重要区分に対応する画像データ全体から、一部の
画像データを抽出する場合に、重要区分の文書を適切に
表した画像データが重要画像として抽出される可能性が
極めて高くなるからである。前記重要区分取得手段は、
重要度が最も高い区分を取得しても、重要度の高い区分
を上から所定数取得しても、重要度が所定の高さ以上の
ものを区分数にかかわらず取得するものであってもよ
い。前記重要画像取得手段は、前記重要区分に対応する
対応画像データから、重要画像データとして、対応画像
データ全体を取得するものとすることができる。また、
対応画像データのうちから、例えば時間的に真ん中にな
る１秒間の画像データを取得する等所定間の動画の画像
データを取得することもできる。また、これらのように
一連の動画を構成する画像データではなく、１つの静止
画像分の画像データを取得するものとしてもよい。ま
た、画像データと同様に、重要画像に対応する音声デー
タや、この音声データから変換された文書データを、前
記重要画像データとともに、前記一まとまりの画像デー
タに対応付けておいてもよい。The image data acquisition means communicates image data acquired by an external imaging device from the imaging device or another computer via a public line or a dedicated line such as the Internet, a cable connection, or the like. It can be obtained from a storage medium via an interface. Further, the image data acquisition means may include an image imaging unit (camera), and acquire the image data by imaging. The document data obtaining means converts the sound data of the sound accompanying the moving image, which is recognized by an external sound recognition device and converted into document data, from another computer via a public line such as the Internet or a dedicated line, or a cable. By communication such as connection,
In addition, it can be obtained through various storage media. Further, the document data acquisition unit includes a voice recognition unit inside the document data acquisition unit itself, and converts the moving image from a recording device incorporated in an external imaging device or a recording device used with an external imaging device. Acquisition of the accompanying audio data via a public line or a dedicated line such as the Internet, communication via a cable connection, or via various storage media, and conversion of the obtained audio data by voice recognition as a conversion result. May be obtained. Further, a voice input unit (microphone) may be provided in addition to the voice recognition means, the voice accompanying the moving image may be provided, and the voice data may be converted into document data by the voice recognition means to obtain the document data. The “unit” of the image data of a group of moving images is a unit for creating a thumbnail. This unit can be automatically obtained by dividing the unit into units having uniform contents, for example, by detecting a scene break by analyzing image data. In addition, it is also possible to form a unit automatically divided according to a predetermined time or time. Further, the unit may be a group separated by an operator manually. The unit of the division into which the document data corresponding to the set of image data is further divided by the division unit is one or a predetermined number of sentences, phrases, phrases, words, and the like. It is a unit of size that can specify importance. Although the size of this section depends on the size of a group of documents before the section, etc., when extracting not the entire image data corresponding to the important section but a part of the image data as an important image, It is preferable that the unit is as small as possible within a range in which the importance can be specified for each section. This is because, when extracting a part of the image data from the entire image data corresponding to the important section, the possibility that the image data appropriately representing the document of the important section is extracted as the important image becomes extremely high. The important category acquisition means,
Even if the category with the highest importance is acquired, a predetermined number of the categories with the highest importance are acquired from the top, and those with the importance equal to or higher than the predetermined height are acquired regardless of the number of the categories Good. The important image acquiring means may acquire the entire corresponding image data as important image data from the corresponding image data corresponding to the important section. Also,
From the corresponding image data, for example, image data of a moving image for a predetermined period of time can be obtained, such as obtaining image data for one second that is temporally centered. Further, instead of the image data forming a series of moving images as described above, image data for one still image may be obtained. Further, similarly to the image data, audio data corresponding to the important image and document data converted from the audio data may be associated with the group of image data together with the important image data.

【０００９】上述の本発明の編集処理装置では、画像に
対応する音声に基づいて重要度の高い区分を割り出し、
この重要度の高い区分の画像データを重要画像のデータ
として、一まとまりの画像全体に対応づける。したがっ
て、この重要画像のデータを、一まとまりの画像のサム
ネイルとすることにより、内容的に重要な音声に対応
し、内容的に重要と推測される画像の画像データが、サ
ムネイルとなり、内容を適切に表したサムネイルを付す
ことが可能である。In the above-described editing processing device of the present invention, a section having a high importance is determined based on a sound corresponding to an image.
The image data of the high importance section is associated with the entire group of images as important image data. Therefore, by making the data of the important image a thumbnail of a group of images, the image data of the image that is presumed to be important in content corresponds to the audio that is important in content, and the thumbnail is used as the thumbnail. Can be attached.

【００１０】本発明の編集処理装置は、前記第１の構成
の編集処理装置において、前記重要画像データ取得手段
は、前記重要画像データとして、前記対応画像データか
ら１つの静止画像データを取得する重要静止画像データ
取得手段である編集処理装置（第２の構成）とすること
ができる。この第２の構成の編集処理装置では、サムネ
イルとして、静止画像を得ることができる。本発明の編
集処理装置は、前記第１または第２の編集処理装置にお
いて、前記重要区分取得手段は、複数の前記重要区分を
取得し、前記重要画像データ取得手段は、前記重要区分
取得手段で取得した複数の前記重要区分それぞれに対応
する対応画像データから候補画像データを取得する候補
画像データ取得手段と、前記候補画像データ取得手段で
取得した候補画像データを画像出力させる候補画像出力
手段と、前記候補画像出力手段で出力された候補画像の
うちから選択された１つを取得する選択取得手段とを備
え、前記選択取得手段で取得した選択された候補画像の
候補画像データを前記重要画像データとする編集処理装
置（第３の構成）とすることができる。この第３の構成
の編集処理装置では、複数の重要画像候補の中から１つ
を操作者に選択させることによって、操作者の判断を加
えることによって、より確実に、一まとまりの画像の内
容を適切に表し且つ操作者に分かりやすい画像を、重要
画像として特定することが可能となる。また、重要区分
に対応する画像データを、一まとまりの画像の内容を表
す画像の画像データの候補として予め自動的に選択して
いるので、操作者は少ない手間で、１つの重要画像を選
び出すことが可能である。本発明の編集処理装置は、前
記第１から第３の編集処理装置において、複数のシーン
を含む動画の画像データを、各シーン毎に分割する画像
データ分割手段を備え、前記画像データ取得手段は、前
記画像データ分割手段による分割に従って、各シーン毎
の画像データをそれぞれ前記一まとまりの動画の画像デ
ータとして取得する編集処理装置（第４の構成）とする
ことができる。この第４の構成の編集処理装置では、取
得した画像データが画像データ分割手段によって複数の
まとまりに分割され、このまとまり毎に、重要画像デー
タが対応付けられる。前記画像データ分割手段は、シー
ンブレイク検出手段を含み画像データをシーンブレイク
毎に分割するものとすることができる。また、画像デー
タを所定の時間間隔毎に分割するものとしてもよい。更
に、画像データ分割手段は、文書データ取得手段で取得
した文書データから、文書の内容を分析し文書データの
文書を意味内容に従って段落や章等の複数の文書に分割
し、文書データとの対応から画像データを分割するもの
としてもよい。In the editing processing apparatus according to the present invention, in the editing processing apparatus having the first configuration, the important image data obtaining means obtains one still image data from the corresponding image data as the important image data. An editing processing device (second configuration) that is still image data acquisition means can be provided. In the editing processing device having the second configuration, a still image can be obtained as a thumbnail. In the editing processing apparatus according to the present invention, in the first or second editing processing apparatus, the important section obtaining unit obtains a plurality of the important sections, and the important image data obtaining unit includes the important section obtaining unit. Candidate image data acquisition means for acquiring candidate image data from corresponding image data corresponding to each of the plurality of acquired important sections, and candidate image output means for outputting an image of the candidate image data acquired by the candidate image data acquisition means, Selecting and obtaining means for obtaining one selected from the candidate images output by the candidate image output means, and converting the candidate image data of the selected candidate image obtained by the selecting and obtaining means into the important image data. (Third configuration). In the editing processing device of the third configuration, the operator can select one of a plurality of important image candidates, and by adding the judgment of the operator, the content of a group of images can be more reliably determined. An image that is appropriately represented and easy for the operator to understand can be specified as an important image. In addition, since the image data corresponding to the important section is automatically selected in advance as a candidate for image data of an image representing the contents of a group of images, the operator can select one important image with little effort. Is possible. The editing processing apparatus of the present invention, in the first to third editing processing apparatuses, further includes an image data dividing unit that divides image data of a moving image including a plurality of scenes for each scene. According to a fourth aspect of the present invention, there is provided an editing apparatus (fourth configuration) for acquiring image data of each scene as image data of a group of moving images according to the division by the image data dividing unit. In the editing processing device having the fourth configuration, the acquired image data is divided into a plurality of units by the image data dividing unit, and important image data is associated with each of the units. The image data dividing means may include a scene break detecting means and divide the image data for each scene break. Further, the image data may be divided at predetermined time intervals. Further, the image data dividing unit analyzes the contents of the document from the document data acquired by the document data acquiring unit, divides the document of the document data into a plurality of documents such as paragraphs and chapters according to the meaning and content, and handles the document data. May be used to divide the image data.

【００１１】本発明は、一まとまりの動画の画像データ
を取得する画像データ取得機能と、前記動画に対応付け
られた音声の音声データを変換した文書データを取得す
る文書データ取得機能と、前記文書データ取得機能で取
得した前記文書データを、所定の単位で区分する区分機
能と、前記文書データ取得機能で取得した前記文書デー
タの形態素解析を行い、前記区分に含まれる重要語から
各区分毎の重要度を算出する重要度算出機能と、前記重
要度算出機能で算出した前記重要度が高い区分の文書デ
ータを、重要区分として取得する重要区分取得機能と、
前記重要区分の文書データに対応する前記音声データに
対応付けられた対応画像データから、所定の画像データ
を、重要画像データとして取得する重要画像データ取得
機能と、前記重要画像データ取得機能で取得した前記重
要画像データを、前記画像データ取得機能で取得した一
まとまりの前記画像データと対応付ける対応機能とをコ
ンピュータに実現させるためのコンピュータ読み取り可
能な編集処理プログラムが記憶された記憶媒体（第５の
構成）を提供することにより前記目的を達成するもので
ある。また、前記画像データ取得機能と、前記文書デー
タ取得機能と、前記区分機能と、前記重要度取得機能
と、前記重要区分取得機能と、前記重要画像データ取得
機能と、前記対応機能とを実現するための編集処理プロ
グラム、編集処理プログラム伝送媒体、編集処理プログ
ラム搬送波、編集処理プログラム信号、またはプログラ
ム製品としてもよい。ここで、プログラム製品には、編
集処理プログラムによる前記各機能を実現する記憶媒
体、サーバシステムコンピュータ、及びコンピュータシ
ステム等を含む。前記画像データ取得機能は、外部の撮
像装置によって取得された画像データをインターネット
等の公衆回線や専用回線を介したりケーブル接続等によ
り通信で、また、種々記憶媒体を介して、取得するもの
とすることができる。更に、前記画像データ取得機能
は、画像撮像部（カメラ）を備え、撮像により画像デー
タを取得するようにしてもよい。前記文書データ取得機
能は、前記動画に伴う音声の音声データが外部の音声認
識装置によって音声認識され文書データに変換されたも
のを、インターネット等の公衆回線や専用回線を介した
りケーブル接続等により通信で、また、種々記憶媒体を
介して、取得するものとすることができる。また、前記
文書データ取得機能は、音声認識機能を含み、外部か
ら、前記動画に伴う音声の音声データを、インターネッ
ト等の公衆回線や専用回線を介したりケーブル接続等に
より通信で、また、種々記憶媒体を介して、取得するも
のとしてもよい。更に、音声認識機能に加えて音声入力
部（マイク）から、前記動画に伴う音声から音声データ
を取得し、音声認識機能により文書データに変換するよ
うにしてもよい。一まとまりの動画の画像データの「一
まとまり」は、サムネイルを作成する単位である。この
一まとまりは、画像データを分析することでシーンブレ
イクを検出する等、自動的に内容的に統一性のあるまと
まりで区切って取得するものとすることができる。ま
た、操作者が手動によって区切ったまとまりとすること
もできる。更に、所定の時刻や時間によって自動的に区
切ったまとまりであってもよい。前記区分機能により一
まとまりの画像データに対応する文書データが更に区分
される区分の単位は、１または所定数の文ごと、句ご
と、文節ごと、単語ごと等、意味内容が失われず各区分
の重要度を特定可能な大きさの単位である。この区分の
大きさは、区分前の一まとまりの文書の大きさ等にもよ
るが、重要画像として、重要区分に対応する画像データ
全体ではなくそのうち一部の画像データを抽出する場合
には、各区分毎に重要度を特定可能な大きさの範囲内
で、なるべく小さい単位であることが好ましい。重要区
分に対応する画像データ全体から、一部の画像データを
抽出する場合に、重要区分の文書を適切に表した画像デ
ータが重要画像として抽出される可能性が極めて高くな
るからである。前記重要区分取得機能は、重要度が最も
高い区分を取得しても、重要度の高い区分を上から所定
数取得しても、重要度が所定の高さ以上のものを区分数
にかかわらず取得するものであってもよい。前記重要画
像取得機能は、前記重要区分に対応する画像データとし
て、重要区分内の画像データ全体を取得するものとする
ことができる。また、重要区分内の画像データのうちか
ら、例えば時間的に真ん中になる１秒間の画像データを
取得する等所定の画像データを取得する事もできる。ま
た、これらのように一連の動画を構成する画像データで
はなく、１つの静止画像分の画像データを取得するもの
としてもよい。また、画像データと同様に、重要区分内
の音声データや、この音声データから変換された文書デ
ータを、前記重要区分に対応する画像データとともに、
前記一まとまりの画像データに対応付けておいてもよ
い。本発明の編集処理プログラムが記憶された記憶媒
体、及び前記編集処理プログラム、編集処理プログラム
伝送媒体、編集処理プログラム搬送波、編集処理プログ
ラム信号、またはプログラム製品によれば、各種コンピ
ュータにプログラムを実装することによって、前記第１
の構成の編集処理装置を実現することができる。According to the present invention, there is provided an image data obtaining function for obtaining a set of moving image data, a document data obtaining function for obtaining document data obtained by converting sound data of sound associated with the moving image, A classification function for dividing the document data acquired by the data acquisition function into predetermined units, and a document data acquisition function
A morphological analysis of the data, and an importance calculation function for calculating the importance of each of the categories from the important words included in the category, and a document of the category with a high importance calculated by the importance calculation function An important category acquisition function that acquires data as important categories,
From the corresponding image data which is correlated to the audio data corresponding to the document data of the important segment, the predetermined image data, and important image data acquisition function of acquiring the key image data, acquired by the important image data acquisition function A storage medium storing a computer-readable editing processing program for causing a computer to realize a function of associating the important image data with the set of image data acquired by the image data acquisition function (fifth configuration) The above-mentioned object is achieved by providing). Further, the image data acquiring function, the document data acquiring function, the sorting function, the importance acquiring function, the important category acquiring function, the important image data acquiring function, and the corresponding function are realized. Processing program, editing processing program transmission medium, editing processing program carrier wave, editing processing program signal, or program product. Here, the program product includes a storage medium, a server system computer, a computer system, and the like for realizing each of the functions according to the editing processing program. The image data acquisition function is to acquire image data acquired by an external imaging device by communication via a public line or a dedicated line such as the Internet, by a cable connection, or the like, and via various storage media. be able to. Further, the image data acquisition function may include an image imaging unit (camera) and acquire image data by imaging. The document data acquisition function communicates the sound data of the sound accompanying the moving image, which is recognized by an external sound recognition device and converted into document data, via a public line or a dedicated line such as the Internet, or by a cable connection or the like. In addition, it can be obtained through various storage media. The document data acquisition function includes a voice recognition function, and externally communicates voice data of voice accompanying the moving image via a public line or a dedicated line such as the Internet, a cable connection, or the like, and various types of storage. It may be obtained via a medium. Furthermore, in addition to the voice recognition function, voice data may be acquired from the voice accompanying the moving image from a voice input unit (microphone) and converted into document data by the voice recognition function. The “unit” of the image data of a group of moving images is a unit for creating a thumbnail. This unit can be automatically obtained by dividing the unit into units having uniform contents, for example, by detecting a scene break by analyzing image data. In addition, it is also possible for the operator to manually set the unit. Further, the unit may be a unit automatically separated by a predetermined time or time. The unit of the division into which the document data corresponding to the group of image data is further divided by the division function is one or a predetermined number of sentences, phrases, phrases, words, etc. It is a unit of size that can specify importance. Although the size of this section depends on the size of a group of documents before the section, etc., when extracting not the entire image data corresponding to the important section but a part of the image data as an important image, It is preferable that the unit is as small as possible within a range in which the importance can be specified for each section. This is because, when extracting a part of the image data from the entire image data corresponding to the important section, the possibility that the image data appropriately representing the document of the important section is extracted as the important image becomes extremely high. The important category acquisition function, even if the category with the highest importance is acquired, even if a predetermined number of categories with high importance are acquired from the top, regardless of the number of categories, the importance is equal to or higher than the predetermined height. It may be acquired. The important image acquisition function may acquire the entire image data in the important section as image data corresponding to the important section. Also, from the image data in the important section, it is possible to acquire predetermined image data such as, for example, acquiring image data for one second which is in the middle in time. Further, instead of the image data forming a series of moving images as described above, image data for one still image may be obtained. Also, like the image data, the audio data in the important section and the document data converted from the audio data, together with the image data corresponding to the important section,
The image data may be associated with the set of image data. According to the storage medium storing the editing processing program of the present invention, and the editing processing program, the editing processing program transmission medium, the editing processing program carrier wave, the editing processing program signal, or the program product, the program is mounted on various computers. By the first
It is possible to realize an editing processing device having the above configuration.

【００１２】前記第５の構成の編集処理プログラムが記
憶された記憶媒体は、前記重要画像取得機能は、前記重
要画像の画像データとして、前記重要区分内の画像デー
タから１つの静止画像データを取得する重要静止画像取
得機能を含むものとすることができる（第６の構成）。
前記第５の構成及び第６の構成の編集処理プログラムが
記憶された記憶媒体は、前記重要区分取得機能は、複数
の前記重要区分を取得し、前記重要画像データ取得機能
は、前記重要区分取得機能で取得した複数の前記重要区
分それぞれに対応する対応画像データから候補画像デー
タを取得する候補画像データ取得機能と、前記候補画像
データ取得機能で取得した候補画像データを画像出力さ
せる候補画像出力機能と、前記候補画像出力機能で出力
された候補画像のうちから選択された１つを取得する選
択取得機能とを備え、前記選択取得機能で取得した選択
された候補画像の候補画像データを前記重要画像データ
とする編集処理プログラムが記憶された記憶媒体（第７
の構成）とすることができる。また、この編集処理プロ
グラム、編集処理プログラム伝送媒体、編集処理プログ
ラム搬送波、編集処理プログラム信号、またはプログラ
ム製品でもよい。前記第５から第７のうちのいずれか１
の構成の編集処理プログラムが記憶された記憶媒体は、
複数のシーンを含む動画の画像データを、各シーン毎に
分割する画像データ分割機能を備えさせ、前記画像デー
タ取得機能は、前記画像データ分割機能による分割に従
って、各シーン毎の画像データをそれぞれ前記一まとま
りの動画の画像データとして取得する編集処理プログラ
ムが記憶された記憶媒体（第８の構成）とすることがで
きる。また、この編集処理プログラム、編集処理プログ
ラム伝送媒体、編集処理プログラム搬送波、編集処理プ
ログラム信号、またはプログラム製品でもよい。[0012] In the storage medium storing the editing program according to the fifth aspect, the important image obtaining function obtains one still image data from the image data in the important section as the image data of the important image. (A sixth configuration).
In the storage medium storing the editing processing program according to the fifth configuration and the sixth configuration, the important section acquisition function acquires the plurality of important sections, and the important image data acquisition function acquires the important section acquisition. A candidate image data acquisition function for acquiring candidate image data from corresponding image data corresponding to each of the plurality of important sections acquired by the function, and a candidate image output function for outputting the candidate image data acquired by the candidate image data acquisition function to an image And a selection obtaining function for obtaining one selected from the candidate images output by the candidate image output function, wherein the candidate image data of the selected candidate image obtained by the selection obtaining function is A storage medium storing an editing processing program to be image data (the seventh storage medium)
Configuration). Further, the editing processing program, the editing processing program transmission medium, the editing processing program carrier wave, the editing processing program signal, or the program product may be used. Any one of the fifth to seventh aspects
The storage medium storing the editing processing program having the configuration of
The image data of a moving image including a plurality of scenes is provided with an image data dividing function of dividing the image data for each scene.The image data acquiring function separates the image data of each scene according to the division by the image data dividing function. It may be a storage medium (eighth configuration) in which an editing processing program to be obtained as a group of moving image image data is stored. Further, the editing processing program, the editing processing program transmission medium, the editing processing program carrier wave, the editing processing program signal, or the program product may be used.

【００１３】[0013]

【発明の実施の形態】以下、本発明の編集処理装置、及
び編集処理プログラムが記憶された記憶媒体の好適な実
施の形態について、図１から図６を参照して詳細に説明
する。図１は本発明の編集処理装置の一実施形態の構成
であり、本発明の編集処理プログラムが記憶された記憶
媒体の該プログラムが読みとられたコンピュータの構成
を、概念的に表したものである。この概念構成図に示さ
れるように、編集処理装置（コンピュータ）は、入力手
段１、画像データ取得手段２、文書データ取得手段３、
区分手段４、重要度取得手段５、重要区分取得手段６、
重要画像データ取得手段８、対応手段９、及び出力手段
１０を備えている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of an edit processing apparatus and a storage medium storing an edit processing program according to the present invention will be described below in detail with reference to FIGS. FIG. 1 is a conceptual diagram showing the configuration of an embodiment of an editing processing apparatus according to the present invention, and is a conceptual diagram showing the configuration of a computer that has read the editing processing program of the present invention from a storage medium storing the program. is there. As shown in the conceptual block diagram, the editing processing device (computer) includes an input unit 1, an image data obtaining unit 2, a document data obtaining unit 3,
Classification means 4, importance acquisition means 5, important division acquisition means 6,
An important image data acquisition unit 8, a correspondence unit 9, and an output unit 10 are provided.

【００１４】入力手段１は、ユーザが編集処理装置に行
わせる各種処理についての命令を入力したり、データを
選択するためのものであり、キーボード、マウス、マイ
ク及びこのマイクからの音声の音声認識装置等が含まれ
る。またこの入力手段１は、編集する画像データ、この
画像データに伴う音声データやこの音声データを音声認
識により変換した文書データ、等を取り込んだり、編集
する画像データを指定するためのものである。この入力
手段１は、マイクを具備するビデオカメラや外部のコン
ピュータ、ＣＤ−ＲＯＭやＤＶＤ等の補助記憶装置、そ
の他の外部装置から、直接、またはインターネット等の
回線網を介して、有線または無線接続により、データの
編集対象となる画像データ、及びこの画像データに伴う
音声データや音声データを音声認識により変換した文書
データを、通信手段を使用して、取得する。The input means 1 is used by a user to input commands for various processes to be performed by the editing processing apparatus and to select data. The input means 1 includes a keyboard, a mouse, a microphone, and voice recognition of voice from the microphone. Equipment etc. are included. The input means 1 is for inputting image data to be edited, audio data accompanying the image data, document data obtained by converting the audio data by voice recognition, and the like, and for specifying image data to be edited. The input unit 1 is connected to a video camera equipped with a microphone, an external computer, an auxiliary storage device such as a CD-ROM or a DVD, or another external device directly or via a line network such as the Internet. Thus, the image data to be edited, the voice data accompanying the image data, and the document data obtained by converting the voice data by voice recognition are obtained using the communication unit.

【００１５】画像データ取得手段２は、一まとまりの動
画の画像データを取得する画像データ取得処理を行う。
画像データ取得手段２は、本実施形態においては、複数
のシーンを含む動画の画像データを、各シーン毎に分割
する画像データ分割処理を行う画像データ分割部２１を
備えている。この画像データ分割部２１は、入力手段１
によって編集の対象として指定され取得された画像デー
タを画像分析することによってブラックアウトやシーン
の遷移を検出する。ブラックアウトやシーンの遷移が検
出された場合には、取得した画像データが複数のシーン
を含むものとして、検出されたブラックアウトやシーン
の遷移を境界として各シーン毎に分割し、各シーンを一
まとまりの動画の画像データとする。これにより、画像
データ取得手段２は、入力手段１により編集の対象と指
定された画像データを、分割された１まとまり（シー
ン）毎に順次取得し出力してゆく。この画像データは、
音声データと共通するタイマによる時刻データを伴って
いる。The image data obtaining means 2 performs image data obtaining processing for obtaining image data of a group of moving images.
In the present embodiment, the image data acquisition unit 2 includes an image data division unit 21 that performs image data division processing of dividing image data of a moving image including a plurality of scenes for each scene. This image data dividing unit 21
By performing image analysis on the acquired image data designated as an editing target, blackouts and scene transitions are detected. When a blackout or a transition of a scene is detected, it is assumed that the acquired image data includes a plurality of scenes, and the detected blackout or a transition of the scene is divided for each scene as a boundary. It is set as image data of a group of moving images. As a result, the image data obtaining means 2 sequentially obtains and outputs the image data designated to be edited by the input means 1 for each divided unit (scene). This image data is
It is accompanied by time data by a timer which is common to the audio data.

【００１６】文書データ取得手段３は、前記動画に伴う
音声の音声データを音声認識により変換した文書データ
を、画像データに対応して取得する文書データ取得処理
を行う。文書データ取得手段３は、画像データ取得手段
２が１まとまり毎の画像データを取得すると、この画像
データに付された時刻データを参照して、この１まとま
り毎の画像データに対応する音声の音声認識結果の文書
データを取得する。この文書データは、画像データと共
通するタイマによる時刻データを伴っている。文書デー
タ取得手段３は、音声データを音声認識により文書デー
タに変換する音声認識部３１を含み、入力手段１におい
て、画像データに対応して音声データが取得されている
場合には、この音声データを音声認識により文書データ
に変換して、画像データ取得手段２による画像データの
まとまりに対応した文書データを取得する。入力手段１
において音声認識結果としての文書データが取得されて
いる場合には、この文書データから、画像データ取得手
段２による画像データのまとまりに対応する文書データ
を取り出して取得する。音声認識部３１は、音声波形等
の音声データを音声認識辞書と照合して音声認識結果を
文書データに変換したり、またはこれらを仮名漢字変換
するものである。この仮名漢字変換は、自然言語処理に
基づいた解析を加えて変換したものとすることができ
る。尚、入力手段１において音声認識後の文書データが
取得されている場合であっても、文書データ取得手段３
において独自の解析により再変換し新たな文書データを
取得するようにしてもよい。The document data obtaining means 3 performs a document data obtaining process of obtaining document data obtained by converting voice data of voice accompanying the moving image by voice recognition in correspondence with image data. When the image data obtaining means 2 obtains the image data for each group, the document data obtaining means 3 refers to the time data attached to the image data, and outputs the sound corresponding to the image data for each group. Get the document data of the recognition result. This document data is accompanied by time data by a timer common to the image data. The document data acquisition unit 3 includes a speech recognition unit 31 that converts speech data into document data by speech recognition. If the input unit 1 acquires speech data corresponding to image data, the speech data Is converted into document data by voice recognition, and the image data acquisition means 2 acquires document data corresponding to the group of image data. Input means 1
In the case where the document data is obtained as a result of the voice recognition, the document data corresponding to the group of the image data by the image data obtaining means 2 is extracted and obtained from the document data. The voice recognition unit 31 converts voice data such as a voice waveform into a voice recognition dictionary and converts the voice recognition result into document data, or converts them into kana-kanji characters. This kana-kanji conversion can be performed by adding an analysis based on natural language processing. Note that even if the input means 1 has acquired the document data after voice recognition, the document data acquiring means 3
May be re-converted by unique analysis to obtain new document data.

【００１７】区分手段４は、文書データ取得処理で取得
した前記文書データを、所定の単位で区分する区分処理
を行う。本実施形態においては、所定の単位は文であ
り、区分手段４は、文書データ取得手段３で取得した一
まとまり（各シーン）毎に、文書データを、各文単位に
区分する。重要度取得手段５は、区分手段４で区分され
た各区分について、重要度を取得する重要度取得処理を
行う。本実施形態においては、区分手段４で区分された
区分は、各文であり、重要度取得手段５は、各文につい
て重要度を取得する。重要度取得手段５は、１まとまり
の文書データに含まれる各文について、形態素解析を行
って、自立語、名詞句、複合名詞句等を含めた候補語
（句）を抽出し、抽出した候補語（句）の、一まとまり
中での出現頻度、評価関数から、各候補語（句）重要度
ｆ（ｘ）を決定する。ここで、評価関数としては、例え
ば、所定の重要語が予め指定されている場合にはその重
要語に対する重み付け、単語、名詞句、複合名詞句等の
候補語（句）の種類による重み付け等が使用される。そ
して、各文に出現する候補語（句）の重要度ｆ（ｘ）を
加算することによって、各文の重要度Ｆ（ｘ）とする。The classifying means 4 performs a classifying process for classifying the document data obtained in the document data obtaining process into predetermined units. In the present embodiment, the predetermined unit is a sentence, and the dividing unit 4 divides the document data into units of each sentence obtained by the document data obtaining unit 3 (each scene). The importance obtaining unit 5 performs an importance obtaining process for obtaining the importance for each of the sections divided by the dividing unit 4. In the present embodiment, the sections classified by the classifying unit 4 are each sentence, and the importance acquiring unit 5 acquires the importance of each sentence. The importance acquiring unit 5 performs a morphological analysis on each sentence included in a set of document data, extracts candidate words (phrases) including independent words, noun phrases, compound noun phrases, and the like, and extracts the extracted candidate words. Each candidate word (phrase) importance f (x) is determined from the appearance frequency of the word (phrase) in a group and the evaluation function. Here, as the evaluation function, for example, when a predetermined important word is specified in advance, weighting for the important word, weighting according to the type of a candidate word (phrase) such as a word, a noun phrase, a compound noun phrase, and the like are used. used. Then, the importance level f (x) of each sentence is added by adding the importance level f (x) of the candidate word (phrase) appearing in each sentence.

【００１８】重要区分取得手段６は、文書データの区分
のうち重要度取得処理で取得した重要度の高い重要区分
を取得する重要区分取得処理を行う。本実施形態におい
ては、重要区分は重要文であり、重要区分取得手段６
は、重要度取得手段５で取得された各文のうち重要度Ｆ
（ｘ）の高い文を、重要文として取得する。本実施形態
においては、重要度の高い順に所定の数の区分を取得す
る。重要画像データ取得手段８は、前記重要区分に対応
する対応画像データから、所定の画像データを、重要画
像データとして取得する重要画像データ取得処理を行
う。本実施形態においては、重要画像データ取得手段８
は、重要文に対応する対応画像データの中から、１つの
静止画像の画像データを重要画像データとして取得す
る。この重要画像データ取得手段８は、対応画像データ
取得部７を含んでいる。対応画像データ取得部７は、画
像データ取得処理で取得した画像データのうち、重要区
分の文書データに対応する画像データ（対応画像デー
タ）を取得する対応画像データ取得処理を行う。本実施
形態においては、区分は文であり、重要区分取得手段６
において重要文とされた各文にそれぞれ対して、画像デ
ータ取得手段２で取得した画像データの所定部分を対応
させる。文書データ取得手段３で取得された文書デー
タ、及び画像データ取得手段２で取得された画像データ
は、共通のタイマによる時間が記録されている。そし
て、対応取得手段７は、文書データの各区分開始時点と
終了時点から、画像データを同一の開始時点と終了時点
を有する画像データに区分し、対応させる。例えば、
「始めに、今日の主な項目です。」という文書データに
ついて、この区分の共通タイマによる開始時点が０．３
秒であり終了時点が１．５秒であれば、画像データのう
ち同じタイマによる０．３秒から１．５秒までの画像の
画像データが対応する。The important category acquiring means 6 performs an important category acquiring process for acquiring the important category of high importance acquired in the importance acquiring process among the document data categories. In the present embodiment, the important category is an important sentence, and the important category acquisition unit 6
Is the importance F of each sentence acquired by the importance acquisition means 5.
A sentence with a high (x) is acquired as an important sentence. In the present embodiment, a predetermined number of sections are acquired in descending order of importance. The important image data obtaining means 8 performs an important image data obtaining process of obtaining predetermined image data as important image data from the corresponding image data corresponding to the important section. In the present embodiment, the important image data acquisition means 8
Acquires the image data of one still image from the corresponding image data corresponding to the important sentence as important image data. The important image data acquisition means 8 includes a corresponding image data acquisition unit 7. The corresponding image data acquisition unit 7 performs a corresponding image data acquisition process of acquiring image data (corresponding image data) corresponding to document data of an important category from the image data acquired in the image data acquisition process. In the present embodiment, the division is a sentence, and the important division acquisition unit 6
A predetermined portion of the image data acquired by the image data acquiring means 2 is made to correspond to each sentence which is regarded as an important sentence. The document data obtained by the document data obtaining means 3 and the image data obtained by the image data obtaining means 2 have time recorded by a common timer. Then, the correspondence acquisition unit 7 divides the image data into image data having the same start time and end time based on the start time and end time of each section of the document data, and associates them. For example,
For the document data "Initially, today's main items."
If the time is seconds and the end point is 1.5 seconds, the image data of the image from 0.3 seconds to 1.5 seconds by the same timer among the image data corresponds.

【００１９】また本実施形態においては、重要画像デー
タ取得手段８は、候補画像データ取得部８１と、候補画
像出力部８２と、選択取得部８３とを備えている。候補
画像データ取得部８１では、重要区分取得手段６で取得
した複数の前記重要区分それぞれについて対応する画像
データを対応画像データ取得処理の処理結果として取得
し、各重要区分に対応する対応画像データそれぞれから
１つずつの静止画像の画像データを、重要画像の候補画
像の画像データとして取得する候補画像データ取得処理
を行う。１つの静止画像の画像データとしては、各重要
区分の開始時点から終了時点までの画像データのうち、
中間時点の静止画像の画像データを選択する。例えば、
上述の、「始めに、今日の主な項目です。」という文書
データに対応し、共通タイマによる開始時点が０．３秒
であり終了時点が１．５秒の画像の場合には、このタイ
マでの０．９秒の時点における画像データを、重要画像
の候補画像の画像データとする。In the present embodiment, the important image data obtaining means 8 includes a candidate image data obtaining unit 81, a candidate image output unit 82, and a selection obtaining unit 83. The candidate image data obtaining unit 81 obtains image data corresponding to each of the plurality of important sections obtained by the important section obtaining unit 6 as a processing result of the corresponding image data obtaining process, and obtains the corresponding image data corresponding to each important section. A candidate image data acquisition process for acquiring image data of still images one by one as image data of a candidate image of an important image is performed. As image data of one still image, of image data from the start time to the end time of each important section,
The image data of the still image at the intermediate time point is selected. For example,
In the case of the image data in which the start time by the common timer is 0.3 seconds and the end time is 1.5 seconds, this timer corresponds to the above-mentioned document data "First, today's main item." The image data at the time point of 0.9 seconds in is set as the image data of the candidate image of the important image.

【００２０】候補画像出力部８２は、候補画像取得処理
で取得した候補画像の画像データを出力手段１０から画
像出力させる候補画像出力処理を行う。選択取得部８３
は、出力手段１０に出力された候補画像の中から、１つ
の選択を取得する選択取得処理を行う。出力手段１０か
ら候補画像が画像出力されると、操作者は、１つの画像
を重要画像として選択し、この選択結果を入力手段１か
ら入力する。選択取得部８３は、この入力結果から、ど
の画像が重要画像として選択されたかを取得し、選択さ
れた画像を重要画像として決定する。The candidate image output unit 82 performs a candidate image output process for outputting image data of the candidate image acquired in the candidate image acquisition process from the output unit 10. Selection acquisition unit 83
Performs a selection acquisition process of acquiring one selection from among the candidate images output to the output unit 10. When a candidate image is output from the output unit 10, the operator selects one image as an important image and inputs the selection result from the input unit 1. The selection acquisition unit 83 acquires which image is selected as an important image from the input result, and determines the selected image as an important image.

【００２１】対応手段９は、重要画像取得処理で取得し
た前記重要画像の画像データを画像データ取得処理で取
得した一まとまりの画像データと対応付ける対応処理を
行う。本実施形態においては、更に、対応手段９は、重
要画像と、この重要画像が抽出された前記一まとまりの
画像データとの対応を、該対応が認識可能に出力手段１
０に表示する。例えば、一まとまりの画像データのファ
イル名と、重要画像とを並べて表示する等である。これ
により、静止画像が、一まとまりの動画の画像データの
サムネイルとして機能し、操作者は、静止画像によっ
て、一まとまりの画像データの内容を明確に知ることが
できる。The correspondence unit 9 performs a correspondence process of associating the image data of the important image acquired in the important image acquisition process with a set of image data acquired in the image data acquisition process. In the present embodiment, the correspondence unit 9 further outputs the correspondence between the important image and the set of image data from which the important image has been extracted so that the correspondence can be recognized.
Display at 0. For example, a file name of a group of image data and an important image are displayed side by side. Accordingly, the still image functions as a thumbnail of the image data of a group of moving images, and the operator can clearly know the contents of the group of image data from the still image.

【００２２】図２は、図１のように構成された編集処理
装置の、具体的なシステム構成を表したものである。図
２に示されるように、編集処理装置はパーソナルコンピ
ュータやパーソナルコンピュータを含むコンピュータシ
ステムによって構成される。編集処理装置は、図２に示
すようにシステム全体を制御するための制御部１１を備
えている。この制御部１１には、データバス等のバスラ
インを介して、入力手段１としてのキーボード１２やマ
ウス１３、出力手段としての表示装置１４や、印刷装置
１５、記憶装置１６、記憶媒体駆動装置１７、通信制御
装置１８、入出力Ｉ／Ｆ１９、音声認識装置２０が接続
されている。FIG. 2 shows a specific system configuration of the editing processing apparatus configured as shown in FIG. As shown in FIG. 2, the editing processing device is configured by a personal computer or a computer system including a personal computer. The editing processing apparatus includes a control unit 11 for controlling the entire system as shown in FIG. The control unit 11 includes a keyboard 12 and a mouse 13 as input means 1, a display device 14 as output means, a printing device 15, a storage device 16, and a storage medium driving device 17 via bus lines such as a data bus. , A communication control device 18, an input / output I / F 19, and a voice recognition device 20.

【００２３】制御部１１は、ＣＰＵ１１１、ＲＯＭ１１
２、ＲＡＭ１１３を備えている。ＣＰＵ１１１は、プロ
グラムに従って各種装置を制御し演算を行う。ＲＯＭ１
１２は、コンピュータの起動時に実行されるプログラム
等が予め格納されたリードオンリーメモリである。ＲＡ
Ｍ１１３は、ＣＰＵが各種制御や演算を行うためのプロ
グラムやデータが格納されるワーキングメモリとして使
用される。The control unit 11 includes a CPU 111, a ROM 11
2. A RAM 113 is provided. The CPU 111 controls various devices according to programs to perform calculations. ROM1
Reference numeral 12 denotes a read-only memory in which programs to be executed when the computer is started are stored in advance. RA
The M113 is used as a working memory in which programs and data for the CPU to perform various controls and calculations are stored.

【００２４】キーボード１２は、編集を行う動画の画像
データや画像データに付された音声データ、文書データ
を取得する際に、編集の対象となる動画を指定したり、
選択取得手段の一部として候補画像の中から重要画像を
選択し指定する入力手段１を構成する。キーボード１２
には、仮名文字を入力するための仮名キーやテンキー、
各種機能を実行するための機能キー、カーソルキー、等
の各種キーが配置されている。マウス１３は、ポインテ
ィングデバイスであり、キーボードと同様に入力手段１
を構成し、表示装置１４に表示されたキーやアイコン等
を左クリックすることで編集対象となる動画を指定した
り重要画像を選択する。表示装置１４は、例えばＣＲＴ
や液晶ディスプレイ等が使用される。この表示装置に
は、編集の対象となりうる動画がアイコンやファイル名
で表示され、編集を行う画像をキーボード１２やマウス
１３により選択するようになっている。また、重要画像
候補の画像が表示され、重要画像をキーボード１２やマ
ウス１３により選択するようになっている。更に、重要
画像が、この重要画像を抽出した一まとまりの動画のサ
ムネイルとして表示されるようになっている。印刷装置
１５は、出力手段１０を構成し、表示装置１４に表示さ
れた画像等の印刷を行うためのものである。この印刷装
置としては、レーザプリンタ、ドットプリンタ、インク
ジェットプリンタ、ページプリンタ、感熱式プリンタ、
熱転写式プリンタ、等の各種印刷装置が使用される。When acquiring image data of a moving image to be edited, audio data attached to the image data, and document data, the keyboard 12 specifies a moving image to be edited,
The input unit 1 is configured to select and designate an important image from candidate images as a part of the selection acquisition unit. Keyboard 12
Contains kana keys and numeric keys for entering kana characters,
Various keys such as a function key and a cursor key for executing various functions are arranged. The mouse 13 is a pointing device.
By left-clicking a key, an icon, or the like displayed on the display device 14, a moving image to be edited is specified or an important image is selected. The display device 14 is, for example, a CRT
And a liquid crystal display are used. On this display device, a moving image that can be edited is displayed by an icon or a file name, and an image to be edited is selected by the keyboard 12 or the mouse 13. In addition, an image of an important image candidate is displayed, and the important image is selected by the keyboard 12 and the mouse 13. Furthermore, important images are displayed as thumbnails of a group of moving images from which the important images have been extracted. The printing device 15 constitutes the output unit 10 and prints an image or the like displayed on the display device 14. This printing device includes laser printers, dot printers, inkjet printers, page printers, thermal printers,
Various printing devices such as a thermal transfer printer are used.

【００２５】記憶装置１６は、読み書き可能な記憶媒体
と、その記憶媒体に対してプログラムやデータ等の各種
情報を読み書きするための駆動装置で構成されている。
この記憶装置１６に使用される記憶媒体としては、主と
してハードディスクが使用されるが、後述の記憶媒体駆
動装置１７で使用される各種記憶媒体のうちの読み書き
可能な記憶媒体を使用するようにしてもよい。記憶装置
１６は、仮名漢字変換辞書１６１、プログラム格納部１
６２、データ格納部１６３、音声認識辞書１６４、及び
図示しないその他の格納部（例えば、この記憶装置１６
内に格納されているプログラムやデータ等をバックアッ
プするための格納部）等を有している。プログラム格納
部１６２には、本実施形態による編集処理において、一
まとまりの動画毎にサムネイルを付与するサムネイル処
理を行うためのプログラムとして、一まとまりの動画の
画像データを取得する画像データ取得処理プログラム、
前記動画に対応付けられた音声の音声データを変換した
文書データを取得する文書データ取得処理プログラム、
前記文書データ取得処理プログラムによる文書データ取
得処理で取得した前記文書データを、所定の単位で区分
する区分処理プログラム、前記区分処理で区分された各
区分毎の文書データの重要度を取得する重要度取得処理
プログラム、前記重要度取得処理プログラムによる重要
度取得処理で取得した前記重要度が高い区分の文書デー
タを、重要区分として取得する重要区分取得処理プログ
ラム、前記重要区分取得処理プログラムによる重要区分
取得処理で取得した前記重要区分に対応する対応画像デ
ータから、所定の画像データを、重要画像データとして
取得する重要画像データ取得処理プログラム、前記重要
画像データ取得処理プログラムによる重要画像データ取
得処理で取得した前記重要画像データを、前記画像デー
タ取得処理で取得した一まとまりの前記画像データと対
応付ける対応処理プログラムが格納されている。また、
このサムネイル処理の他の、動画の編集処理プログラ
ム、（例えば、サムネイル処理により付されたサムネイ
ルを参照しながら一まとまりの動画どうしの並べ替え処
理のプログラム等サムネイル処理の結果を利用するもの
と、利用しないものを含む）、仮名漢字変換辞書１６１
を使用して入力された仮名文字列を漢字混り文に変換す
る仮名漢字変換プログラム、等の各種プログラムが格納
されている。The storage device 16 comprises a readable and writable storage medium and a drive device for reading and writing various information such as programs and data on the storage medium.
As a storage medium used for the storage device 16, a hard disk is mainly used, but a readable / writable storage medium among various storage media used in a storage medium drive device 17 described later may be used. Good. The storage device 16 stores the kana-kanji conversion dictionary 161, the program storage unit 1
62, a data storage unit 163, a speech recognition dictionary 164, and other storage units (not shown) (for example, the storage device 16).
Storage unit for backing up programs, data, and the like stored therein. The program storage unit 162 includes, in the editing processing according to the present embodiment, an image data acquisition processing program for acquiring image data of a group of moving images, as a program for performing a thumbnail process for giving a thumbnail to each group of moving images;
A document data acquisition processing program for acquiring document data obtained by converting audio data of audio associated with the moving image,
A division processing program that divides the document data acquired in the document data acquisition processing by the document data acquisition processing program into predetermined units, and a degree of importance that acquires the importance of the document data for each division divided by the division processing An acquisition processing program, an important category acquisition processing program for acquiring, as important categories, the document data of the high importance categories acquired in the importance acquisition processing by the importance acquisition program, and important category acquisition by the important category acquisition processing program From the corresponding image data corresponding to the important section obtained in the processing, predetermined image data is obtained by an important image data obtaining processing program for obtaining as important image data, an important image data obtaining processing by the important image data obtaining processing program. Acquire the important image data in the image data acquisition process Corresponding processing program is stored associating the a collection of the image data. Also,
In addition to the thumbnail processing, a video editing processing program (for example, a program that uses the result of thumbnail processing, such as a program for rearranging a group of videos while referring to thumbnails attached by thumbnail processing, And kana-kanji conversion dictionary 161
And various programs, such as a kana-kanji conversion program for converting a kana character string input by using.

【００２６】データ格納部１６３には、他の装置で撮像
・作成されて記憶媒体駆動装置１７や通信制御装置１８
から読み込まれた動画データ、及び、この動画データと
ともに取得された音声データや該音声データの音声認識
結果としての文書データ、及び本実施形態による編集処
理で動画データから抽出された各動画データのサムネイ
ルとしての静止画像の画像データ等が格納される。音声
認識辞書格納部１６４には、音素、単音節、単語、形態
素、文節等（以下形態素等）の単位での音声データと単
語等との対応音声認識辞書が格納されている。本実施形
態においては、形態素毎の音声パターンと形態素が対応
する形態素辞書が格納されている。この音声認識辞書１
６４は、音声の音声データのパターンや、他の装置で作
成され記憶媒体駆動装置１７や通信制御装置１８から読
み込まれた音声データのパターンから、対応する単語を
探し出し、入力された音声を単語として認識する際に使
用される。In the data storage section 163, the storage medium drive 17 and the communication control
From the moving image data, the audio data acquired together with the moving image data, the document data as the sound recognition result of the audio data, and the thumbnails of the respective moving image data extracted from the moving image data by the editing processing according to the present embodiment. Is stored as image data of a still image. The speech recognition dictionary storage unit 164 stores a speech recognition dictionary corresponding to speech data and words in units of phonemes, single syllables, words, morphemes, phrases, and the like (hereinafter, morphemes and the like). In the present embodiment, a morpheme dictionary corresponding to a speech pattern for each morpheme and a morpheme is stored. This voice recognition dictionary 1
64 finds a corresponding word from a voice data pattern of voice or a voice data pattern created by another device and read from the storage medium driving device 17 or the communication control device 18, and the input voice is used as a word. Used for recognition.

【００２７】記憶媒体駆動装置１７は、ＣＰＵ１１１が
外部の記憶媒体からコンピュータプログラムや文書を含
むデータ等を読み込むための駆動装置である。記憶媒体
に記憶されているコンピュータプログラム等には、本実
施形態の編集処理装置により実行されるイメージ情報検
索処理や主観評価表現辞書較正処理、主観評価情報辞書
較正処理等の各種処理プログラム、及び、そこで使用さ
れる辞書、データ等も含まれる。ここで、記憶媒体と
は、コンピュータプログラムやデータ等が記憶される記
憶媒体をいい、具体的には、フロッピーディスク、ハー
ドディスク、磁気テープ等の磁気記憶媒体、メモリチッ
プやＩＣカード等の半導体記憶媒体、ＣＤ−ＲＯＭやＭ
Ｏ、ＰＤ（相変化書換型光ディスク）等の光学的に情報
が読み取られる記憶媒体、紙カードや紙テープ等の用紙
（および、用紙に相当する機能を持った媒体）を用いた
記憶媒体、その他各種方法でコンピュータプログラム等
が記憶される記憶媒体が含まれる。本実施形態の編集処
理装置において使用される記憶媒体としては、主とし
て、ＣＤ−ＲＯＭやフロッピー（登録商標）ディスク等
の記憶媒体が使用される。記憶媒体駆動装置１７は、こ
れらの各種記憶媒体からコンピュータプログラムを読み
込む他に、フロッピーディスクのような書き込み可能な
記憶媒体に対してＲＡＭ１１３や記憶装置１６に格納さ
れているデータ等を書き込むことが可能である。The storage medium drive 17 is a drive for the CPU 111 to read a computer program or data including a document from an external storage medium. Computer programs and the like stored in the storage medium include various processing programs such as image information search processing, subjective evaluation expression dictionary calibration processing, and subjective evaluation information dictionary calibration processing executed by the editing processing apparatus of the present embodiment, and The dictionary, data, etc. used there are also included. Here, the storage medium refers to a storage medium in which a computer program, data, and the like are stored, and specifically, a magnetic storage medium such as a floppy disk, a hard disk, and a magnetic tape, and a semiconductor storage medium such as a memory chip and an IC card. , CD-ROM or M
O, PD (phase-change rewritable optical disk) and other storage media from which information can be read optically, storage media using paper such as paper cards and paper tapes (and media having functions equivalent to paper), and various other types A storage medium on which a computer program or the like is stored in the method is included. As a storage medium used in the editing processing device of the present embodiment, a storage medium such as a CD-ROM or a floppy (registered trademark) disk is mainly used. The storage medium driving device 17 can read data stored in the RAM 113 and the storage device 16 into a writable storage medium such as a floppy disk in addition to reading a computer program from these various storage media. It is.

【００２８】なお、フロッピーディスクやメモリチッ
プ、ＩＣカード等に格納された動画の画像データや音声
データ、文書データを記憶媒体駆動装置１７を介して読
み込んで、本実施形態による編集処理を行う場合、記憶
媒体駆動装置１７は画像データ取得手段２及び文書デー
タ取得手段３として機能する。When the moving image data, audio data, and document data stored in a floppy disk, a memory chip, an IC card, or the like are read through the storage medium driving device 17 and the editing processing according to the present embodiment is performed, The storage medium driving device 17 functions as the image data obtaining unit 2 and the document data obtaining unit 3.

【００２９】本実施形態の編集処理装置では、制御部１
１のＣＰＵ１１１が、記憶媒体駆動装置１７にセットさ
れた外部の記憶媒体からコンピュータプログラムを読み
込んで、記憶装置１６の各部に格納（インストール）す
る。そして、本実施形態によるイメージ情報検索処理等
の各種処理を実行する場合、記憶装置１６から該当プロ
グラムをＲＡＭ１１３に読み込み、実行するようになっ
ている。但し、記憶装置１６からではなく、記憶媒体駆
動装置１７により外部の記憶媒体から直接ＲＡＭ１１３
にプログラムを読み込んで実行することも可能である。
また、編集処理装置によっては、本実施形態の編集処理
のための各種プログラム等を予めＲＯＭ１１２に記憶さ
せておき、これをＣＰＵ１１１が実行するようにしても
よい。さらに、本実施形態による編集処理のための各種
プログラムやデータを、通信制御装置１８を介して他の
記憶媒体からダウンロードし、実行するようにしてもよ
い。In the editing processing apparatus of the present embodiment, the control unit 1
The first CPU 111 reads a computer program from an external storage medium set in the storage medium drive 17 and stores (installs) it in each unit of the storage 16. When executing various processes such as the image information search process according to the present embodiment, the corresponding program is read from the storage device 16 into the RAM 113 and executed. However, instead of the storage device 16, the storage medium driving device 17 directly outputs the RAM 113 from an external storage medium.
It is also possible to read and execute the program.
Further, depending on the editing processing device, various programs and the like for the editing processing of the present embodiment may be stored in the ROM 112 in advance, and the CPU 111 may execute the programs. Furthermore, various programs and data for the editing process according to the present embodiment may be downloaded from another storage medium via the communication control device 18 and executed.

【００３０】通信制御装置１８は、編集処理装置と他の
パーソナルコンピュータ等の各種外部電子機器との間を
ネットワーク接続するための制御装置であり、この通信
制御装置１８を介して、外部電子機器から編集の対象と
なる動画の画像データや該画像データに伴う音声データ
または文書データを取得することができる。この場合の
通信制御装置１８は画像データ取得手段２や文書データ
取得手段３を構成する。また、通信制御装置１８は、出
力手段１０及び選択取得部８３を構成し、編集処理プロ
グラムにおいて抽出された重要画像候補の画像データを
通信制御装置１８を介して外部電子機器に出力し、重要
画像候補の中から重要画像の選択を外部電子機器から通
信制御装置１８を介して取得することができる。The communication control device 18 is a control device for making a network connection between the editing processing device and various external electronic devices such as other personal computers. It is possible to acquire image data of a moving image to be edited and audio data or document data accompanying the image data. The communication control device 18 in this case constitutes the image data obtaining means 2 and the document data obtaining means 3. Further, the communication control device 18 configures the output unit 10 and the selection acquisition unit 83, and outputs the image data of the important image candidate extracted in the editing processing program to the external electronic device via the communication control device 18, and outputs the important image The selection of the important image from the candidates can be obtained from the external electronic device via the communication control device 18.

【００３１】入出力Ｉ／Ｆ１９は、ビデオカメラ等の各
種機器を接続するためのインターフェースであり、画像
データ取得手段２や文書データ取得手段３を構成して、
外部電子機器から編集の対象となる動画の画像データや
該画像データに伴う音声データを取得することができ
る。音声認識装置２０は、文書データ取得手段３を構成
し、入出力Ｉ／Ｆ１９を介して接続されたビデオカメラ
から、また、通信制御装置１８を介して外部電子機器か
ら、更に、記憶媒体駆動装置１７を介して記憶媒体か
ら、及び記憶手段１６のデータ格納部１６３から、入力
手段１で指定され読み込まれた音声データを、音声認識
辞書１６４を使用して認識し、文書データに変換する。The input / output I / F 19 is an interface for connecting various devices such as a video camera, and constitutes the image data obtaining means 2 and the document data obtaining means 3.
Image data of a moving image to be edited and audio data accompanying the image data can be obtained from an external electronic device. The voice recognition device 20 constitutes the document data acquisition means 3, from a video camera connected via the input / output I / F 19, from an external electronic device via the communication control device 18, and further from a storage medium driving device The voice data specified by the input means 1 and read from the storage medium via the storage means 16 and from the data storage section 163 of the storage means 16 are recognized using the voice recognition dictionary 164 and converted into document data.

【００３２】本実施形態の編集処理装置は、パーソナル
コンピュータやワードプロセッサ等を含むコンピュータ
システムで構成するだけでなく、ＬＡＮ（ローカル・エ
リア・ネットワーク）のサーバ、コンピュータ（パソコ
ン）通信のホスト、インターネット上に接続されたコン
ピュータシステム等によって構成することも可能であ
る。また、ネットワーク上の各機器に機能分散させ、ネ
ットワーク全体で編集処理装置を構成することも可能で
ある。図３は、このようなネットワークにより編集処理
装置を構成した場合のシステム構成図を表したものであ
る。図３に示すように、編集処理装置は、サムネイル処
理等を行うホスト装置３０と、動画の画像データや音声
データ、文書データをホスト装置３０に送信する複数の
クライアントＰＣ５０と、ホスト装置３０と各クライア
ントＰＣ５０とを接続するネットワーク４０とから構成
されている。ネットワーク４０としては主としてインタ
ーネットに接続されるが、ＬＡＮ（ローカル・エリア・
ネットワーク）や、コンピュータネットワーク等の各種
ネットワーク４０と接続可能になっている。ネットワー
ク４０にはパーソナルコンピュータ等のクライアントＰ
Ｃ５０が適宜接続されるようになっており、適時複数の
クライアントＰＣ５０がホスト装置３０にアクセス可能
になっている。The editing apparatus according to the present embodiment is not only constituted by a computer system including a personal computer, a word processor, etc., but also by a LAN (local area network) server, a computer (personal computer) communication host and the Internet. It is also possible to configure with a connected computer system or the like. Further, it is also possible to distribute the functions to the devices on the network and configure the editing processing apparatus on the entire network. FIG. 3 shows a system configuration diagram when the editing processing device is configured by such a network. As shown in FIG. 3, the editing processing device includes a host device 30 that performs thumbnail processing and the like, a plurality of client PCs 50 that transmit moving image data, audio data, and document data to the host device 30. A network 40 connects the client PC 50 to the network. Although the network 40 is mainly connected to the Internet, a LAN (local area
Network) and various networks 40 such as a computer network. A client P such as a personal computer is connected to the network 40.
The C50 is appropriately connected, and a plurality of client PCs 50 can access the host device 30 at appropriate times.

【００３３】クライアントＰＣ５０は、パーソナルコン
ピュータ等のいわゆるパソコンシステムにより構成さ
れ、ダイヤルアップソフトウェア等によりネットワーク
４０（インターネット）に接続されＷＷＷ（Ｗｏｒｌｄ
ＷｉｄｅＷｅｂ）のデータをブラウザ（ｂｒｏｗｓ
ｅｒ）ソフトによりに閲覧可能になっている。一方、ク
ライアントＰＣ５０は制御部、表示部、入力部、出力
部、通信制御部、記憶部、その他の機器を備えている。
クライアントＰＣ５０の制御部は、装置全体を所定のプ
ログラムに従って処理、制御するようになっており、入
力部から入力された動画の画像データや音声データ、文
書データを、通信制御部及びネットワーク４０を介して
ホスト装置３０に送信すると共に、ホスト装置３０にお
いてサムネイル処理により付与されたサムネイルを受信
して動画データとともに表示部に表示し、記憶部に格納
し、又は出力部から印刷出力するようになっている。The client PC 50 is constituted by a so-called personal computer system such as a personal computer, is connected to a network 40 (Internet) by dial-up software or the like, and is connected to a WWW (World).
Data from Wide Web is browsed by browsers
er) It can be browsed by software. On the other hand, the client PC 50 includes a control unit, a display unit, an input unit, an output unit, a communication control unit, a storage unit, and other devices.
The control unit of the client PC 50 processes and controls the entire apparatus according to a predetermined program, and transmits image data, audio data, and document data of a moving image input from the input unit via the communication control unit and the network 40. In addition to transmitting to the host device 30, the thumbnail given by the thumbnail processing in the host device 30 is received and displayed together with the moving image data on the display unit, stored in the storage unit, or printed out from the output unit. I have.

【００３４】一方、ホスト装置３０は、制御部３１を備
えており、制御部３１にデータバス等のバスラインを介
して入出力部３２、表示部３４、記憶部３６、通信制御
部３８、図示しないその他の機器が接続されている。各
部３１〜３８の基本的構成は、図２に示した編集処理装
置とほぼ同様であり、特に異なる点を中心に説明する
と、制御部３１は、ＷＷＷサーバーとして機能し、図２
に示した編集処理装置の制御部１１や、クライアントＰ
Ｃ５０の制御部１１に比べ高速処理が可能であると共
に、複数のクライアントＰＣ５０からのアクセスに対応
するために並列処理が可能になっている。同様に通信制
御部３８も複数のＩＳＤＮ回線との接続が可能であると
共に、クライアントＰＣ５０のそれよりも高速処理が可
能になっている。そして、ホスト装置３０は入力手段１
及び出力手段１０を構成する通信制御部３８の制御によ
ってクライアントＰＣ５０からネットワーク４０を介し
て動画の画像データや音声データ、音声データの音声認
識結果の文書データを受信取得し、重要画像の候補画像
をクライアントＰＣ５０に出力し、クライアントＰＣ５
０から重要画像の選択を取得する。重要画像とこの重要
画像の抽出元の動画との対応は、動画の画像データや音
声データ、文書データとともにデータ格納部３６３に格
納されるか、または、ネットワーク４０を介してクライ
アントＰＣ５０に送信される。On the other hand, the host device 30 includes a control unit 31. The control unit 31 has an input / output unit 32, a display unit 34, a storage unit 36, a communication control unit 38, and a bus line such as a data bus. No other equipment is connected. The basic configuration of each of the units 31 to 38 is substantially the same as that of the editing processing apparatus shown in FIG. 2, and the control unit 31 functions as a WWW server.
The control unit 11 of the editing processing apparatus shown in FIG.
High-speed processing is possible as compared with the control unit 11 of the C50, and parallel processing is possible to cope with access from a plurality of client PCs 50. Similarly, the communication control unit 38 can be connected to a plurality of ISDN lines and can perform processing at a higher speed than that of the client PC 50. The host device 30 is connected to the input unit 1
Under the control of the communication control unit 38 constituting the output unit 10, image data and audio data of moving images are received from the client PC 50 via the network 40, and the document data of the voice recognition result of the audio data is received and acquired, and the candidate image of the important image is Output to the client PC 50 and output to the client PC 5
From 0, the selection of the important image is obtained. The correspondence between the important image and the moving image from which the important image is extracted is stored in the data storage unit 363 together with the image data, audio data, and document data of the moving image, or transmitted to the client PC 50 via the network 40. .

【００３５】以上のように構成された編集処理装置によ
るサムネイル処理の動作について次に説明する。図４
は、編集処理装置による編集処理において行われるサム
ネイル処理の動作を表したフローチャートであり、図５
及び図６は、サムネイル処理の各工程における処理を概
念的に表した説明図である。編集処理装置によるサムネ
イル処理は、ユーザにより、入力手段１から、編集処理
において所定の動画データをサムネイル表示モードで一
覧表示する命令が入力され、サムネイルの付与されてい
ない動画データが検出された場合に、このサムネイルの
付与されていない動画データについて行われる。Next, the operation of the thumbnail processing performed by the editing device configured as described above will be described. FIG.
FIG. 5 is a flowchart showing the operation of the thumbnail processing performed in the editing processing by the editing processing apparatus.
FIG. 6 is an explanatory diagram conceptually showing processing in each step of the thumbnail processing. The thumbnail processing by the editing processing device is performed when a user inputs a command to display a list of predetermined moving image data in the thumbnail display mode in the editing processing from the input unit 1 and detects moving image data to which no thumbnail is assigned. This is performed on the moving image data to which the thumbnail is not provided.

【００３６】本実施形態によるサムネイル処理において
は、ユーザによりサムネイル表示モードで一覧表示する
動画のうち、サムネイルの付与されていない動画の画像
データＡを、画像データ取得手段２が、サムネイル処理
の対象となる画像データとして取得する（画像データ取
得処理）（ステップ１１）（図５（ａ））。画像データ
取得手段２で取得された画像データＡは、画像データ取
得手段２に具備されるシーンブレイク検出部によってシ
ーンブレイクを検出することによって、画像データに複
数のシーンが含まれているかどうか調べられる（ステッ
プ１３）。そして、画像データに複数のシーンが含まれ
ている場合には、画像データに複数のシーンが含まれて
いるとして、画像データ分割部２１により画像データ分
割処理が行われ、画像データが各シーンごとのまとまり
（画像データａ、画像データｂ、画像データｃ、・・
・）に分割される（画像データ分割処理）（ステップ１
５）（図５（ｂ））。画像データ取得手段２で取得され
た画像データが１シーンである場合（ステップ１３；
Ｎ）及び複数シーンの画像データが各シーン毎に一まと
まりの画像データに分割された後（ステップ１５後）、
文書データ取得手段３が、この画像データの動画ととも
に録音された音声の音声認識結果の文書データ（画像デ
ータａに対応する文書データａ’、画像データｂに対応
する文書データｂ’、画像データｃに対応する文書デー
タｃ’、・・・）を、各画像データの一まとまり毎に対
応させて取得する（文書データ取得処理）（ステップ１
７）（図５（ｃ））。このとき、文書データ取得手段３
は、画像データ取得手段２で取得された画像データに対
応する音声の音声データが音声認識されていない場合に
は、音声データを音声認識部３１によって音声認識し
て、文書データを取得する。既に音声認識結果がある場
合には、この音声認識結果の文書データをそのまま取得
する。各画像データのまとまりに対応する音声データや
文書データは、画像データに付されている時刻データを
参照し、同じ時刻データ分の音声データや文書データを
割り出して取得する。In the thumbnail processing according to the present embodiment, the image data A of the moving picture to which the thumbnail is not assigned among the moving pictures displayed in the thumbnail display mode by the user is determined by the image data acquiring means 2 as the thumbnail processing target. (Image data acquisition processing) (step 11) (FIG. 5A). The image data A obtained by the image data obtaining unit 2 is checked by detecting a scene break by a scene break detecting unit provided in the image data obtaining unit 2 to determine whether the image data includes a plurality of scenes. (Step 13). If the image data includes a plurality of scenes, it is determined that the image data includes a plurality of scenes. (Image data a, image data b, image data c,...)
.) (Image data division processing) (step 1)
5) (FIG. 5 (b)). When the image data obtained by the image data obtaining means 2 is one scene (step 13;
N) and after the image data of a plurality of scenes are divided into a group of image data for each scene (after step 15),
The document data obtaining means 3 generates the document data (the document data a ′ corresponding to the image data a, the document data b ′ corresponding to the image data b, and the image data c) as the voice recognition result of the voice recorded together with the moving image of the image data. ) Corresponding to each set of image data (document data acquisition processing) (step 1).
7) (FIG. 5 (c)). At this time, the document data acquisition means 3
When the voice data of the voice corresponding to the image data obtained by the image data obtaining means 2 is not recognized, the voice data is recognized by the voice recognition unit 31 to obtain the document data. If the speech recognition result already exists, the document data of the speech recognition result is obtained as it is. The audio data and the document data corresponding to the group of the image data are obtained by referring to the time data attached to the image data and determining the audio data and the document data for the same time data.

【００３７】次に、区分手段４が、最初のまとまりの画
像データ（Ｎ＝１としたときのＮ番目のまとまりの画像
データ、即ち１番目のまとまりの画像データａ）に対応
する文書データ（文書データａ’）を取得し、この文書
データを各文（文１、文２、文３、・・・）に区分する
（区分処理）（ステップ１９〜ステップ２３）（図５
（ｄ））。そして、重要度取得手段５が、各文１、文
２、文３、・・・について形態素解析を行って、自立
語、名詞句、複合名詞句等を含めた重要語（句）を抽出
し、抽出した重要語（句）の、最初の一まとまり中での
出現頻度、評価関数から、各重要語（句）の重要度ｆを
決定する。評価関数は、例えば、重要語（句）に対する
重み付け、単語、名詞句、複合名詞句等の重要語（句）
の種類による重み付け等を表す関数である。重要語
（句）、各重要語（句）の重み付け、重要語（句）の種
類に対する重み付けは、本実施形態においては、所定の
記憶部（記憶装置１６のプログラム格納部１６２やデー
タ格納部１６３等）に予め格納されている。そして、各
文に出現する重要語（句）の重要度ｆを累積することに
よって、各文１、文２、文３、・・・の重要度Ｆを決定
する。（重要度取得処理）（ステップ２５）（図５
（ｅ））。Next, the classifying means 4 sets the document data (document) corresponding to the first group of image data (the N-th group of image data when N = 1, ie, the first group of image data a). Data a ′) is obtained, and this document data is divided into sentences (sentence 1, sentence 2, sentence 3,...) (Segmentation processing) (steps 19 to 23) (FIG. 5).
(D)). The importance acquiring means 5 performs morphological analysis on each sentence 1, sentence 2, sentence 3,... To extract important words (phrases) including independent words, noun phrases, compound noun phrases, and the like. The importance f of each important word (phrase) is determined from the frequency of appearance of the extracted important word (phrase) in the first group and the evaluation function. The evaluation function is, for example, weighting for important words (phrases), important words (phrases) such as words, noun phrases, compound noun phrases, etc.
Is a function representing the weighting or the like depending on the type of. In the present embodiment, the important words (phrases), the weights of the important words (phrases), and the weights for the types of the important words (phrases) are determined in a predetermined storage unit (the program storage unit 162 or the data storage unit 163 of the storage device 16). Etc.) are stored in advance. Then, the importance F of each sentence 1, sentence 2, sentence 3,... Is determined by accumulating the importance f of the important word (phrase) appearing in each sentence. (Importance acquisition processing) (Step 25) (FIG. 5)
(E)).

【００３８】各文の重要度Ｆが決定されると、この重要
度Ｆを参照して、重要区分取得手段６が、重要度Ｆの高
い順から、文書データの全部の文の数に対して所定の割
合の数の文を選び、重要文と特定する（重要文１、重要
文２、・・・）（重要区分取得処理）（ステップ２７）
（図５（ｆ））。次いで、対応画像データ取得部７が、
重要文１，重要文２、・・・に対応する画像データ（対
応画像データ１、対応画像データ２、・・・）を取得す
る（対応画像データ取得処理）。重要文に対応する画像
データは、各重要文に付される時刻データを参照し、同
じ時刻データ分の画像データを割り出して取得する（対
応画像データ取得処理）（ステップ２９）（図６
（ｇ））。そして、候補画像データ取得部８１により、
各重要文１，重要文２、・・・に対応する対応画像デー
タ１、対応画像データ２、・・・それぞれについて、そ
の始点と終点の中間の時刻における静止画像の画像デー
タが抽出され、これらの画像データが重要画像候補の画
像データ（候補画像データ１、候補画像データ２、・・
・）として特定される（重要画像候補取得処理）（図６
（ｈ））。候補画像データ１、候補画像データ２、・・
・は、候補画像出力部８２によって出力手段１０から画
像出力される（候補画像出力処理）（ステップ３１）
（図６（ｉ））。操作者は、出力手段１０から出力され
た候補画像データによる候補画像を見て、一まとまりの
画像データの内容のサムネイルとして適当と思うものを
選択し、入力手段１から入力する。When the importance F of each sentence is determined, referring to the importance F, the important division acquiring means 6 determines the number of all sentences in the document data in descending order of importance F. A sentence of a predetermined ratio is selected and identified as an important sentence (important sentence 1, important sentence 2, ...) (important category acquisition processing) (step 27).
(FIG. 5 (f)). Next, the corresponding image data obtaining unit 7
Acquire image data (corresponding image data 1, corresponding image data 2,...) Corresponding to important sentences 1, 2, and so on (corresponding image data acquisition processing). The image data corresponding to the important sentence is referred to the time data attached to each important sentence, and the image data for the same time data is determined and obtained (corresponding image data obtaining processing) (step 29) (FIG. 6).
(G)). Then, by the candidate image data acquisition unit 81,
For each of the corresponding image data 1, the corresponding image data 2,... Corresponding to each of the important sentences 1, 2,..., The image data of the still image at the time between the start point and the end point is extracted. Are image data of important image candidates (candidate image data 1, candidate image data 2,...)
(Important image candidate acquisition processing) (FIG. 6)
(H)). Candidate image data 1, Candidate image data 2, ...
Is output from the output unit 10 by the candidate image output unit 82 (candidate image output processing) (step 31).
(FIG. 6 (i)). The operator looks at the candidate images based on the candidate image data output from the output unit 10, selects a suitable thumbnail as the content of the set of image data, and inputs the thumbnail from the input unit 1.

【００３９】操作者による選択は、選択取得部８３が、
入力手段１からの入力結果に基づいて取得し（ステップ
３３）、選択された候補画像を重要画像（サムネイル用
の画像）とする（重要画像取得処理）。そして、重要画
像の画像データは、対応手段９によって、ステップ２１
で取得した文書データに対応する一まとまりの画像デー
タに対する、サムネイル用の画像データとして対応付け
られ（対応処理）（図６（ｊ））、所定の記憶部に記憶
される（ステップ３５）。最初の一まとまりについて画
像データと重要画像の画像データとが対応付けられた後
は、以後２番目のまとまり、３番目のまとまり、・・・
と、ステップ２１からの処理が繰り返され、動画が分割
された全てのまとまりについて、同様に重要画像の画像
データが対応付けられ記憶される。全てのまとまりにつ
いて重要画像の画像データが対応付けられると、サムネ
イル処理が終了される。以後、編集処理において動画デ
ータをサムネイル表示モードで一覧表示する命令が入力
されると、各一まとまりの画像データａ、画像データ
ｂ、画像データｃ、・・・に、上述のサムネイル処理に
おいて対応付けられたサムネイル用の画像データによる
サムネイル画像が付されて表示される。The selection by the operator is performed by the selection
It is acquired based on the input result from the input means 1 (step 33), and the selected candidate image is set as an important image (image for thumbnail) (important image acquisition processing). Then, the image data of the important image is provided by the correspondence unit 9 in step 21.
A set of image data corresponding to the document data acquired in (1) is associated as image data for thumbnails (corresponding processing) (FIG. 6 (j)) and stored in a predetermined storage unit (step 35). After the image data and the image data of the important image are associated with each other for the first group, the second group, the third group, and so on, are used thereafter.
Then, the processing from step 21 is repeated, and the image data of the important image is similarly associated and stored for all the divided groups of the moving image. When the image data of the important image is associated with all the groups, the thumbnail processing ends. Thereafter, when an instruction for displaying a list of moving image data in the thumbnail display mode is input in the editing process, the set of image data a, image data b, image data c,. A thumbnail image based on the given thumbnail image data is displayed.

【００４０】このように、本実施形態では、一まとまり
のシーンの画像に対応する音声を音声認識した文書を取
得し、この文書中の重要度の高い文（重要文）を特定す
る。そして、重要文に対応する画像（重要文が音声出力
される場面の画像）は、一まとまりのシーンの内容を良
好に反映したものであるとして、この重要文に対応する
動画に含まれる静止画像の画像データを、サムネイル用
の画像の候補（候補画像）として出力する。そして、出
力した候補画像の中から操作者の選択を取得し、選択さ
れた候補画像をサムネイルと特定し、一まとまりのシー
ン全体に対応づける。As described above, in the present embodiment, a document in which voices corresponding to images of a group of scenes are recognized by voice is acquired, and a sentence having a high importance (important sentence) in the document is specified. The image corresponding to the important sentence (the image of the scene where the important sentence is output as sound) is a still image included in the moving image corresponding to the important sentence, assuming that the content of the set of scenes is well reflected. Are output as thumbnail image candidates (candidate images). Then, the operator's selection is obtained from the output candidate images, the selected candidate image is specified as a thumbnail, and the selected candidate image is associated with the entire group of scenes.

【００４１】従って、本実施形態によると、一まとまり
のシーンの内容を考慮し、重要な内容を表示する場面の
画像データが、サムネイル候補となるので、一まとまり
の内容を適切に表したサムネイルを付すことが可能であ
る。本実施形態によると、重要な内容を表示する場面の
静止画像をサムネイルの候補画像として出力し、操作者
により適切な画像を選択させているので、より確実に、
一まとまりのシーンの内容を適切に表し且つ操作者に分
かりやすい画像が、重要画像として特定される。このと
き、重要な内容を表示する場面の画像がサムネイルとし
て自動的に選択されているので、操作者は少ない手間
で、１つのサムネイル用の画像を選び出すことが可能で
ある。本実施形態によると、複数のシーンを含む動画
が、シーンブレイクにより自動的に分割され、シーン毎
にサムネイルが付与されるので、異なる内容に共通の１
つのサムネイルが付与されることがない。Therefore, according to the present embodiment, the image data of a scene displaying important contents is considered as a thumbnail candidate in consideration of the contents of a group of scenes, so that a thumbnail appropriately representing the group of contents is generated. It is possible to attach. According to the present embodiment, a still image of a scene displaying important contents is output as a thumbnail candidate image, and an appropriate image is selected by the operator.
An image that appropriately represents the contents of a group of scenes and that is easy for the operator to understand is specified as an important image. At this time, since the image of the scene displaying the important content is automatically selected as the thumbnail, the operator can select one thumbnail image with little effort. According to the present embodiment, a moving image including a plurality of scenes is automatically divided by a scene break, and a thumbnail is assigned to each scene.
No two thumbnails are given.

【００４２】以上、本発明の一実施形態について説明し
たが、本発明は、上述の実施形態に限定されるものでは
なく、請求項に記載された発明の範囲内で種々の変形を
することが可能である。例えば、上述の実施形態では、
編集処理装置としてコンピュータを用いているが、コン
ピュータに限定されるものではなく、編集処理のための
専用機等でもよい。上述の実施形態においては、重要文
に対応する画像データ（対応画像データ）から、重要画
像候補として静止画像データを抽出しているが、所定時
間分の動画データを抽出してもよい。この場合、重要文
に対応する画像データ全体を重要画像候補とすることも
可能である。上述の実施形態においては、重要文に対応
する画像データから、時間的に中間に位置する静止画像
データを重要画像候補としているが、重要画像候補の画
像データは、重要文に対応する画像データから抽出され
ていればよく、重要文の開始時点の画像データや、開始
後所定時間後の画像データ等とすることもできる。上述
の実施形態においては、一まとまりの画像データに対応
する文書データを文単位で区分し重要区分として重要文
を取得しているが、区分する単位は、文単位に限定され
るものではなく、複数の文を１単位としたり、文節を単
位としてもよい。例えば、図５に示す文書データａ’の
場合に、「始めに」「今日の」「ニュースを」「お伝え
します」「今日」「午前３時ごろ」・・・と文節単位で
区分し、各文節に含まれる単語の重要度から重要区分で
ある重要文節「地震が」「震度６の」「被害状況は」を
抽出し、これらの各重要文節に対応する画像を対応画像
としてもよい。区分が文単位以外であっても、複数の重
要区分を抽出可能であることは上述の実施形態と同様で
ある。また、重要区分の単位が文、文節、その他いずれ
であっても、重要度が等しい区分が複数検出された場合
には、それらのうち時刻データが最初のものや中間のも
の等所定の条件から１つを選択したり、いずれについて
も重要区分とすることもできる。While one embodiment of the present invention has been described above, the present invention is not limited to the above-described embodiment, and various modifications can be made within the scope of the invention described in the claims. It is possible. For example, in the above embodiment,
Although a computer is used as the editing processing device, the invention is not limited to the computer, and may be a dedicated machine for editing processing. In the above embodiment, still image data is extracted as an important image candidate from image data (corresponding image data) corresponding to an important sentence, but moving image data for a predetermined time may be extracted. In this case, the entire image data corresponding to the important sentence can be set as an important image candidate. In the above-described embodiment, from the image data corresponding to the important sentence, the still image data positioned in the middle in time is regarded as the important image candidate. However, the image data of the important image candidate is obtained from the image data corresponding to the important sentence. It is sufficient that the image data is extracted, and the image data at the start of the important sentence or the image data at a predetermined time after the start can be used. In the above-described embodiment, the document data corresponding to a set of image data is classified in sentence units and an important sentence is acquired as an important division. However, the unit of division is not limited to a sentence unit. A plurality of sentences may be set as one unit, or a phrase may be set as a unit. For example, in the case of the document data a 'shown in FIG. 5, it is divided into phrases such as "at the beginning", "today", "news", "tell me", "today", "around 3am", and so on. The important phrases “earthquake”, “of seismic intensity 6”, and “damage status”, which are important categories, are extracted from the importance of the words included in each phrase, and the image corresponding to each of these important phrases may be used as the corresponding image. As in the above-described embodiment, a plurality of important sections can be extracted even if the section is not a sentence unit. In addition, regardless of whether the unit of the important division is sentence, phrase, or any other, if a plurality of divisions with the same importance are detected, the time data among them is determined based on predetermined conditions such as the first and intermediate time data. One can be selected, or both can be set as important categories.

【００４３】上述の実施形態においては、一まとまりの
画像データに対して、複数の重要画像候補を操作者に提
案し、操作者の選択によって重要画像を決定している
が、各まとまりについて１つの重要文のみを選び出し、
この重要文から抽出した静止画像を自動的に重要画像と
してサムネイルに決定するようにしてもよい。上述の実
施形態においては、画像データのシーンブレイクを検出
して、各シーンを一まとまりとしているが、対応する音
声データの音声認識後の文書データに基づいて、シーン
のまとまりを検出するようにしてもよい。文書データに
基づいてシーンのまとまりを検出する場合、例えば特開
平１１−４５２７８号公報記載の技術等の、従来より公
知の技術を用いることができる。即ち文書データの各文
を仮段落に分割し各仮段落について内容を表す文書ベク
トル等の指標を作成し、この指標に基づいて各仮段落文
間の類似度を求めて、所定の類似度以上の仮段落どうし
を１つのまとまりとする。この場合の文書ベクトルとし
ては、形態素解析により文書内に出現するキーワードを
パラメータとして、各キーワードについて仮段落中での
出現頻度や評価関数から各重要度ｆを決定しこの重要度
を各パラメータの値としたものを採用することができ、
この場合の類似度は、文書ベクトル間の角度に依存する
コサインにより求めることができる。すなわち、文書ベ
クトルｂｎとｂｎ＋１間の角度をｑとし、両文書ベクト
ルの内積をｂｎ・ｂｎ＋１とし、両文書ベクトルの大き
さをそれぞれ｜ｂｎ｜、｜ｂｎ＋１｜とした場合、両文
書ベクトルの類似度ｓは次の数式１により求まる。In the above-described embodiment, a plurality of important image candidates are proposed to the operator for a set of image data, and the important image is determined by the operator's selection. Select only important sentences,
A still image extracted from the important sentence may be automatically determined as a thumbnail as an important image. In the above-described embodiment, each scene is grouped by detecting a scene break of image data, but a group of scenes is detected based on document data of the corresponding voice data after voice recognition. Is also good. When detecting a group of scenes based on document data, a conventionally known technique such as the technique described in JP-A-11-45278 can be used. That is, each sentence of the document data is divided into provisional paragraphs, and an index such as a document vector representing the content of each provisional paragraph is created. Tentative paragraphs are grouped together. In this case, as the document vector, a keyword appearing in the document by the morphological analysis is used as a parameter, and for each keyword, each importance f is determined from an appearance frequency and an evaluation function in the provisional paragraph, and the importance is determined by the value of each parameter Can be adopted
The similarity in this case can be obtained by a cosine depending on the angle between the document vectors. That is, assuming that the angle between the document vectors bn and bn + 1 is q, the inner product of both document vectors is bn · bn + 1, and the magnitudes of both document vectors are | bn | and | bn + 1 | s is obtained by the following equation 1.

【００４４】[0044]

【数式１】類似度ｓ＝ＣＯＳ（ｑ）＝（ｂｎ・ｂｎ＋
１）／（｜ｂｎ｜×｜ｂｎ＋１｜）[Formula 1] Similarity s = COS (q) = (bn · bn +
1) / (| bn | × | bn + 1 |)

【００４５】この類似度ｓの値は−１≦ｓ≦１までの値
をとり、１に近いほど２つの仮段落の文書ベクトルが互
いに平行に近く、２つの仮段落どうしは似ていると考え
ることができる。The value of the similarity s takes a value up to −1 ≦ s ≦ 1, and as the value is closer to 1, the document vectors of the two provisional paragraphs are closer to each other and the two provisional paragraphs are considered to be similar. be able to.

【００４６】また、各処理の順番についても、適宜変更
可能である。例えば、上述の実施形態においては、画像
データの一まとまり毎（シーン毎）に、対応する文書デ
ータの取得から重要文の特定、重要画像の決定までを行
い、他のまとまりについてもこれを繰り返すようになっ
ているが、各まとまりについての処理を、全てのまとま
りについて行ってから、次の処理を行うようにしてもよ
い。即ち、全てのまとまりについて、まとまりごとの文
書データの取得（文書データの分割）を行った後、全て
のまとまりについて重要文を特定し、その後、各まとま
りについての画像候補の出力と重要画像の選択の取得、
重要画像と各まとまりの画像データとの対応付けを行っ
てもよい。Further, the order of each processing can be changed as appropriate. For example, in the above-described embodiment, for each unit of image data (for each scene), from acquisition of corresponding document data to identification of an important sentence and determination of an important image, this is repeated for other units. However, the processing for each group may be performed for all the groups, and then the next processing may be performed. That is, after obtaining document data (division of document data) for each unit for all units, an important sentence is specified for all units, and then output of image candidates and selection of important images for each unit. The acquisition of
The important image may be associated with each set of image data.

【００４７】上述の実施形態及び各変形例においては、
入力音声は日本語となっているが、あらゆる言語につい
て、音声データを取得し、編集処理を行うことが可能で
ある。その場合、対象となる言語用の形態素解析アルゴ
リズム等を使用するといった、本発明の構成には影響の
ない部分を変更するだけでよい。In the above embodiment and each of the modifications,
Although the input voice is in Japanese, it is possible to obtain voice data and perform editing processing in any language. In this case, it is only necessary to change a portion that does not affect the configuration of the present invention, such as using a morphological analysis algorithm for the target language.

【００４８】尚、以上の変形例は、適宜複数を選択し組
み合わせて適用することが可能である。例えば、重要文
に基づいて動画の画像データを重要画像候補として抽出
する変形例に、一まとまりの画像データに対して１つの
重要文のみを決定し重要画像を決定する変形例を組み合
わせて、一まとまりの画像データに対して１つの重要文
のみを決定し、この重要文に基づいて動画の画像データ
を重要画像として決定することができる。The above modifications can be applied by selecting a plurality of them as appropriate and combining them. For example, a modification in which image data of a moving image is extracted as an important image candidate based on an important sentence is combined with a modification in which only one important sentence is determined for a set of image data to determine an important image. Only one important sentence is determined for a group of image data, and the image data of a moving image can be determined as an important image based on this important sentence.

【００４９】[0049]

【発明の効果】以上説明したように、本発明によれば、
動画の画像データに、内容を適切に表したサムネイルを
付すことが可能である。As described above, according to the present invention,
It is possible to attach a thumbnail appropriately representing the content to the image data of the moving image.

[Brief description of the drawings]

【図１】本発明の編集処理装置の一実施形態であり、本
発明の編集処理プログラムが記憶された記憶媒体の一実
施形態の該プログラムが読み取られた、コンピュータの
構成を表したブロック図である。FIG. 1 is a block diagram showing a configuration of a computer, which is an embodiment of an editing processing apparatus of the present invention and which reads an editing processing program of the present invention from an embodiment of a storage medium storing the program. is there.

【図２】同上、編集処理装置（コンピュータ）の具体的
なシステム構成図である。FIG. 2 is a specific system configuration diagram of the editing processing device (computer).

【図３】同上、編集処理装置をネットワークにより構成
した場合のシステム構成図である。FIG. 3 is a system configuration diagram when the editing processing device is configured by a network;

【図４】同上、編集処理装置（コンピュータ）による編
集処理におけるサムネイル処理の流れを表すフローチャ
ートである。FIG. 4 is a flowchart showing a flow of a thumbnail process in an editing process performed by the editing processing device (computer).

【図５】図４のサムネイル処理の各工程における処理を
概念的に表した説明図である。FIG. 5 is an explanatory view conceptually showing processing in each step of the thumbnail processing in FIG. 4;

【図６】図４のサムネイル処理の図５に続く各工程にお
ける処理を概念的に表した説明図である。FIG. 6 is an explanatory view conceptually showing processing in each step subsequent to FIG. 5 in the thumbnail processing of FIG. 4;

【図７】本発明及び従来技術において、複数のシーンを
含む動画におけるシーンの遷移の形態を表した説明図で
ある。FIG. 7 is an explanatory diagram showing a form of scene transition in a moving image including a plurality of scenes in the present invention and the related art.

[Explanation of symbols]

１入力手段２画像データ取得手段２１画像データ分割部３文書データ取得手段３１音声認識部４区分手段５重要度取得手段６重要区分取得手段７対応画像データ取得部８重要画像データ取得手段８１候補画像データ取得部８２候補画像出力部８３選択取得部９対応手段１０出力手段 1 Input means 2 Image data acquisition means 21 Image Data Division 3 Document data acquisition means 31 Voice Recognition Unit 4 Classification means 5 Importance acquisition means 6. Means of acquiring important categories 7 Compatible image data acquisition unit 8 Important image data acquisition means 81 Candidate image data acquisition unit 82 candidate image output unit 83 Selection acquisition unit 9 Corresponding means 10 Output means

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＨ０４Ｎ 5/781 Ｈ０４Ｎ 5/781 ５１０Ｆ 5/85 5/91 Ｒ (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04N 5/91 - 5/956 G11B 27/02 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ identification symbol FI H04N 5/781 H04N 5/781 510F 5/85 5/91 R (58) Field surveyed (Int.Cl. ⁷ , DB name) H04N 5/91-5/956 G11B 27/02

Claims

(57) [Claims]

An image data acquiring unit that acquires a group of moving image data; a document data acquiring unit that acquires document data obtained by converting audio data of a sound associated with the moving image; The document data obtained by the means,
A classifying unit for classifying the document data in a predetermined unit, and a form of the document data acquired by the document data acquiring unit
Importance calculating means for performing a morphological analysis and calculating the importance of each of the categories from the important words included in the category, and converting the document data of the category having a high importance calculated by the importance calculating means into important categories and important segment acquiring means for acquiring a, from the corresponding image data which is correlated to the audio data corresponding to the document data of the key segment, and importance image data acquisition means for acquiring predetermined image data, as a key image data, An editing processing apparatus comprising: a correspondence unit that associates the important image data acquired by the important image data acquisition unit with a group of the image data acquired by the image data acquisition unit.

2. The important image data acquiring means according to claim 1, wherein the important image data acquiring means acquires one still image data from the corresponding image data as the important image data. Edit processing device.

3. The important section acquiring means acquires a plurality of important sections, and the important image data acquiring means acquires corresponding image data corresponding to each of the plurality of important sections acquired by the important section acquiring means. Candidate image data obtaining means for obtaining candidate image data, candidate image output means for outputting an image of the candidate image data obtained by the candidate image data obtaining means,
From the candidate images output by the candidate image output means.
3. A selection obtaining means for obtaining a selected one , wherein candidate image data of a selected candidate image obtained by the selection obtaining means is used as the important image data. 3. The editing processing device according to 1.

4. An image data dividing unit that divides image data of a moving image including a plurality of scenes for each scene, wherein the image data obtaining unit is configured to divide the image data of each scene according to the division by the image data dividing unit. 4. The method according to claim 1, wherein data is acquired as image data of the group of moving images.
The editing processing device according to claim 1.

5. An image data obtaining function for obtaining a group of moving image data, a document data obtaining function for obtaining document data obtained by converting audio data of sound associated with the moving image, and the document data obtaining. The document data obtained by the function,
A classification function for classifying in a predetermined unit, and a form of the document data acquired by the document data acquisition function
An importance calculation function for performing a morphological analysis and calculating the importance of each section from the important words included in the section, and document data of the high importance section calculated by the importance calculation function , and importance classification obtaining function for obtaining a, from the corresponding image data which is correlated to the audio data corresponding to the document data of the important segment, the predetermined image data, and important image data acquisition function of acquiring the key image data, A computer-readable editing processing program for causing a computer to implement a function of associating the important image data acquired by the important image data acquisition function with the set of image data acquired by the image data acquisition function is stored. Storage media.