JPH08235198A

JPH08235198A - Multimedia information management system

Info

Publication number: JPH08235198A
Application number: JP7034986A
Authority: JP
Inventors: Eiji Ohira; 栄二大平; Koichi Kimura; 宏一木村; Hiromichi Fujisawa; 浩道藤澤
Original assignee: GIJUTSU KENKYU KUMIAI SHINJOHO; GIJUTSU KENKYU KUMIAI SHINJOHO SHIYORI KAIHATSU KIKO; Hitachi Ltd
Current assignee: GIJUTSU KENKYU KUMIAI SHINJOHO; GIJUTSU KENKYU KUMIAI SHINJOHO SHIYORI KAIHATSU KIKO; Hitachi Ltd
Priority date: 1995-02-23
Filing date: 1995-02-23
Publication date: 1996-09-13

Abstract

PURPOSE: To record and retrieve multimedia information with high efficiency by enabling information management wherein a human's way of memorizing and a human's way of recalling are simulated. CONSTITUTION: This system is provided with a dividing means (episode extraction processing part 8) which groups multimedia information on the basis of the community of physical features of the multimedia information and a summarizing means (storage structure learning process 9) which groups the respective groups divided by the dividing means to high order on the basis of the relativity that the community of the respective groups has and further groups them repeatedly to higher order to put the groups in hierarchical structure, and makes a meaning network connecting languages on the basis of the meaning of language information correspond to the community information and relativity information of the respective groups in the hierarchical structure; and the community information and relativity information of the respective groups, and the language information are used as retrieval conditions of the multimedia information.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、動画像や音声等からな
るマルチメディア情報の記録および検索技術に係り、特
に、人間の記憶の仕方および人間の記憶の辿り方を疑似
し、マルチメディア情報の記録および検索を効率良く行
なうのに好適なマルチメディア情報管理システムに関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for recording and retrieving multimedia information composed of moving images, sounds, etc., and in particular, simulating a human memory method and a human memory tracing method, The present invention relates to a multimedia information management system suitable for efficiently recording and searching for.

【０００２】[0002]

【従来の技術】人間の活動の多くは、本人や他人の過去
の経験や知見に基づいて進められる。このような過去の
経験や知見を伝達する優れたメディアとして文書がある
が、近年、ビデオカメラやＶＴＲ（ビデオテープレコー
ダ）、レ−ザディスク装置等における技術の進歩に伴
い、音声や動画像を含むマルチメディア情報の記録、伝
達が可能になってきた。さらに、電子計算機（コンピュ
ータ）が高速化し、大きな記憶容量を持つことが可能に
なった。これにより、蓄積された映像をＶＴＲのように
単に逐次に再生するのみでなく、所望の映像をランダム
にアクセスして利用可能な環境が整ってきた。そして、
蓄積される情報が膨大になるに伴い、所望の映像を迅速
に検索できる機能が重要となる。2. Description of the Related Art Most human activities are carried out based on the past experience and knowledge of the person or others. Documents are an excellent medium for transmitting such past experience and knowledge, but in recent years, audio and moving images have been recorded with the progress of technology in video cameras, VTRs (video tape recorders), and laser disk devices. It has become possible to record and transmit multimedia information including multimedia information. Furthermore, the speed of the electronic computer (computer) has increased, and it has become possible to have a large storage capacity. As a result, an environment has been established in which not only the accumulated video is simply sequentially reproduced like a VTR, but also a desired video can be randomly accessed and used. And
As the amount of stored information becomes huge, it is important to have a function of quickly searching for a desired image.

【０００３】従来、蓄積された文書や映像に、標題や日
時、キ−ワ−ドを付与し、検索を容易にする技術が多く
採り入れられている。しかし、後で思い出し易いキ−ワ
−ドを付けることは、本人であっても難しい。このよう
な問題を解決するための従来技術としては、例えば、特
公平５−８４５３８号公報に示された文書の情報記憶検
索システムがある。この技術では、あいまいな記憶から
の検索を可能とするため、知識ベ−スとして、概念間の
関係を示すネットワ−クを構築し、この知識ベ−スに基
づいた推論により文書の検索を実現する。例えば、「埼
玉県にある○○会社が製造したワ−クステ−ションの記
事」の情報により記録された文書を、あいまいな検索
文：「関東の電機会社が作った計算機の記事」で検索可
能となる。[0003] Conventionally, many techniques have been adopted to add a title, a date and time, and a keyword to accumulated documents and videos to facilitate retrieval. However, it is difficult for the person himself to add a key word that is easy to remember later. As a conventional technique for solving such a problem, there is, for example, a document information storage / retrieval system disclosed in Japanese Patent Publication No. 5-84538. In this technology, since it is possible to retrieve from ambiguous memory, a network showing the relation between concepts is constructed as a knowledge base, and document retrieval is realized by inference based on this knowledge base. To do. For example, you can search for a document recorded based on the information in "Workstation articles manufactured by a company in Saitama Prefecture" using the ambiguous search statement: "Computer articles made by electronics companies in Kanto." Becomes

【０００４】しかし、人間が過去に目にした文書やその
格納場所を思い出そうとするとき、上述のような文書の
内容よりも、その文書を目にしたときの状況からの連想
で思い出そうとすることが多い。すなわち、いつ、どこ
で、どんな状況で見ていたかを思い出すことにより、文
書の内容や格納場所をより詳細に思い出す。上述の従来
技術では、このようなエピソ−ド的な検索ができない。
また、上述の従来技術では、文書に付けられる検索キ−
が、単語から構造を持つ文になったことにより、検索の
柔軟性が向上するが、検索キ−の文は登録者の主観によ
って要約された結果である。このため、検索者が登録者
と異なる場合等においては、登録時の観点と検索時の観
点とが異なり、不適正な検索キーが入力される可能性が
あり、このような場合には検索ができない。However, when a person wants to remember a document that he or she saw in the past and the storage location of the document, he or she wants to remember it by the association from the situation when the document is seen rather than the contents of the document as described above. I often do it. That is, by remembering when, where, and under what circumstances, the contents and storage location of the document can be recalled in more detail. The above-mentioned conventional technique cannot perform such an episodic search.
Further, in the above-mentioned conventional technique, a search key attached to a document is used.
However, since the sentence has a structure from words, the search flexibility is improved, but the sentence of the search key is a result summarized by the subjectivity of the registrant. Therefore, when the searcher is different from the registrant, the viewpoint at the time of registration is different from the viewpoint at the time of search, and an incorrect search key may be entered. Can not.

【０００５】エピソード的な検索を可能とするものとし
て、特開平５−２８２３７９号公報に示された動画像の
管理装置がある。この装置では、動画像を物理的変化や
意味内容に則して分割し、それを時間方向に順次階層的
にまとめあげ、その階層関係を木構造で表現する。木構
造の各ノ−ドには、代表となる静止画と属性を検索キ−
として付与する。この木構造と検索キ−により、ランダ
ムな画像の検索を行なう。この技術では、時間的なエピ
ソ−ドの記憶・検索が可能である。しかし、活動の時間
を超えた要約や、活動における空間的なエピソ−ドの表
現が考慮されていない。また、各ノ−ドの内容に基づく
検索については、固定的である。そのために、人間が記
憶しやすい（覚えやすい）情報や、思い出しやすい情報
に基づいた検索を行なうには十分ではない。As a device that enables an episodic search, there is a moving image management device disclosed in Japanese Patent Laid-Open No. 5-282379. In this device, moving images are divided according to physical changes and meanings, and they are sequentially hierarchically grouped in the time direction, and the hierarchical relationship is represented by a tree structure. For each node of the tree structure, a representative still image and attribute search key
As. A random image search is performed using this tree structure and search key. This technique allows temporal episode memory / retrieval. However, it does not take into account time-dependent summaries of activities or expressions of spatial episodes in activities. Further, the search based on the content of each node is fixed. Therefore, it is not sufficient to perform a search based on information that is easy for a human to remember (easy to remember) or information that is easy to remember.

【０００６】また、特開平３−５２０７０号公報には、
画面上の一つの事物をマウス等によりポイントすること
で、このポイントされた事物に予め対応付けられている
関連情報を呼び出す技術が、さらに、特願平６−２６０
０１３号には、この技術を改善して、予め対応付けられ
ていない情報に関しても、ユーザが記憶を連想的に辿り
ながら所望のシーンを見つけだすことができる技術が記
載されている。特願平６−２６００１３号に記載の技術
では、例えば、検索対象である被写体Ｂがインデクスと
して登録されていない場合にも、ユーザは、被写体Ｂが
被写体Ａと同時に現われるという連想を基に、インデク
スに登録されている被写体Ａを通して、連想的に、被写
体Ｂが出ている特定のシーンまで辿ることができる。さ
らに、この技術では、単に複数の被写体間の連想に限ら
ず、シーン自体や、言葉、ＢＧＭ、字幕といった、映像
中のマルチメディア情報に基づいた連想を用いて検索を
行なうことができる。Further, Japanese Patent Laid-Open No. 52070/1993 discloses that
By pointing one thing on the screen with a mouse or the like, a technique of calling up related information previously associated with the pointed thing is further disclosed in Japanese Patent Application No. 6-260.
No. 013 describes a technique by which this technique is improved so that a user can find a desired scene while associatively tracing memory even for information that is not associated in advance. In the technique described in Japanese Patent Application No. 6-260013, for example, even when the subject B to be searched is not registered as an index, the user is based on the association that the subject B appears at the same time as the subject A. Through the subject A registered in, the specific scene in which the subject B appears can be associatively traced. Furthermore, this technique is not limited to merely associations between a plurality of subjects, and retrieval can be performed using associations based on multimedia information in a video such as a scene itself, words, BGM, and subtitles.

【０００７】しかし、特願平６−２６００１３号に記載
の技術は、ユーザが検索対象物に関して持っている知識
を利用して連想を行なうものであり、連想に用いるイン
デクスを、ユーザ自身が探しださなければならない。も
し、ユーザの知識が間違っている場合には、連想は正し
く行なわれず、所望の検索を得ることができない。この
特願平６−２６００１３号に記載の技術においては、こ
のようなユーザの連想を高信頼化させる支援技術に関し
ての考慮がなされていない。すなわち、上述のように、
人間が記憶しやすい（覚えやすい）情報や、思い出しや
すい情報に基づいた検索を行なうことに関しての考慮が
なされていない。However, the technique described in Japanese Patent Application No. Hei 6-260013 uses the knowledge that the user has about the object to be searched for association, and the user himself / herself searches for the index used for the association. I have to do it. If the user's knowledge is incorrect, the association cannot be performed correctly and the desired search cannot be obtained. In the technique described in Japanese Patent Application No. 6-260013, no consideration is given to a support technique for making such a user's association highly reliable. That is, as mentioned above,
No consideration is given to information that is easy for humans to remember (easy to remember) or to perform a search based on information that is easy to remember.

【０００８】人間の記憶には、「ペンギンは鳥の一種で
ある」ということや、「リンゴは重力により地面に落ち
る」というような学校で習うような知識の記憶と、「昨
日、動物園に行った」ことや、「太郎がはしごから落ち
た」といった個人的な経験に関する記憶とがある。心理
学者のＴｕｌｖｉｎｇは、前者を「意味記憶」、後者を
「エピソ−ド記憶」と名付けた。記憶法の研究におい
て、記憶に残る、忘れにくい意味記憶とエピソ−ド記憶
は、それぞれ互いに関連つけられたものであることが知
られている。[0008] Human memories include "penguins are a type of bird" and knowledge such as "apples fall to the ground due to gravity" that are learned at school, and "I went to the zoo yesterday. There is a memory of personal experiences such as “Tata” and “Taro fell off the ladder”. Psychologist Tulving named the former "meaning memory" and the latter "episode memory." In the study of mnemonics, it is known that memorable and unforgettable semantic memory and episodic memory are related to each other.

【０００９】このように、人間の記憶においては、既知
の単語や事実、概念などの多くの情報と関係付けられた
単語や事実ほど、忘れにくく記憶に残りやすい。機械に
このような記憶を実現することにより、利用者が覚えて
いる情報、思い出しやすい情報に基づいた迅速な検索が
可能になる。このためには、映像の意味的な内容に加え
て、イメ−ジに近い、時間や、場所などの空間的な情報
を含むエピソ−ド的な情報をも統合的に管理可能である
必要がある。また、人手による画像情報等のラベル付け
は、主観的になりやすい。このため、実映像に基づいた
ラベル付け、構造化が必要である。しかし、上述したよ
うに、従来の技術においては、このような人間が記憶し
やすい（覚えやすい）情報や、思い出しやすい情報に基
づいた検索を行なうことに関しての考慮がなされていな
い。As described above, in human memory, words and facts associated with a large amount of information such as known words, facts, and concepts are less likely to be forgotten and more likely to remain in the memory. By realizing such a memory in the machine, it becomes possible to perform a quick search based on the information that the user remembers or the information that is easy to remember. For this purpose, in addition to the semantic content of the video, it is necessary to be able to integrally manage episodic information including spatial information such as time and place, which is close to the image. is there. In addition, manual labeling of image information and the like tends to be subjective. For this reason, labeling and structuring based on real images are necessary. However, as described above, in the related art, no consideration is given to such information that a person can easily memorize (remember) or a search based on information that is easy to remember.

【００１０】[0010]

【発明が解決しようとする課題】解決しようとする問題
点は、従来の技術では、人間が記憶しやすい（覚えやす
い）情報や、思い出しやすい情報に基づくマルチメディ
ア情報の記録および検索を行なうことができない点であ
る。本発明の目的は、これら従来技術の課題を解決し、
マルチメディア情報の記録および検索を高効率化するこ
とが可能なマルチメディア情報管理システムを提供する
ことである。The problem to be solved by the prior art is that in the prior art, it is possible to record and retrieve information that is easy for humans to remember (easy to remember) and multimedia information based on information that is easy to remember. This is a point that cannot be done. The object of the present invention is to solve these problems of the prior art,
An object of the present invention is to provide a multimedia information management system capable of highly efficient recording and retrieval of multimedia information.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、本発明のマルチメディア情報管理システムは、
（１）マルチメディア情報を、このマルチメディア情報
の物理的特徴の共通性に基づきグループ分けする分割手
段（エピソード抽出処理部８）と、この分割手段で分け
た各グループを、各グループの共通性が有する関連性に
基づき、上位のグループに要約し、かつ、この上位グル
ープのさらなる上位グループへの要約を繰り返してグル
ープを階層構造に生成すると共に、言語情報の意味に基
づき言語間を連結する意味ネットワークを、階層構造の
各グループの共通性情報および関連性情報に対応付ける
要約手段（記憶構造学習処理９）とを設け、階層構造の
各グループの共通性情報と関連性情報、および言語情報
を、マルチメディア情報の検索条件として用いることを
特徴とする。In order to achieve the above object, the multimedia information management system of the present invention comprises:
(1) Dividing means (episode extraction processing unit 8) that divides multimedia information into groups based on the commonality of physical characteristics of the multimedia information, and each group divided by this dividing means, the commonality of each group Meaning of summarizing to a higher-level group based on the relevance of each, and repeating the summarization of this higher-level group to further higher-level groups to generate groups in a hierarchical structure, and connecting languages based on the meaning of linguistic information Summarizing means (memory structure learning processing 9) for associating the network with the commonality information and the relevance information of each group of the hierarchical structure is provided, and the commonality information, the relevance information, and the language information of each group of the hierarchical structure are provided. It is characterized by being used as a search condition for multimedia information.

【００１２】また、（２）上記（１）に記載のマルチメ
ディア情報管理システムにおいて、階層構造の各グルー
プの共通性情報と関連性情報および言語情報を表示する
手段（中央処理装置２０、インターフェイス部６、ディ
スプレイ３）を設けることを特徴とする。また、（３）
上記（１）、もしくは、（２）のいずれかに記載のマル
チメディア情報管理システムにおいて、分割手段（エピ
ソード抽出処理部８）は、マルチメディア情報の動画像
内の事物の共通性に基づき、マルチメディア情報のグル
ープ分けを行なうことを特徴とする。(2) In the multimedia information management system described in (1) above, means for displaying commonality information, relevance information and linguistic information of each group in the hierarchical structure (central processing unit 20, interface unit) 6, a display 3) is provided. Also, (3)
In the multimedia information management system as described in either (1) or (2) above, the dividing means (episode extraction processing unit 8) uses the multimedia information based on the commonality of objects in the moving image of the multimedia information. The feature is that the media information is divided into groups.

【００１３】また、（４）上記（３）に記載のマルチメ
ディア情報管理システムにおいて、動画像を撮影したカ
メラの動きに基づき、分割手段（エピソード抽出処理部
８）で用いる共通性情報を定めることを特徴とする。ま
た、（５）上記（３）、もしくは、上記（４）のいずれ
かに記載のマルチメディア情報管理システムにおいて、
動画像を撮影したカメラの動きに基づき、各グループ間
の位置関係を求め、この各グループ間の位置関係を、要
約手段（記憶構造学習処理９）で用いる関連性情報とす
ることを特徴とする。(4) In the multimedia information management system described in (3) above, the commonality information used by the dividing means (episode extraction processing section 8) is determined based on the movement of the camera capturing the moving image. Is characterized by. (5) In the multimedia information management system according to any one of (3) or (4) above,
The positional relationship between the groups is obtained based on the movement of the camera that captured the moving image, and the positional relationship between the groups is used as relevance information used by the summarizing means (memory structure learning processing 9). .

【００１４】また、（６）上記（１）から（５）のいず
れかに記載のマルチメディア情報管理システムにおい
て、マルチメディア情報内の複数の事物間の空間的位置
関係を、グループの共通性情報として用いることを特徴
とする。また、（７）上記（１）から（６）のいずれか
に記載のマルチメディア情報管理システムにおいて、共
通性情報および関連性情報を、パターンにより形成し、
パターン認識により共通性および関連性の判別を行なう
ことを特徴とする。また、（８）上記（１）から（７）
のいずれかに記載のマルチメディア情報管理システムに
おいて、共通性情報および関連性情報を、記号を含む言
語により形成し、意味解析により共通性および関連性の
判別を行なうことを特徴とする。(6) In the multimedia information management system according to any one of (1) to (5), the spatial positional relationship between a plurality of things in the multimedia information is determined by group commonality information. It is characterized by using as. (7) In the multimedia information management system according to any one of (1) to (6), commonality information and relevance information are formed by a pattern,
The feature is that the commonality and the relevance are discriminated by pattern recognition. Also, (8) above (1) to (7)
In the multimedia information management system described in any one of (1) to (5), the commonality information and the relevance information are formed by a language including symbols, and the commonality and the relevance are determined by semantic analysis.

【００１５】また、（９）上記（１）から（８）のいず
れかに記載のマルチメディア情報管理システムにおい
て、グループ間を、時間方向の前後関係で関連付け、こ
の関連付け情報をマルチメディア情報の検索条件として
用いることを特徴とする。また、（１０）上記（１）か
ら（９）のいずれかに記載のマルチメディア情報管理シ
ステムにおいて、グループ間を、空間的な位置関係で関
連付け、この関連付け情報をマルチメディア情報の検索
条件として用いることを特徴とする。また、（１１）上
記（１）から（１０）のいずれかに記載のマルチメディ
ア情報管理システムにおいて、操作者からの指示入力に
基づき、階層構造のグループ分けの訂正を行なうことを
特徴とする。(9) In the multimedia information management system according to any one of (1) to (8), the groups are associated with each other in a time-direction context, and the association information is searched for the multimedia information. It is characterized in that it is used as a condition. (10) In the multimedia information management system according to any one of (1) to (9), the groups are associated with each other in a spatial positional relationship, and the association information is used as a search condition for multimedia information. It is characterized by (11) In the multimedia information management system according to any one of (1) to (10), the hierarchical grouping is corrected based on an instruction input from an operator.

【００１６】[0016]

【作用】本発明においては、人間の記憶と同様に、意味
に関する情報と、個人の経験である画像や音声等のマル
チメディア情報の要約結果とを対応付けて、マルチメデ
ィア情報の検索に用いるので、人間が思い出しやすい情
報を検索条件として提供できる。人間のエピソ−ド記憶
における経験の記憶は、時間や空間的なイメ−ジに近い
情報を持つが、それは、写真やレコ−ドのような厳密な
イメ−ジではない。また、例えば、撮影された動画像デ
−タは、視点の違いにより、意味的に一まとまりと認知
される区間が異なる。In the present invention, as in the case of human memory, the information about the meaning is associated with the summary result of the multimedia information such as images and sounds which is the experience of the individual and used for the retrieval of the multimedia information. , It is possible to provide information that humans can easily remember as a search condition. The memory of experience in human episodic memory has information similar to temporal and spatial images, but it is not a strict image like photographs and records. In addition, for example, the captured moving image data has different sections that are semantically recognized as a unit depending on the viewpoint.

【００１７】本発明では、例えば画像デ−タからの特徴
の抽出結果に基づき、撮影された動画像デ−タを分割
（グループ分け）する。これにより、主観に依存せず、
客観的に同じ意味のまとまりの区間を検出することがで
きる。さらに分割した動画像デ−タを１つの単位（ノ−
ド）として、各ノ−ド間における、動画像デ−タの物理
的特徴や意味的な共通性（関連性）に基づいて、階層的
に要約する（まとめる）。これにより、人や物のような
具象物のみでなく、活動のような抽象的な概念（例え
ば、会議や移動区間）によりラベル付けされたまとめ上
げが同じ枠組みの下で実現でき、多視点に応じた要約が
可能となる。さらに、各ノ−ドには、付加情報として、
画像に写っている事物と各事物間の空間上の位置関係を
持たせることにより、空間的情報をも含むイメ−ジに近
い情報を、ノ−ドに持たせることができる。また、ノ−
ドを特徴付ける記号や言葉を持たせることにより、意味
記憶との対応をとることができる。さらに、ノ−ドを特
徴付けるパターンと認識の手続きをそれぞれ記憶するこ
とにより、ノ−ドをパターンレベルにおいても同定可能
となる。In the present invention, for example, the photographed moving image data is divided (grouped) based on the feature extraction result from the image data. This makes it independent of subjectivity,
It is possible to objectively detect a group of sections having the same meaning. Moving image data further divided into one unit (no
As a function), hierarchical summarization (combining) is performed based on the physical characteristics of moving image data and the semantic commonality (relevance) between the nodes. As a result, not only concrete objects such as people and things but also abstract concepts such as activities (for example, meetings and moving sections) can be grouped together under the same framework, and multiple viewpoints can be realized. A tailored summary is possible. Furthermore, in each node, as additional information,
By providing the objects in the image and the positional relationship in space between the objects, the node can be provided with information close to an image including spatial information. In addition,
Correspondence with semantic memory can be achieved by adding symbols or words that characterize the words. Further, by storing the pattern characterizing the node and the recognition procedure, respectively, the node can be identified even at the pattern level.

【００１８】[0018]

【実施例】以下、本発明の実施例を、図面により詳細に
説明する。図１は、本発明のマルチメディア情報管理シ
ステムの本発明に係る構成の一実施例を示すブロック図
であり、図２は、図１におけるマルチメディア情報管理
システムの動作に用いる各装置構成例を示すブロック図
である。図２における各装置は、オフィスなどにおける
日常活動を撮影した映像の記録、検索を行なうために設
けたものであり、中央処理装置（図中、ＣＰＵと記載）
２０、磁気ディスク装置（図中、ＨＤＤと記載）１０、
メモリ装置（図中、メモリと記載）３０、カメラ１やマ
イク２からの動画像や音声をＡＤ変換してディジタル化
した後にバス５０に送信する制御部（図中、ＣＯＮＴと
記載）４０、ディスプレイ（図中、ＣＲＴと記載）３や
キーボード（図中、ＫＥＹと記載）４、マウス５の情報
の入出力の制御を行うインタフェース部（図中、ＩＦと
記載）６により構成される。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the configuration of the multimedia information management system of the present invention according to the present invention, and FIG. 2 is an example of the configuration of each device used in the operation of the multimedia information management system in FIG. It is a block diagram shown. Each device in FIG. 2 is provided for recording and searching video images of daily activities in an office, and is a central processing unit (indicated as CPU in the drawing).
20, a magnetic disk device (indicated as HDD in the figure) 10,
A memory device (indicated as memory in the figure) 30, a control unit (indicated as CONT in the figure) 40 that AD-converts moving images and sounds from the camera 1 and the microphone 2 and digitizes them, and then displays. (Indicated as CRT in the figure) 3, keyboard (indicated as KEY in the figure) 4, and interface unit 6 (indicated as IF in the figure) 6 for controlling input / output of information from the mouse 5.

【００１９】磁気ディスク装置１０は、意味的情報（言
語情報）を記憶する意味記憶装置（図中、言語と記載）
１０１、エピソード的情報を記憶するエピソード記憶装
置（図中、ＥＰＳと記載）１０２、カメラ１やマイク２
からの動画像や音声を記憶するマルチメディア情報記憶
装置（図中、ＤＡＴＡと記載）１０３から構成される。
ここで、マルチメディア情報記憶装置１０３は、磁気デ
ィスク装置でなく、レーザディスク装置のような外部記
憶装置でも良い。この場合、カメラ１やマイク２からの
動画像や音声は、制御部４０を経由しないで、直接マル
チメディア情報記憶装置１０３に登録する構成も可能で
ある。The magnetic disk device 10 is a meaning storage device (described as a language in the drawing) for storing semantic information (language information).
101, an episode storage device (described as EPS in the drawing) 102 for storing episodic information, a camera 1 and a microphone 2
It is composed of a multimedia information storage device (indicated as DATA in the figure) 103 for storing moving images and sounds from the.
Here, the multimedia information storage device 103 may be an external storage device such as a laser disk device instead of the magnetic disk device. In this case, the moving image or sound from the camera 1 or the microphone 2 may be directly registered in the multimedia information storage device 103 without passing through the control unit 40.

【００２０】図１において、７はカメラ１やマイク２か
ら送られてきた動画像や音声等をマルチメディア情報記
憶装置１０３に格納するメディア情報管理部、８はカメ
ラ１から送られてきた動画像分割する本発明に係る分割
手段としてのエピソード抽出処理部、９はエピソード記
憶装置１０２に記録されたカット間の共通性に基づいて
要約を行う本発明に係る要約手段としての記憶構造学習
処理部、１１はキーボード４から入力された言語の意味
解析を行なう言語解析部、１２は言語に基づく情報検索
を行なう検索部、１３はブラウザである。In FIG. 1, reference numeral 7 is a media information management unit for storing in the multimedia information storage device 103 moving images and sounds sent from the camera 1 and the microphone 2, and 8 is a moving image sent from the camera 1. An episode extraction processing unit as dividing means according to the present invention for dividing, a reference numeral 9 denotes a storage structure learning processing unit as summarizing means according to the present invention for summarizing based on the commonality between cuts recorded in the episode storage device 102, Reference numeral 11 is a language analysis unit that performs semantic analysis of the language input from the keyboard 4, 12 is a search unit that performs information search based on the language, and 13 is a browser.

【００２１】このような構成により、本実施例のマルチ
メディア情報管理システムでは、入力されたマルチメデ
ィア情報を分割し、かつ、要約した情報（共通性情報、
関連性情報）を、意味に関する情報（言語情報）に対応
付けて管理する。このことにより、機械における情報の
獲得および記憶形態を、人間の記憶の生成や想起の特性
に合致したものとすることができる。このような本発明
に係る特徴的な動作の概要を以下に説明する。With such a configuration, in the multimedia information management system of this embodiment, the input multimedia information is divided and summarized (commonality information,
Relevance information) is managed in association with information (language information) about meaning. As a result, the form of information acquisition and storage in the machine can be matched to the characteristics of human memory generation and recall. The outline of the characteristic operation according to the present invention will be described below.

【００２２】まず、画像データからの特徴の抽出結果
（共通性情報）に基づき、撮影された動画像データを分
割すると共に、分割された動画像データを１つの単位
（ノード）として、各ノード間における、動画像データ
の物理的特徴や意味的な共通性（関連性情報）に基づい
て、階層的に要約する。この要約した結果も上位のノー
ドとして要約処理を行なう。そして、対応するマルチメ
ディア情報の記憶場所のポインタや、画像に写っている
事物と各事物間の空間上の位置関係、ノードを特徴付け
る記号や言葉、ノードを特徴付けるパターンと認識の手
続き等を、付加情報として、各ノードに付加して、要約
結果を記憶する。さらに、各ノード間の関係として、時
間方向の前後関係や空間における位置関係、ノード間の
階層関係、および、ノードの部分／全体の関係である集
約関係を記憶する。First, based on the extraction result (commonality information) of the features from the image data, the captured moving image data is divided, and the divided moving image data is used as one unit (node). Based on the physical features and semantic commonality (relevance information) of moving image data in, a hierarchical summary is given. The summary result is also summarized as an upper node. Then, add a pointer to the storage location of the corresponding multimedia information, the spatial relationship between the objects in the image and the space between the objects, symbols and words that characterize the nodes, patterns that characterize the nodes, and recognition procedures. Information is added to each node and the summary result is stored. Further, as a relationship between the nodes, a front-and-rear relationship in the time direction, a positional relationship in space, a hierarchical relationship between the nodes, and an aggregate relationship that is a part / whole relationship of the nodes are stored.

【００２３】情報の検索時においては、記憶した要約結
果（共通性情報、関連性情報）と、記憶した意味に関す
る情報（言語情報）および両者の対応付けの一部あるい
は全部を表示する。そして、操作者からのノードの表示
の指示により、対応付けられた画像や音声の情報を再生
する。さらに、操作者の言語（言語情報）による検索の
問い合わせに基づいて、所望の記憶された画像や音声の
情報を検索し、表示する。このようにして、操作者は、
対話的に表示内容を操作し、所望の表示内容を容易に検
索することができる。また、情報のメンテナンスにおい
ては、記憶した要約結果と、意味に関する情報を表示
し、操作者からの処理結果の内容の正否の教示入力に基
づき、要約処理や要約結果、意味に関する情報を変更す
る。このように、本実施例では、マルチメディア情報の
記録および検索を、人間の記憶の仕方や、人間の記憶の
辿り方（思い出し方）を疑似して行なうことができる。At the time of information retrieval, the stored summary result (commonality information, relevance information), the stored information about meaning (language information), and part or all of the correspondence between the two are displayed. Then, in response to an instruction from the operator to display the node, the associated image and audio information is reproduced. Furthermore, based on a search inquiry in the operator's language (language information), desired stored image and audio information is searched and displayed. In this way, the operator
It is possible to interactively operate the display content and easily search for the desired display content. Further, in the maintenance of information, the stored summary result and the information regarding the meaning are displayed, and the summary processing, the summary result, and the information regarding the meaning are changed based on the instruction input by the operator as to whether the content of the processing result is correct or not. As described above, in this embodiment, the recording and retrieval of the multimedia information can be performed by simulating the way of human memory and the way (remember) of human memory.

【００２４】以下、各処理部単位に、本発明に係る動作
説明を行なう。各処理部のプログラムやデータおよび、
処理の途中結果は、図２のメモリ３０に格納され、図１
の中央処理装置２０で実行される。本実施例において
は、カメラ１やマイク２を作業者の頭に取り付けている
ものとする。このカメラ１やマイク２から収録されたオ
フィスなどの活動の動画像や音声は、メディア情報管理
部７とエピソード抽出処理部８に送られる。メディア情
報管理部７は、送られてきた動画像や音声を、そのまま
マルチメディア情報記憶装置１０３に格納する。The operation of the present invention will be described below for each processing unit. Program and data of each processing unit,
The intermediate results of the processing are stored in the memory 30 of FIG.
The central processing unit 20 of FIG. In this embodiment, it is assumed that the camera 1 and the microphone 2 are attached to the head of the worker. The moving images and voices of activities such as offices recorded from the camera 1 and the microphone 2 are sent to the media information management unit 7 and the episode extraction processing unit 8. The media information management unit 7 stores the transmitted moving image or sound in the multimedia information storage device 103 as it is.

【００２５】エピソード抽出処理部８は、送られてきた
動画像を分割する。この分割された区間を以降カットと
呼ぶことにする。一般に動画像は、３０フレーム／秒で
入力される。このため、カットは、連続した１つ以上の
フレームである。エピソード抽出処理部８は、抽出した
カットと、マルチメディア情報記憶装置１０３に格納さ
れた対応する画像のフレームとの対応（始終点のフレー
ム番号など）を記録する。さらに、動画像や音声などの
情報から、そのカットに登場する人や物、さらには背景
や雰囲気などを認識し、エピソード記憶装置１０２に記
録する。記憶構造学習処理部９は、エピソード記憶装置
１０２に記録されたカット間の共通性に基づいて要約を
行い、図３に示すように構造化する。メディア情報管理
部７は、構造化の内容に基づき、マルチメディア情報記
憶装置１０３を整理し、必要な情報のみを残す。The episode extraction processing unit 8 divides the transmitted moving image. Hereinafter, this divided section will be referred to as a cut. Generally, moving images are input at 30 frames / sec. Thus, a cut is one or more consecutive frames. The episode extraction processing unit 8 records the correspondence (the start and end frame numbers, etc.) between the extracted cut and the frame of the corresponding image stored in the multimedia information storage device 103. Further, from the information such as the moving image and the sound, the person and the object appearing in the cut, and the background and the atmosphere are recognized and recorded in the episode storage device 102. The memory structure learning processing unit 9 summarizes based on the commonality between the cuts recorded in the episode memory device 102, and structures as shown in FIG. The media information management unit 7 organizes the multimedia information storage device 103 based on the contents of structuring and leaves only necessary information.

【００２６】図３は、本発明に係る記憶構造例を示す説
明図である。本図３は、作業者が自分の机で、デスクワ
ーク作業をした後、用事で移動した状況のエピソードの
表現例を示したものである。入力された動画像や音声
は、マルチメディア情報記憶装置１０３に格納されると
共に、エピソード抽出処理部８で、カット１０２０〜１
０２３以下に分割され、両記憶の対応が記録される。カ
ット１０２０〜１０２３は、二つの違った場所（例え
ば、ディスプレイと資料）を繰り返し交互に見ている場
面である。FIG. 3 is an explanatory diagram showing an example of a storage structure according to the present invention. FIG. 3 shows an example of expressing an episode of a situation in which a worker has moved to work after performing deskwork work at his desk. The input moving image and sound are stored in the multimedia information storage device 103, and the cuts 1020 to 1
It is divided into 023 or less, and the correspondence between both memories is recorded. Cuts 1020 to 1023 are scenes where two different places (for example, the display and the material) are repeatedly viewed alternately.

【００２７】図４は、図１におけるエピソード抽出処理
部と記憶構造学習処理部の詳細構成例を示すブロック図
である。本図４は、特に、エピソード抽出処理部８と記
憶構造学習処理部９の処理の流れを示した図であり、特
徴抽出部８１は、画像から輝度やエッジなどの特徴を抽
出する。動きの分類部８２では、例えば、輝度の差分
や、あるいは画像のオプティカルフローを求めることに
より各点の動きのベクトルを計算する。そして、この輝
度の差分値や動きのベクトルから、カメラの動きを求め
る。この動きの特徴から、例えば、カット分割処理部８
３では、動画像を動きの激しいカットと、動きの少ない
カットに分割する。図３では、動きの少ないカットのみ
を表示している。カット分割処理部８３では、動きの激
しいカットに関しては、オプティカルフローなどからそ
の動きの方向を抽出する。以上の各処理は、既存の画像
処理技術で実現可能である。FIG. 4 is a block diagram showing a detailed configuration example of the episode extraction processing unit and the storage structure learning processing unit in FIG. FIG. 4 is a diagram particularly showing the flow of processing of the episode extraction processing unit 8 and the storage structure learning processing unit 9, and the feature extraction unit 81 extracts features such as brightness and edges from the image. The motion classification unit 82 calculates the motion vector of each point by, for example, obtaining the difference in brightness or the optical flow of the image. Then, the motion of the camera is obtained from the difference value of the brightness and the motion vector. From the characteristics of this movement, for example, the cut division processing unit 8
In 3, the moving image is divided into a cut having a large movement and a cut having a small movement. In FIG. 3, only cuts with little movement are displayed. The cut division processing unit 83 extracts the direction of movement of a cut having a large movement from an optical flow or the like. Each processing described above can be realized by the existing image processing technology.

【００２８】記憶構造学習処理部９は、カットを、動画
像や音声の物理的特徴や意味的な共通性に基づいて要約
する。物理的特徴に基づく要約処理部９１では、物理的
特徴として、例えば、カット間の場所の共通性に基づい
て要約する。場所の共通性は、例えば、カットの画像同
士のパターンマッチングにより検出可能である。互いに
よくマッチングした大きな類似度を示すカット同士が同
じ場所のカットと判断できる。この処理により、図３の
カット１０２０と１０２２はノード１０２６に、カット
１０２１と１０２３はノード１０２７にまとめられる。
要約されたノードを、以降シーンと呼ぶ。このシーン１
０２６とシーン１０２７に関しては、エピソード抽出処
理部８で抽出される両カット間の動きの方向の情報に基
づき、図３に示すように、シーンの空間的位置関係（図
中、上，下と記載）や距離（図示しない）を登録する。
物理的特徴に基づく要約処理部９１は、さらに空間の移
動範囲によって、図３に示すように、机の周辺の活動の
シーン１０２８と、用事で移動したシーン１０２９に要
約する。The memory structure learning processing section 9 summarizes the cuts based on the physical characteristics and semantic commonality of moving images and sounds. The physical characteristic-based summarization processing unit 91 summarizes the physical characteristics based on, for example, the commonality of the positions between the cuts. The commonality of places can be detected, for example, by pattern matching between cut images. It can be determined that cuts that have a high degree of similarity that are well matched with each other are cuts at the same place. By this processing, the cuts 1020 and 1022 in FIG. 3 are combined into a node 1026, and the cuts 1021 and 1023 in FIG. 3 are combined into a node 1027.
The summarized node is hereafter called a scene. This scene 1
Regarding 026 and the scene 1027, as shown in FIG. 3, based on the information on the direction of movement between the cuts extracted by the episode extraction processing unit 8, as shown in FIG. ) And distance (not shown) are registered.
The physical characteristic-based summarization processing unit 91 further summarizes the activity scene 1028 around the desk and the scene 1029 moved by an errand, as shown in FIG. 3, according to the movement range of the space.

【００２９】ここで、要約の途中結果は、一時、短期作
業領域９２に格納され、例えば、図３のシーン１０２８
からシーン１０２９のような、シーンの変化が検出され
たとき、エピソード記憶装置１０２に登録する。また、
時間構造が保持さるシーンは、図３においてシーン１０
２８からシーン１０２９の間に示される「時間」のよう
に、その時間関係を登録する。図３においては、カット
１０２０〜１０２３間も時間構造を保持するが、図では
記載を省略している。シーンとマルチメディア情報記憶
装置１０３に格納された画像との対応は、カットと個別
に記録する。これにより、シーンを特徴付ける動画像を
表示することができる。Here, the intermediate result of the summary is temporarily stored in the short-term work area 92, and for example, the scene 1028 in FIG.
When a scene change such as a scene 1029 is detected, it is registered in the episode storage device 102. Also,
The scene in which the temporal structure is retained is scene 10 in FIG.
The time relationship is registered as “time” shown between 28 and scene 1029. In FIG. 3, the time structure is retained between the cuts 1020 to 1023, but the illustration is omitted in the figure. The correspondence between the scene and the image stored in the multimedia information storage device 103 is recorded separately from the cut. Thereby, a moving image characterizing the scene can be displayed.

【００３０】本実施例では、図３の構造化の結果を、図
１および図２のディスプレイ３により表示する。そし
て、利用者からの分割結果や要約結果の評価に応じて、
動きの分類学習部９３や場所分類学習部９４は、分類基
準を変更する。例えば、利用者は、分割が不足、あるい
は、し過ぎの指示を図１、２に示すキーボード４などに
より行う。動きの分類学習部９３や場所分類学習部９４
は、この指示に応じて、カット分割における動きの閾値
や場所の類似度の閾値を変更する。In this embodiment, the structuring result of FIG. 3 is displayed on the display 3 of FIGS. 1 and 2. Then, according to the evaluation of the division result and the summary result from the user,
The motion classification learning unit 93 and the place classification learning unit 94 change the classification standard. For example, the user gives an instruction of insufficient division or excessive division using the keyboard 4 shown in FIGS. Motion classification learning unit 93 and place classification learning unit 94
Changes the threshold value of the motion in cut division and the threshold value of the similarity degree of the place according to this instruction.

【００３１】図１、２の意味記憶装置１０１では、例え
ば、特公平５−８４５３８号公報記載の概念の意味ネッ
トワークを利用して、図３に示すように、意味的情報を
構築する。ここでは、言語による単語間の関係が表現さ
れる。例えば、図３において、意味的情報１０１１（会
議資料作成）には、意味的情報１０１０（デスクワー
ク）や、図示しない打合せや調査等の活動が含まれる。
カットやシーンに、言語によるラベル付けをすることに
より、意味記憶装置１０１の意味と関係付ける。例え
ば、図３において、シーン１０２８は、デスクワークの
単語でラベル付けできる。また、文や図５の意味ネット
ワーク構成例に示すような動詞を中心としたネットワー
ク表現でも可能である。これにより、言語と実世界との
関係をとることができる。ここで、例えば、登場人物に
関しては、人手によらずに、エピソード抽出処理部８で
の認識結果に基づき、自動的にラベル付け可能である。
認識においては、人物を厳密に特定する必要はなく、
「早口の男性と一緒」のようなあいまいなラベル付けも
可能である。The semantic storage device 101 shown in FIGS. 1 and 2 constructs semantic information as shown in FIG. 3 using, for example, the semantic network of the concept described in Japanese Patent Publication No. 5-84538. Here, the relationship between words according to language is expressed. For example, in FIG. 3, the semantic information 1011 (meeting material creation) includes semantic information 1010 (deskwork) and activities such as meetings and surveys not shown.
The cuts and scenes are labeled with the language so as to be related to the meaning of the meaning storage device 101. For example, in FIG. 3, scene 1028 can be labeled with the word deskwork. It is also possible to use a network expression centered on sentences and verbs as shown in the semantic network configuration example of FIG. This makes it possible to establish a relationship between the language and the real world. Here, for example, the characters can be automatically labeled based on the recognition result in the episode extraction processing unit 8 without depending on the human hands.
In recognition, it is not necessary to specify the person exactly,
Vague labeling like "with a quick man" is also possible.

【００３２】エピソード記憶装置１０２内の図３に示す
カットやシーンは、図６に示すデータ構造で実現でき
る。図６は、本発明に係る共通情報および関連情報のデ
ータ構造例を示す説明図である。本図６に示すデータ構
造例は、現在のオブジェックト指向データベースで採ら
れている技法で記述した例である。カットやシーンは、
場と名付けたクラス（以降、場クラスと呼ぶ）で表現す
る。クラスは、フレーム理論などで提案されたフレーム
と同様、複数の項目（属性）と値の対で構成される表形
式の枠組みである。クラスをコピーし、属性に具体的値
を格納したものがインスタンスと呼ばれる。The cuts and scenes shown in FIG. 3 in the episode storage device 102 can be realized by the data structure shown in FIG. FIG. 6 is an explanatory diagram showing a data structure example of common information and related information according to the present invention. The data structure example shown in FIG. 6 is an example described by the technique adopted in the current object-oriented database. Cuts and scenes
It is expressed by a class named "ba" (hereinafter referred to as "ba class"). The class is a tabular framework composed of a plurality of item (attribute) and value pairs, like the frame proposed in the frame theory. Instances are those that copy a class and store concrete values in attributes.

【００３３】オブジェックト指向データベースでは、ク
ラスをコピーし、インスタンスを作ることにより、はじ
めて図２のメモリ３０上に実体のオブジェックトが生成
される。インスタンスを生成すると、インスタンスに
は、生成されるインスタンスを識別するための識別子が
付く。例えば、場クラスを１つコピーすると、場「０」
の識別子のインスタンスが作られ、さらにコピーする
と、場「１」の識別子のインスタンスが作られる。イン
スタンスの各属性には、記号や数字あるいはインスタン
スの識別子を複数個格納可能である。特に、複数のイン
スタンスの識別子が格納できることにより、インスタン
ス間の１対多の関係を表現できる。In the object-oriented database, an actual object is first created in the memory 30 of FIG. 2 by copying a class and creating an instance. When an instance is created, the instance is given an identifier for identifying the created instance. For example, if you copy one place class, place "0"
Is instantiated, and further copying creates an instance of the identifier of field "1". Each attribute of the instance can store a plurality of symbols, numbers, or identifiers of the instance. In particular, by storing the identifiers of a plurality of instances, it is possible to express a one-to-many relationship between the instances.

【００３４】本図６において、菱形の印が付加された部
分属性や、黒丸と線分で表現された登場人物・物属性が
この関係で表現できる。前者は、集約関係を表す関係
で、菱形の印が付いたインスタンスが全体を、そこに線
分で結ばれたインスタンスが部分を示す。会社組織にお
ける、部と課、課と係の関係がこの集約関係である。詳
細属性も同様の関係を表現するが、時間的に連続した下
位概念の要約の構造を表現する。場クラス２０１は、場
のインスタンスの識別子を属性値にもつ時間属性を持
つ。これにより、カットやシーン間の時間関係を表現す
る。In FIG. 6, a partial attribute with a diamond mark and a character / object attribute represented by a black circle and a line segment can be represented by this relationship. The former is a relation that represents an aggregation relation, and the instance marked with a diamond indicates the whole, and the instance connected by a line segment shows the portion. The relation between departments and divisions and divisions and departments in a company organization is this aggregate relation. The detailed attribute expresses a similar relationship, but expresses the structure of a temporally consecutive subordinate concept summary. The field class 201 has a time attribute whose attribute value is the identifier of the field instance. This expresses the time relationship between cuts and scenes.

【００３５】図７は、図６におけるクラス定義に基づく
本発明に係るエピソード記憶構造の表現例を示す説明図
である。場インスタンス２０１０の時間属性に、場イン
スタンス２０１１の識別子を格納することにより、場イ
ンスタンス２０１０が場インスタンス２０１１に時間的
に先行することを表す。位置関係インスタンス２０２０
は、図３のシーン１０２６とシーン１０２７の位置関係
を表現する。また、部分属性は、図３のシーン１０２６
とカット１０２０およびカット１０２２の集約関係を表
現する。それぞれ、場インスタンス２０１０、場インス
タンス２０１３、場インスタンス２０１４に対応する。
さらに、詳細属性には、時間関係が保存された、下位の
先頭の場のインスタンスの識別子を格納する。このよう
に、場の間の時空間の関係を表現可能である。FIG. 7 is an explanatory diagram showing a representation example of the episode memory structure according to the present invention based on the class definition in FIG. By storing the identifier of the field instance 2011 in the time attribute of the field instance 2010, it is indicated that the field instance 2010 temporally precedes the field instance 2011. Positional relationship instance 2020
Represents the positional relationship between scene 1026 and scene 1027 in FIG. The partial attribute is scene 1026 in FIG.
Represents the aggregation relation between the cut 1020 and the cut 1022. It corresponds to the field instance 2010, the field instance 2013, and the field instance 2014, respectively.
Further, in the detailed attribute, the identifier of the lower first place instance in which the time relation is stored is stored. In this way, the spatiotemporal relationship between fields can be expressed.

【００３６】画像先頭フレームと画像フレーム長属性
は、それぞれ、図１、２におけるマルチメディア情報記
憶装置１０３の画像の先頭ポインタとフレーム長を格納
する。登場人物・物属性には、キャラクタインスタンス
の識別子を格納する。キャラクタインスタンス間の位置
関係も、場インスタンス２０１０と２０１２と同様に位
置関係クラス２０４により表現する。本図７では、両者
は、人であり、互いに右下／左上の位置関係がある。キ
ャラクタインスタンスには、属性に対応する概念のイン
スタンスの識別子を格納することにより、人のみでなく
物や場所なども格納できる。このように、画像に登場す
る人物や物、およびその位置関係をも保存して記憶でき
る。図５の言語表現は、それ自身を、例えば、言語表現
クラスのインスタンスとして、その識別子を図６の説明
文属性に登録する。The image head frame and the image frame length attribute respectively store the head pointer and the frame length of the image of the multimedia information storage device 103 in FIGS. The character / object attribute stores the identifier of the character instance. The positional relationship between the character instances is also expressed by the positional relationship class 204 like the field instances 2010 and 2012. In FIG. 7, both are people and have a lower right / upper left positional relationship with each other. By storing the identifier of the concept instance corresponding to the attribute in the character instance, not only a person but also an object or a place can be stored. In this way, the persons and objects appearing in the image and their positional relationship can also be saved and stored. The language expression of FIG. 5 registers itself as an instance of the language expression class in the description attribute of FIG.

【００３７】オブジェックト指向データベースなどのオ
ブジェックト指向技法の特徴は、データのみでなく、手
続きプログラムもリンクして格納できる。本実施例で
は、シーンやカットを特徴付けるパターンと、入力され
る音声や画像とそのパターンとのマッチングをとる認識
手続きを格納した図６の認識用パターン２０５のインス
タンスの識別子を格納する属性を持つ。これにより、入
力が、そのシーンやカットであるかを言語では言い表せ
ない場合も、パターンレベルで問い合わせ応答可能であ
る。この機能は、図４の物理的特徴に基づく要約処理部
９１でも利用可能である。A feature of the object-oriented technique such as the object-oriented database is that not only data but also procedural programs can be linked and stored. In the present embodiment, there is an attribute that stores an identifier of an instance of the recognition pattern 205 of FIG. 6 that stores a pattern that characterizes a scene or a cut and a recognition procedure that matches an input voice or image with the pattern. As a result, even if the language cannot describe whether the input is the scene or cut, it is possible to respond to the inquiry at the pattern level. This function can also be used in the summary processing unit 91 based on the physical characteristics shown in FIG.

【００３８】情報の検索においては、人の想起における
再認と再生機能を支援する。この再認の支援について、
図１を用いて説明する。ブラウザ１３は、図３の要約結
果をディスプレイ３に表示する。シーンやカットは、例
えば、言語によるラベルや、それを特徴づける静止画あ
るいは、対応する動画像の繰り返し再生画像をアイコン
として表示する。利用者は、キーボード４やマウス５を
用いて対話的に検索する。図３の要約結果は、時間や空
間の情報を多く保存した、意味付けされた過去の経験の
要約である。このため、これをインデックスとして検索
することにより、時間や意味のいろいろな単位により、
画像を迅速に検索できる。さらに、この検索から、忘れ
ていたことを思い出すのを支援できる。[0038] In information retrieval, the recognition and reproduction function in human recall is supported. Regarding the support of this recognition,
This will be described with reference to FIG. The browser 13 displays the summary result of FIG. 3 on the display 3. The scenes and cuts are displayed, for example, as labels, labels in language, still images characterizing the labels, or repeatedly reproduced images of corresponding moving images as icons. The user interactively searches using the keyboard 4 and the mouse 5. The summary result of FIG. 3 is a summary of the past experiences that have been made meaning, which preserves a lot of time and space information. Therefore, by searching this as an index, various units of time and meaning
Images can be searched quickly. In addition, this search can help you remember what you forgot.

【００３９】再生の支援では、言語を基本とした検索が
可能である。単語や文で検索文が入力されると、言語解
析部１１は、検索文を、例えば、図５のように解析す
る。そして、検索部１２は、意味記憶装置１０１の知識
に基づいて、検索を行う。例えば、「Ａさんと会ったと
き」の問いに対して、意味記憶装置１０１に、Ａさんは
男、Ａさんは３０歳、Ａさんは早口などの知識があれ
ば、前述した「早口の男性と一緒」のシーンは候補とし
て検索できる。また、エピソード記憶装置１０２にＡさ
んの人のインスタンスを登録し、その声の特徴と認識プ
ログラムを属性として格納する。そして、意味記憶装置
１０１のＡさんとリンクを登録する。同様に、図７にお
ける人のインスタンスにも声の特徴を登録しておく。検
索時に、この声の特徴を入力として、Ａさんの声の特徴
との話者認識をすることにより、登録時にＡさんを認識
しなくても、人に基づく検索が可能である。声の特徴と
しては、例えば、長時間スペクトラムを登録する。With the reproduction support, a language-based search is possible. When a search sentence is input with a word or a sentence, the language analysis unit 11 analyzes the search sentence, for example, as shown in FIG. Then, the search unit 12 performs a search based on the knowledge of the meaning storage device 101. For example, in response to the question “when I met Mr. A”, if there is knowledge in the meaning storage device 101 that Mr. A is a man, Mr. A is 30 years old, and Mr. A is a quick talk, You can search for "with" scenes as candidates. Also, the instance of Mr. A's person is registered in the episode storage device 102, and the feature of the voice and the recognition program are stored as attributes. Then, the link is registered with Mr. A of the meaning storage device 101. Similarly, the voice feature is registered in the person instance in FIG. 7. At the time of search, the feature of the voice is input and the speaker is recognized as the feature of the voice of Mr. A, so that the person-based search can be performed without recognizing Mr. A at the time of registration. As a voice feature, for example, a long-time spectrum is registered.

【００４０】以上、図１〜図７を用いて説明したよう
に、本実施例のマルチメディア情報管理システムでは、
入力されたマルチメディア情報を分割して要約した情報
を、意味に関する情報に対応付けて管理することによ
り、機械における情報の獲得および記憶形態を、人間の
記憶の生成や想起の特性に合致したものとすることがで
きる。このように、マルチメディア情報の記録および検
索を、人間の記憶の仕方や、人間の記憶の辿り方（思い
出し方）を疑似して行なうことにより、後で検索が容易
なように編集され加工された映像のみでなく、日常活動
を撮影した長時間の動画像等に関しても、自動的に効率
良く整理して記憶し、所望のシーンを迅速に検索でき、
かつ、利用者が覚えやすい情報や思い出しやすい情報に
基づいて、正確かつ迅速に検索できる。As described above with reference to FIGS. 1 to 7, in the multimedia information management system of this embodiment,
Information obtained by dividing the input multimedia information and managing it by associating it with information related to meaning, and matching the information acquisition and memory forms in the machine with the characteristics of human memory generation and recall. Can be In this way, by recording and retrieving multimedia information by imitating the way human memory and how human memory follows (remember), it is edited and processed for easy later retrieval. Not only video, but also long-time moving images of daily activities, etc. can be automatically and efficiently organized and stored, and you can quickly search for the desired scene,
Moreover, it is possible to search accurately and quickly based on information that the user can easily remember or remember.

【００４１】尚、本発明は、図１〜図７を用いて説明し
た実施例に限定されるものではなく、その要旨を逸脱し
ない範囲において種々変更可能である。例えば、本実施
例では、オブジェックト指向技術を用いてシステム構成
を行なっているが、他のプログラム言語を用いたプログ
ラムによっても本発明に係る動作を行なうシステム構成
は可能である。また、図３における各ノード（カット１
０２０〜１０２３、シーン１０２６〜１０２９等）の表
示に関しては、全てを表示しても、あるいは、「シーン
１０２８」＋「シーン１０２６」＋「シーン１０２７」
の組合せや、「シーン１０２８」＋「カット１０２０」
の組合せ等に限定して表示しても良い。The present invention is not limited to the embodiments described with reference to FIGS. 1 to 7, and various modifications can be made without departing from the scope of the invention. For example, in the present embodiment, the system configuration is performed using the object-oriented technology, but a system configuration in which the operation according to the present invention is performed by a program using another programming language is also possible. In addition, each node (cut 1
020 to 1023, scenes 1026 to 1029, etc.), all may be displayed, or “scene 1028” + “scene 1026” + “scene 1027”
Combination of "Scene 1028" + "Cut 1020"
It is also possible to limit the display to combinations and the like.

【００４２】[0042]

【発明の効果】本発明によれば、人間の記憶の仕方や、
人間の記憶の辿り方（思い出し方）を疑似したマルチメ
ディア情報の管理ができ、利用者が覚えやすい情報や思
い出しやすい情報に基づいた正確かつ迅速な検索がで
き、マルチメディア情報の記録および検索を高効率化す
ることが可能である。According to the present invention, the method of human memory and
It is possible to manage multimedia information that mimics how human memory traces (remembers), and to perform accurate and quick search based on information that is easy for the user to remember or remember. It is possible to improve efficiency.

[Brief description of drawings]

【図１】本発明のマルチメディア情報管理システムの本
発明に係る構成の一実施例を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a configuration according to the present invention of a multimedia information management system of the present invention.

【図２】図１におけるマルチメディア情報管理システム
の動作に用いる各装置構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of each device used in the operation of the multimedia information management system in FIG.

【図３】本発明に係る記憶構造例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of a memory structure according to the present invention.

【図４】図１におけるエピソード抽出処理部と記憶構造
学習処理部の詳細構成例を示すブロック図である。FIG. 4 is a block diagram showing a detailed configuration example of an episode extraction processing unit and a storage structure learning processing unit in FIG. 1.

【図５】図１の意味記憶装置における意味ネットワーク
構成例を示す説明図である。5 is an explanatory diagram showing a configuration example of a semantic network in the meaning storage device of FIG. 1. FIG.

【図６】本発明に係る共通情報および関連情報のデータ
構造例を示す説明図である。FIG. 6 is an explanatory diagram showing a data structure example of common information and related information according to the present invention.

【図７】図６におけるクラス定義に基づく本発明に係る
エピソード記憶構造の表現例を示す説明図である。7 is an explanatory diagram showing a representation example of an episode storage structure according to the present invention based on the class definition in FIG. 6. FIG.

[Explanation of symbols]

１：カメラ、２：マイク、３：ディスプレイ、４：キー
ボード、５：マウス、６：インタフェース部、７：メデ
ィア情報管理部、８：エピソード抽出処理部、９：記憶
構造学習処理部、１０：磁気ディスク装置、１１：言語
解析部、１２：検索部、１３：ブラウザ、２０：中央処
理装置、３０：メモリ、４０：制御部、５０：バス、８
１：特徴抽出部、８２：動きの分類部、８３：カット分
割処理部、９１：物理的特徴に基づく要約処理部、９
２：短期作業領域、９３：動きの分類学習部、９４：場
所分類学習部、１０１：意味記憶装置、１０２：エピソ
ード記憶装置、１０３：マルチメディア情報記憶装置、
２０１：場クラス、２０２：位置関係クラス、２０３：
キャラクタクラス、２０４：位置関係クラス、１０１０
〜１０１３：意味的情報、１０２０〜１０２３：カッ
ト、１０２６〜１０２９：シーン、２０１０〜２０１
４：場インスタンス、２０２０：位置関係インスタン
ス、２０３０，２０３１：キャラクタインスタンス1: camera, 2: microphone, 3: display, 4: keyboard, 5: mouse, 6: interface unit, 7: media information management unit, 8: episode extraction processing unit, 9: storage structure learning processing unit, 10: magnetism Disk device, 11: language analysis unit, 12: search unit, 13: browser, 20: central processing unit, 30: memory, 40: control unit, 50: bus, 8
1: feature extraction unit, 82: motion classification unit, 83: cut division processing unit, 91: physical feature-based summarization processing unit, 9
2: short-term work area, 93: movement classification learning unit, 94: place classification learning unit, 101: meaning storage device, 102: episode storage device, 103: multimedia information storage device,
201: place class, 202: positional relationship class, 203:
Character class, 204: positional relationship class, 1010
-1013: Semantic information, 1020-1023: Cut, 1026-1029: Scene, 2010-201
4: field instance, 2020: positional relationship instance, 2030, 2031: character instance

フロントページの続き (72)発明者木村宏一東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者藤澤浩道東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内Front page continuation (72) Koichi Kimura 1-280, Higashi Koikekubo, Kokubunji, Tokyo Inside Hitachi Central Research Laboratory (72) Inventor Hiromichi Fujisawa 1-280 Higashi Koikeku, Tokyo Kokubunji City Central Research Laboratory, Hitachi Ltd.

Claims

[Claims]

1. An information management system for recording multimedia information and outputting multimedia information corresponding to a search condition based on an operator's input operation of the search condition,
Dividing means for dividing the multimedia information into groups based on the commonality of physical characteristics of the multimedia information, and each group divided by the dividing means, based on the relevance of the commonality of each group, Put it in a group,
In addition, the group is repeated into a further upper group to generate the group in a hierarchical structure, and a semantic network for connecting languages based on the meaning of the language information is used to define commonality information of each group in the hierarchical structure. Multimedia information characterized by using the commonality information and relevance information of each group in the hierarchical structure, and the language information as search conditions for the multimedia information. Management system.

2. The multimedia information management system according to claim 1, further comprising means for displaying commonality information, relevance information and language information of each group of the hierarchical structure. Management system.

3. The multimedia information management system according to claim 1 or 2,
The multimedia information management system, wherein the dividing means divides the multimedia information into groups based on the commonality of objects in the moving image of the multimedia information.

4. The multimedia information management system according to claim 3, wherein the commonality information used by the dividing means is determined based on the movement of the camera that photographed the moving image. .

5. The multimedia information management system according to claim 3 or 4,
A multimedia information management system, wherein the positional relationship between the groups is obtained based on the movement of the camera that captured the moving image, and the positional relationship between the groups is used as relevance information used by the summarizing means.

6. The multimedia information management system according to claim 1, wherein the spatial positional relationship between a plurality of things in the multimedia information is
A multimedia information management system characterized by being used as commonality information of the above group.

7. The multimedia information management system according to claim 1, wherein the commonality information and the relevance information are formed by a pattern, and the commonality and the relevance information are recognized by pattern recognition. A multimedia information management system characterized by making a distinction.

8. The multimedia information management system according to claim 1, wherein the commonality information and the relevance information are formed by a language including symbols, and the commonality information and the relevance information are analyzed by semantic analysis. A multimedia information management system characterized by determining relevance.

9. The multimedia information management system according to claim 1, wherein the groups are associated with each other in a time-direction context, and the association information is used as a search condition for the multimedia information. A multimedia information management system characterized by being used.

10. The multimedia information management system according to any one of claims 1 to 9, wherein the groups are associated with each other in a spatial positional relationship, and the association information is used as a search condition for the multimedia information. A multimedia information management system characterized by being used.

11. The multimedia information management system according to any one of claims 1 to 10, wherein the grouping of the hierarchical structure is corrected based on an instruction input from an operator. Information management system.