JP2006107109A

JP2006107109A - Information management device and information management method

Info

Publication number: JP2006107109A
Application number: JP2004292607A
Authority: JP
Inventors: Kenichiro Nakagawa; 賢一郎中川; Masaaki Yamada; 雅章山田; Hiroki Yamamoto; 寛樹山本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-10-05
Filing date: 2004-10-05
Publication date: 2006-04-20

Abstract

<P>PROBLEM TO BE SOLVED: To perform retrieval based on annotation information even for information without an annotation. <P>SOLUTION: Retrieval information is retrieved on the basis of an inputted retrieval condition and the annotation information related to the retrieval information, related information is extracted from a storage means storing the retrieval information for each retrieved retrieval information, and the extracted information and the related information are related and presented to a user. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、情報を検索するための情報管理装置、情報管理方法に関するものである。 The present invention relates to an information management apparatus and an information management method for searching for information.

昨今、デジタルカメラ等の普及が著しい。ユーザは、デジタルカメラのような携帯型撮像装置により撮像したデジタル画像を、ＰＣやサーバ等で管理することが一般的である。例えば撮像した画像を、ＰＣあるいはサーバ上のフォルダ内に整理したり、特定の画像を印刷し、年賀状等に組み込むことが可能である。また、サーバで管理する場合は、一部の画像を他のユーザに公開することも可能である。 In recent years, the spread of digital cameras and the like has been remarkable. In general, a user manages a digital image captured by a portable imaging device such as a digital camera using a PC or a server. For example, captured images can be organized in folders on a PC or server, or specific images can be printed and incorporated into New Year's cards or the like. In addition, when managed by the server, some images can be disclosed to other users.

このような作業を行う場合には、ユーザの意図する特定の画像を見つけ出すことが必要となる。見つけ出す対象となる画像数が少ない場合は、画像をサムネイル表示し、その一覧から目視で見つけ出すことも可能である。しかし、対象となる画像数が何百となる場合や、対象画像群が複数フォルダに分断されて格納されている場合は、目視で見つけ出すことは困難である。 When performing such work, it is necessary to find a specific image intended by the user. When the number of images to be found is small, the images can be displayed as thumbnails, and can be found visually from the list. However, when the number of target images is hundreds, or when the target image group is divided and stored in a plurality of folders, it is difficult to find it visually.

そこで、撮像装置上で画像に音声アノテーション（音声による注釈）を付け、検索時にその情報を使うことが行われている。例えば携帯型撮像装置により山の画像を撮像し、その画像に対して「箱根の山」と発声する。この音声データは先の画像データと対となって撮像装置内に格納された後、その画像撮像装置内あるいは画像をアップロードしたＰＣ内で音声認識され、“はこねのやま”というテキスト情報に変換される。音声アノテーションデータがテキスト情報に変換されれば、後は一般的なテキスト検索技術で処理することが可能であり、「やま」、「はこね」等のテキスト入力でその画像を検索することができる。 Therefore, an audio annotation (annotation by voice) is added to an image on the imaging apparatus, and the information is used at the time of retrieval. For example, an image of a mountain is picked up by a portable imaging device, and “Hakone no Yama” is uttered to the image. This audio data is stored in the imaging device as a pair with the previous image data, and then recognized in the imaging device or in the PC to which the image is uploaded, and converted to text information “Hakoneyama”. Is done. Once the voice annotation data is converted into text information, it can be processed with a general text search technique, and the image can be searched with text input such as “Yama” or “Hakone”. .

このような音声アノテーションを利用した先行技術に、特許文献１がある。特許文献１では、画像の撮像時あるいは撮像後に注釈となる音声をユーザが入力し、その音声データを既存の音声認識技術を利用して画像検索に利用している。 Patent Document 1 is a prior art using such voice annotation. In Patent Document 1, a user inputs a voice to be an annotation at the time of image capture or after image capture, and the sound data is used for image search using an existing speech recognition technology.

アノテーション情報を利用した一般的な検索システムでは、アノテーションが付与されていない画像を検索対象とすることができない。しかし、全ての検索対象物にアノテーション情報を付与させることは、ユーザにとって負担が大きい。そのため、アノテーション情報を自動で付けるアプローチや、前後に付けられたアノテーション情報から対象物のアノテーション情報を推定するアプローチが提案されている。 In a general search system using annotation information, an image to which no annotation is attached cannot be a search target. However, it is burdensome for the user to add annotation information to all search objects. Therefore, approaches for automatically adding annotation information and approaches for estimating annotation information of an object from annotation information attached before and after are proposed.

例えば、特許文献２は、撮像装置上で画像にアノテーション情報を付与できるシステムである。ここでは、画像に付与するアノテーション情報を変更しない限り、前の画像に付けたものと同じアノテーション情報が付けられるという実施例が記載されている。 For example, Patent Document 2 is a system that can add annotation information to an image on an imaging apparatus. Here, an embodiment is described in which the same annotation information as that attached to the previous image is attached unless the annotation information attached to the image is changed.

また、特許文献３では、直前に撮像された画像とのタイムラグが少ない場合にだけ、直前の画像からアノテーション情報をコピーする提案がなされている。
特開２００３−２１９３２７号公報特開２００３−２２４７５０号公報特開平７−１２１５６１号公報 In Patent Document 3, there is a proposal to copy annotation information from the immediately preceding image only when the time lag with the image captured immediately before is small.
JP 2003-219327 A JP 2003-224750 A Japanese Patent Laid-Open No. 7-121561

これらの提案では、アノテーションを付けなかったものに対しても、アノテーション情報のコピーやアノテーション情報へのリンクが付与される。そのため、ユーザの負担を減らすという面では有効である。しかし、これらの処理が撮像時に撮像装置内で行われており、アノテーション情報やそのリンク情報のコピーを撮像装置内のメモリに保持することになるため、メモリ使用効率が低下するという問題があった。また、異なる撮像装置で撮像された画像はアノテーション情報を共有できないという問題もあった。 In these proposals, a copy of the annotation information and a link to the annotation information are given even to those that are not annotated. Therefore, it is effective in reducing the burden on the user. However, since these processes are performed in the image pickup apparatus at the time of image pickup, a copy of the annotation information and the link information is held in the memory in the image pickup apparatus, which causes a problem that the memory use efficiency is lowered. . Another problem is that images captured by different image capturing apparatuses cannot share annotation information.

本発明にかかる情報管理装置は、入力された検索条件及び検索対象情報に関連付けられたアノテーション情報に基づいて、検索対象画像を検索する検索手段と、前記検索手段で検索された検索対象情報毎に、前記検索対象情報が格納された格納手段から関連画像を抽出する抽出手段と、前記検索手段により抽出された画像とその関連画像を関連付けてユーザに提示するよう制御する提示制御手段とを有することを特徴とする。 An information management apparatus according to the present invention includes a search unit that searches for a search target image based on input search conditions and annotation information associated with the search target information, and for each search target information searched by the search unit. , An extraction unit that extracts a related image from a storage unit that stores the search target information, and a presentation control unit that controls the image extracted by the search unit and the related image to be presented to the user in association with each other. It is characterized by.

本発明により、アノテーションが付いていない情報に関しても、アノテーション情報に基づいた検索を行うことができるようになる。これらのことを行うためには、アノテーション情報が付いていない画像は、一つ前の画像に付いたアノテーション情報をコピーすることが一般的である。しかし本発明では、この処理が必要ないため、撮像装置内での消費メモリの増加を防ぐことができる。 According to the present invention, it is possible to perform a search based on annotation information even for information without an annotation. In order to do these things, it is common to copy the annotation information attached to the previous image to the image without the annotation information. However, according to the present invention, this process is not necessary, so that it is possible to prevent an increase in memory consumption in the imaging apparatus.

以下、図面を参照して、本発明に係る情報管理装置の実施の形態について、画像を管理する場合を例にあげて説明する。 Hereinafter, an embodiment of an information management apparatus according to the present invention will be described with reference to the drawings, taking an example of managing images.

（実施例１）
図１は、本実施例の情報管理装置の機能構成図である。本発明の情報管理装置１０１は、制御コマンド発行部１０２、検索文字列入力部１０３、画像出力部１０４と接続する。制御コマンド発行部１０２は、情報管理装置に対し、音声アノテーションデータを音声認識するためのイベントや、画像の検索を実行するための命令を送信する部分である。制御コマンド発行部１０２には、例えばＧＵＩボタンや、物理ボタン等が考えられる。または、特定のメモリデバイス（コンパクトフラッシュ（登録商標）カード）がスロットに挿されたことにより、特定のコマンドが発行されたと見なしてもよい。検索文字列入力部１０３は、画像検索をするための検索クエリを入力するための装置であり、キーボードや音声を入力するマイクが考えられる。また、画像出力部１０４は、検索された画像の候補をユーザに提示するための部分であり、ディスプレイ等が想定される。 Example 1
FIG. 1 is a functional configuration diagram of the information management apparatus according to the present embodiment. The information management apparatus 101 of the present invention is connected to a control command issuing unit 102, a search character string input unit 103, and an image output unit 104. The control command issuing unit 102 is a part that transmits an event for voice recognition of voice annotation data and a command for executing an image search to the information management apparatus. As the control command issuing unit 102, for example, a GUI button or a physical button can be considered. Alternatively, it may be considered that a specific command is issued when a specific memory device (compact flash (registered trademark) card) is inserted into the slot. The search character string input unit 103 is a device for inputting a search query for performing an image search, and a keyboard and a microphone for inputting voice are conceivable. The image output unit 104 is a part for presenting the searched image candidates to the user, and a display or the like is assumed.

画像装置内には、画像・音声データベース１１１が含まれている。このデータベースは、撮像装置から取り込まれた画像や、その画像に付加された音声アノテーションデータが格納されている。 An image / audio database 111 is included in the image apparatus. This database stores images captured from the imaging device and audio annotation data added to the images.

図１０は、音声・画像データベースの例である。このデータベースの要素としては、撮像装置で撮像された画像データ、画像データに関連付けられた音声アノテーションデータ、更に、音声アノテーションデータが情報管理装置音声認識された結果を格納した音声認識結果格納ファイルが含まれる。これらの３つの要素は常に揃っているわけではなく、例えば、音声アノテーションデータが付けられていない画像は、音声アノテーションデータのスロットが空となる。また、まだ音声認識されていない音声アノテーションデータに関しては、音声認識結果格納ファイルのスロットは空となる。各々のデータ要素の対応関係は、別に対応情報として保持してもよいし、図１０の例のように、同じファイル名で拡張子部分だけが異なるものが、互いに対応しているとみなしてもよい。 FIG. 10 shows an example of an audio / image database. The elements of this database include image data picked up by the image pickup device, voice annotation data associated with the image data, and a voice recognition result storage file that stores the result of voice recognition of the voice annotation data by the information management device. It is. These three elements are not always aligned. For example, in an image without audio annotation data, the audio annotation data slot is empty. For voice annotation data that has not yet been voice-recognized, the slot of the voice recognition result storage file is empty. Correspondence between each data element may be held as correspondence information separately, and it is considered that the same file name and only the extension part is different as in the example of FIG. Good.

情報管理装置内の制御部１０７は、装置外部からの制御コマンドを受け、各種のコマンドを実行する部分である。このコマンドは少なくとも、音声アノテーションデータの音声認識を行うためのコマンド（音声認識コマンド）と、検索クエリを用いて画像を検索するための検索コマンドである。 The control unit 107 in the information management apparatus is a part that receives control commands from the outside of the apparatus and executes various commands. This command is at least a command for performing voice recognition of voice annotation data (voice recognition command) and a search command for searching for an image using a search query.

音声認識コマンドが入力されると、制御部は音声アノテーションデータ認識部１０６にアクセスする。そして、前もって収集された人間の声の特徴量（音響モデル）と言語制約（言語モデル）からなる音声認識用データ１０５を用い、画像・音声データベース１１１内の音声アノテーションデータを音声認識する。このとき、画像データベースにある音声アノテーションデータを全て音声認識してもよいし、まだ音声認識をしていない音声アノテーションデータに対してのみ行ってもよい。図１０の例では、１０４．ｗａｖと１０６．ｗａｖに相当する音声認識結果格納ファイルがないため、この二つの音声アノテーションデータの認識を行う。 When a voice recognition command is input, the control unit accesses the voice annotation data recognition unit 106. Then, the speech annotation data in the image / speech database 111 is speech-recognized using speech recognition data 105 including the feature values (acoustic model) of human voice and language constraints (language model) collected in advance. At this time, all voice annotation data in the image database may be voice-recognized, or may be performed only on voice annotation data that has not yet been voice-recognized. In the example of FIG. wav and 106. Since there is no voice recognition result storage file corresponding to wav, these two voice annotation data are recognized.

例えば、音声アノテーションデータが「箱根の山」という音声であった場合、理想的には「ｈａｋｏｎｅｎｏｙａｍａ」という音素列が音声認識結果として出力される。これらの音声認識結果は音声認識結果格納部１１０に送られ、音声認識結果格納ファイルという形で画像・音声データベースに追加される。図７が音声認識結果格納ファイルの例である。ここでは、１つの音声アノテーションデータに対し、５つの候補を認識結果として出力している。 For example, when the voice annotation data is a voice “Hakone no Yama”, a phoneme string “hakonenoyama” is ideally output as a voice recognition result. These speech recognition results are sent to the speech recognition result storage unit 110 and added to the image / speech database in the form of a speech recognition result storage file. FIG. 7 shows an example of a speech recognition result storage file. Here, five candidates are output as recognition results for one voice annotation data.

制御コマンド発行部１０２から検索コマンドが入力されると、制御部１０７は検索部１０８に働きかける。ここでは、テキストクエリによる画像の検索を行う例で説明する。検索処理が始まると、検索部は本装置外の検索文字列入力部１０３からテキストを取り込む。例えば「箱根の山」という文字列が取り込まれると、検索部では、この入力文字列を自動読み付けし、「ｈａｋｏｎｅｎｏｙａｍａ」のような音素列（発声情報）に変換する。更に、この発声情報と画像・音声データベース中の音声認識結果格納ファイルに書き込まれた音声認識結果の音素列との類似度を計算し、類似度が大きいＮ個の画像を選択する。ここでは、これらの画像を検索結果画像と呼ぶ。 When a search command is input from the control command issuing unit 102, the control unit 107 works on the search unit 108. Here, an example in which an image is retrieved by a text query will be described. When the search process starts, the search unit takes in the text from the search character string input unit 103 outside the apparatus. For example, when a character string “Hakone no Yama” is captured, the search unit automatically reads the input character string and converts it into a phoneme string (voice information) such as “hakonenoyama”. Further, the similarity between the utterance information and the phoneme string of the speech recognition result written in the speech recognition result storage file in the image / speech database is calculated, and N images having a large similarity are selected. Here, these images are called search result images.

関連画像抽出部１１２では、検索結果画像と関連した画像群を抽出する。この抽出方法については、後に詳しく説明する。検索結果画像と関連画像は、検索結果提示部１０９に送られ、装置外部の画像出力部１０４に出力される。 The related image extraction unit 112 extracts an image group related to the search result image. This extraction method will be described in detail later. The search result image and the related image are sent to the search result presentation unit 109 and output to the image output unit 104 outside the apparatus.

図２、図３は、本情報管理装置のフローである。ここでは、このフローを元に、画像の撮像から検索を行うまでの具体的な流れを説明する。 2 and 3 are flowcharts of the information management apparatus. Here, based on this flow, a specific flow from image capture to search will be described.

まず、ユーザは、図４の携帯型画像撮像装置４０１を用い、画像を撮像しておく必要がある。この携帯型画像撮像装置には、画像確認画面４０３が付いている。また、動作モード切替スイッチ４０５の切り替えにより、撮像モード／撮像済み画像確認モードを切り替えることができるものとする。撮像済み画像確認モード時には、画像確認画面に今まで撮像した画像を確認することができるものとする。 First, the user needs to capture an image using the portable image capturing device 401 of FIG. This portable image pickup apparatus has an image confirmation screen 403. Further, it is assumed that the imaging mode / captured image confirmation mode can be switched by switching the operation mode switching switch 405. In the captured image confirmation mode, it is assumed that the image captured so far can be confirmed on the image confirmation screen.

ユーザは撮像済み画像確認モードにおいて、特定画像に音声アノテーションを付けることが可能である。たとえば、音声アノテーションを付加したい画像を画像確認画面に表示させ、機器の音声アノテーション付与ボタン４０２を押すことで、その画像に音声アノテーションデータが付与されてもよい。具体的には、このボタンが押されることにより、マイク４０４から一定時間の音声が取り込まれ、その音声データを画像と関連付けて撮像装置内のメモリに格納する。 The user can add a voice annotation to the specific image in the captured image confirmation mode. For example, an image to which a voice annotation is to be added may be displayed on the image confirmation screen, and the voice annotation data may be added to the image by pressing the voice annotation giving button 402 of the device. Specifically, when this button is pressed, sound for a certain period of time is captured from the microphone 404, and the sound data is associated with an image and stored in a memory in the imaging apparatus.

しかし、ユーザが全ての画像に対して音声アノテーションを付けることはユーザへの負荷が大きいため期待できない。そこで、図５のように、シーンの切れ目毎に音声アノテーションを付けてもらうようにする。 However, it cannot be expected that the user attaches audio annotations to all images because the load on the user is large. Therefore, as shown in FIG. 5, a voice annotation is added to each scene break.

図５では、まず撮り始めの画像に「箱根の山」という音声アノテーションを付け５０２、続けて２枚音声アノテーション無し画像５０３を撮像したことを示している。そして、次の画像を撮像した後、その画像に「芦ノ湖散策」という音声アノテーションを付けている。 FIG. 5 shows that an audio annotation “mountain of Hakone” is added 502 to the first image to be shot, and then two images without audio annotation 503 are captured. Then, after the next image is captured, a voice annotation “Lake Ashinoko” is added to the image.

ユーザが旅から帰り、携帯型画像撮像装置をＰＣに接続すると、図６のようなダイアログウインドウが開き、携帯型画像撮像装置内のデータをＰＣにアップロードすることを促す。ユーザは、アップロードしたい画像を選択し、アップロード指示ボタン６０５を押下する。すると、対象画像とその画像に付加された音声アノテーションデータが、ＰＣ上の画像・音声データベースにアップロードされる。 When the user returns from a trip and connects the portable image pickup device to the PC, a dialog window as shown in FIG. 6 is opened to prompt the user to upload the data in the portable image pickup device to the PC. The user selects an image to be uploaded and presses an upload instruction button 605. Then, the target image and audio annotation data added to the image are uploaded to the image / audio database on the PC.

このアップロード処理が完了すると、図２で示したフローが開始される。このフローに入ると、装置内の画像・音声データベースを検索し（Ｓ２０１）、まだ音声認識していない音声アノテーションデータがあるかどうかを判別する（Ｓ２０２）。すでに、全音声アノテーションデータが認識されていた場合には、このフローから抜ける。 When this upload process is completed, the flow shown in FIG. 2 is started. When this flow is entered, the image / sound database in the apparatus is searched (S201), and it is determined whether there is speech annotation data that has not yet been speech-recognized (S202). If all voice annotation data has already been recognized, the flow is exited.

まだ音声認識していない音声アノテーションがあった場合、その音声アノテーションデータを取得する（Ｓ２０３）。そして、その音声データを音声認識する（Ｓ２０４）。音声認識結果は、画像・音声データベースに音声認識結果格納ファイルとして格納される（Ｓ２０５）。 If there is a voice annotation that has not yet been recognized, the voice annotation data is acquired (S203). Then, the voice data is recognized (S204). The voice recognition result is stored in the image / voice database as a voice recognition result storage file (S205).

ある日、ユーザがアップロードした画像を利用したくなったとする。するとユーザは図８のような画像検索プログラムを立ち上げ、ここの検索文字列入力フィールド８０２に検索テキストを入力し、検索開始ボタン８０３を押す。 One day, you want to use an image uploaded by the user. Then, the user starts up an image search program as shown in FIG. 8, enters search text in the search character string input field 802, and presses a search start button 803.

検索開始ボタンが押されると、図３で示したフローに入る。このフローに入ると、まず、検索文字列入力フィールド１０３から検索文字列の取り込みを行う（Ｓ３０１）。次に、この検索文字列を、言語処理技術を利用して音素列に変換する（Ｓ３０２）。変換した音素列は変数Ａに格納する。 When the search start button is pressed, the flow shown in FIG. 3 is entered. In this flow, first, a search character string is fetched from the search character string input field 103 (S301). Next, this search character string is converted into a phoneme string using a language processing technique (S302). The converted phoneme string is stored in variable A.

次に、装置内の画像・音声認識結果データベースにアクセスし、そこから音声認識結果格納ファイルを一つ取得する。このとき取得したものを音声認識結果格納ファイルαとおく（Ｓ３０３）。次に、変数Ｃ＿ｍａｘを０でクリアする（Ｓ３０４）。 Next, the image / speech recognition result database in the apparatus is accessed, and one speech recognition result storage file is obtained therefrom. What is obtained at this time is set as a speech recognition result storage file α (S303). Next, the variable C_max is cleared with 0 (S304).

次に、Ｓ３０３で取得した音声認識結果格納ファイルαから、音声認識候補を一つ取得する。音声認識結果候補は図７のように、音声認識結果格納ファイルの各行に対応した情報である。ここで取得した音声認識候補の“ｓｔｒｉｎｇ＝”に続く認識結果音素列は、変数Ｂに格納する（Ｓ３０５）。 Next, one speech recognition candidate is acquired from the speech recognition result storage file α acquired in S303. The speech recognition result candidate is information corresponding to each row of the speech recognition result storage file as shown in FIG. The recognition result phoneme string following “string =” of the voice recognition candidate acquired here is stored in the variable B (S305).

次に、変数Ａ、Ｂに格納された音素列間の類似度を算出する（Ｓ３０６）。この時には、既存技術である動的計画法を利用することができる。算出された類似度は変数Ｃに格納する。格納されたＣは、Ｃ＿ｍａｘと比較を行う（Ｓ３０７）。もし、ＣがＣ＿ｍａｘよりも大きい場合は、Ｃ＿ｍａｘをＣで更新する（Ｓ３０８）。 Next, the similarity between the phoneme strings stored in the variables A and B is calculated (S306). At this time, the existing technology, dynamic programming, can be used. The calculated similarity is stored in the variable C. The stored C is compared with C_max (S307). If C is larger than C_max, C_max is updated with C (S308).

これらの作業は一つの音声認識結果格納ファイルα内の全音声認識候補に対して行い、全認識候補が終了した場合に（Ｓ３０９）、Ｃ＿ｍａｘをその音声認識認識ファイルαのスコアとする（Ｓ３１０）。 These operations are performed on all the speech recognition candidates in one speech recognition result storage file α. When all the recognition candidates are completed (S309), C_max is set as the score of the speech recognition recognition file α (S310). .

以上の操作を、画像・音声認識結果データベースの全認識結果格納ファイルに対して行う。全ての音声認識結果格納ファイルが終了した場合（Ｓ３１１）、各ファイルに対して算出されたスコアでソートし、その上位Ｎ個に対応する画像を検索結果画像とする。更に、検索結果画像に関連する画像を画像・音声データベースから抽出し、ユーザに提示する（Ｓ３１２）。 The above operation is performed on all the recognition result storage files in the image / speech recognition result database. When all the speech recognition result storage files are completed (S311), the files are sorted by the score calculated for each file, and the images corresponding to the top N are used as search result images. Further, an image related to the search result image is extracted from the image / sound database and presented to the user (S312).

この関連画像の抽出は、次の手法で行う。まず、画像・音声データベースを画像データの撮像日時順に並べる。そして、検索結果画像から、次の音声アノテーションデータを持つ画像までの画像群を関連画像とする。例えば、図１０で示す画像・音声データベースが撮像日時順に並んでいるとすると、１０１．ｊｐｇの画像に対して１０２．ｊｐｇと１０３．ｊｐｇが関連画像となる。同様に、１０４．ｊｐｇの画像に対しては、１０５．ｊｐｇが関連画像となる。 This related image is extracted by the following method. First, the image / sound database is arranged in the order of image data capturing date and time. Then, an image group from the search result image to an image having the next audio annotation data is set as a related image. For example, if the image / sound database shown in FIG. 102. For jpg images. jpg and 103. jpg is the related image. Similarly, 104. For jpg images, 105. jpg is the related image.

説明のため、再び図５の例に戻るが、ここではユーザは、最初の写真を撮像した後に、その画像に対して「箱根の山」という音声アノテーションを付与した。更に、二枚写真を撮像し、これらには音声アノテーションを付与しなかった。この場合、検索を行った際には、２枚目、３枚目の写真は、１枚目と同時に（あるいは１枚目から展開された形で）提示されることになる。この動作はユーザから見て、比較的自然な振る舞いであると考えられる。 For the sake of explanation, the example of FIG. 5 is returned again. Here, after the first photograph is taken, the user assigns a voice annotation “mountain of Hakone” to the image. In addition, two photographs were taken and no audio annotation was given to them. In this case, when a search is performed, the second and third pictures are presented simultaneously with the first picture (or in a form developed from the first picture). This operation is considered to be a relatively natural behavior from the viewpoint of the user.

検索結果画像と上記の考えで抽出された関連画像は、図９のような提示方法で出力される。検索結果提示ウインドウＡ９０１では、検索された音声アノテーション付き画像を検索スコア順に提示しているところである。更に、ユーザがマウスカーソル９０３を検索結果画像に重ねると、その画像の関連画像がポップアップして提示される。 The search result image and the related image extracted based on the above idea are output by the presentation method as shown in FIG. In the search result presentation window A901, the searched images with voice annotations are presented in the order of search scores. Further, when the user places the mouse cursor 903 on the search result image, a related image of the image pops up and is presented.

検索結果提示ウインドウＢ９０４が関連画像９０５を提示している例である。ユーザはこのポップアップした関連画像からも、所望の画像を選択することができるようになる。 The search result presentation window B904 is an example in which a related image 905 is presented. The user can select a desired image from the related images popped up.

本実施例の利点は、音声アノテーションデータが付与されていない画像であっても、検索することができるという点である。また、音声アノテーションデータが付加されていない画像には、直前のアノテーションデータをコピー（あるいはリンク情報を張る）というようなことを撮像装置内で行わないため、撮像装置内のメモリ使用量の増加を防ぐ。 The advantage of this embodiment is that it is possible to search even an image to which audio annotation data is not assigned. In addition, since an image without audio annotation data is not copied in the imaging device, the previous annotation data is copied (or linked information is added), so that the memory usage in the imaging device is increased. prevent.

（実施例２）
上記実施例では、画像を撮像する携帯型画像撮像装置は１台であることを想定していたが、これが複数あってもよい。例えば、一人が複数台の携帯型画像撮像装置を用いて撮像、音声アノテーション付けを行った場合でも、共通の画像・音声データベースのアップロードすることが可能である。その場合、音声アノテーションを付けた携帯型画像撮像装置と別の携帯型画像撮像装置で撮像された画像も関連画像として提示することが可能である。 (Example 2)
In the above-described embodiment, it is assumed that there is one portable image capturing apparatus that captures an image, but there may be a plurality of these. For example, even when one person performs imaging and voice annotation using a plurality of portable image pickup devices, it is possible to upload a common image / sound database. In that case, an image captured by a portable image capturing device different from the portable image capturing device to which the voice annotation is attached can be presented as a related image.

また、画像内、あるいは画像・音声データベース内に、どの携帯型画像撮像装置で撮像されたものかの情報を持つことにより、他のカメラで撮像された画像は関連画像に含めないということも可能である。 In addition, it is possible to have images taken by other cameras not included in related images by having information on which portable image capture device was captured in the image or in the image / sound database. It is.

（実施例３）
上記実施例では、ある画像が、特定の画像の関連画像であるということを、画像・音声データベースに残さなかった。しかし、どの画像がどの画像に関連するものかを、画像・音声データベースに明示的に情報保持することも可能である。例えば、図１１はこの情報を保持した画像・音声データベースの例である。図１０のものに比べ、関連情報という要素が増え、この部分に関連する画像の名前が保持されている。 (Example 3)
In the above embodiment, it is not left in the image / sound database that an image is a related image of a specific image. However, it is also possible to explicitly hold information in the image / sound database which image is related to which image. For example, FIG. 11 shows an example of an image / sound database holding this information. Compared to that of FIG. 10, the element of related information increases, and the name of the image related to this part is held.

この情報は、例えば、携帯型画像撮像装置からＰＣへ画像をアップロードする時点で作成することが可能である。また、音声認識処理を実行した際に、この情報を登録するようにしてもよい。 This information can be created, for example, when an image is uploaded from the portable image pickup device to the PC. Further, this information may be registered when the voice recognition process is executed.

（実施例４）
上記実施例では、ある画像がある画像の関連画像であるとみなす場合に、その撮像時間差を考慮しなかった。しかし、あまりに時間差がある画像を関連画像と見なすことは不自然である。そこで、ある音声アノテーション付き画像Ａが撮像された時間から、次の音声アノテーション付き画像が撮像される時間までに撮像された画像群であっても、その撮像時刻が画像Ａの撮像時刻から一定時間以上経過したものは、画像Ａの関連画像とみなさないという処理を加えることも可能である。 Example 4
In the above embodiment, when an image is regarded as a related image of an image, the difference in imaging time is not taken into consideration. However, it is unnatural to consider an image with a time difference as a related image. Thus, even in an image group captured from the time when an image A with an audio annotation is captured until the time when the next image with an audio annotation is captured, the imaging time is a fixed time from the imaging time of the image A. It is also possible to add a process that does not regard the image that has passed above as the related image of the image A.

（実施例５）
上記実施例では、音声アノテーション付き画像が画像・音声データベースから削除された場合の処理には触れなかった。しかし、この画像が削除された場合にも、画像・音声データベース内の関連情報を自動で張り直すことが可能である。例えば、この画像に関連情報が付いていた場合は、その関連情報を消す一つ前の音声アノテーション付きデータの関連情報に追加することが考えられる。 (Example 5)
In the above embodiment, the processing when the image with audio annotation is deleted from the image / audio database is not touched. However, even when this image is deleted, the related information in the image / sound database can be automatically restored. For example, when related information is attached to this image, it can be considered that the related information is added to the related information of the immediately preceding data with the voice annotation to be deleted.

例えば、図１１で１０４．ｊｐｇの画像（音声アノテーション付き画像）が削除された場合を考える。その場合、関連情報である“１０５．ｊｐｇ”は、一つ前の音声アノテーション付きデータである１０１．ｊｐｇの関連情報に追加すればよい。この例の場合では、１０１．ｊｐｇの関連情報は“１０２．ｊｐｇ，１０３．ｊｐｇ，１０４．ｊｐｇ”となる。 For example, in FIG. Consider a case where an image of jpg (an image with audio annotation) is deleted. In this case, “105.jpg”, which is related information, is the previous data with voice annotation 101.jpg. What is necessary is just to add to the relevant information of jpg. In this example, 101. The related information of jpg is “102.jpg, 103.jpg, 104.jpg”.

もし、消された音声アノテーション付き画像がデータベース中で先頭のものであれば、その関連情報は削除してもよい。 If the deleted image with voice annotation is the first one in the database, the related information may be deleted.

（実施例６）
上記実施例では、画像情報に音声アノテーションを付与する例を挙げて説明してきたが、本発明はこれに限定されるものではなく、文書情報や音声情報等様々な情報への適応が可能である。また、音声アノテーションではなく、テキスト等のアノテーションを付与する場合においても適用可能である。 (Example 6)
In the above embodiment, an example in which audio annotation is added to image information has been described, but the present invention is not limited to this, and can be applied to various information such as document information and audio information. . Further, the present invention can also be applied when an annotation such as text is given instead of a voice annotation.

（実施例７）
なお、本発明の目的は、前述した実施例の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記憶媒体に格納されたプログラムコードを読出し実行することによっても、達成されることは言うまでもない。 (Example 7)
An object of the present invention is to supply a storage medium recording a program code of software that realizes the functions of the above-described embodiments to a system or apparatus, and the computer (or CPU or MPU) of the system or apparatus stores the storage medium. Needless to say, this can also be achieved by reading and executing the program code stored in the.

この場合、記憶媒体から読出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体は本発明を構成することになる。 In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present invention.

プログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク，ハードディスク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭなどを用いることができる。 As a storage medium for supplying the program code, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a CD-ROM, a CD-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

また、コンピュータが読出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program code read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on the instruction of the program code. It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the actual processing and the processing is included.

さらに、記憶媒体から読出されたプログラムコードが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the board or the function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing.

実施例の情報管理装置の機能構成図である。It is a functional block diagram of the information management apparatus of an Example. 実施例の情報管理装置の音声認識処理部のフローチャートである。It is a flowchart of the speech recognition process part of the information management apparatus of an Example. 実施例の情報管理装置の画像検索処理部のフローチャートである。It is a flowchart of the image search process part of the information management apparatus of an Example. 実施例の携帯型画像撮像装置の例である。It is an example of the portable imaging device of an Example. 撮像写真とその関連画像の関係を示した説明図である。It is explanatory drawing which showed the relationship between a picked-up photograph and its related image. 実施例の画像アップロード時の操作ＵＩ画面の例である。It is an example of the operation UI screen at the time of image upload of an Example. 実施例の音声認識結果格納ファイルの例である。It is an example of the speech recognition result storage file of an Example. 実施例の画像検索時画面の例である。It is an example of the screen at the time of the image search of an Example. 実施例の画像検索結果の提示例である。It is an example of presentation of an image search result of an example. 実施例の情報管理装置の画像・音声データベースの例である。It is an example of the image / sound database of the information management apparatus of an Example. 実施例３〜５の情報管理装置の画像・音声データベースの例である。It is an example of the image and audio | voice database of the information management apparatus of Examples 3-5.

Claims

Search means for searching for the search target information based on the input search condition and annotation information associated with the search target information;
For each search target information searched by the search means, an extraction means for extracting related information from a storage means in which the search target information is stored based on time information associated with the search target information;
An information management apparatus comprising: a display control unit configured to control the search target information searched by the search unit and the related information extracted by the extraction unit to be related and presented to the user.

The information is an image;
The extraction unit captures an image between the search target image and an image having imaging date and time information immediately after the search target image in an image associated with annotation information in the image stored in the storage unit. The information management apparatus according to claim 1, wherein an image having date information and not associated with the annotation information stored in the storage unit is extracted as a related image of the search target image.

The information is an image;
The extraction unit extracts, as a related image, an image having a difference in imaging date and time between the search target image and the search target image within a predetermined time from the images stored in the storage unit. The information management apparatus according to claim 1.

The information management apparatus according to claim 1, wherein the extraction unit is performed after the operation of the search unit.

A registration unit that registers the search target information and the annotation information in the storage unit;
The information management apparatus according to claim 1, wherein the extraction unit is performed after the operation of the registration unit.

The annotation information includes voice annotation information,
The information management apparatus according to claim 1, further comprising voice recognition means for voice recognition of the voice annotation information and converting the voice annotation information into text information.

A search process for searching for search target information based on the input search conditions and annotation information associated with the search target information;
For each search target information searched in the search step, an extraction step for extracting related information from a storage means in which the search target information is stored based on time information associated with the search target information;
An information management method, comprising: a search control step of controlling the search target information searched in the search step, the information extracted in the extraction step, and the related information to be related and presented to the user.

The information is an image;
The extraction step includes imaging between the search target image and an image having imaging date and time information immediately after the search target image among images associated with annotation information in the image stored in the storage unit. The information management method according to claim 7, wherein an image having date information and not associated with the annotation information stored in the storage unit is extracted as a related image of the search target image.

In the extracting step, an image having a difference in imaging date and time between the search target image and the search target image within a predetermined time is extracted as a related image from the search target image and the image stored in the storage unit. The information management method according to claim 7.

The information management method according to claim 7, wherein the extraction step is performed after the operation of the search step.

A registration step of registering the search target information and the annotation information in the storage unit;
The information management method according to claim 7, wherein the extraction step is performed after the operation of the registration step.

The annotation information includes voice annotation information,
8. The information management method according to claim 7, further comprising a speech recognition step of recognizing the speech annotation information and converting the speech annotation information into text information.

A control program for causing a computer to execute the information processing method according to claim 7.

A computer-readable storage medium storing the control program according to claim 13.