JP2008085582A

JP2008085582A - System for controlling image, image taking apparatus, image control server and method for controlling image

Info

Publication number: JP2008085582A
Application number: JP2006262550A
Authority: JP
Inventors: Satoshi Nakamura; 敏中村
Original assignee: Fujifilm Corp
Current assignee: Fujifilm Corp
Priority date: 2006-09-27
Filing date: 2006-09-27
Publication date: 2008-04-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for controlling an image capable of simply generating information to be added to the image, an image taking apparatus, an image control server and a method for controlling the image. <P>SOLUTION: The system 1000 for controlling the image has a digital camera 1 and a personal computer 200 for communicating with the digital camera 1 through a USB cable 30 and achieving image control server for receiving the image from the digital camera 1 and controlling the image. In the system, the digital camera 1 transmits the image towards the personal computer 200 while transmitting a voice with matching information for the image. In the system, the personal computer 200 stores the received image while recognizing the received voice and converting it into a text, and generates information to be added to the image on the basis of the text. Prior to storage, the added information is mapped to the image associated with the voice used as the basis of generation of additional information. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、撮像素子上に被写体を結像させて画像を生成する撮影装置とその撮影装置から画像を受け取って管理する画像管理サーバとを備えた画像管理システム、その撮影装置、その画像管理サーバ、およびその画像管理サーバにおける画像管理方法に関する。 The present invention relates to an image management system including an imaging device that forms an image by imaging a subject on an imaging device, and an image management server that receives and manages the image from the imaging device, the imaging device, and the image management server And an image management method in the image management server.

ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）規格に代表される有線通信技術や、ブルートゥース規格に準拠した電波通信やＩｒＤＡ（ＩｎｆｒａｒｅｄＤａｔａＡｓｓｏｃｉａｔｉｏｎ）に準拠した赤外線通信等の無線通信技術の進歩に伴い、画像等を表わす大容量データを高速に伝送することが可能となってきている。その結果、例えばデジタルカメラで撮影して得られた大量の画像データを、画像を管理する画像管理サーバに任意のタイミングで高速に転送することにより、そのデジタルカメラが有する記憶容量に制約されることなく、所望の枚数だけ写真撮影を行なうことができる。 With the progress of wireless communication technologies such as USB (Universal Serial Bus) standards, wireless communication technologies compliant with Bluetooth standards, radio communication compliant with Bluetooth standards, and infrared communication compliant with IrDA (Infrared Data Association), etc. It has become possible to transmit capacity data at high speed. As a result, for example, a large amount of image data obtained by photographing with a digital camera can be transferred to an image management server that manages images at high speed at an arbitrary timing, thereby limiting the storage capacity of the digital camera. The desired number of photographs can be taken.

一般に、デジタルカメラで撮影して得られた画像データには、日付情報や撮影情報等の付加情報が含まれており、画像管理サーバでは、これらの付加情報とリンクさせて画像データを保存するということが行なわれている。 In general, image data obtained by shooting with a digital camera includes additional information such as date information and shooting information, and the image management server saves image data linked to the additional information. Has been done.

例えば、デジタルカメラで撮影して得られた画像データを通信回線経由で受信して保存する画像保存システムであって、テキストデータに変換された音声データといった付加情報に基づいてディレクトリを作成し、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）情報や撮影日時情報に基づいたファイル名を生成して画像データを保存するシステムが提案されている（特許文献１参照）。 For example, an image storage system that receives and stores image data obtained by photographing with a digital camera via a communication line, creates a directory based on additional information such as audio data converted into text data, and GPS A system for generating a file name based on (Global Positioning System) information and shooting date / time information and storing image data has been proposed (see Patent Document 1).

また、画像に付随する日時，位置情報，露光情報等の属性情報と、予め登録されているユーザ・プロファイル情報，地図情報，季節行事の情報，イベント情報等から、画像データのコンテキストを推定し、推定した結果に基づいて画像データの分類や検索等を行なうことができる画像管理システムが提案されている（特許文献２参照）。 In addition, the context of the image data is estimated from attribute information such as date and time, position information, exposure information, etc. attached to the image and user profile information, map information, seasonal event information, event information, etc. registered in advance, There has been proposed an image management system capable of classifying and searching image data based on an estimated result (see Patent Document 2).

さらに、外部から受信した個人情報に基づいて、撮影画像の付加データである被写体人物に関する情報を作成したり更新したりする装置および方法が提案されている（特許文献３参照）。 Furthermore, an apparatus and method for creating or updating information about a subject person, which is additional data of a captured image, based on personal information received from outside has been proposed (see Patent Document 3).

また、画像内の人物を識別して、その人物の名前を画像ファイルの付加データに追加するデジタルカメラおよび方法が提案されている（特許文献４参照）。
特開２００５−２０３９９４号公報特開２００２−１０１７８号公報特開２００４−１３３５３６号公報特開２００４−６２８６８号公報 There has also been proposed a digital camera and method for identifying a person in an image and adding the person's name to the additional data of the image file (see Patent Document 4).
JP 2005-203994 A JP 2002-10178 A JP 2004-133536 A JP 2004-62868 A

近年、通信技術の益々の進歩に伴い、デジタルカメラ等の撮影装置で撮影して得られた大量の画像データを任意のタイミングで画像管理サーバに転送し、その画像管理サーバで画像を一括管理するという傾向が強まりつつある。ここで、画像管理サーバでは、画像に付加される付加情報に基づいて画像を一括管理するということが行なわれる。このため、画像に付加される付加情報を効率よく且つ精度よく生成することが重要である。 In recent years, with the progress of communication technology, a large amount of image data obtained by photographing with a photographing device such as a digital camera is transferred to an image management server at an arbitrary timing, and images are collectively managed by the image management server. This tendency is getting stronger. Here, the image management server performs batch management of images based on additional information added to the images. For this reason, it is important to efficiently and accurately generate additional information added to an image.

しかし、上述した特許文献１に提案された技術では、テキストデータに変換された音声データといった付加情報を生成するにあたり、話者の音声の変換を撮影装置側で行なう必要がある。ここで、音声の変換を行なうための音声認識処理における負荷は大きく、このためリソース（記憶容量等）が限られた撮影装置側で音声の変換を行なうのでは、撮影装置側の負担が大きいという問題がある。 However, in the technique proposed in Patent Document 1 described above, in order to generate additional information such as voice data converted into text data, the voice of the speaker needs to be converted on the photographing apparatus side. Here, the load in the speech recognition processing for converting the voice is large, and therefore, if the voice conversion is performed on the side of the photographing apparatus with limited resources (storage capacity, etc.), the burden on the photographing apparatus side is large. There's a problem.

また、特許文献２に提案された技術では、コンテキストを推定するための推論エンジンに対してカメラユーザがコマンドや推論のための追加情報を入力することは困難であり、コンテキストの候補が複数ある場合、推論エンジンの推論精度が低下する恐れがあるという問題を抱えている。 In the technique proposed in Patent Document 2, it is difficult for a camera user to input a command or additional information for inference to an inference engine for estimating a context, and there are a plurality of context candidates. There is a problem that the inference accuracy of the inference engine may decrease.

さらに、特許文献３に提案された技術では、被写体人物に関する情報を生成するためには、別途個人情報を作成して装置に受信させる必要があり、従って手間がかかるという問題がある。 Furthermore, in the technique proposed in Patent Document 3, in order to generate information related to the subject person, it is necessary to create personal information separately and cause the apparatus to receive it.

また、特許文献４に提案された技術では、カメラ内部で人物の識別を行なってその人物に関する情報を生成するため、カメラ内部に設けられたメモリ容量の多くを使用する必要があり、従ってカメラ内に画像データを十分に格納することが困難であるという問題がある。 In the technique proposed in Patent Document 4, since a person is identified inside the camera and information about the person is generated, it is necessary to use much of the memory capacity provided in the camera. However, it is difficult to sufficiently store image data.

本発明は、上記事情に鑑み、画像に付加する付加情報を簡便に生成することができる画像管理システム、撮影装置、画像管理サーバ、および画像管理方法を提供することを目的とする。 In view of the circumstances described above, an object of the present invention is to provide an image management system, a photographing apparatus, an image management server, and an image management method that can easily generate additional information to be added to an image.

上記目的を達成する本発明の画像管理システムは、撮像素子上に被写体を結像させて画像を生成する撮影装置と、上記撮影装置と通信し該撮影装置から画像を受け取って管理する画像管理サーバとを備えた画像管理システムにおいて、
上記撮影装置が、
音声を取得する音声取得手段と、
上記音声取得手段で取得される音声を画像に対応づける対応づけ手段と、
上記画像管理サーバに向けて画像を送信するとともに、その画像管理サーバに向けて、上記音声取得手段で取得された音声を、上記対応づけ手段で対応づけられた画像との対応づけ情報を伴って送信する送信手段とを備え、
上記画像管理サーバは、
上記撮影装置から送信されてきた画像を受信するとともにその撮影装置から送信されてきた音声を受信する受信手段と、
上記受信手段で受信した画像を保管する保管手段と、
上記受信手段で受信した音声を認識してテキストに変換する音声認識手段と、
上記音声認識手段で音声が認識されてなるテキストに基づいて、画像に付加される付加情報を生成する付加情報生成手段とを備え、
上記保管手段が、画像を保管するとともに、上記付加情報生成手段で生成された付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて保管するものであることを特徴とする。 An image management system of the present invention that achieves the above object includes a photographing device that forms an image by forming a subject on an image sensor, and an image management server that communicates with the photographing device and receives and manages the image from the photographing device. In an image management system comprising
The photographing device is
Audio acquisition means for acquiring audio;
Association means for associating the sound acquired by the sound acquisition means with an image;
The image is transmitted to the image management server, and the sound acquired by the sound acquisition unit is associated with the image associated with the image associated by the association unit. A transmission means for transmitting,
The image management server
Receiving means for receiving the image transmitted from the imaging apparatus and receiving the audio transmitted from the imaging apparatus;
Storage means for storing the image received by the receiving means;
Voice recognition means for recognizing the voice received by the receiving means and converting it into text;
Additional information generating means for generating additional information to be added to an image based on text obtained by recognizing voice by the voice recognition means;
The storage means stores the image and stores the additional information generated by the additional information generation means in association with the image associated with the sound that is the basis of the additional information generation. It is characterized by.

本発明の画像管理システムでは、撮影装置が、画像管理サーバに向けて、画像を送信するとともに画像との対応づけ情報を伴って音声を送信するものであり、また、画像管理サーバが、受信した画像を保管するとともに、受信した音声を認識してテキストに変換して、そのテキストに基づいて画像に付加される付加情報を生成し、その付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて保管するものである。このため、画像管理サーバで、音声認識処理が行なわれて画像に付加される付加情報が生成される。従って、撮影装置で音声認識処理を行なう必要はなく、撮影装置の処理の負担を軽減したまま、画像に付加される付加情報を音声で簡便に生成することができる。 In the image management system of the present invention, the imaging device transmits an image to the image management server and transmits a sound with information associated with the image, and the image management server receives the image. The image is stored, the received voice is recognized and converted into text, additional information added to the image is generated based on the text, and the additional information is used as the basis for generating the additional information. The image is stored in association with the image associated with. For this reason, the image management server performs voice recognition processing to generate additional information to be added to the image. Therefore, it is not necessary to perform voice recognition processing with the photographing apparatus, and it is possible to easily generate additional information to be added to the image while reducing the processing burden of the photographing apparatus.

ここで、本発明の画像管理システムにおける上記撮影装置は、上記音声取得手段で取得した音声を格納した音声ファイルを作成する音声ファイル作成手段を備え、
上記送信手段は、上記画像管理サーバに向けて音声を送信するにあたり、上記音声ファイル作成手段で作成された音声ファイルを送信するものであることが好ましい。 Here, the imaging device in the image management system of the present invention includes an audio file creation unit that creates an audio file storing the audio acquired by the audio acquisition unit,
The transmission unit preferably transmits the audio file created by the audio file creation unit when transmitting the audio toward the image management server.

撮影装置が、画像管理サーバに向けて音声を送信するにあたり、音声を格納した音声ファイルを作成して送信すると、画像管理サーバでは、その音声ファイルに基づいて音声を認識して画像に付加される付加情報を生成するため、画像に付加される付加情報を音声でさらに簡便に生成することができる。 When the image capturing device creates and transmits a sound file storing sound when transmitting sound to the image management server, the image management server recognizes the sound based on the sound file and adds it to the image. Since the additional information is generated, the additional information to be added to the image can be generated more easily by voice.

また、本発明の画像管理システムにおける上記送信手段は、上記画像管理サーバに向けて音声を送信するにあたり、上記音声取得手段で取得した音声をそのまま送信するものであり、
上記画像管理サーバが、上記受信手段で受信した音声を格納した音声ファイルを作成する音声ファイル作成手段を備えたものであることも好ましい態様である。 Further, the transmission means in the image management system of the present invention transmits the voice acquired by the voice acquisition means as it is when transmitting the voice to the image management server,
It is also a preferable aspect that the image management server includes an audio file creating unit that creates an audio file storing the audio received by the receiving unit.

このようにすると、撮影装置側の処理の負担をさらに軽減することができる。 In this way, it is possible to further reduce the processing burden on the photographing apparatus side.

さらに、本発明の画像管理システムにおける上記撮影装置が、
当該撮影装置の地理上の位置情報を取得する位置情報取得手段と、
日時情報を取得する日時情報取得手段と、
当該撮影装置のユーザのスケジュール情報を取得するスケジュール情報取得手段とを備え、
上記送信手段が、上記画像管理サーバに向けて、さらに、上記位置情報取得手段、上記日時情報取得手段、および上記スケジュール情報取得手段によりそれぞれ取得された位置情報、日時情報、およびスケジュール情報を送信するものであり、
上記受信手段は、さらに、上記撮影装置から送信されてきた位置情報、日時情報、およびスケジュール情報を受信するものであり、
上記付加情報生成手段は、上記音声認識手段で音声が認識されてなるテキストに基づくとともに、上記受信手段で受信した位置情報、日時情報、およびスケジュール情報にも基づいて付加情報を生成するものであることも好ましい。 Furthermore, the imaging device in the image management system of the present invention,
Position information acquisition means for acquiring geographical position information of the photographing apparatus;
Date and time information acquisition means for acquiring date and time information;
Schedule information acquisition means for acquiring schedule information of the user of the photographing apparatus,
The transmission means further transmits the position information, date information, and schedule information respectively acquired by the position information acquisition means, the date information acquisition means, and the schedule information acquisition means to the image management server. Is,
The receiving means further receives position information, date / time information, and schedule information transmitted from the photographing apparatus,
The additional information generating means generates additional information based on the text obtained by recognizing the voice by the voice recognition means and also based on the position information, date / time information, and schedule information received by the receiving means. It is also preferable.

このようにすると、画像管理サーバに、撮影装置の地理上の位置情報、日時情報、およびユーザのスケジュール情報を提供することができ、従って画像管理サーバでは画像に付加する付加情報を音声で簡便に生成することができるとともに、その画像に付加する位置情報、日時情報、およびスケジュール情報をも簡便に生成することができる。 In this way, it is possible to provide the image management server with the geographical position information, date / time information, and user schedule information of the photographing apparatus. Therefore, the image management server can easily add additional information added to the image by voice. While being able to generate | occur | produce, the positional information, date information, and schedule information added to the image can also be produced | generated easily.

また、本発明の画像管理システムにおける上記画像管理サーバが、さらに、地図を管理する地図データベースを備え、
上記付加情報生成手段が、さらに、上記地図データベース内の地図にも基づいて付加情報を生成するものであることも好ましい態様である。 The image management server in the image management system of the present invention further includes a map database for managing maps,
It is also a preferred aspect that the additional information generating means further generates additional information based on a map in the map database.

このようにすると、画像管理サーバは、画像に付加する付加情報を音声で簡便に生成することができるとともに、その画像に付加する地図情報をも簡便に生成することができる。 In this way, the image management server can easily generate additional information to be added to the image by voice and can also easily generate map information to be added to the image.

さらに、本発明の画像管理システムにおける上記画像管理サーバが、上記受信手段で受信した画像から該画像上に写った顔を認識する顔認識手段を備え、
上記付加情報生成手段が、上記音声認識手段で音声が認識されてなるテキストに基づくとともに、上記顔認識手段で認識された、画像上の顔にも基づいて付加情報を生成するものであることも好ましい。 Furthermore, the image management server in the image management system of the present invention comprises face recognition means for recognizing a face reflected on the image from the image received by the receiving means,
The additional information generation means may generate additional information based on the text on which the voice is recognized by the voice recognition means and also based on the face on the image recognized by the face recognition means. preferable.

このように、音声が認識されてなるテキストに基づくとともに、画像上の顔にも基づいて付加情報を生成すると、音声で被写体人物情報を付加情報とすることができる。 As described above, when the additional information is generated based on the text from which the voice is recognized and also based on the face on the image, the subject person information can be used as the additional information by the voice.

また、本発明の画像管理システムにおける上記画像管理サーバが、さらに顔情報を管理する顔情報データベースを備え、
上記付加情報生成手段が、さらに上記顔情報データベース内の顔情報にも基づいて付加情報を生成するものであることも好ましい態様である。 Further, the image management server in the image management system of the present invention further comprises a face information database for managing face information,
It is also a preferred aspect that the additional information generating means further generates additional information based on the face information in the face information database.

このように、顔情報を管理する顔情報データベース内の顔情報にも基づいて付加情報を生成すると、画像上の顔に基づいて付加情報を生成するための処理を迅速に行なうことができる。 As described above, when the additional information is generated based on the face information in the face information database managing the face information, the process for generating the additional information based on the face on the image can be quickly performed.

さらに、本発明の画像管理システムにおける上記音声認識手段で音声が認識されてなるテキストと上記顔認識手段で認識された画像上の顔とに基づいて新たな顔情報を生成して上記顔情報データベースに追加する顔情報追加手段を備えたことも好ましい。 Further, the face information database is generated by generating new face information on the basis of the text recognized by the voice recognition means and the face on the image recognized by the face recognition means in the image management system of the present invention. It is also preferable to include a face information adding means for adding to the face.

このように、画像管理サーバにおいて、音声が認識されてなるテキストと認識された画像上の顔とに基づいて新たな顔情報を生成して顔情報データベースに追加すると、画像上の被写体人物の顔情報を顔情報データベースに登録することができる。 As described above, when new face information is generated and added to the face information database based on the recognized text and the face on the recognized image in the image management server, the face of the subject person on the image is displayed. Information can be registered in the face information database.

また、本発明の画像管理システムにおける上記付加情報生成手段が、上記音声認識手段で音声が認識されてなるテキストに基づいて付加情報生成に関するコマンドを認識するコマンド認識手段を含み、そのコマンド認識手段で認識したコマンドに従って付加情報を生成するものであることも好ましい態様である。 The additional information generation means in the image management system of the present invention includes command recognition means for recognizing a command related to additional information generation based on text obtained by recognizing voice by the voice recognition means. It is also a preferable aspect that the additional information is generated according to the recognized command.

このようにすると、ユーザは付加情報生成手段に対して指示を与えることができる。 In this way, the user can give an instruction to the additional information generating means.

また、上記目的を達成する本発明の撮影装置は、撮像素子上に被写体を結像させて画像を生成し、画像を受け取って管理する画像管理サーバと通信してその画像管理サーバに画像を送信する撮影装置において、
音声を取得する音声取得手段と、
上記音声取得手段で取得される音声を画像に対応づける対応づけ手段と、
上記画像管理サーバに向けて画像を送信するとともに、その画像管理サーバに向けて、上記音声取得手段で取得された音声を、上記対応づけ手段で対応づけられた画像との対応づけ情報を伴って送信する送信手段とを備えたことを特徴とする。 In addition, the imaging apparatus of the present invention that achieves the above object generates an image by imaging a subject on an imaging device, communicates with an image management server that receives and manages the image, and transmits the image to the image management server In the shooting device
Audio acquisition means for acquiring audio;
Association means for associating the sound acquired by the sound acquisition means with an image;
The image is transmitted to the image management server, and the sound acquired by the sound acquisition unit is associated with the image associated with the image associated by the association unit. Transmission means for transmitting.

本発明の撮影装置は、画像管理サーバに向けて、画像を送信するとともに画像との対応づけ情報を伴って音声を送信するものである。このため、画像管理サーバで、音声認識処理が行なわれて画像に付加される付加情報が生成される。従って、撮影装置で音声認識処理を行なう必要はなく、撮影装置の処理の負担が軽減される。また、画像管理サーバでは、画像に付加される付加情報を音声で簡便に生成することができる。 The imaging apparatus of the present invention transmits an image to an image management server and transmits a sound with information associated with the image. For this reason, the image management server performs voice recognition processing to generate additional information to be added to the image. Therefore, it is not necessary to perform voice recognition processing with the photographing apparatus, and the processing burden on the photographing apparatus is reduced. Further, the image management server can easily generate additional information added to the image by voice.

ここで、本発明の撮影装置において、上記音声取得手段で取得した音声を格納した音声ファイルを作成する音声ファイル作成手段を備え、
上記送信手段は、上記画像管理サーバに向けて音声を送信するにあたり、上記音声ファイル作成手段で作成された音声ファイルを送信するものであることが好ましい。 Here, in the photographing apparatus of the present invention, provided with a voice file creation means for creating a voice file storing the voice acquired by the voice acquisition means,
The transmission unit preferably transmits the audio file created by the audio file creation unit when transmitting the audio toward the image management server.

このようにすると、画像管理サーバ側では、音声ファイルに基づいて音声を認識して画像に付加される付加情報を生成するため、画像に付加される付加情報を音声でさらに簡便に生成することができる。 In this way, since the image management server recognizes the sound based on the sound file and generates additional information added to the image, the additional information added to the image can be more easily generated by sound. it can.

また、本発明の撮影装置における上記送信手段は、上記画像管理サーバに向けて音声を送信するにあたり、上記音声取得手段で取得した音声をそのまま送信するものであることも好ましい態様である。 In addition, in a preferable aspect, the transmission unit in the photographing apparatus of the present invention transmits the voice acquired by the voice acquisition unit as it is when transmitting the voice to the image management server.

このようにすると、撮影装置の処理の負担をさらに軽減することができる。 In this way, it is possible to further reduce the processing load of the photographing apparatus.

さらに、本発明の撮影装置において、当該撮影装置の地理上の位置情報を取得する位置情報取得手段と、日時情報を取得する日時情報取得手段と、当該撮影装置のユーザのスケジュール情報を取得するスケジュール情報取得手段とを備え、
上記送信手段が、上記画像管理サーバに向けて、さらに、上記位置情報取得手段、上記日時情報取得手段、および上記スケジュール情報取得手段によりそれぞれ取得された位置情報、日時情報、およびスケジュール情報を送信するものであることも好ましい。 Furthermore, in the imaging apparatus of the present invention, a position information acquisition unit that acquires geographical position information of the imaging apparatus, a date and time information acquisition unit that acquires date and time information, and a schedule that acquires schedule information of the user of the imaging apparatus Information acquisition means,
The transmission means further transmits the position information, date information, and schedule information respectively acquired by the position information acquisition means, the date information acquisition means, and the schedule information acquisition means to the image management server. It is also preferable.

このようにすると、画像管理サーバに、撮影装置の地理上の位置情報、日時情報、およびユーザのスケジュール情報を提供することができ、その画像管理サーバではこれら位置情報、日時情報、およびスケジュール情報にも基づいて付加情報を生成することができる。 In this way, it is possible to provide the image management server with geographical location information, date / time information, and user schedule information of the imaging device, and the image management server includes the location information, date / time information, and schedule information. Additional information can be generated based on the above.

さらに、上記目的を達成する本発明の画像管理サーバは、撮像素子上に被写体を結像させて画像を生成する撮影装置と通信し該撮影装置から画像を受け取って管理する画像管理サーバにおいて、
上記撮影装置から送信されてきた画像を受信するとともにその撮影装置から送信されてきた音声を受信する受信手段と、
上記受信手段で受信した画像を保管する保管手段と、
上記受信手段で受信した音声を認識してテキストに変換する音声認識手段と、
上記音声認識手段で音声が認識されてなるテキストに基づいて、画像に付加される付加情報を生成する付加情報生成手段とを備え、
上記保管手段が、画像を保管するとともに、上記付加情報生成手段で生成された付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて保管するものであることを特徴とする。 Furthermore, an image management server of the present invention that achieves the above object is an image management server that communicates with an imaging device that forms an image on an image sensor and generates an image, and receives and manages the image from the imaging device.
Receiving means for receiving the image transmitted from the imaging apparatus and receiving the audio transmitted from the imaging apparatus;
Storage means for storing the image received by the receiving means;
Voice recognition means for recognizing the voice received by the receiving means and converting it into text;
Additional information generating means for generating additional information to be added to an image based on text obtained by recognizing voice by the voice recognition means;
The storage means stores the image and stores the additional information generated by the additional information generation means in association with the image associated with the sound that is the basis of the additional information generation. It is characterized by.

本発明の画像管理サーバは、受信した画像を保管するとともに、受信した音声を認識してテキストに変換して、そのテキストに基づいて画像に付加される付加情報を生成し、その付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて保管するものである。このため、この画像管理サーバにおいて音声認識処理が行なわれて画像に付加される付加情報が生成される。従って、撮影装置で音声認識処理を行なう必要はなく、撮影装置の処理の負担を軽減したまま、画像に付加される付加情報を音声で簡便に生成することができる。 The image management server of the present invention stores the received image, recognizes the received sound and converts it into text, generates additional information to be added to the image based on the text, The image is stored in association with the image associated with the sound that is the basis for generating the additional information. For this reason, the image management server performs voice recognition processing to generate additional information to be added to the image. Therefore, it is not necessary to perform voice recognition processing with the photographing apparatus, and it is possible to easily generate additional information to be added to the image while reducing the processing burden of the photographing apparatus.

ここで、本発明の画像管理サーバにおける上記受信手段は、音声の受信にあたり、上記撮影装置で作成されてその撮影装置から送信されてきた音声ファイルを受信するものであることが好ましい。 Here, it is preferable that the receiving means in the image management server of the present invention receives an audio file created by the imaging device and transmitted from the imaging device when receiving audio.

このようにすると、画像管理サーバでは、その音声ファイルに基づいて音声を認識して画像に付加される付加情報を生成することができ、従って画像に付加される付加情報を音声でさらに簡便に生成することができる。 In this way, the image management server can recognize the sound based on the sound file and generate additional information to be added to the image. Therefore, the additional information to be added to the image can be more easily generated by sound. can do.

また、本発明の画像管理サーバにおける上記受信手段は、音声の受信にあたり、上記撮影装置で取得されてそのまま送信されてきた音声を受信するものであり、
当該画像管理サーバが、上記受信手段で受信した音声を格納した音声ファイルを作成する音声ファイル作成手段を備えたものであることも好ましい態様である。 Further, the receiving means in the image management server of the present invention is for receiving the sound acquired by the photographing apparatus and transmitted as it is when receiving the sound,
It is also a preferable aspect that the image management server includes an audio file creating unit that creates an audio file storing the audio received by the receiving unit.

さらに、本発明の画像管理サーバにおける上記受信手段は、さらに、上記撮影装置から送信されてきた位置情報、日時情報、およびスケジュール情報を受信するものであり、
上記付加情報生成手段は、上記音声認識手段で音声が認識されてなるテキストに基づくとともに、上記受信手段で受信した位置情報、日時情報、およびスケジュール情報にも基づいて付加情報を生成するものであることも好ましい。 Furthermore, the receiving means in the image management server of the present invention further receives position information, date / time information, and schedule information transmitted from the photographing apparatus,
The additional information generating means generates additional information based on the text obtained by recognizing the voice by the voice recognition means and also based on the position information, date / time information, and schedule information received by the receiving means. It is also preferable.

このようにすると、画像に付加する付加情報を音声で簡便に生成することができるとともに、その画像に付加する位置情報、日時情報、およびスケジュール情報をも簡便に生成することができる。 In this way, it is possible to easily generate additional information to be added to an image by voice, and it is also possible to easily generate position information, date / time information, and schedule information to be added to the image.

さらに、本発明の画像管理サーバにおいて地図を管理する地図データベースを備え、
上記付加情報生成手段が、さらに、上記地図データベース内の地図にも基づいて付加情報を生成するものであることも好ましい態様である。 The image management server of the present invention further comprises a map database for managing maps,
It is also a preferred aspect that the additional information generating means further generates additional information based on a map in the map database.

また、本発明の画像管理サーバにおける上記受信手段で受信した画像からその画像上に写った顔を認識する顔認識手段を備え、
上記付加情報生成手段が、上記音声認識手段で音声が認識されてなるテキストに基づくとともに、上記顔認識手段で認識された、画像上の顔にも基づいて付加情報を生成するものであることも好ましい。 The image management server of the present invention further comprises face recognition means for recognizing a face reflected on the image from the image received by the receiving means,
The additional information generation means may generate additional information based on the text on which the voice is recognized by the voice recognition means and also based on the face on the image recognized by the face recognition means. preferable.

このようにすると、音声で被写体人物情報を付加情報とすることができる。 In this way, the subject person information can be used as additional information by voice.

さらに、本発明の画像管理サーバにおいて顔情報を管理する顔情報データベースを備え、
上記付加情報生成手段が、さらに上記顔情報データベース内の顔情報にも基づいて付加情報を生成するものであることも好ましい態様である。 Furthermore, the image management server of the present invention comprises a face information database for managing face information,
It is also a preferred aspect that the additional information generating means further generates additional information based on the face information in the face information database.

このようにすると、画像上の顔に基づいて付加情報を生成するための処理を迅速に行なうことができる。 In this way, it is possible to quickly perform processing for generating additional information based on the face on the image.

また、本発明の画像管理サーバにおける上記音声認識手段で音声が認識されてなるテキストと上記顔認識手段で認識された画像上の顔とに基づいて新たな顔情報を生成して上記顔情報データベースに追加する顔情報追加手段を備えたことも好ましい。 Further, the face information database is generated by generating new face information on the basis of the text recognized by the voice recognition means and the face on the image recognized by the face recognition means in the image management server of the present invention. It is also preferable to include a face information adding means for adding to the face.

このようにすると、画像上の被写体人物の顔情報を顔情報データベースに登録することができる。 In this way, the face information of the subject person on the image can be registered in the face information database.

さらに、本発明の画像管理サーバにおける上記付加情報生成手段が、上記音声認識手段で音声が認識されてなるテキストに基づいて付加情報生成に関するコマンドを認識するコマンド認識手段を含み、そのコマンド認識手段で認識したコマンドに従って付加情報を生成するものであることも好ましい態様である。 Further, the additional information generation means in the image management server of the present invention includes command recognition means for recognizing a command related to additional information generation based on text obtained by recognizing voice by the voice recognition means. It is also a preferable aspect that the additional information is generated according to the recognized command.

さらに、上記目的を達成する本発明の画像管理方法は、撮像素子上に被写体を結像させて画像を生成する撮影装置と通信しその撮影装置から画像を受け取って管理する画像管理サーバにおける画像管理方法において、
上記撮影装置から送信されてきた画像を受信するとともにその撮影装置から送信されてきた音声を受信する受信ステップと、
上記受信ステップで受信した画像を保管する保管ステップと、
上記受信ステップで受信した音声を認識してテキストに変換する音声認識ステップと、
上記音声認識ステップで音声が認識されてなるテキストに基づいて、画像に付加される付加情報を生成する付加情報生成ステップとを有し、
上記保管ステップが、画像を保管するとともに、上記付加情報生成ステップで生成された付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて保管するステップであることを特徴とする。 Furthermore, the image management method of the present invention that achieves the above-described object provides an image management in an image management server that communicates with an imaging device that forms an image by forming an object on an image sensor and receives and manages the image from the imaging device. In the method
A receiving step of receiving an image transmitted from the imaging device and receiving an audio transmitted from the imaging device;
A storage step for storing the image received in the reception step;
A speech recognition step of recognizing the speech received in the reception step and converting it to text;
An additional information generation step for generating additional information to be added to the image based on the text in which the voice is recognized in the voice recognition step;
The storing step is a step of storing the image and storing the additional information generated in the additional information generating step in association with the image associated with the sound that is the basis of the additional information generation. It is characterized by.

本発明の画像管理方法は、上記本発明の画像管理サーバにおける画像管理方法であるため、この画像管理サーバにおいて音声認識処理を行なって画像に付加される付加情報を生成することができる。従って、撮影装置で音声認識処理を行なう必要はなく、撮影装置の処理の負担を軽減したまま、画像に付加される付加情報を音声で簡便に生成することができる。 Since the image management method of the present invention is an image management method in the image management server of the present invention, additional information added to an image can be generated by performing voice recognition processing in the image management server. Therefore, it is not necessary to perform voice recognition processing with the photographing apparatus, and it is possible to easily generate additional information to be added to the image while reducing the processing burden of the photographing apparatus.

ここで、本発明の画像管理方法における上記受信ステップは、音声の受信にあたり、上記撮影装置で作成されて該撮影装置から送信されてきた音声ファイルを受信するステップであることが好ましい。 Here, it is preferable that the reception step in the image management method of the present invention is a step of receiving an audio file created by the imaging apparatus and transmitted from the imaging apparatus when receiving audio.

このようなステップであると、画像管理サーバでは、その音声ファイルに基づいて音声を認識して画像に付加される付加情報を生成するため、画像に付加される付加情報を音声でさらに簡便に生成することができる。 In such a step, the image management server recognizes the sound based on the sound file and generates additional information to be added to the image. Therefore, the additional information to be added to the image is more easily generated by sound. can do.

また、本発明の画像管理方法における上記受信ステップは、音声の受信にあたり、上記撮影装置で取得されてそのまま送信されてきた音声を受信するステップであり、さらに、上記受信ステップで受信した音声を格納した音声ファイルを作成する音声ファイル作成ステップを有することも好ましい態様である。 Further, in the image management method of the present invention, the reception step is a step of receiving the voice acquired by the photographing apparatus and transmitted as it is when receiving the voice, and further stores the voice received in the reception step. It is also a preferable aspect to have an audio file creation step for creating an audio file.

このようなステップであると、撮影装置側の処理の負担をさらに軽減することができる。 With such a step, the processing burden on the photographing apparatus side can be further reduced.

さらに、本発明の画像管理方法における上記受信ステップは、さらに、上記撮影装置から送信されてきた位置情報、日時情報、およびスケジュール情報を受信するものであり、
上記付加情報生成ステップは、上記音声認識ステップで音声が認識されてなるテキストに基づくとともに、上記受信ステップで受信した位置情報、日時情報、およびスケジュール情報にも基づいて付加情報を生成するステップであることも好ましい。 Furthermore, the receiving step in the image management method of the present invention further receives position information, date / time information, and schedule information transmitted from the photographing apparatus,
The additional information generation step is a step of generating additional information based on the text obtained by recognizing the speech in the speech recognition step and also based on the position information, date / time information, and schedule information received in the reception step. It is also preferable.

このようなステップであると、画像管理サーバでは画像に付加する付加情報を音声で簡便に生成することができるとともに、その画像に付加する位置情報、日時情報、およびスケジュール情報をも簡便に生成することができる。 In such a step, the image management server can easily generate additional information to be added to the image by voice, and can also easily generate position information, date / time information, and schedule information to be added to the image. be able to.

また、本発明の画像管理方法において、上記画像管理サーバが、地図を管理する地図データベースを備えたものであって、
上記付加情報生成ステップが、さらに、上記地図データベース内の地図にも基づいて付加情報を生成するステップであることも好ましい態様である。 In the image management method of the present invention, the image management server includes a map database for managing a map,
It is also a preferred aspect that the additional information generating step is a step of generating additional information based on the map in the map database.

このようなステップであると、画像管理サーバは、画像に付加する付加情報を音声で簡便に生成することができるとともに、その画像に付加する地図情報をも簡便に生成することができる。 In such a step, the image management server can easily generate additional information to be added to the image by voice, and can also easily generate map information to be added to the image.

さらに、本発明の画像管理方法における上記受信ステップで受信した画像から該画像上に写った顔を認識する顔認識ステップを有し、
上記付加情報生成ステップが、上記音声認識ステップで音声が認識されてなるテキストに基づくとともに、上記顔認識ステップで認識された、画像上の顔にも基づいて付加情報を生成するステップであることも好ましい態様である。 Furthermore, it has a face recognition step for recognizing a face reflected on the image from the image received in the reception step in the image management method of the present invention,
The additional information generation step may be a step of generating additional information based on the text on which the voice is recognized in the voice recognition step and also on the face on the image recognized in the face recognition step. This is a preferred embodiment.

このようなステップであると、音声で被写体人物情報を付加情報とすることができる。 In such a step, the subject person information can be used as additional information by voice.

また、本発明の画像管理方法において、上記画像管理サーバが、さらに顔情報を管理する顔情報データベースを備え、
上記付加情報生成ステップが、さらに上記顔情報データベース内の顔情報にも基づいて付加情報を生成するステップであることも好ましい。 In the image management method of the present invention, the image management server further includes a face information database for managing face information,
It is also preferable that the additional information generation step is a step of generating additional information based on face information in the face information database.

このようなものであると、画像上の顔に基づいて付加情報を生成するための処理を迅速に行なうことができる。 With such a configuration, processing for generating additional information based on the face on the image can be quickly performed.

さらに、本発明の画像管理方法における上記音声認識ステップで音声が認識されてなるテキストと上記顔認識ステップで認識された画像上の顔とに基づいて新たな顔情報を生成して上記顔情報データベースに追加する顔情報追加ステップを有することも好ましい態様である。 Further, the face information database is generated by generating new face information based on the text in which the voice is recognized in the voice recognition step and the face on the image recognized in the face recognition step in the image management method of the present invention. It is also a preferable aspect to have a face information adding step to be added to.

このようなステップを有すると、画像上の被写体人物の顔情報を顔情報データベースに登録することができる。 With such steps, the face information of the subject person on the image can be registered in the face information database.

また、本発明の画像管理方法における上記付加情報生成ステップが、上記音声認識ステップで音声が認識されてなるテキストに基づいて、付加情報生成に関するコマンドを認識するコマンド認識ステップを含み、そのコマンド認識ステップで認識したコマンドに従って付加情報を生成するステップであることも好ましい。 Further, the additional information generation step in the image management method of the present invention includes a command recognition step for recognizing a command related to additional information generation based on the text obtained by recognizing the voice in the voice recognition step. It is also preferable that the additional information be generated according to the command recognized in step (b).

このようなステップであると、ユーザは付加情報生成手段に対して指示を与えることができる。 In such a step, the user can give an instruction to the additional information generating means.

本発明によれば、画像に付加する付加情報を簡便に生成することができる画像管理システム、撮影装置、画像管理サーバ、および画像管理方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the image management system, imaging device, image management server, and image management method which can produce | generate the additional information added to an image simply can be provided.

以下、本発明の実施の形態について説明する。 Embodiments of the present invention will be described below.

図１は、本発明の第１実施形態の画像管理システムを示す図である。 FIG. 1 is a diagram showing an image management system according to the first embodiment of the present invention.

図１に示す画像管理システム１０００には、撮像素子上に被写体を結像させて画像を生成するデジタルカメラ１（本発明の撮影装置の第１実施形態に相当）と、このデジタルカメラ１と通信しそのデジタルカメラ１から画像を受け取って管理することにより本発明の画像管理サーバの第１実施形態を実現するためのパーソナルコンピュータ２００とが備えられている。 An image management system 1000 shown in FIG. 1 includes a digital camera 1 (corresponding to the first embodiment of the photographing apparatus of the present invention) that forms an image by imaging a subject on an image sensor, and communicates with the digital camera 1. A personal computer 200 for realizing the first embodiment of the image management server of the present invention by receiving and managing images from the digital camera 1 is provided.

デジタルカメラ１の詳細については後述するが、このデジタルカメラ１には、ＵＳＢケーブル３０のＵＳＢコネクタ３１が接続されるＵＳＢ接続部１７０が備えられている。 Although details of the digital camera 1 will be described later, the digital camera 1 includes a USB connection unit 170 to which the USB connector 31 of the USB cable 30 is connected.

また、図１には、ＵＳＢケーブル３０のＵＳＢコネクタ３２と接続されるＵＳＢ接続部２１４を有し、その接続部２１４へのデジタルカメラ１の、ＵＳＢコネクタ３２による接続を受け、さらに所定の操作によりデジタルカメラ１から画像データを取り込むパーソナルコンピュータ２００が示されている。このパーソナルコンピュータ２００は、本体部２１０と、表示画面２２０ａを備えた画像表示装置２２０と、キーボード２３０と、マウス２４０とを有する。本体部２１０には、詳細は後述するが、ＣＰＵ（中央処理装置）、主メモリ（ランダムアクセスメモリ）、ハードディスク、通信Ｉ／Ｆ等が内蔵されている。また、この本体部２１０は、ＣＤ−ＲＯＭが装填されるＣＤ−ＲＯＭ装填口２１１と、フレキシブルディスクが装填されるフレキシブルディスク装填口２１２を有しており、それらの内部には、それらの装填口２１１，２１２から装填されたＣＤ−ＲＯＭやフレキシブルディスクをドライブしてアクセスするＣＤ−ＲＯＭドライブやフレキシブルディスクドライブが内蔵されている。さらに、この本体部２１０は、電源スイッチ２１３を有する。 In addition, FIG. 1 includes a USB connection unit 214 connected to the USB connector 32 of the USB cable 30, receives connection of the digital camera 1 to the connection unit 214 by the USB connector 32, and further performs a predetermined operation. A personal computer 200 that captures image data from the digital camera 1 is shown. The personal computer 200 includes a main body 210, an image display device 220 including a display screen 220 a, a keyboard 230, and a mouse 240. Although the details will be described later, the main body 210 incorporates a CPU (central processing unit), a main memory (random access memory), a hard disk, a communication I / F, and the like. The main body 210 has a CD-ROM loading slot 211 into which a CD-ROM is loaded and a flexible disk loading slot 212 into which a flexible disk is loaded. A CD-ROM drive and a flexible disk drive for accessing a CD-ROM and a flexible disk loaded from 211 and 212 are incorporated. Further, the main body 210 has a power switch 213.

図２は、図１に示すデジタルカメラを上方から見た上面図（図２（ａ））、正面から見た正面図（図２（ｂ））、および背面から見た背面図（図２（ｃ））である。 2 is a top view of the digital camera shown in FIG. 1 as viewed from above (FIG. 2A), a front view as viewed from the front (FIG. 2B), and a rear view as viewed from the back (FIG. 2 (2)). c)).

このデジタルカメラ１は、図２（ａ），図２（ｂ）に示すようにカメラボディ１ａの中央にレンズ鏡胴１０が配備されている。また、このカメラボディ１ａの上面にはパワーボタン１１が配備されており、このパワーボタン１１が操作されたら、図２（ａ），図２（ｂ）に示すようにレンズ鏡胴１０が繰出されて撮影準備が整えられるようになっている。 As shown in FIGS. 2A and 2B, the digital camera 1 is provided with a lens barrel 10 in the center of the camera body 1a. Further, a power button 11 is provided on the upper surface of the camera body 1a. When the power button 11 is operated, the lens barrel 10 is extended as shown in FIGS. 2 (a) and 2 (b). Ready to shoot.

さらに、カメラボディ１ａ上面には、電源投入用のパワーボタン１１のほか、レリーズボタン１２とそのレリーズボタン１２の廻りに撮影モードダイヤル１２＿１とが配備されており、この撮影モードダイヤル１２＿１がＡＵＴＯに切り替えられているときにレリーズボタン１２が押されて撮影が行なわれるときにはデジタルカメラ１内部に自動的に撮影条件が設定されて撮影が行なわれる。また、この撮影モードダイヤル１２＿１が動画撮影モード（符号Ｍ）に切り替えられると、動画撮影が行なわれるようになり、シーンポジション（符号ＳＰ）側に切り替えられると、撮影シーンに応じた撮影条件がデジタルカメラ１内部に自動的に設定されて撮影が行なわれるようになる。 Further, on the upper surface of the camera body 1a, in addition to the power button 11 for turning on the power, a release button 12 and a shooting mode dial 12_1 are provided around the release button 12, and the shooting mode dial 12_1 is switched to AUTO. When the release button 12 is pressed while shooting is performed, shooting conditions are automatically set in the digital camera 1 and shooting is performed. When the shooting mode dial 12_1 is switched to the moving image shooting mode (symbol M), moving image shooting is performed. When the shooting mode dial 12_1 is switched to the scene position (symbol SP) side, shooting conditions corresponding to the shooting scene are digital. The camera 1 is automatically set to take a picture.

また、図２（ｃ）に示すように、カメラボディ１ａの背面側には、画像表示装置１３が設けられており、撮影モード時には、この画像表示装置１３上に被写体が表示されたり、メニューが表示されたりする。この撮影モードにあるときに画像表示装置１３の脇にある再生ボタン１４が一度押されると、再生モードに切り替わり既撮影画像が画像表示装置１３上に表示され、この再生ボタン１４が再度押されると、撮影モードに切り替わりスルー画像が画像表示装置１３上に表示される。また、再生モードボタン１４の横には、Ｆ（フォトモード）ボタン１５が配備されており、よく用いられるモード、例えばピクセル設定モードや感度設定モードなどの切り替えがこのＦボタン１５の操作により簡単に行なえるようにもなっている。 As shown in FIG. 2C, an image display device 13 is provided on the back side of the camera body 1a. In the shooting mode, a subject is displayed on the image display device 13 or a menu is displayed. It is displayed. When the playback button 14 on the side of the image display device 13 is pressed once in this shooting mode, the mode is switched to the playback mode and the already shot image is displayed on the image display device 13, and when the playback button 14 is pressed again. Then, the shooting mode is switched and the through image is displayed on the image display device 13. In addition, an F (photo mode) button 15 is provided next to the playback mode button 14, and a frequently used mode such as a pixel setting mode or a sensitivity setting mode can be easily switched by operating the F button 15. You can also do it.

また、そのＦボタン１５の下方には十字キー１６やＯＫ／メニューボタン１７が配備され上方にはズームスイッチ１８が配備されている。これら十字キー１６やＯＫ／メニューボタン１７の操作により、セットアップメニューに切り替えて日時の設定や画像表示を行なうか否かの設定等を行なったり、撮影メニューに切り替えて連写，セルフタイマ等を選択したりすることができる。 A cross key 16 and an OK / menu button 17 are provided below the F button 15, and a zoom switch 18 is provided above the F button 15. By operating the cross key 16 or the OK / menu button 17, the setup menu is switched to set the date and time, whether or not to display images, and the shooting menu is switched to select continuous shooting, self-timer, etc. You can do it.

また、十字キー１６の下方には、画像表示装置１３の表示を切り換えたり操作を途中でやめるときなどに使用されるＤＳＰ／ＢＡＣＫボタン２１が配備されている。 Also, below the cross key 16, a DSP / BACK button 21 used for switching the display of the image display device 13 or stopping the operation halfway is provided.

さらに、図２（ｂ）に示すカメラボディ１ａの正面には測光センサ１９０や閃光発光窓１９１が配備されており、閃光発光が必要な場合にはその閃光発光窓１９１から閃光が被写体に向けて発光されるようになっている。また、図２（ｂ）に示すカメラボディ１ａの下部には音声を録音するためのマイクロフォン１５０が配備されるとともに、カメラボディ１ａの側面には前述したＵＳＢ接続部１７０が設けられている。また、図２（ｃ）に示すカメラボディ１ａの底面には、スピーカ１８０が配備されている。 Further, a photometric sensor 190 and a flash light emission window 191 are provided in front of the camera body 1a shown in FIG. 2B. When flash light emission is necessary, the flash light is directed from the flash light emission window 191 toward the subject. Light is emitted. In addition, a microphone 150 for recording sound is disposed at the lower part of the camera body 1a shown in FIG. 2B, and the USB connection unit 170 described above is provided on the side surface of the camera body 1a. A speaker 180 is provided on the bottom surface of the camera body 1a shown in FIG.

図３は、図１に示すデジタルカメラの内部構成を示す図である。 FIG. 3 is a diagram showing an internal configuration of the digital camera shown in FIG.

このデジタルカメラ１には、撮影光学系を構成するズームレンズ１０＿１ａ，アイリス１０＿１ｂ，フォーカスレンズ１０＿１ｃと、それらズームレンズ１０＿１ａ，アイリス１０＿１ｂ，フォーカスレンズ１０＿１ｃを駆動することによりズーミング，露光量調整，フォーカシングを行なうためのモータドライバ１１１，１１２，１１３が備えられている。 The digital camera 1 performs zooming, exposure adjustment, and focusing by driving the zoom lens 10_1a, the iris 10_1b, and the focus lens 10_1c constituting the photographing optical system, and the zoom lens 10_1a, the iris 10_1b, and the focus lens 10_1c. Motor drivers 111, 112, and 113 are provided.

また、デジタルカメラ１には、ＣＣＤ１１４と、タイミングジェネレータ１１５と、ＣＰＵ１１６とが備えられている。 Further, the digital camera 1 includes a CCD 114, a timing generator 115, and a CPU 116.

ＣＣＤ１１４は、ズームレンズ１０＿１ａ，アイリス１０＿１ｂ，フォーカスレンズ１０＿１ｃを経由してきた被写体光を捉える固体撮像素子である。このＣＣＤ１１４には、入射された被写体光を電気信号である画像信号に変換するフォトダイオード等の光電変換素子が多数個備えられている。 The CCD 114 is a solid-state imaging device that captures subject light that has passed through the zoom lens 10_1a, the iris 10_1b, and the focus lens 10_1c. The CCD 114 is provided with a large number of photoelectric conversion elements such as photodiodes that convert incident subject light into image signals that are electrical signals.

タイミングジェネレータ１１５は、ＣＰＵ１１６からの指示により所定のタイミングでＣＣＤ１１４を駆動する。これにより、ＣＣＤ１１４に入射されている被写体光が所定のフレームレートで光電変換され、そのＣＣＤ１１４からアナログ画像信号として出力される。 The timing generator 115 drives the CCD 114 at a predetermined timing according to an instruction from the CPU 116. As a result, the subject light incident on the CCD 114 is photoelectrically converted at a predetermined frame rate and output from the CCD 114 as an analog image signal.

ＣＰＵ１１６は、このデジタルカメラ１全体の制御を行なう。具体的には、ＣＰＵ１１６にはＲＯＭが内蔵されており、その内蔵されたＲＯＭのプログラムの手順にしたがってデジタルカメラ１全体の動作が制御される。 The CPU 116 controls the entire digital camera 1. Specifically, the CPU 116 has a built-in ROM, and the operation of the entire digital camera 1 is controlled in accordance with the program procedure of the built-in ROM.

さらに、デジタルカメラ１には、ＣＣＤ１１４から出力されたアナログ画像信号の雑音を低減する処理等を行なうＣＤＳＡＭＰ１１７と、その処理等が施されたアナログ画像信号をデジタル画像信号にアナログ／デジタル変換するＡ／Ｄ変換部１１８と、Ａ／Ｄ変換部１１８でデジタル画像信号に変換されたＲＧＢからなる画像データをデータバスを介して、メモリであるＳＤＲＡＭ１２５に転送する画像入力コントローラ１１９とが備えられている。 Further, the digital camera 1 includes a CDSAMP 117 that performs processing for reducing noise of the analog image signal output from the CCD 114, and A / A that performs analog / digital conversion of the analog image signal subjected to the processing to a digital image signal. A D conversion unit 118 and an image input controller 119 for transferring image data composed of RGB converted into a digital image signal by the A / D conversion unit 118 to the SDRAM 125 as a memory via a data bus are provided.

また、デジタルカメラ１には、ＳＤＲＡＭ１２５の画像データを読み出させ、ＹＣ信号への変換を行なう画像処理プロセスを実行する画像信号処理回路１２０と、画像処理プロセスが終了した時点でＪＰＥＧプロセスを実行して画像データの圧縮を行なう圧縮処理回路１２１と、画像データをビデオ信号に変換して前述した画像表示装置１３に導くためのビデオエンコーダ１２２とが備えられている。 In addition, the digital camera 1 reads the image data of the SDRAM 125 and executes an image processing process 120 for executing an image processing process for converting the data into a YC signal, and executes the JPEG process when the image processing process is completed. A compression processing circuit 121 for compressing the image data, and a video encoder 122 for converting the image data into a video signal and leading it to the image display device 13 described above.

さらに、デジタルカメラ１には、画像のピント情報を検出するＡＦ検出回路１２３と、画像の輝度情報と白色バランス情報を検出するＡＥ＆ＡＷＢ検出部１２４と、作業領域用のメモリとして使用されるとともに、後述する音声ファイルおよびスケジュール情報ファイルが格納されるＳＤＲＡＭ１２５と、Ａ領域，Ｂ領域を有する表示用のバッファとしてのＶＲＡＭ１２６と、画像データを記録メディア１００に記憶するための制御を行なうメディアコントローラ１２７とが備えられている。 Further, the digital camera 1 is used as an AF detection circuit 123 that detects image focus information, an AE & AWB detection unit 124 that detects image brightness information and white balance information, and a work area memory. SDRAM 125 in which audio files and schedule information files to be stored are stored, VRAM 126 as a display buffer having an A area and a B area, and a media controller 127 that performs control for storing image data in recording medium 100. It has been.

また、この図３には、前述したパワーボタン１１，レリーズボタン１２，撮影モードダイヤル１２＿１，再生ボタン１４，Ｆボタン１５，十字キー１６，ＯＫ／メニューボタン１７，ズームスイッチ１８，ＤＳＰ／ＢＡＣＫボタン２１からなる操作部１０＿１００が示されている。 3 shows the power button 11, release button 12, shooting mode dial 12_1, playback button 14, F button 15, cross key 16, OK / menu button 17, zoom switch 18, DSP / BACK button 21. An operation unit 10_100 consisting of is shown.

さらに、デジタルカメラ１には、マイクロフォン１５０と、そのマイクロフォン１５０からのアナログの音声信号をデジタルの音声信号に変換するＡ／Ｄ変換回路１５１とが備えられている。これらマイクロフォン１５０およびＡ／Ｄ変換回路１５１が、本発明にいう音声を取得する音声取得手段の一例に相当する。 Furthermore, the digital camera 1 includes a microphone 150 and an A / D conversion circuit 151 that converts an analog audio signal from the microphone 150 into a digital audio signal. The microphone 150 and the A / D conversion circuit 151 correspond to an example of a sound acquisition unit that acquires sound according to the present invention.

また、デジタルカメラ１には、マイクロフォン１５０およびＡ／Ｄ変換回路１５１で取得される音声を画像に対応づける対応づけ手段１５２が備えられている。 In addition, the digital camera 1 is provided with an association unit 152 that associates sound acquired by the microphone 150 and the A / D conversion circuit 151 with an image.

さらに、デジタルカメラ１には、図１に示すパーソナルコンピュータ２００に向けて画像を送信するとともに、そのパーソナルコンピュータ２００に向けて、マイクロフォン１５０およびＡ／Ｄ変換回路１５１で取得された音声を、対応づけ手段１５２で対応づけられた画像との対応づけ情報を伴って送信する通信Ｉ／Ｆ１５３（本発明にいう送信手段の一例に相当）が備えられている。この通信Ｉ／Ｆ１５３は、前述したＵＳＢ接続部１７０を有する。 Further, the digital camera 1 transmits an image to the personal computer 200 shown in FIG. 1 and associates the sound acquired by the microphone 150 and the A / D conversion circuit 151 with the personal computer 200. A communication I / F 153 (corresponding to an example of a transmission means in the present invention) that transmits the information associated with the image associated by the means 152 is provided. The communication I / F 153 includes the USB connection unit 170 described above.

また、デジタルカメラ１には、マイクロフォン１５０およびＡ／Ｄ変換回路１５１で取得した音声を格納した音声ファイルを作成する音声ファイル作成手段１６０が備えられている。ここで、通信Ｉ／Ｆ１５３は、パーソナルコンピュータ２００に向けて音声を送信するにあたり、音声ファイル作成手段１６０で作成された音声ファイルを送信する。 In addition, the digital camera 1 is provided with an audio file creation means 160 that creates an audio file storing the audio acquired by the microphone 150 and the A / D conversion circuit 151. Here, the communication I / F 153 transmits the voice file created by the voice file creation unit 160 when sending the voice to the personal computer 200.

さらに、デジタルカメラ１には、そのデジタルカメラ１の地理上の位置情報を取得するＧＰＳ位置情報取得手段１５４（本発明にいう位置情報取得手段の一例に相当）が備えられている。具体的には、ＧＰＳ位置情報取得手段１５４は、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）衛星からのＧＰＳ信号を受信してＧＰＳ位置（現在位置）データを取得する。 Further, the digital camera 1 is provided with GPS position information acquisition means 154 (corresponding to an example of the position information acquisition means referred to in the present invention) for acquiring geographical position information of the digital camera 1. Specifically, the GPS position information acquisition unit 154 receives a GPS signal from a GPS (Global Positioning System) satellite and acquires GPS position (current position) data.

また、デジタルカメラ１には、日時情報を取得する日時情報取得手段１５５と、このデジタルカメラ１のユーザのスケジュール情報を取得するスケジュール情報取得手段１５６とが備えられている。ここで、通信Ｉ／Ｆ１５３は、パーソナルコンピュータ２００に向けて、さらに、ＧＰＳ位置情報取得手段１５４、日時情報取得手段１５５、およびスケジュール情報取得手段１５６によりそれぞれ取得された位置情報、日時情報、およびスケジュール情報を送信する。 Further, the digital camera 1 includes date / time information acquisition means 155 for acquiring date / time information and schedule information acquisition means 156 for acquiring schedule information of the user of the digital camera 1. Here, the communication I / F 153 further sends the position information, date information, and schedule acquired by the GPS position information acquisition unit 154, the date information acquisition unit 155, and the schedule information acquisition unit 156 toward the personal computer 200, respectively. Send information.

さらに、デジタルカメラ１には、音声を再生するための音声信号を生成する音声再生回路１５７と、その音声再生回路１５７からの音声信号を入力して音声を発生するスピーカ１８０が備えられている。 Further, the digital camera 1 is provided with an audio reproduction circuit 157 that generates an audio signal for reproducing audio, and a speaker 180 that receives the audio signal from the audio reproduction circuit 157 and generates audio.

ここで、デジタルカメラ１の動作について簡単に説明する。このデジタルカメラ１の動作は統括的にＣＰＵ１１６により制御される。このＣＰＵ１１６内にはＲＯＭが内蔵されており、その内蔵されたＲＯＭ内にプログラムが格納されている。このプログラムの手順にしたがってデジタルカメラ１全体の動作がＣＰＵ１１６により制御される。また、ＳＤＲＡＭ１２５にはＲＯＭに記述されているプログラムの手順にしたがってＣＰＵ１１６がこのデジタルカメラ１全体を制御しているときに、処理中の画像データが一時的に格納されたりする。 Here, the operation of the digital camera 1 will be briefly described. The operation of the digital camera 1 is comprehensively controlled by the CPU 116. The CPU 116 has a built-in ROM, and a program is stored in the built-in ROM. The operation of the entire digital camera 1 is controlled by the CPU 116 according to the procedure of this program. Further, the SDRAM 125 temporarily stores image data being processed when the CPU 116 controls the entire digital camera 1 in accordance with the procedure of the program described in the ROM.

また、表示用のバッファとしてのＶＲＡＭ１２６にはスルー画像を表わす画像データがＡ領域、Ｂ領域に順次記憶され、それらの領域にある画像データが交互にビデオエンコーダ１２２に供給され画像データに基づく画像が順次切り換えられてスルー画像として画像表示装置１３上に表示される。 The VRAM 126 serving as a display buffer stores image data representing a through image in order in the A area and B area, and the image data in these areas are alternately supplied to the video encoder 122 to generate an image based on the image data. The images are sequentially switched and displayed on the image display device 13 as a through image.

次に、このデジタルカメラ１の画像データの流れについて簡単に説明する。ＣＰＵ１１６は撮影光学系を経由した被写体光をＣＣＤ１１４で捉え、そのＣＣＤ１１４で生成される画像データを所定の間隔ごとにＣＤＳＡＭＰ１１７側へ出力させ、Ａ／Ｄ変換部１１８，画像入力コントローラ１１９，画像信号処理回路１２０で表示可能な画像データに変換してスルー画像を画像表示装置１３の画面上に表示させる。そのスルー画像に基づいて任意にフレーミングが行なわれ、ＡＦ検出回路１２３が絶えずフォーカスレンズ１０＿１ｃを前後させ合焦点検出を行なってフレーミングの中央に焦点のあった画像をＣＣＤ１１４に結像させる。このときにはＡＥ＆ＡＷＢ検出回路１２４によって被写界輝度が検出され、その検出結果に基づいてモータドライバ１１２により小絞、あるいは開放絞りが設定されたり、ホワイトバランス調整のため、Ｒ、Ｇ、Ｂの各色信号のゲインが調整されたりする。このように被写界輝度に応じて露出が調節され焦点のあった画像を表わす画像データを、タイミングジェネレータ１１５からのタイミング信号に応じて所定の間隔ごとにＣＤＳＡＭＰ１１７へ出力させ、後段でスルー画像を得ている。撮影者はこのスルー画像を見ながらフレーミングを行ない、シャッタチャンスにレリーズ操作を行なう。 Next, the flow of image data of the digital camera 1 will be briefly described. The CPU 116 captures the subject light passing through the photographing optical system by the CCD 114, and outputs the image data generated by the CCD 114 to the CDSAMP 117 side at predetermined intervals, and the A / D conversion unit 118, the image input controller 119, image signal processing. The image data is converted into image data that can be displayed by the circuit 120 and the through image is displayed on the screen of the image display device 13. Framing is arbitrarily performed based on the through image, and the AF detection circuit 123 continuously moves the focus lens 10_1c back and forth to perform in-focus detection to form an image focused on the center of the framing on the CCD 114. At this time, the AE & AWB detection circuit 124 detects the field luminance, and based on the detection result, the motor driver 112 sets a small aperture or an open aperture, and each color signal of R, G, B is used for white balance adjustment. The gain is adjusted. In this way, image data representing a focused image whose exposure is adjusted according to the field luminance is output to the CDSAMP 117 at predetermined intervals according to the timing signal from the timing generator 115, and a through image is displayed at a later stage. It has gained. The photographer performs framing while viewing the through image, and performs a release operation at a photo opportunity.

レリーズ操作が行なわれると、ＣＰＵ１１６は、ＣＣＤ１１４にレリーズ操作時の画像を結像させるため、タイミングジェネレータ１１５からタイミング信号を供給させる。このタイミング信号はＣＣＤ１１４に露光開始および露光終了を告げるものでありいわゆるシャッタスピードに相当する。ＣＰＵ１１６はこの露光終了時にＣＣＤ１１４から画像データ（ＲＧＢの光の３原色Ｒ，Ｇ、Ｂからなる画像データ）を出力させ、後段のＣＤＳＡＭＰ１１７に供給させる。さらに、ＣＤＳＡＭＰ１１７でそのＣＣＤ１１４から出力された画像データの雑音が低減され、その雑音が低減された画像データがＡ／Ｄ変換部１１８でデジタル信号に変換される。このＡ／Ｄ変換部１１８でデジタル信号に変換されたＲＧＢからなる画像データは、後段の画像入力コントローラ１１９によりデータバスを介してＳＤＲＡＭ１２５に供給され、このＳＤＲＡＭ１２５にＣＣＤ１１４のすべての画素の画像データが記憶される。すべての画素データに対応する画像データがＳＤＲＡＭ１２５に記憶されると、ＣＰＵ１１６が画像信号処理回路１２０の画像処理プロセスを起動させ、その画像処理プロセスによってＳＤＲＡＭ１２５の画像データを読み出させ、ＹＣ信号への変換を行なわせる。さらに、この画像信号処理回路１２０の画像処理プロセスが終了したことをＣＰＵ１１６が検知すると、ＣＰＵ１１６は圧縮処理回路１２１のＪＰＥＧプロセスを起動させ、ＪＰＥＧプロセスによって画像データの圧縮を行なわせる。この圧縮処理回路１２１での圧縮が完了したことをＣＰＵ１１６が検知すると、ＣＰＵ１１６は次に記録処理プロセスを起動させ、ＪＰＥＧ圧縮された画像データを、メディアコントローラ１２７を介して記録メディア１００に記録させる。以上が撮影データの流れである。 When the release operation is performed, the CPU 116 supplies a timing signal from the timing generator 115 to form an image at the time of the release operation on the CCD 114. This timing signal informs the CCD 114 of the start and end of exposure and corresponds to a so-called shutter speed. At the end of the exposure, the CPU 116 outputs image data (image data composed of the three primary colors R, G, and B of RGB light) from the CCD 114 and supplies it to the subsequent CDSAMP 117. Further, the noise of the image data output from the CCD 114 is reduced by the CDSAMP 117, and the image data with the reduced noise is converted into a digital signal by the A / D conversion unit 118. The RGB image data converted into digital signals by the A / D conversion unit 118 is supplied to the SDRAM 125 via the data bus by the subsequent image input controller 119, and the image data of all the pixels of the CCD 114 is stored in the SDRAM 125. Remembered. When the image data corresponding to all the pixel data is stored in the SDRAM 125, the CPU 116 activates the image processing process of the image signal processing circuit 120, reads the image data of the SDRAM 125 by the image processing process, and outputs the image data to the YC signal. Let the conversion take place. Further, when the CPU 116 detects that the image processing process of the image signal processing circuit 120 is completed, the CPU 116 activates the JPEG process of the compression processing circuit 121 and causes the image data to be compressed by the JPEG process. When the CPU 116 detects that the compression processing circuit 121 has completed the compression, the CPU 116 next activates a recording processing process to record the JPEG-compressed image data on the recording medium 100 via the media controller 127. The above is the flow of shooting data.

次に、図１に示すパーソナルコンピュータ２００について説明する。 Next, the personal computer 200 shown in FIG. 1 will be described.

図４は、図１に示すパーソナルコンピュータの概略回路ブロックを示す図である。 FIG. 4 is a diagram showing a schematic circuit block of the personal computer shown in FIG.

図４に示すパーソナルコンピュータ２００には、図１に示す本体部２１０を構成する、各種プログラムを実行するＣＰＵ２０１、ハードディスク装置２０３に格納されたプログラムが読み出されＣＰＵ２０１での実行のために展開される主メモリ２０２、各種プログラムや画像データ等が保存されたハードディスク装置２０３、フレキシブルディスク２０４＿１が装填され、その装填されたフレキシブルディスク２０４＿１をアクセスするフレキシブルディスクドライブ２０４、ＣＤ−ＲＯＭ２０５＿１をアクセスするＣＤ−ＲＯＭドライブ２０５、デジタルカメラ１と接続され、図１に示すＵＳＢ接続部２１４を有する通信Ｉ／Ｆ２０６、および図１に示す表示画面２２０ａを備えた画像表示装置２２０，キーボード２３０，マウス２４０が備えられている。これらはバス２１５を介して相互に接続されている。 In the personal computer 200 shown in FIG. 4, the CPU 201 that executes various programs and the program stored in the hard disk device 203 constituting the main body 210 shown in FIG. 1 are read and expanded for execution by the CPU 201. A main memory 202, a hard disk device 203 in which various programs, image data, and the like are stored, a flexible disk 204_1 are loaded, a flexible disk drive 204 that accesses the loaded flexible disk 204_1, and a CD-ROM drive that accesses the CD-ROM 205_1 205, a communication I / F 206 connected to the digital camera 1 and having the USB connection unit 214 shown in FIG. 1, and an image display device 220 including a display screen 220a shown in FIG. 1, a keyboard 230, and a mouse 240 It is provided. These are connected to each other via a bus 215.

図５は、図１に示すパーソナルコンピュータで実現される画像管理サーバの構成を示す図である。 FIG. 5 is a diagram showing a configuration of an image management server realized by the personal computer shown in FIG.

図５に示す画像管理サーバ３００には、デジタルカメラ１から送信されてきた画像を受信するとともにそのデジタルカメラ１から送信されてきた音声を受信する図４にも示す通信Ｉ／Ｆ２０６（本発明にいう受信手段の一例）が備えられている。 The image management server 300 shown in FIG. 5 receives the image transmitted from the digital camera 1 and also receives the sound transmitted from the digital camera 1. The communication I / F 206 shown in FIG. An example of receiving means) is provided.

また、この画像管理サーバ３００には、通信Ｉ／Ｆ２０６で受信した画像を画像データベース３０１に保管する保管手段３０２と、通信Ｉ／Ｆ２０６で受信した音声を認識してテキストに変換する音声認識手段３０３と、音声認識手段３０３で音声が認識されてなるテキストに基づいて、画像に付加される付加情報（メタデータ）を生成する付加情報生成手段３０４とが備えられている。ここで、保管手段３０２は、ＥｘｉｆファイルやＪＰＥＧファイル等の画像を画像データベース３０１に保管するとともに、付加情報生成手段３０４で生成された付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて画像データベース３０１に保管する。 The image management server 300 includes a storage unit 302 that stores an image received by the communication I / F 206 in the image database 301, and a voice recognition unit 303 that recognizes the voice received by the communication I / F 206 and converts it into text. And additional information generating means 304 for generating additional information (metadata) to be added to the image based on the text recognized by the voice recognition means 303. Here, the storage unit 302 stores an image such as an Exif file or a JPEG file in the image database 301, and corresponds to the additional information generated by the additional information generation unit 304 to the sound that is the basis of the additional information generation. The image is stored in the image database 301 in association with the attached image.

また、通信Ｉ／Ｆ２０６は、さらに、デジタルカメラ１から送信されてきた位置情報、日時情報、およびスケジュール情報を受信するものであり、付加情報生成手段３０４は、音声認識手段３０３で音声が認識されてなるテキストに基づくとともに、通信Ｉ／Ｆ２０６で受信した位置情報、日時情報、およびスケジュール情報にも基づいて付加情報を生成する。 Further, the communication I / F 206 receives position information, date / time information, and schedule information transmitted from the digital camera 1, and the additional information generation unit 304 recognizes the voice by the voice recognition unit 303. The additional information is generated based on the position information, date / time information, and schedule information received by the communication I / F 206.

さらに、画像管理サーバ３００には、地図を管理する地図データベース３０５が備えられており、付加情報生成手段３０４は、さらに、地図データベース３０５内の地図にも基づいて付加情報を生成する。ここで、地図データベース３０５で管理される地図を表わす情報としては、地形情報、地名、施設名、緯度、経度等がある。 Further, the image management server 300 is provided with a map database 305 for managing a map, and the additional information generating unit 304 further generates additional information based on the map in the map database 305. Here, information representing a map managed by the map database 305 includes terrain information, place name, facility name, latitude, longitude, and the like.

また、画像管理サーバ３００には、通信Ｉ／Ｆ２０６で受信した画像からその画像上に写った顔を認識する顔認識手段３０６が備えられている。ここで、付加情報生成手段３０４は、音声認識手段３０３で音声が認識されてなるテキストに基づくとともに、顔認識手段３０６で認識された、画像上の顔にも基づいて付加情報を生成する。 Further, the image management server 300 is provided with a face recognition unit 306 that recognizes a face captured on the image from the image received by the communication I / F 206. Here, the additional information generation unit 304 generates additional information based on the text recognized by the voice recognition unit 303 and also based on the face on the image recognized by the face recognition unit 306.

さらに、画像管理サーバ３００には、顔情報を管理する顔情報データベース３０７が備えられており、付加情報生成手段３０４は、さらに顔情報データベース３０７内の顔情報にも基づいて付加情報を生成する。 Further, the image management server 300 is provided with a face information database 307 for managing face information, and the additional information generating unit 304 further generates additional information based on the face information in the face information database 307.

また、画像管理サーバ３００には、音声認識手段３０３で音声が認識されてなるテキストと顔認識手段３０６で認識された画像上の顔とに基づいて新たな顔情報を生成して顔情報データベース３０７に追加する顔情報追加手段３０８が備えられている。 The image management server 300 generates new face information based on the text recognized by the voice recognition unit 303 and the face on the image recognized by the face recognition unit 306 to generate a face information database 307. Is provided with face information adding means 308 to be added.

ここで、付加情報生成手段３０４は、音声認識手段３０３で音声が認識されてなるテキストに基づいて付加情報生成に関するコマンドを認識するコマンド認識手段３０４ａを含み、このコマンド認識手段３０４ａで認識したコマンドに従って付加情報を生成する。 Here, the additional information generating unit 304 includes a command recognizing unit 304a that recognizes a command related to additional information generation based on text obtained by recognizing the speech by the speech recognizing unit 303, and according to the command recognized by the command recognizing unit 304a. Generate additional information.

また、通信Ｉ／Ｆ２０６は、音声の受信にあたり、デジタルカメラ１で作成されてそのデジタルカメラ１から送信されてきた音声ファイルを受信する。 Further, the communication I / F 206 receives an audio file created by the digital camera 1 and transmitted from the digital camera 1 when receiving the audio.

さらに、画像管理サーバ３００には、例えばＥｘｉｆファイルの付加情報（ＧＰＳ位置情報を含む）を解析する付加情報解析部３０９も備えられている。 Furthermore, the image management server 300 also includes an additional information analysis unit 309 that analyzes additional information (including GPS position information) of, for example, an Exif file.

図６は、図５に示す付加情報生成手段におけるコンテキスト推定のアルゴリズムのフローを示す図である。 FIG. 6 is a diagram showing a flow of a context estimation algorithm in the additional information generating means shown in FIG.

先ず、ステップＳ１において、付加情報解析結果を取得する。具体的には、画像ファイル中のＥｘｉｆ情報から撮影日時（日時Ａと記述する）、撮影地の緯度，経度情報を取得する。次に、ステップＳ２において、画像と関連づけられたスケジュール情報ファイルがあるか否かが判定される。スケジュール情報ファイルがないと判定された場合は、後述するステップＳ７に進む。一方、スケジュール情報ファイルがあると判定された場合は、ステップＳ３に進む。 First, in step S1, an additional information analysis result is acquired. Specifically, the shooting date and time (described as date A) and the latitude and longitude information of the shooting location are acquired from the Exif information in the image file. Next, in step S2, it is determined whether there is a schedule information file associated with the image. If it is determined that there is no schedule information file, the process proceeds to step S7 described later. On the other hand, if it is determined that there is a schedule information file, the process proceeds to step S3.

ステップＳ３では、スケジュール情報ファイルからスケジュール情報（日時、地名、施設名、各一覧等）を取得する。さらに、ステップＳ４において、スケジュール情報から、日時Ａに該当するスケジュールを検索する。 In step S3, schedule information (date and time, place name, facility name, each list, etc.) is acquired from the schedule information file. Further, in step S4, the schedule corresponding to the date and time A is searched from the schedule information.

次いで、ステップＳ５において、日時Ａに該当するスケジュールがあるか否かが判定される。日時Ａに該当するスケジュールがないと判定された場合は、ステップＳ７に進む。一方、日時Ａに該当するスケジュールがあると判定された場合は、ステップＳ６において、該当スケジュール中の地名、施設名をキーワードＡとして抽出する。 Next, in step S5, it is determined whether there is a schedule corresponding to the date A. If it is determined that there is no schedule corresponding to the date and time A, the process proceeds to step S7. On the other hand, if it is determined that there is a schedule corresponding to the date and time A, the place name and facility name in the corresponding schedule are extracted as the keyword A in step S6.

ステップＳ７では、音声認識手段で認識した地名、施設名をキーワードＢとして取得する。 In step S7, the place name and facility name recognized by the voice recognition means are acquired as the keyword B.

さらに、ステップＳ８において、地図データベースからキーワードＡ，Ｂを検索語として検索する。検索範囲は撮影地の緯度，経度近傍とし、具体的な範囲はシステムで任意に設定する（例えば５００ｍとする）。検索結果が推定したコンテキストである。 In step S8, keywords A and B are searched from the map database as search terms. The search range is near the latitude and longitude of the shooting location, and the specific range is arbitrarily set by the system (for example, 500 m). This is the context estimated by the search result.

次に、ステップＳ９において、コンテキスト、キーワードＡ，Ｂをメタデータ（付加情報）に変換して出力し、このフローを終了する。 Next, in step S9, the context and keywords A and B are converted into metadata (additional information) and output, and this flow ends.

図７は、デジタルカメラで作成された画像ファイルおよび音声ファイルの格納構造を示す図である。 FIG. 7 is a diagram showing a storage structure of an image file and an audio file created by the digital camera.

図７には、デジタルカメラ１の記録メディア１００全体の内容を表わすメインフォルダ４００（フォルダ名ＤＣＩＭ）が示されている。このメインフォルダ４００には、画像データフォルダ４１０（フォルダ名１００＿ＡＢＣ）が格納されている。画像データフォルダ４１０には、ＪＰＥＧ用の画像ファイル４１１（フォルダ名ＤＳＣＦ０００７．ＪＰＧ），音声ファイル４１２（フォルダ名ＤＳＣＦ０００７．ＷＡＶ），…，音声ファイル４１ｎ（フォルダ名ＤＳＣＦ０００ｎ．ＷＡＶ）が格納されている。 FIG. 7 shows a main folder 400 (folder name DCIM) representing the entire contents of the recording medium 100 of the digital camera 1. The main folder 400 stores an image data folder 410 (folder name 100_ABC). The image data folder 410 stores a JPEG image file 411 (folder name DSCF0007.JPG), an audio file 412 (folder name DSCF0007.WAV),..., An audio file 41n (folder name DSCF000n.WAV).

図８は、デジタルカメラのメモリ（ＳＤＲＡＭ）に格納されたスケジュール情報ファイルの一例を示す図である。 FIG. 8 is a diagram showing an example of a schedule information file stored in the memory (SDRAM) of the digital camera.

尚、デジタルカメラ１のメモリ（ＳＤＲＡＭ）１２５にスケジュール情報ファイルを格納するには、例えばパーソナルコンピュータ２００上でスケジュールソフトを起動してスケジュール情報を作成したり編集したりしてスケジュール情報ファイルを作成しておき、通信Ｉ／Ｆ２０６，１５３を経由してデジタルカメラ１のＳＤＲＡＭ１２５に格納する。 In order to store the schedule information file in the memory (SDRAM) 125 of the digital camera 1, the schedule information file is created by, for example, starting schedule software on the personal computer 200 to create or edit the schedule information. The data is stored in the SDRAM 125 of the digital camera 1 via the communication I / Fs 206 and 153.

この図８には、デジタルカメラ１のＳＤＲＡＭ１２５に格納されたスケジュール情報ファイル５００が示されている。このスケジュール情報ファイル５００は、全体のスケジュールのタイトル名を示す情報が格納されたタイトル部５１０と、第１のスケジュールの種類や日程等の情報を示す情報が格納された第１のスケジュール部５２０と、第２のスケジュールの種類や日程等の情報を示す情報が格納された第２のスケジュール部５３０とから構成されている。 FIG. 8 shows a schedule information file 500 stored in the SDRAM 125 of the digital camera 1. The schedule information file 500 includes a title portion 510 that stores information indicating the title name of the entire schedule, a first schedule portion 520 that stores information indicating information such as the type and schedule of the first schedule, and the like. The second schedule unit 530 stores information indicating information such as the type and schedule of the second schedule.

図９は、デジタルカメラで音声を取得する音声取得処理ルーチンのフローを示す図である。 FIG. 9 is a diagram showing a flow of an audio acquisition processing routine for acquiring audio with a digital camera.

デジタルカメラ１のレリーズボタン１２が押されて撮影が完了したことをトリガにして、この音声取得処理ルーチンが起動する。尚、ここでは、デジタルカメラ１のレリーズボタン１２が押されて撮影が完了したことをトリガにして、音声取得処理ルーチンを起動する例で説明するが、このデジタルカメラ１に録音開始ボタンを備え、この録音開始ボタンをユーザが明示的に押したり、あるいはメニューから録音開始→ＯＫを選択してもよい。 The voice acquisition processing routine is started by using the release button 12 of the digital camera 1 as a trigger to complete shooting. Here, an example in which the voice acquisition processing routine is started by using the release button 12 of the digital camera 1 as a trigger and shooting is completed will be described. However, the digital camera 1 includes a recording start button, The user may explicitly press the recording start button, or may select recording start → OK from the menu.

先ず、ステップＳ１１において、音声を取得するための録音開始を行なう。次に、ステップＳ１２において、自動的に録音の開始を行なう自動スタートか否かが判定される。ここでは、自動スタートであると判定されてステップＳ１４に進む。尚、例えばユーザがメニューから録音開始を選択した場合は、自動スタートではないと判定されてステップＳ１３に進む。ステップＳ１３では、録音対象を選択する。ここで、図１０を参照して、ステップＳ１３における操作について説明する。 First, in step S11, recording for acquiring voice is started. Next, in step S12, it is determined whether or not it is an automatic start for automatically starting recording. Here, it is determined that the start is automatic, and the process proceeds to step S14. For example, when the user selects start of recording from the menu, it is determined that the automatic start is not performed, and the process proceeds to step S13. In step S13, a recording target is selected. Here, the operation in step S13 will be described with reference to FIG.

図１０は、デジタルカメラの画像表示装置上に表示された、ステップＳ１３において録音対象を選択するための画像ファイル構造を示す図である。 FIG. 10 is a diagram showing an image file structure for selecting a recording target in step S13 displayed on the image display device of the digital camera.

図１０に示す画像表示装置１３上には、記録メディア１００全体の内容を表わすメインフォルダ４００（フォルダ名ＤＣＩＭ）が示されている。このメインフォルダ４００には、画像データフォルダ４１０（フォルダ名１００＿ＡＢＣ）と、画像データフォルダ４２０（フォルダ名１０１＿ＡＢＣ）が格納されている。画像データフォルダ４１０には、ＪＰＥＧ用の画像ファイル４１０＿１（フォルダ名ＤＳＣＦ０００１．ＪＰＧ），ＪＰＥＧ用の画像ファイル４１０＿２（フォルダ名ＤＳＣＦ０００２．ＪＰＧ），…が格納されている。 On the image display device 13 shown in FIG. 10, a main folder 400 (folder name DCIM) representing the contents of the entire recording medium 100 is shown. The main folder 400 stores an image data folder 410 (folder name 100_ABC) and an image data folder 420 (folder name 101_ABC). The image data folder 410 stores a JPEG image file 410_1 (folder name DSCF0001.JPG), a JPEG image file 410_2 (folder name DSCF0002.JPG),.

ここで、録音対象を選択するにあたり、十字キー１６等を操作してメインフォルダ４００を選択した場合、カメラ内の全画像に対して共通に音声を対応づけて音声ファイルを作成する。作成された音声ファイルは、メインフォルダ４００に配置される。 Here, in selecting the recording target, if the main folder 400 is selected by operating the cross key 16 or the like, an audio file is created by associating audio with all images in the camera in common. The created audio file is placed in the main folder 400.

また、画像データフォルダ４１０を選択した場合、当該画像データフォルダ４１０内にある全ての画像に対して共通に音声を対応づけて音声ファイルを作成する。作成された音声ファイルは、画像データフォルダ４１０に、例えばフォルダ名４１０＿ＡＢＣ．ＷＡＶとして配置される。 When the image data folder 410 is selected, an audio file is created by associating audio with all images in the image data folder 410 in common. The created audio file is stored in the image data folder 410, for example, folder name 410_ABC. Arranged as WAV.

さらに、個別の画像を選択した場合、当該画像のみに対して音声を対応づけて音声ファイルを作成する。作成された音声ファイルは、例えば前述した図７に示すように配置される。尚、個別の画像を選択する場合、以下の図１１に示すように、サムネイル画像から選択してもよい。 Furthermore, when an individual image is selected, a sound file is created by associating sound with only the image. The created audio file is arranged, for example, as shown in FIG. When selecting individual images, thumbnail images may be selected as shown in FIG.

図１１は、デジタルカメラの画像表示装置上に表示されたサムネイル画像を示す図である。 FIG. 11 is a diagram showing thumbnail images displayed on the image display device of the digital camera.

図１１に示す画像表示装置１３上には、９個のサムネイル画像４３１，…，４３９が表示されている。ここでは、９個のサムネイル画像４３１，…，４３９のうちのサムネイル画像４３６が選択されている様子が示されている。このように、複数のサムネイル画像を表示しておき、これらのサムネイル画像から個別の画像を選択してもよい。 Nine thumbnail images 431,..., 439 are displayed on the image display device 13 shown in FIG. Here, a state in which the thumbnail image 436 among the nine thumbnail images 431,..., 439 is selected is shown. In this manner, a plurality of thumbnail images may be displayed and individual images may be selected from these thumbnail images.

再び、図９に戻って説明を続ける。ステップＳ１４では、録音を実行する。録音の実行は、自動スタートの場合は、直ちに録音が開始される。一方、ステップＳ１３において録音対象を選択した場合は、例えば録音開始ボタンを押すことにより録音が開始される。 Returning again to FIG. 9, the description will be continued. In step S14, recording is performed. When recording is automatically started, recording starts immediately. On the other hand, when the recording target is selected in step S13, recording is started by pressing a recording start button, for example.

さらに、ステップＳ１５において、録音を終了して、このフローを終了する。尚、録音の終了は、タイマで自動的に終了してもよく、あるいは録音開始ボタンを再度押すことにより終了してもよい。 Further, in step S15, the recording is finished, and this flow is finished. Note that the end of recording may be ended automatically by a timer or may be ended by pressing the recording start button again.

図１２は、図５に示す画像管理サーバにおける画像管理方法の概略のステップを示す図である。 FIG. 12 is a diagram showing schematic steps of the image management method in the image management server shown in FIG.

先ず、受信ステップＳ２１において、デジタルカメラ１から送信されてきた画像を受信するとともにデジタルカメラ１から送信されてきた音声を受信する。ここで、この受信ステップＳ２１は、音声の受信にあたり、デジタルカメラ１で作成されてそのデジタルカメラ１から送信されてきた音声ファイルを受信する。また、この受信ステップＳ２１は、さらに、デジタルカメラ１１から送信されてきた位置情報、日時情報、およびスケジュール情報を受信する。 First, in the reception step S21, the image transmitted from the digital camera 1 is received and the sound transmitted from the digital camera 1 is received. Here, this reception step S21 receives an audio file created by the digital camera 1 and transmitted from the digital camera 1 when receiving the audio. In the reception step S21, the position information, date / time information, and schedule information transmitted from the digital camera 11 are further received.

次いで、顔認識ステップＳ２２において、受信ステップＳ２１で受信した画像からその画像上に写った顔を認識する。 Next, in the face recognition step S22, the face reflected on the image is recognized from the image received in the reception step S21.

さらに、第１の保管ステップＳ２３において、受信ステップＳ２１で受信した画像を保管する。 Further, in the first storage step S23, the image received in the reception step S21 is stored.

次に、音声認識ステップＳ２４において、受信ステップＳ２１で受信した音声を認識してテキストに変換する。 Next, in the voice recognition step S24, the voice received in the reception step S21 is recognized and converted into text.

さらに、付加情報生成ステップＳ２５において、音声認識ステップＳ２４で音声が認識されてなるテキストに基づいて、画像に付加される付加情報を生成する。詳細には、この付加情報生成ステップＳ２５では、音声認識ステップＳ２４で音声が認識されてなるテキストに基づくとともに、受信ステップＳ２１で受信した位置情報、日時情報、およびスケジュール情報にも基づいて付加情報を生成する。さらに、地図データベース内の地図にも基づいて付加情報を生成する。 Further, in the additional information generation step S25, additional information to be added to the image is generated based on the text from which the voice is recognized in the voice recognition step S24. Specifically, in this additional information generation step S25, based on the text from which the speech is recognized in the speech recognition step S24, the additional information is also obtained based on the position information, date information, and schedule information received in the reception step S21. Generate. Furthermore, additional information is generated based on the map in the map database.

また、この付加情報生成ステップＳ２５では、音声認識ステップＳ２４で音声が認識されてなるテキストに基づくとともに、顔認識ステップＳ２２で認識された、画像上の顔にも基づいて付加情報を生成する。また、顔情報データベース内の顔情報にも基づいて付加情報を生成する。さらに、音声認識ステップＳ２４で音声が認識されてなるテキストと顔認識ステップＳ２２で認識された画像上の顔とに基づいて新たな顔情報を生成して顔情報データベースに追加する顔情報追加ステップを実行する。 Further, in this additional information generation step S25, additional information is generated based on the text recognized in the voice recognition step S24 and also on the face on the image recognized in the face recognition step S22. Also, additional information is generated based on the face information in the face information database. Further, a face information adding step for generating new face information and adding it to the face information database based on the text whose voice is recognized in the voice recognition step S24 and the face on the image recognized in the face recognition step S22. Execute.

また、この付加情報生成ステップＳ２５では、音声認識ステップＳ２４で音声が認識されてなるテキストに基づいて、付加情報生成に関するコマンドを認識するコマンド認識ステップを含み、そのコマンド認識ステップで認識したコマンドに従って付加情報を生成する。 In addition, the additional information generation step S25 includes a command recognition step for recognizing a command related to additional information generation based on the text in which the voice is recognized in the voice recognition step S24, and is added according to the command recognized in the command recognition step. Generate information.

次に、第２の保管ステップＳ２６において、画像を保管するとともに、付加情報生成ステップＳ２５で生成された付加情報を、その付加情報生成の基になった音声に対応づけられた画像に対応づけて保管して、このフローを終了する。 Next, in the second storage step S26, the image is stored, and the additional information generated in the additional information generation step S25 is associated with the image associated with the sound that is the basis of the additional information generation. Save and end this flow.

図１３は、図５に示す画像管理サーバにおいて、音声が認識されてなるテキストに基づくとともに画像上の顔にも基づいて付加情報を生成して登録するまでのルーチンのフローを示す図である。 FIG. 13 is a diagram showing a routine flow until the additional information is generated and registered based on the text in which the voice is recognized and also on the face on the image in the image management server shown in FIG.

例えば、図７に示す、デジタルカメラ１で作成された音声ファイル（ＷＡＶ）および画像ファイル（ＪＰＧ）を対象とする場合、音声ファイル（ＷＡＶ）に対してはステップＳ３１〜ステップＳ３４までの処理を行ない、画像ファイル（ＪＰＧ）に対してはステップＳ３５〜ステップＳ３７までの処理を行なう。 For example, when an audio file (WAV) and an image file (JPG) created by the digital camera 1 shown in FIG. 7 are targeted, the processing from step S31 to step S34 is performed on the audio file (WAV). The process from step S35 to step S37 is performed on the image file (JPG).

先ず、ステップＳ３１において、音声ファイル（ＷＡＶ）に関連する音声情報を選択する。次に、ステップＳ３２において、音声を認識する。さらに、ステップＳ３３において、音声が認識されてなるテキストに基づいて、付加情報生成に関するコマンドを検出し、ステップＳ３４において、そのコマンドを解析する。これらステップＳ３３，Ｓ３４が、本発明にいうコマンド認識ステップに相当する。ステップＳ３８では、このコマンドに従ってメタデータ（付加情報）を生成する。 First, in step S31, audio information related to an audio file (WAV) is selected. Next, in step S32, the voice is recognized. Further, in step S33, a command related to additional information generation is detected based on the text from which the voice is recognized, and the command is analyzed in step S34. These steps S33 and S34 correspond to the command recognition step in the present invention. In step S38, metadata (additional information) is generated according to this command.

さらに、ステップＳ３５，Ｓ３６，Ｓ３７で顔解析が行なわれる。先ず、ステップＳ３５において、画像ファイル（ＪＰＧ）から顔を抽出する。次いで、ステップＳ３６において、抽出した顔を認識してステップＳ３７に進む。ステップＳ３７では、認識した顔のデータを照合する。次いで、ステップＳ３８で、この顔データに従ったメタデータ（付加情報）を生成する。さらに、ステップＳ３９において、ステップＳ３８で生成したメタデータを付加し、ステップＳ４０（本発明にいう顔情報追加ステップに相当）において新たな顔情報を生成して顔情報データベースに追加登録して、このフローを終了する。 Furthermore, face analysis is performed in steps S35, S36, and S37. First, in step S35, a face is extracted from the image file (JPG). Next, in step S36, the extracted face is recognized, and the process proceeds to step S37. In step S37, the recognized face data is collated. In step S38, metadata (additional information) according to the face data is generated. Further, in step S39, the metadata generated in step S38 is added. In step S40 (corresponding to the face information adding step in the present invention), new face information is generated and additionally registered in the face information database. End the flow.

図１４は、図５に示す画像管理サーバにおいて、音声が認識されてなるテキストに基づくとともにスケジュール情報に従った付加情報を生成する場合のルーチンのフローを示す図である。 FIG. 14 is a diagram showing a routine flow in the case where the image management server shown in FIG. 5 generates additional information based on text obtained by recognizing voice and according to schedule information.

図１４に示すステップＳ３１〜ステップＳ３４は、前述した図１３に示すステップＳ３１〜ステップＳ３４と同じであるため、説明は省略する。ここでは、ステップＳ５１，Ｓ５２でスケジュール情報解析が行なわれる。先ず、ステップＳ５１において、スケジュール情報ファイル（ＩＣＳ）に関連するスケジュール情報を読み出す。次に、ステップＳ５２において、スケジュール情報を抽出する。さらに、ステップＳ５３に進む。 Steps S31 to S34 shown in FIG. 14 are the same as steps S31 to S34 shown in FIG. Here, schedule information analysis is performed in steps S51 and S52. First, in step S51, schedule information related to the schedule information file (ICS) is read. Next, in step S52, schedule information is extracted. Further, the process proceeds to step S53.

ステップＳ５３では、このスケジュール情報に従ったメタデータ（付加情報）を生成する。さらに、ステップＳ５４において、ステップＳ５３で生成したメタデータを画像に付加して、このフローを終了する。 In step S53, metadata (additional information) according to the schedule information is generated. Further, in step S54, the metadata generated in step S53 is added to the image, and this flow ends.

図１５は、本発明の第２実施形態の画像管理システムを構成するデジタルカメラの内部構成を示す図、図１６は、本発明の第２実施形態の画像管理システムを構成する画像管理サーバの構成を示す図である。 FIG. 15 is a diagram showing an internal configuration of a digital camera constituting the image management system of the second embodiment of the present invention, and FIG. 16 is a configuration of an image management server constituting the image management system of the second embodiment of the present invention. FIG.

図１５に示すデジタルカメラ２は、本発明の撮影装置の第２実施形態に相当するものであり、このデジタルカメラ２は、前述した図３に示すデジタルカメラ１と比較し、音声ファイル作成手段１６０が削除されている点と、通信Ｉ／Ｆ１５３が、図１６に示す画像管理サーバ６００に向けて音声を送信するにあたり、マイクロフォン１５０およびＡ／Ｄ変換回路１５１で取得した音声をそのまま送信する点とが異なっている。 A digital camera 2 shown in FIG. 15 corresponds to the second embodiment of the photographing apparatus of the present invention, and this digital camera 2 is compared with the digital camera 1 shown in FIG. And the point that the communication I / F 153 transmits the sound acquired by the microphone 150 and the A / D conversion circuit 151 as it is when the communication I / F 153 transmits the sound to the image management server 600 shown in FIG. Are different.

一方、図１６に示す画像管理サーバ６００は、本発明の画像管理サーバの第２実施形態に相当するものであり、この画像管理サーバ６００は、前述した図５に示す画像管理サーバ３００と比較し、通信Ｉ／Ｆ２０６で受信した音声を格納した音声ファイルを作成する音声ファイル作成手段６０１が追加されている点と、通信Ｉ／Ｆ２０６が、音声の受信にあたり、デジタルカメラ２で取得されてそのまま送信されてきた音声を受信する点とが異なっている。 On the other hand, the image management server 600 shown in FIG. 16 corresponds to the second embodiment of the image management server of the present invention, and this image management server 600 is compared with the image management server 300 shown in FIG. In addition, an audio file creation means 601 for creating an audio file storing the audio received by the communication I / F 206 is added, and the communication I / F 206 is acquired by the digital camera 2 and transmitted as it is when receiving the audio. The difference is that the received voice is received.

図１７は、図１６に示す画像管理サーバにおける画像管理方法のステップを示す図である。 FIG. 17 is a diagram showing the steps of the image management method in the image management server shown in FIG.

図１７に示す画像管理方法のステップは、前述した図１２に示す画像管理方法のステップと比較し、受信ステップＳ２１の役割が異なっている点と、音声ファイル作成ステップＳ６１が追加されている点とが異なっている。 The steps of the image management method shown in FIG. 17 are different from the steps of the image management method shown in FIG. 12 described above in that the role of the reception step S21 is different, and the audio file creation step S61 is added. Are different.

図１７に示す受信ステップＳ２１では、音声の受信にあたり、デジタルカメラ２で取得されてそのまま送信されてきた音声を受信する。また、音声ファイル作成ステップＳ６１では、受信ステップ２１で受信した音声を格納した音声ファイルを作成する。このため、デジタルカメラ２側の音声処理の負担をさらに軽減することができる。 In the reception step S21 shown in FIG. 17, when receiving the voice, the voice acquired by the digital camera 2 and transmitted as it is is received. In the sound file creation step S61, a sound file storing the sound received in the reception step 21 is created. For this reason, the audio processing burden on the digital camera 2 side can be further reduced.

尚、本実施形態では、デジタルカメラとパーソナルコンピュータとをＵＳＢケーブルで接続した有線通信の例で説明したが、これに限られるものではなく、例えば、デジタルカメラやそのデジタルカメラの機能を備えた携帯電話等の撮影装置とパーソナルコンピュータ等の画像管理サーバとをブルートゥース規格に準拠した電波通信やＩｒＤＡに準拠した赤外線通信等の無線通信で行なってもよく、さらに、これらに限られるものではなく、本発明は、撮像素子上に被写体を結像させて画像を生成する撮影装置と、その撮影装置から画像を受け取って管理する画像管理サーバとの間で通信するものに適用することができる。 In the present embodiment, an example of wired communication in which a digital camera and a personal computer are connected by a USB cable has been described. However, the present invention is not limited to this, and for example, a digital camera or a mobile phone having the functions of the digital camera. The photographing device such as a telephone and the image management server such as a personal computer may be performed by wireless communication such as radio wave communication conforming to the Bluetooth standard or infrared communication conforming to IrDA, and is not limited thereto. The invention can be applied to a communication between an imaging device that forms an image by imaging a subject on an image sensor and an image management server that receives and manages the image from the imaging device.

本発明の第１実施形態の画像管理システムを示す図である。It is a figure which shows the image management system of 1st Embodiment of this invention. 図１に示すデジタルカメラを上方から見た上面図、正面から見た正面図、および背面から見た背面図である。It is the top view which looked at the digital camera shown in FIG. 1 from upper direction, the front view seen from the front, and the back view seen from the back. 図１に示すデジタルカメラの内部構成を示す図である。It is a figure which shows the internal structure of the digital camera shown in FIG. 図１に示すパーソナルコンピュータの概略回路ブロックを示す図である。It is a figure which shows the schematic circuit block of the personal computer shown in FIG. 図１に示すパーソナルコンピュータで実現される画像管理サーバの構成を示す図である。It is a figure which shows the structure of the image management server implement | achieved by the personal computer shown in FIG. 図５に示す付加情報生成手段におけるコンテキスト推定のアルゴリズムのフローを示す図である。It is a figure which shows the flow of the algorithm of the context estimation in the additional information production | generation means shown in FIG. デジタルカメラで作成された画像ファイルおよび音声ファイルの格納構造を示す図である。It is a figure which shows the storage structure of the image file and audio | voice file which were produced with the digital camera. デジタルカメラのメモリ（ＳＤＲＡＭ）に格納されたスケジュール情報ファイルの一例を示す図である。It is a figure which shows an example of the schedule information file stored in the memory (SDRAM) of the digital camera. デジタルカメラで音声を取得する音声取得処理ルーチンのフローを示す図である。It is a figure which shows the flow of the audio | voice acquisition process routine which acquires an audio | voice with a digital camera. デジタルカメラの画像表示装置上に表示された、ステップＳ１３において録音対象を選択するための画像ファイル構造を示す図である。It is a figure which shows the image file structure for selecting the recording object displayed on the image display apparatus of a digital camera in step S13. デジタルカメラの画像表示装置上に表示されたサムネイル画像を示す図である。It is a figure which shows the thumbnail image displayed on the image display apparatus of a digital camera. 図５に示す画像管理サーバにおける画像管理方法の概略のステップを示す図である。It is a figure which shows the general | schematic step of the image management method in the image management server shown in FIG. 図５に示す画像管理サーバにおいて、音声が認識されてなるテキストに基づくとともに画像上の顔にも基づいて付加情報を生成して登録するまでのルーチンのフローを示す図である。FIG. 6 is a diagram showing a flow of a routine in the image management server shown in FIG. 5 until the additional information is generated and registered based on the text in which the voice is recognized and also on the face on the image. 図５に示す画像管理サーバにおいて、音声が認識されてなるテキストに基づくとともにスケジュール情報に従った付加情報を生成する場合のルーチンのフローを示す図である。FIG. 6 is a diagram showing a routine flow in the case where the image management server shown in FIG. 5 generates additional information based on text in which voice is recognized and according to schedule information. 本発明の第２実施形態の画像管理システムを構成するデジタルカメラの内部構成を示す図である。It is a figure which shows the internal structure of the digital camera which comprises the image management system of 2nd Embodiment of this invention. 本発明の第２実施形態の画像管理システムを構成する画像管理サーバの構成を示す図である。It is a figure which shows the structure of the image management server which comprises the image management system of 2nd Embodiment of this invention. 図１６に示す画像管理サーバにおける画像管理方法のステップを示す図である。It is a figure which shows the step of the image management method in the image management server shown in FIG.

Explanation of symbols

１，２デジタルカメラ
１ａカメラボディ
１０レンズ鏡胴
１０＿１ａズームレンズ
１０＿１ｂアイリス
１０＿１ｃフォーカスレンズ
１０＿１００操作子群
１１パワーボタン
１２レリーズボタン
１２＿１撮影モードダイヤル
１３画像表示装置
１４再生ボタン
１５Ｆボタン
１６十字キー
１７ＯＫ／メニューボタン
１８ズームスイッチ
２１ＤＳＰ／ＢＡＣＫボタン
３０ＵＳＢケーブル
３１，３２ＵＳＢコネクタ
１００記録メディア
１１１，１１２，１１３モータドライバ
１１４ＣＣＤ
１１５タイミングジェネレータ
１１６，２０１ＣＰＵ
１１７ＣＤＳＡＭＰ
１１８Ａ／Ｄ変換部
１１９画像入力コントローラ
１２０画像信号処理回路
１２１圧縮処理回路
１２２ビデオエンコーダ
１２３ＡＦ検出回路
１２４ＡＥ＆ＡＷＢ検出回路
１２５メモリ（ＳＤＲＡＭ）
１２６ＶＲＡＭ
１２７メディアコントローラ
１５０マイクロフォン
１５１Ａ／Ｄ変換回路
１５２対応づけ手段
１５３，２０６通信Ｉ／Ｆ
１５４ＧＰＳ位置情報取得手段
１５５日時情報取得手段
１５６スケジュール情報取得手段
１５７音声再生回路
１６０，６０１音声ファイル作成手段
１７０，２１４ＵＳＢ接続部
１８０スピーカ
１９０測光センサ
１９１閃光発光窓
２００パーソナルコンピュータ
２０２主メモリ
２０３ハードディスク装置
２０４フレキシブルディスクドライブ
２０４＿１フレキシブルディスク
２０５ＣＤ−ＲＯＭドライブ
２０５＿１ＣＤ−ＲＯＭ
２１０本体部
２１１ＣＤ−ＲＯＭ装填口
２１２フレキシブルディスク装填口
２１３電源スイッチ
２１５バス
２２０画像表示装置
２２０ａ表示画面
２３０キーボード
２４０マウス
３００，６００画像管理サーバ
３０１画像データベース
３０２保管手段
３０３音声認識手段
３０４付加情報生成手段
３０４ａコマンド認識手段
３０５地図データベース
３０６顔認識手段
３０７顔情報データベース
３０８顔情報追加手段
３０９付加情報解析部
４００メインフォルダ
４１０，４２０画像データフォルダ
４１０＿１，４１０＿２，４１１画像ファイル
４１２，…，４１ｎ音声ファイル
４３１，…，４３９サムネイル画像
５００スケジュール情報ファイル
５１０タイトル部
５２０，５３０スケジュール部
１０００画像管理システム DESCRIPTION OF SYMBOLS 1, 2 Digital camera 1a Camera body 10 Lens barrel 10_1a Zoom lens 10_1b Iris 10_1c Focus lens 10_100 Control group 11 Power button 12 Release button 12_1 Shooting mode dial 13 Image display device 14 Play button 15 F button 16 Cross key 17 OK / Menu button 18 Zoom switch 21 DSP / BACK button 30 USB cable 31, 32 USB connector 100 Recording media 111, 112, 113 Motor driver 114 CCD
115 Timing generator 116, 201 CPU
117 CDSAMP
118 A / D Converter 119 Image Input Controller 120 Image Signal Processing Circuit 121 Compression Processing Circuit 122 Video Encoder 123 AF Detection Circuit 124 AE & AWB Detection Circuit 125 Memory (SDRAM)
126 VRAM
127 Media controller 150 Microphone 151 A / D conversion circuit 152 Corresponding means 153,206 Communication I / F
154 GPS position information acquisition means 155 Date and time information acquisition means 156 Schedule information acquisition means 157 Audio reproduction circuit 160, 601 Audio file creation means 170, 214 USB connection unit 180 Speaker 190 Photometric sensor 191 Flash emission window 200 Personal computer 202 Main memory 203 Hard disk Device 204 Flexible disk drive 204_1 Flexible disk 205 CD-ROM drive 205_1 CD-ROM
210 Main body 211 CD-ROM loading slot 212 Flexible disk loading slot 213 Power switch 215 Bus 220 Image display device 220a Display screen 230 Keyboard 240 Mouse 240 300 Mouse image management server 301 Image database 302 Storage means 303 Voice recognition means 304 Additional information generation Means 304a Command recognition means 305 Map database 306 Face recognition means 307 Face information database 308 Face information addition means 309 Additional information analysis section 400 Main folder 410, 420 Image data folder 410_1, 410_2, 411 Image file 412, ..., 41n Audio file 431, ..., 439 Thumbnail image 500 Schedule information file 510 Title part 520,530 Schedule part 1000 pictures Control system

Claims

In an image management system comprising: an imaging device that forms an image by imaging a subject on an imaging device; and an image management server that communicates with the imaging device and receives and manages images from the imaging device.
The imaging device is
Audio acquisition means for acquiring audio;
Association means for associating the sound acquired by the sound acquisition means with an image;
The image is transmitted to the image management server, and the audio acquired by the audio acquisition unit is associated with the image associated with the image associated by the association unit toward the image management server. A transmission means for transmitting,
The image management server
Receiving means for receiving the image transmitted from the imaging apparatus and receiving the audio transmitted from the imaging apparatus;
Storage means for storing images received by the receiving means;
Voice recognition means for recognizing the voice received by the receiving means and converting it into text;
Additional information generating means for generating additional information to be added to an image based on text obtained by recognizing voice by the voice recognition means;
The storage means stores the image and stores the additional information generated by the additional information generation means in association with the image associated with the sound that is the basis of the additional information generation. An image management system characterized by

The photographing apparatus includes a sound file creating unit that creates a sound file storing the sound obtained by the sound obtaining unit,
2. The image management system according to claim 1, wherein the transmission unit transmits the audio file created by the audio file creation unit when transmitting the audio toward the image management server.

The transmission unit transmits the audio acquired by the audio acquisition unit as it is when transmitting the audio toward the image management server,
2. The image management system according to claim 1, wherein the image management server includes an audio file creating unit that creates an audio file storing the audio received by the receiving unit.

The imaging device is
Position information acquisition means for acquiring geographical position information of the photographing apparatus;
Date and time information acquisition means for acquiring date and time information;
Schedule information acquisition means for acquiring schedule information of the user of the photographing apparatus,
The transmission means further transmits the position information, date / time information, and schedule information respectively acquired by the position information acquisition means, the date / time information acquisition means, and the schedule information acquisition means to the image management server. Is,
The receiving means further receives position information, date / time information, and schedule information transmitted from the photographing apparatus,
The additional information generating means generates additional information based on text obtained by recognizing speech by the speech recognizing means and also based on position information, date / time information, and schedule information received by the receiving means. The image management system according to claim 1.

The image management server further comprises a map database for managing maps,
5. The image management system according to claim 4, wherein the additional information generating means further generates additional information based on a map in the map database.

The image management server comprises face recognition means for recognizing a face reflected on the image from the image received by the receiving means;
The additional information generation means generates the additional information based on the text on which the voice is recognized by the voice recognition means and also based on the face on the image recognized by the face recognition means. The image management system according to claim 1, wherein:

The image management server further comprises a face information database for managing face information,
7. The image management system according to claim 6, wherein the additional information generating means further generates additional information based on face information in the face information database.

Face information adding means for generating new face information based on text recognized by the voice recognition means and a face on the image recognized by the face recognition means and adding the face information to the face information database; The image management system according to claim 7, wherein:

The additional information generation means includes command recognition means for recognizing a command related to additional information generation based on text obtained by recognizing speech by the voice recognition means, and generates additional information according to the command recognized by the command recognition means. The image management system according to claim 1, wherein the image management system is an image management system.

In an imaging device that forms an image by imaging a subject on an image sensor, communicates with an image management server that receives and manages the image, and transmits the image to the image management server.
Audio acquisition means for acquiring audio;
Association means for associating the sound acquired by the sound acquisition means with an image;
The image is transmitted to the image management server, and the audio acquired by the audio acquisition unit is associated with the image associated with the image associated by the association unit toward the image management server. An imaging apparatus comprising: a transmission means for transmitting.

Voice file creation means for creating a voice file storing the voice acquired by the voice acquisition means;
11. The photographing apparatus according to claim 10, wherein the transmission unit transmits the audio file created by the audio file creation unit when transmitting the audio toward the image management server.

The photographing apparatus according to claim 10, wherein the transmission unit transmits the audio acquired by the audio acquisition unit as it is when transmitting the audio toward the image management server.

Position information acquisition means for acquiring geographical position information of the photographing apparatus;
Date and time information acquisition means for acquiring date and time information;
Schedule information acquisition means for acquiring schedule information of the user of the photographing apparatus,
The transmission means further transmits the position information, date / time information, and schedule information respectively acquired by the position information acquisition means, the date / time information acquisition means, and the schedule information acquisition means to the image management server. The photographing apparatus according to claim 10, wherein the photographing apparatus is a thing.

In an image management server that communicates with an imaging device that forms an image of a subject on an image sensor and generates an image, and receives and manages the image from the imaging device.
Receiving means for receiving the image transmitted from the imaging apparatus and receiving the audio transmitted from the imaging apparatus;
Storage means for storing images received by the receiving means;
Voice recognition means for recognizing the voice received by the receiving means and converting it into text;
Additional information generating means for generating additional information to be added to an image based on text obtained by recognizing voice by the voice recognition means;
The storage means stores the image and stores the additional information generated by the additional information generation means in association with the image associated with the sound that is the basis of the additional information generation. An image management server characterized by the above.

15. The image management server according to claim 14, wherein the receiving unit is configured to receive an audio file created by the imaging apparatus and transmitted from the imaging apparatus when receiving audio.

The receiving means is for receiving sound that is acquired by the photographing apparatus and transmitted as it is when receiving sound;
The image management server according to claim 14, wherein the image management server includes an audio file creating unit that creates an audio file storing the audio received by the receiving unit.

The receiving means further receives position information, date / time information, and schedule information transmitted from the photographing apparatus,
The additional information generating means generates additional information based on text obtained by recognizing speech by the speech recognizing means and also based on position information, date / time information, and schedule information received by the receiving means. The image management server according to claim 14.

In addition, it has a map database to manage the map,
15. The image management server according to claim 14, wherein the additional information generating means further generates additional information based on a map in the map database.

A face recognition means for recognizing a face reflected on the image from the image received by the receiving means;
The additional information generation means generates the additional information based on the text on which the voice is recognized by the voice recognition means and also based on the face on the image recognized by the face recognition means. The image management server according to claim 14, wherein:

It has a face information database that manages face information,
20. The image management server according to claim 19, wherein the additional information generating means further generates additional information based on face information in the face information database.

Face information adding means for generating new face information based on text recognized by the voice recognition means and a face on the image recognized by the face recognition means and adding the face information to the face information database; The image management server according to claim 20, wherein the image management server is an image management server.

The additional information generation means includes command recognition means for recognizing a command related to additional information generation based on text obtained by recognizing speech by the voice recognition means, and generates additional information according to the command recognized by the command recognition means. 15. The image management server according to claim 14, wherein the image management server is a server.

In an image management method in an image management server that communicates with an imaging device that forms an image by imaging a subject on an image sensor and receives and manages the image from the imaging device.
A receiving step of receiving an image transmitted from the photographing apparatus and receiving a sound transmitted from the photographing apparatus;
A storage step for storing the image received in the reception step;
A voice recognition step of recognizing the voice received in the reception step and converting it into text;
An additional information generation step for generating additional information to be added to the image based on the text in which the voice is recognized in the voice recognition step;
The storing step is a step of storing the image and storing the additional information generated in the additional information generating step in association with the image associated with the sound that is the basis of the additional information generation. An image management method characterized by the above.

24. The image management method according to claim 23, wherein the receiving step is a step of receiving an audio file created by the imaging apparatus and transmitted from the imaging apparatus when receiving audio.

The receiving step is a step of receiving the voice acquired by the photographing apparatus and transmitted as it is when receiving the voice, and further creating an audio file that stores the voice received in the receiving step The image management method according to claim 23, further comprising steps.

The reception step further receives position information, date / time information, and schedule information transmitted from the photographing apparatus,
The additional information generation step is a step of generating additional information based on the text obtained by recognizing the voice in the voice recognition step and also based on the position information, date / time information, and schedule information received in the reception step. 24. The image management method according to claim 23.

The image management server includes a map database for managing maps,
27. The image management method according to claim 26, wherein the additional information generating step is a step of generating additional information based on a map in the map database.

A face recognition step for recognizing a face reflected on the image from the image received in the reception step;
The additional information generation step is a step of generating additional information based on the text on which the voice is recognized in the voice recognition step and also based on the face on the image recognized in the face recognition step. The image management method according to claim 23, characterized in that:

The image management server further comprises a face information database for managing face information,
29. The image management method according to claim 28, wherein the additional information generating step is a step of generating additional information based on face information in the face information database.

A face information adding step of generating new face information based on the text in which the voice is recognized in the voice recognition step and the face on the image recognized in the face recognition step and adding the new face information to the face information database. 30. The image management method according to claim 29.

The additional information generation step includes a command recognition step for recognizing a command related to additional information generation based on text obtained by recognizing a voice in the voice recognition step, and generates additional information according to the command recognized in the command recognition step. 24. The image management method according to claim 23, wherein the image management method includes: