JP4866396B2

JP4866396B2 - Tag information adding device, tag information adding method, and computer program

Info

Publication number: JP4866396B2
Application number: JP2008178092A
Authority: JP
Inventors: 宏佐々木; 弘利岩崎
Original assignee: Denso IT Laboratory Inc
Current assignee: Denso IT Laboratory Inc
Priority date: 2008-07-08
Filing date: 2008-07-08
Publication date: 2012-02-01
Anticipated expiration: 2028-07-08
Also published as: JP2010021638A

Description

本発明は、複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を付加するタグ情報付加装置、タグ情報付加方法及びコンピュータプログラムに関する。
本発明は特に、ビデオカメラにより撮影された動画像ファイルや、スチルカメラなどにより撮影された静止画ファイルを構成する複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加するタグ情報付加装置、タグ情報付加方法及びコンピュータプログラムに関する。 The present invention relates to a tag information adding device, a tag information adding method, and a computer program for adding tag information for searching a user's desired image frame to a plurality of image frames.
The present invention particularly relates to tag information for searching for a desired image frame of a user for a plurality of image frames constituting a moving image file shot by a video camera or a still image file shot by a still camera or the like. The present invention relates to a tag information adding device, a tag information adding method, and a computer program.

従来技術として下記の特許文献１には、作業者が再生中の動画の所望のフレームなどを強調表示させるためにそのフレームにタグを付加するための入力操作を行う方法が記載されている。また、他の従来技術として下記の特許文献２には、作業者が画像フレーム内の所望の部分領域にキー画像を付加するための入力操作を行う方法が記載されている。
特開２００５−１８１５９９号公報（図１３）特開２００７−１９７６８号公報（図１０） As a conventional technique, Patent Document 1 below describes a method in which an operator performs an input operation for adding a tag to a frame in order to highlight a desired frame of a moving image being reproduced. Further, as another conventional technique, Patent Document 2 below describes a method in which an operator performs an input operation for adding a key image to a desired partial area in an image frame.
Japanese Patent Laying-Open No. 2005-181599 (FIG. 13) Japanese Patent Laying-Open No. 2007-19768 (FIG. 10)

しかしながら、上記従来技術では、作業者がタグを付加するための入力操作を行う必要があるので、入力操作作業に膨大な時間がかかるという問題点がある。 However, the above-described conventional technique has a problem that it takes an enormous amount of time for the input operation work because the operator needs to perform an input operation for adding a tag.

本発明は上記従来技術の問題点に鑑み、動画像ファイルや静止画ファイルを構成する複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加することができるタグ情報付加装置、タグ情報付加方法及びコンピュータプログラムを提供することを目的とする。 In view of the above problems of the prior art, the present invention automatically generates tag information for searching for a desired image frame for a plurality of image frames constituting a moving image file or a still image file. It is an object to provide a tag information adding device, a tag information adding method, and a computer program that can be added.

上記目的を達成するために、本発明のタグ情報付加装置は、撮像手段により撮像されて生成された複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加するタグ情報付加装置であって、前記撮像手段による撮像中に前記撮像手段の現在位置を検出する現在位置検出手段と、前記現在位置検出手段により検出された現在位置に基づき、地理的な名称を有する地図データを用いて、現在位置近傍の地理的な名称を周辺キーワードとして抽出する周辺キーワード抽出手段と、前記撮像手段による撮像中にユーザの音声を認識する音声認識手段と、前記音声認識手段により音声が認識されない場合に、前記周辺キーワード抽出手段により抽出された周辺キーワードを、前記現在位置が検出された時点の画像フレームに対してタグ情報として付加する第１のタグ情報付加手段と、前記音声認識手段により音声が認識された場合に、認識された音声から名詞を抽出し、抽出した当該名詞と前記周辺キーワード抽出手段により抽出された周辺キーワードとを比較して、一致した名詞を前記音声が認識された時点の画像フレームに対してタグ情報として付加し、一致しない場合には前記名詞又は前記周辺キーワードを前記音声が認識された時点の画像フレームに対してタグ情報として付加する第２のタグ情報付加手段とを、備えた。 In order to achieve the above object, the tag information adding apparatus of the present invention automatically adds tag information for searching for a user's desired image frame to a plurality of image frames imaged and generated by the imaging means. A tag information adding device that generates and adds to a current position detecting means for detecting a current position of the imaging means during imaging by the imaging means, and a current position detected by the current position detecting means, Using map data having a geographical name, peripheral keyword extracting means for extracting a geographical name near the current position as a peripheral keyword, voice recognition means for recognizing a user's voice during imaging by the imaging means, When the voice is not recognized by the voice recognition means, the current position is detected from the peripheral keywords extracted by the peripheral keyword extraction means. Wherein the first tag information adding means for adding a tag information to the image frame time, if the voice is recognized by the voice recognition unit, from the recognized speech to extract noun, extracted with the noun Comparing with the peripheral keyword extracted by the peripheral keyword extracting means, the matched noun is added as tag information to the image frame at the time when the speech is recognized, and if it does not match, the noun or the peripheral keyword a second tag information adding means for adding a tag information to the image frame at the time when the voice is recognized, with.

また上記目的を達成するために、本発明のタグ情報付加方法は、撮像手段により撮像されて生成された複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加するタグ情報付加方法であって、前記撮像手段による撮像中に前記撮像手段の現在位置を検出する現在位置検出ステップと、前記現在位置検出ステップで検出された現在位置と地図データに基づいて、現在位置近傍の地理的な名称を周辺キーワードとして抽出する周辺キーワード抽出ステップと、前記撮像手段による撮像中にユーザの音声を認識する音声認識ステップと、前記音声認識ステップで音声が認識されない場合に、前記周辺キーワード抽出ステップで抽出された周辺キーワードを、前記現在位置が検出された時点の画像フレームに対してタグ情報として付加する第１のタグ情報付加ステップと、前記音声認識ステップで音声が認識された場合に、認識された音声から名詞を抽出し、抽出した当該名詞と前記周辺キーワード抽出ステップで抽出された周辺キーワードとを比較して、一致した名詞を前記音声が認識された時点の画像フレームに対してタグ情報として付加し、一致しない場合には前記名詞又は前記周辺キーワードを前記音声が認識された時点の画像フレームに対してタグ情報として付加する第２のタグ情報付加ステップとを、備えた。 In order to achieve the above object, the tag information adding method of the present invention automatically applies tag information for searching a user's desired image frame to a plurality of image frames captured and generated by the imaging means. generated by a tag information adding process for adding, the current position and the current position detection step of detecting, the detected current position detecting step current position and the map of the imaging means in the imaging by the imaging means based on the data, and a peripheral keyword extracting a geographical name of the current location near the neighborhood keyword, a speech recognition step for recognizing the voice of the user during imaging by the imaging means, the voice by the speech recognition step If not recognized, the field at which the peripheral keyword extracted by the peripheral keyword extraction step, the current position is detected A first tag information adding step of adding a tag information to the frame, when the voice is recognized by the speech recognition step extracts the noun from the recognized speech, the peripheral keyword extraction and extracted the noun Comparing with the peripheral keywords extracted in the step, the matched noun is added as tag information to the image frame at the time when the voice is recognized, and if it does not match, the noun or the peripheral keyword is added to the voice And a second tag information adding step for adding as tag information to the image frame at the time when is recognized .

また上記目的を達成するため、本発明のコンピュータプログラムは、撮像手段により撮像されて生成された複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報をコンピュータが自動的に生成して付加するためのコンピュータプログラムであって、前記撮像手段による撮像中に前記撮像手段の現在位置を検出する現在位置検出ステップと、前記現在位置検出ステップで検出された現在位置と地図データに基づいて、現在位置近傍の地理的な名称を周辺キーワードとして抽出する周辺キーワード抽出ステップと、前記撮像手段による撮像中にユーザの音声を認識する音声認識ステップと、前記音声認識ステップで音声が認識されない場合に、前記周辺キーワード抽出ステップで抽出された周辺キーワードを、前記現在位置が検出された時点の画像フレームに対してタグ情報として付加する第１のタグ情報付加ステップと、前記音声認識ステップで音声が認識された場合に、認識された音声から名詞を抽出し、抽出した当該名詞と前記周辺キーワード抽出ステップで抽出された周辺キーワードとを比較して、一致した名詞を前記音声が認識された時点の画像フレームに対してタグ情報として付加し、一致しない場合には前記名詞又は前記周辺キーワードを前記音声が認識された時点の画像フレームに対してタグ情報として付加する第２のタグ情報付加ステップとを、備えた。
In order to achieve the above object, the computer program according to the present invention enables the computer to automatically generate tag information for searching for a user's desired image frame for a plurality of image frames captured and generated by the imaging means. A computer program for generating and adding to a current position detecting step for detecting a current position of the imaging means during imaging by the imaging means, and a current position and map data detected in the current position detecting step Based on the above, a peripheral keyword extraction step of extracting a geographical name near the current position as a peripheral keyword, a voice recognition step of recognizing a user's voice during imaging by the imaging means, and voice recognition by the voice recognition step if not, the peripheral keyword extracted by the peripheral keyword extracting step, before A first tag information adding step of adding a tag information to the image frames of the current when the position is detected, when the voice is recognized by the speech recognition step extracts the noun from the recognized speech, When the extracted noun is compared with the peripheral keyword extracted in the peripheral keyword extraction step, the matched noun is added as tag information to the image frame at the time when the speech is recognized. A second tag information adding step of adding the noun or the peripheral keyword as tag information to the image frame at the time when the sound is recognized .

この構成により、動画像ファイルや静止画ファイルを構成する複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加することができる。 With this configuration, tag information for searching for a user's desired image frame can be automatically generated and added to a plurality of image frames constituting a moving image file or a still image file.

本発明によれば、撮像手段により撮像されて生成された動画像ファイルや静止画ファイルを構成する複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加することができる。 According to the present invention, tag information for searching for a desired image frame of a user is automatically added to a plurality of image frames constituting a moving image file or a still image file generated by being imaged by an imaging means. Can be generated and added.

以下、図面を参照して本発明の実施の形態について説明する。図１は本発明に係るタグ情報付加装置の一実施の形態を示すブロック図、図２は図１の動画処理装置の動作を説明するためのフローチャートである。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a tag information adding apparatus according to the present invention, and FIG. 2 is a flowchart for explaining the operation of the moving picture processing apparatus of FIG.

図１において、撮像手段である車外カメラ１１は不図示の車両の例えば前方などの周辺動画を撮影し、他の撮像手段である車内カメラ１２は車両内の例えば運転者などの車内動画を撮影するように配置される。車内マイク１３は車両内の運転者などの声を録音し、ＧＰＳ（Global Positioning System）部１４は車両の現在位置（緯度、経度）を取得する。音声認識装置１５は車内マイク１３により録音された音声を認識する。動画処理装置１６は、車外カメラ１１、車内カメラ１２により撮影された各動画を構成する複数フレームのうちの任意のフレームに対して、車内マイク１３及び音声認識装置１５により得られた音声データと、ＧＰＳ部１４により取得された現在位置に基づいて検索用のタグなどを自動的に生成して付加し、タグ付き動画データとして動画記録装置１７に記録する。ここで、動画処理装置１６は地図データをあらかじめストアしており、現在位置近傍における地理的な名称（地名や、山、川、橋、ビルなどの名前）を地図データから周辺キーコードとして取得することができる。取得する周辺キーコードの数は、複数が望ましく、例えば２０個程度とする。 In FIG. 1, an outside camera 11 that is an image capturing unit captures a surrounding moving image of a vehicle (not shown) such as a front, and an in-vehicle camera 12 that is another image capturing unit captures an in-vehicle moving image such as a driver inside the vehicle. Are arranged as follows. The in-vehicle microphone 13 records the voice of the driver in the vehicle, and the GPS (Global Positioning System) unit 14 acquires the current position (latitude, longitude) of the vehicle. The voice recognition device 15 recognizes the voice recorded by the in-vehicle microphone 13. The moving image processing device 16 is configured to obtain audio data obtained by the in-vehicle microphone 13 and the voice recognition device 15 for an arbitrary frame among a plurality of frames constituting each moving image captured by the in-vehicle camera 11 and the in-vehicle camera 12; A search tag or the like is automatically generated and added based on the current position acquired by the GPS unit 14 and recorded in the moving image recording device 17 as tagged moving image data. Here, the moving image processing device 16 stores map data in advance, and acquires a geographical name (name of place, name of mountain, river, bridge, building, etc.) in the vicinity of the current position as a peripheral key code from the map data. be able to. The number of peripheral key codes to be acquired is preferably a plurality, for example, about 20.

図２を参照して動画処理装置１６の動作を説明する。ここで、動画データの動画情報の構成は、動画データ名（例えば撮影順を示す番号）と、ユーザの所望の画像フレームを検索するためのタグとなるキーワードと、同じタグが付与される開始フレーム番号及び終了フレーム番号と位置データ（緯度、経度）などを含む。ここで、検索表示時に見やすいように、同じタグが付与されるフレーム数を一義的に１００フレーム（＝１ブロック）とする。 The operation of the moving image processing device 16 will be described with reference to FIG. Here, the structure of the moving image information of the moving image data includes a moving image data name (for example, a number indicating the shooting order), a keyword serving as a tag for searching for a user's desired image frame, and a start frame to which the same tag is assigned Number, end frame number, position data (latitude, longitude) and the like are included. Here, the number of frames to which the same tag is assigned is uniquely 100 frames (= 1 block) so that it can be easily seen at the time of search display.

まず、車外カメラ１１、車内カメラ１２からの動画データが存在するか否かをチェックし（ステップＳ１）、存在しなければ終了する。ステップＳ１において動画データが存在する場合にはステップＳ２に進み、動画データ名を動画情報に記述する。次いでＧＰＳ部１４からの位置データが存在するか否かをチェックし（ステップＳ３）、存在しなければステップＳ１１に進み、動画情報を動画記録装置１７に記録する。ステップＳ３において位置データが存在する場合にはステップＳ４に進み、その位置データを動画情報に記述する。次いでその位置データと地図データに基づいて現在位置近傍の１以上の地理的な名称を周辺キーワードとして抽出する（ステップＳ５）。 First, it is checked whether there is moving image data from the outside camera 11 and the inside camera 12 (step S1). If the moving image data exists in step S1, the process proceeds to step S2, and the moving image data name is described in the moving image information. Next, it is checked whether or not position data from the GPS unit 14 exists (step S3). If there is no position data, the process proceeds to step S11, and the moving image information is recorded in the moving image recording device 17. If position data exists in step S3, the process proceeds to step S4, and the position data is described in the moving image information. Next, one or more geographical names near the current position are extracted as peripheral keywords based on the position data and map data (step S5).

次いで音声認識装置１５からの音声データが存在するか否かをチェックする（ステップＳ６）。音声データが存在しない場合にはステップＳ１２に進み、位置データを取得した時点の動画フレーム番号を開始フレーム番号としてその１００フレーム後を終了フレーム番号とし、また、ステップＳ５で位置データから抽出した１以上の周辺キーワードを動画情報のタグに記述し、次いでステップＳ１１に進んで動画情報を動画記録装置１７に記録する。 Next, it is checked whether or not there is voice data from the voice recognition device 15 (step S6). If there is no audio data, the process proceeds to step S12, where the moving image frame number at the time when the position data is acquired is set as the start frame number, and the frame after 100 frames is set as the end frame number, and one or more extracted from the position data in step S5 Are described in the tag of the moving image information, and then the process proceeds to step S11 to record the moving image information in the moving image recording device 17.

ステップＳ６において音声データが存在する場合には、ステップＳ７に進んでその音声データから名詞を抽出し、次いでその抽出した名詞と、ステップＳ５で位置データから抽出した１以上の周辺キーワードをマッチングする（ステップＳ８）。もしマッチングしない場合には、ステップＳ９に分岐して位置データから抽出した１以上の周辺キーワードを動画情報のタグに記述し、他方、マッチングした場合にはステップＳ１０に進んでマッチングした名詞を動画情報のタグに記述する。また、ステップＳ９、Ｓ１０では、音声データを取得した時点の動画フレーム番号を開始フレーム番号としてその１００フレーム後を終了フレーム番号とし、次いでステップＳ１１に進んで動画情報を動画記録装置１７に記録する。 If there is speech data in step S6, the process proceeds to step S7, where a noun is extracted from the speech data, and then the extracted noun is matched with one or more peripheral keywords extracted from the location data in step S5 ( Step S8). If not matched, one or more peripheral keywords extracted from the position data branching to step S9 are described in the video information tag. On the other hand, if matched, the process proceeds to step S10 and the matched noun is converted to the video information. Describe in the tag. In steps S9 and S10, the moving image frame number at the time when the audio data is acquired is set as the starting frame number, and the frame after 100 frames is set as the ending frame number. Then, the process proceeds to step S11 and moving image information is recorded in the moving image recording device 17.

図３は富士山の近傍を走行して撮影した動画データに対して付与した動画情報の例を示す。図３では、富士山の動画データの開始フレーム番号＝１から終了フレーム番号＝１００までの１ブロックに対して、緯度データ（latitude）及び経度データ（Longitude）と、キーワード＝御殿場を付与するとともに、開始フレーム番号＝９０から終了フレーム番号＝１８９までの１ブロックに対して、緯度データ（latitude）及び経度データ（Longitude）と、キーワード＝富士山を付与した例を示す。 FIG. 3 shows an example of moving image information given to moving image data shot in the vicinity of Mt. Fuji. In FIG. 3, latitude data (latitude) and longitude data (Longitude) and a keyword = Gotemba are assigned to one block from the start frame number = 1 to the end frame number = 100 of the video data of Mt. Fuji. An example is shown in which latitude data (latitude) and longitude data (Longitude) and keyword = Mt. Fuji are assigned to one block from frame number = 90 to end frame number = 189.

図４は音声データが存在しない富士山の動画データに対して付与した動画情報の例を示す。図４では、開始フレーム番号＝１から終了フレーム番号＝１００までの１ブロックに対して、緯度及び経度データと、図３においてステップＳ５で位置データから抽出した周辺キーワード＝御殿場、富士山、足柄を付与した例を示す。 FIG. 4 shows an example of moving image information given to moving image data of Mt. Fuji for which no audio data exists. In FIG. 4, latitude and longitude data and peripheral keywords extracted from position data in step S5 in FIG. 3 = Gotemba, Mt. Fuji, and Ashigara are assigned to one block from start frame number = 1 to end frame number = 100. An example is shown.

図５は動画データの途中（フレーム番号＝９０）から音声データ「今日の富士山、きれいだね」が存在する富士山の動画データに対して付与した動画情報の例を示す。図５では、開始フレーム番号＝１から終了フレーム番号＝１００までの１ブロックに対して、緯度及び経度データと、ステップＳ５で位置データから抽出した周辺キーワード＝御殿場を付与するとともに、開始フレーム番号＝９０から終了フレーム番号＝１８９までの１ブロックに対して、緯度及び経度データと、音声認識により抽出した名詞「富士山」を付与した例を示す。 FIG. 5 shows an example of the moving image information given to the moving image data of Mt. Fuji in which the audio data “Today's Mt. Fuji, beautiful” exists from the middle of the moving image data (frame number = 90). In FIG. 5, latitude and longitude data and peripheral keywords extracted from the position data in step S5 = Gotemba are assigned to one block from start frame number = 1 to end frame number = 100, and start frame number = An example in which latitude and longitude data and the noun “Mt. Fuji” extracted by speech recognition are assigned to one block from 90 to end frame number = 189.

＜本発明の適用例＞
図６は、本発明に係るタグ情報付加装置１０が適用されたシステムを示す。図６に示すシステムでは、本発明に係るタグ情報付加装置１０は、車両Ｖに搭載されて動画データベース（ＤＢ）１０ａとして使用され、車両Ｖの走行中に撮影された周辺動画や車内動画を構成する複数フレームのうちの任意のフレームに対して、車両Ｖの走行中に録音された音声データと現在位置に基づいて検索用のタグを自動的に生成して付加し、タグ付き動画データとして記録する。タグ情報付加装置１０（動画ＤＢ１０ａ）に記録された動画データは、サーバ１にアップロードしてサーバ１からユーザＹの携帯電話機２や、ＰＣ（パーソナルコンピュータ）３、車載情報端末４にダウンロードする。 <Application example of the present invention>
FIG. 6 shows a system to which the tag information adding apparatus 10 according to the present invention is applied. In the system shown in FIG. 6, the tag information adding device 10 according to the present invention is mounted on a vehicle V and used as a moving image database (DB) 10a, and constitutes a surrounding moving image and an in-vehicle moving image shot while the vehicle V is traveling. A search tag is automatically generated and added to any frame of a plurality of frames that are recorded while the vehicle V is traveling based on the current position and recorded as tagged moving image data. To do. The moving image data recorded in the tag information adding device 10 (moving image DB 10a) is uploaded to the server 1 and downloaded from the server 1 to the mobile phone 2 of the user Y, the PC (personal computer) 3, and the in-vehicle information terminal 4.

また、本発明の撮像手段はビデオカメラに限定されず、通常の携帯型のビデオカメラ、デジタルスチルカメラ、携帯電話機内蔵のカメラにも適用することができる。また、タグ付加対象の映像は、動画に限定されず、複数枚の静止画により構成される画像ファイルにも適用することができる。 The imaging means of the present invention is not limited to a video camera, but can be applied to a normal portable video camera, a digital still camera, and a camera with a built-in mobile phone. Also, the tag addition target video is not limited to a moving image, and can be applied to an image file composed of a plurality of still images.

本発明は、撮像手段により撮像されて生成される動画像ファイルや静止画ファイルを構成する複数の画像フレームに対して、ユーザの所望の画像フレームを検索するためのタグ情報を自動的に生成して付加することができるという効果を有し、ビデオカメラ、スチルカメラ、携帯電話機などに利用することができる。 The present invention automatically generates tag information for searching for a user's desired image frame for a plurality of image frames constituting a moving image file or a still image file generated by being picked up by an image pickup means. And can be used for a video camera, a still camera, a mobile phone, and the like.

本発明に係るタグ情報付加装置の一実施の形態を示すブロック図である。It is a block diagram which shows one Embodiment of the tag information addition apparatus which concerns on this invention. 図１の動画処理装置の動作を説明するためのフローチャートである。3 is a flowchart for explaining the operation of the moving image processing apparatus of FIG. 1. 動画情報を示す説明図である。It is explanatory drawing which shows moving image information. 音声データがない場合の動画情報を示す説明図である。It is explanatory drawing which shows the moving image information when there is no audio | voice data. 音声データがある場合の動画情報を示す説明図である。It is explanatory drawing which shows the moving image information when there exists audio | voice data. 本発明に係るタグ情報付加装置を適用したシステムを示す説明図である。It is explanatory drawing which shows the system to which the tag information addition apparatus which concerns on this invention is applied.

Explanation of symbols

１１車外カメラ
１２車内カメラ
１３車内マイク
１４ＧＰＳ部
１５音声認識装置
１６動画処理装置
１７動画記録装置 DESCRIPTION OF SYMBOLS 11 Out-of-vehicle camera 12 In-vehicle camera 13 In-vehicle microphone 14 GPS part 15 Voice recognition device 16 Movie processing device 17 Movie recording device

Claims

A tag information adding device that automatically generates and adds tag information for searching for a desired image frame of a user to a plurality of image frames that are captured and generated by an imaging unit,
A current position detecting means for detecting a current position of the imaging means during imaging by the imaging means;
Based on the current position detected by the current position detection means, using map data having a geographical name, a peripheral keyword extraction means for extracting a geographical name near the current position as a peripheral keyword;
Voice recognition means for recognizing a user's voice during imaging by the imaging means;
First tag information addition for adding the peripheral keyword extracted by the peripheral keyword extraction unit as tag information to the image frame at the time when the current position is detected when no voice is recognized by the voice recognition unit Means,
When a speech is recognized by the speech recognition means, a noun is extracted from the recognized speech, the extracted noun is compared with the peripheral keywords extracted by the peripheral keyword extraction means, and the matched noun is A tag information is added as tag information to the image frame at the time when the voice is recognized, and if it does not match, the noun or the peripheral keyword is added as tag information to the image frame at the time when the voice is recognized. Tag information adding means of
Provided tag information adding device.

The first and second tag information adding means respectively add the same tag information to a predetermined number of image frames from the image frame at the time when the current position is detected and the image frame at the time when the sound is recognized. The tag information adding apparatus according to claim 1 .

The tag information adding apparatus according to claim 1 or 2, wherein the plurality of image frames are moving image data photographed by a video camera used as the imaging unit.

4. The tag information adding device according to claim 1, wherein the plurality of image frames are still image files photographed by a still camera used as the imaging unit or a camera built in a mobile phone. 5.

A tag information adding method for automatically generating and adding tag information for searching for a user's desired image frame to a plurality of image frames captured and generated by an imaging means,
A current position detecting step for detecting a current position of the imaging means during imaging by the imaging means;
Based on the current position detected in the current position detection step and map data, a peripheral keyword extraction step for extracting a geographical name near the current position as a peripheral keyword;
A voice recognition step of recognizing a user's voice during imaging by the imaging means;
First tag information addition for adding the peripheral keyword extracted in the peripheral keyword extraction step as tag information to the image frame at the time when the current position is detected when no voice is recognized in the voice recognition step Steps,
When a speech is recognized in the speech recognition step, a noun is extracted from the recognized speech, the extracted noun is compared with the peripheral keywords extracted in the peripheral keyword extraction step, and the matched noun is A tag information is added as tag information to the image frame at the time when the voice is recognized, and if it does not match, the noun or the peripheral keyword is added as tag information to the image frame at the time when the voice is recognized. Tag information addition step of
Provided tag information addition method.

A computer program for automatically generating and adding tag information for searching for a user's desired image frame to a plurality of image frames captured and generated by an imaging means,
A current position detecting step for detecting a current position of the imaging means during imaging by the imaging means;
Based on the current position detected in the current position detection step and map data, a peripheral keyword extraction step for extracting a geographical name near the current position as a peripheral keyword;
A voice recognition step of recognizing a user's voice during imaging by the imaging means;
First tag information addition for adding the peripheral keyword extracted in the peripheral keyword extraction step as tag information to the image frame at the time when the current position is detected when no voice is recognized in the voice recognition step Steps,
When a speech is recognized in the speech recognition step, a noun is extracted from the recognized speech, the extracted noun is compared with the peripheral keywords extracted in the peripheral keyword extraction step, and the matched noun is A tag information is added as tag information to the image frame at the time when the voice is recognized, and if it does not match, the noun or the peripheral keyword is added as tag information to the image frame at the time when the voice is recognized. Tag information addition step of
Computer program provided.