JP5320913B2

JP5320913B2 - Imaging apparatus and keyword creation program

Info

Publication number: JP5320913B2
Application number: JP2008226871A
Authority: JP
Inventors: 律子冬木
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2008-09-04
Filing date: 2008-09-04
Publication date: 2013-10-23
Anticipated expiration: 2028-09-04
Also published as: JP2010061426A

Abstract

<P>PROBLEM TO BE SOLVED: To automatically associate a proper keyword with an image file. <P>SOLUTION: An object is imaged to generate image data, and voice data at the imaging are acquired. The color distribution or luminance distribution of the image data is checked by using an image analyzing function, and scene candidates are extracted from a table in which the color distribution or luminance distribution is associated with the scene. On the other hand, frequency or waveform of the voice data are checked by using a voice analyzing function, and the scene candidates are extracted from the table in which the frequency and waveform are associated with the scene. The extracted scene candidates are compared with each other so that the common candidate can be used as a keyword. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音データを利用して画像データに対するキーワードを生成する撮像装置およびキーワード作成プログラムに関する。 The present invention relates to an imaging device and a keyword creation program that generate keywords for image data using sound data.

撮像によって画像データを取得するとともに、撮像時の周囲の音データを取得し、音データから音の内容を判断し、音の内容をキーワードとして画像データに付与することが可能な装置がある。例えば、撮像時に波の音を取得したときには、「波の音」をキーワードとして画像ファイルのメタデータ領域に付与する（例えば、特許文献１参照）。
付与されたキーワードは、例えば画像検索におけるタグとして利用される他、画像ファイルの分類にも利用される。 There is an apparatus that can acquire image data by imaging, acquire surrounding sound data at the time of imaging, determine the sound content from the sound data, and assign the sound content as a keyword to the image data. For example, when a wave sound is acquired at the time of imaging, “wave sound” is assigned to the metadata area of the image file as a keyword (see, for example, Patent Document 1).
The assigned keyword is used, for example, as a tag in image search and also for classification of image files.

特開２００５−３４６４４０号公報（段落００５３等）JP 2005-346440 A (paragraph 0053, etc.)

しかしながら、音データのみで音の内容を正確に特定するのは難しく、上記特許文献１の方法では、誤った情報がキーワードとして取得されるおそれがある。例えば、波の音は雑踏の音と非常に似通っており、海とは離れた街中で撮影した画像データに対し、「波の音」が付与されることがあり得る。 However, it is difficult to accurately specify the content of the sound only with the sound data, and there is a possibility that erroneous information may be acquired as a keyword in the method of Patent Document 1. For example, the sound of a wave is very similar to the sound of a hustle and a “wave sound” may be added to image data taken in a city far from the sea.

本発明に係る撮像装置は、被写体を撮像して画像データを生成する撮像手段と、撮像時の音データを取得する音取得手段と、画像データの色分布を解析した結果とキーワードとを対応付けた第１テーブルと、音データの周波数を解析した結果とキーワードとを対応付けた第２テーブルとを記憶する記憶手段と、生成された画像データの色分布を解析して第１テーブルより対応するキーワードを抽出し、取得された音データの周波数を解析して第２テーブルより対応するキーワードを抽出し、抽出した各キーワードに共通するキーワードを決定する決定手段と、決定されたキーワードを画像データに対応づけて記録する記録手段とを具備することを特徴とする。
本発明に係るキーワード作成プログラムは、コンピュータを、被写体を撮像して画像データを生成させる手段、撮像時の音データを取得させる手段、画像データの色分布を解析した結果とキーワードとを対応付けた第１テーブルと、音データの周波数を解析した結果とキーワードとを対応付けた第２テーブルとを記憶する手段、生成された画像データの色分布を解析して第１テーブルより対応するキーワードを抽出し、取得された音データの周波数を解析して第２テーブルより対応するキーワードを抽出し、抽出した各キーワードに共通するキーワードを決定する手段、決定されたキーワードを画像データに対応づけて記録する手段として機能させる。 An imaging apparatus according to the present invention associates an imaging unit that images a subject to generate image data, a sound acquisition unit that acquires sound data at the time of imaging, a result of analyzing a color distribution of the image data, and a keyword Storage means for storing the first table, a second table in which the result of analyzing the frequency of the sound data and the keywords are associated, and the color distribution of the generated image data is analyzed to correspond to the first table. A keyword is extracted, a frequency corresponding to the acquired sound data is analyzed, a corresponding keyword is extracted from the second table, a keyword common to each extracted keyword is determined, and the determined keyword is used as image data. Recording means for recording in association with each other.
The keyword creating program according to the present invention associates a computer with a means for capturing an image of a subject to generate image data, a means for acquiring sound data at the time of image capturing, a result of analyzing a color distribution of image data, and a keyword. Means for storing a first table and a second table in which the result of analyzing the frequency of sound data and a keyword are associated; and analyzing the color distribution of the generated image data and extracting the corresponding keyword from the first table And analyzing the frequency of the acquired sound data to extract a corresponding keyword from the second table, determining a keyword common to each extracted keyword, and recording the determined keyword in association with the image data. It functions as a means.

本発明によれば、画像ファイルに正確なキーワードを自動的に対応づけることが可能となる。 According to the present invention, it is possible to automatically associate an accurate keyword with an image file.

図１〜図６により本発明の一実施形態を説明する。
図１は本実施形態におけるデジタルカメラの制御ブロック図である。撮影レンズ１を透過した被写体光束は、撮像素子２で撮像され、その撮像信号は画像処理部３に入力される。画像処理部３を構成する画像処理回路３ａは、入力された撮像信号に種々の処理を施して画像データを生成する。画像データは、表示回路３ｂによる処理を経て、カメラ背面などに設けられた液晶モニタ４に表示される。撮影モード設定時には、上記撮像および画像表示が繰り返され、いわゆるライブビュー表示（スルー画表示）が行われる。 An embodiment of the present invention will be described with reference to FIGS.
FIG. 1 is a control block diagram of the digital camera in the present embodiment. The subject luminous flux that has passed through the photographic lens 1 is imaged by the imaging device 2, and the imaging signal is input to the image processing unit 3. The image processing circuit 3a constituting the image processing unit 3 performs various processes on the input imaging signal to generate image data. The image data is displayed on a liquid crystal monitor 4 provided on the back of the camera or the like through processing by the display circuit 3b. When the shooting mode is set, the above imaging and image display are repeated, and so-called live view display (through image display) is performed.

撮像指示がなされると改めて撮像が行われ、生成された画像データは、記録／再生回路３ｃを介してメモリカード等の記録媒体５に画像ファイルとして記録される。一般に画像ファイルは、画像本体を構成するデータ（いわゆる画像データ）と、付加情報としてのメタデータとを組み合わせて作成される。 When an imaging instruction is given, imaging is performed again, and the generated image data is recorded as an image file on the recording medium 5 such as a memory card via the recording / reproducing circuit 3c. In general, an image file is created by combining data (so-called image data) constituting an image main body and metadata as additional information.

また、カメラにはマイク７およびスピーカ８が設けられ、撮像時にマイク７から入力された周囲の音を音データとして取得し、音データに基づいて作成された音ファイルを画像ファイルに対応づけて記録することができる。以下、音ファイルが対応づけられた画像ファイルを「音付き画像ファイル」と呼ぶ。 Further, the camera is provided with a microphone 7 and a speaker 8, the surrounding sound input from the microphone 7 at the time of imaging is acquired as sound data, and a sound file created based on the sound data is recorded in association with the image file. can do. Hereinafter, an image file associated with a sound file is referred to as an “image file with sound”.

カメラを再生モードに設定すると、記録媒体５に記録された画像ファイルを記録／再生回路３ｃにて読み出し、画像処理回路３ａおよび表示回路３ｂによる処理を経て液晶モニタ４に画像として表示することができる。音付き画像ファイルについては、画像再生に同期して対応する音データをスピーカ８から再生することができる。 When the camera is set to the playback mode, the image file recorded on the recording medium 5 can be read out by the recording / playback circuit 3c and displayed as an image on the liquid crystal monitor 4 through processing by the image processing circuit 3a and the display circuit 3b. . As for the image file with sound, the corresponding sound data can be reproduced from the speaker 8 in synchronization with the image reproduction.

ＣＰＵ６は、操作部９からの入力に応答して画像処理部や不図示の回路を制御する。操作部９は、電源ボタンやレリーズボタン、再生操作や情報入力等で用いる各種操作部材等を含む。 The CPU 6 controls an image processing unit and a circuit (not shown) in response to an input from the operation unit 9. The operation unit 9 includes a power button, a release button, various operation members used for reproduction operation, information input, and the like.

ところで、画像ファイルの検索や分類を支援するために、各画像ファイルに対してキーワード（タグ）を対応づけて記憶することが行われている。対応づけは、例えば各画像ファイルのメタデータ領域に、文字情報としてのキーワードを記述することでなされる（一般には、画像ファイルにキーワードを埋め込むといわれる）。本実施形態のカメラは、音付き画像ファイルに対し、画像データと音データとを解析して自動的にキーワードを生成し、そのキーワードを画像ファイルに埋め込むことが可能となっている。 Incidentally, in order to support search and classification of image files, a keyword (tag) is associated with each image file and stored. The association is made, for example, by describing a keyword as character information in the metadata area of each image file (generally, it is said that the keyword is embedded in the image file). The camera of the present embodiment can automatically generate a keyword by analyzing image data and sound data for an image file with sound, and embed the keyword in the image file.

この自動キーワード生成を実現するために、カメラは、画像解析機能を用いて例えば画像データの色分布や輝度分布を調べ、これらとシーンとを対応づけたテーブルからシーン候補を抽出する。一方、音解析機能を用いて音データの周波数、波形等を調べ、それらとシーンとを対応づけたテーブルからシーン候補を抽出する。それぞれ抽出したシーン候補を対比し、共通のものをキーワードとする。 In order to realize this automatic keyword generation, the camera examines, for example, the color distribution and luminance distribution of image data using an image analysis function, and extracts scene candidates from a table in which these are associated with scenes. On the other hand, the sound analysis function is used to check the frequency, waveform, etc. of the sound data, and a scene candidate is extracted from a table in which these are associated with the scene. Each extracted scene candidate is compared and a common one is used as a keyword.

一例として、海水浴で撮影した音付き画像ファイルがあるとする。画像解析においては、画像中にブルー系の占める割合が多いため、シーン候補として「青空」，「海」などが抽出される。一方、音解析によれば、音の周波数や波形から、シーン候補として「雑踏」，「海」（波の音）などが抽出される。この場合、両抽出候補において「海」が共通するため、最終的にキーワードは「海」となる。 As an example, it is assumed that there is an image file with sound taken in a sea bathing. In the image analysis, since the ratio of blue is large in the image, “blue sky”, “sea” and the like are extracted as scene candidates. On the other hand, according to sound analysis, “busy”, “sea” (wave sound), etc. are extracted as scene candidates from the frequency and waveform of sound. In this case, since “sea” is common to both extraction candidates, the keyword is finally “sea”.

他の例として、打ち上げ花火を撮影した音付き画像があるとする。画像解析によれば、全体として暗い中に明るい点が複数あることから、シーン候補として「夜景」，「花火」が抽出される。音解析によれば、爆音から「爆発」，「花火」などが抽出される。したがって、キーワードは「花火」となる。 As another example, it is assumed that there is an image with sound obtained by shooting a fireworks display. According to the image analysis, since there are a plurality of bright points in the dark as a whole, “night view” and “fireworks” are extracted as scene candidates. According to sound analysis, “explosion”, “fireworks” and the like are extracted from the explosion sound. Therefore, the keyword is “fireworks”.

このように、画像データと音データの双方からキーワードを作成するようにしたことで、音データのみから作成する場合と比べて正確なキーワードが得られる。 As described above, since the keyword is created from both the image data and the sound data, an accurate keyword can be obtained as compared with the case where the keyword is created only from the sound data.

なお、多くのシーンに対応させるには、上記テーブル中の情報が充実している必要がある。例えばカメラメーカーが、Ｗｅｂサイトなどで新たな情報を随時公開し、これをユーザがダウンロードしてテーブルに追加したり変更できるようにすることが望ましい。 In order to deal with many scenes, the information in the table needs to be enriched. For example, it is desirable that a camera maker publishes new information on a website or the like as needed, and allows the user to download it and add it to the table or change it.

キーワードの生成および画像ファイルへの埋め込みは、撮影時に自動的に行われるようにしてもよいし、既に撮影された音付き画像ファイルに対しても行えるようにしてもよい。前者の場合は、撮像モードとして「自動キーワード付与モード」を設ければよい。 Generation of keywords and embedding in an image file may be performed automatically at the time of shooting, or may be performed on an image file with sound that has already been shot. In the former case, an “automatic keyword assignment mode” may be provided as the imaging mode.

図２は自動キーワード付与モード設定時の処理手順の一例を示している。レリーズボタンが全押し操作されるとＣＰＵ６がこのプログラムを起動し、ステップＳ１で撮像および録音を行い、画像データおよび音データをそれぞれ取得する。ステップＳ２，Ｓ３で画像解析および音解析をそれぞれ行い、ステップＳ４において、音データから音声ファイルを生成する。ステップＳ５では、ステップＳ２，Ｓ３での解析結果に基づいて上述したようにキーワードを抽出する。ステップＳ６では、抽出されたキーワードをメタデータ領域に記述し、そのメタデータと画像データとを組み合わせて画像ファイルを生成する。ステップＳ７では、画像ファイルと音ファイルとを対応づけて記録媒体５に記録する。 FIG. 2 shows an example of a processing procedure when the automatic keyword assignment mode is set. When the release button is fully pressed, the CPU 6 activates this program, captures and records in step S1, and acquires image data and sound data, respectively. Image analysis and sound analysis are performed in steps S2 and S3, respectively, and an audio file is generated from the sound data in step S4. In step S5, keywords are extracted as described above based on the analysis results in steps S2 and S3. In step S6, the extracted keyword is described in the metadata area, and the metadata and image data are combined to generate an image file. In step S7, the image file and the sound file are associated and recorded on the recording medium 5.

ここで、画像ファイルと音ファイルとを記録する際に、自動的にキーワードにちなんだ名前のフォルダに格納されるようにしてもよい。例えば、抽出されたキーワードが「海」であった場合、記憶媒体５に「海」という名のフォルダが存在しなければそれを作成し、その「海」フォルダにファイルを格納する。以降、「海」がキーワードとして抽出されたファイルは、「海」フォルダに格納される。これによりファイルの分類が自動化できる。 Here, when the image file and the sound file are recorded, they may be automatically stored in a folder named after the keyword. For example, if the extracted keyword is “sea”, if there is no folder named “sea” in the storage medium 5, it is created and the file is stored in the “sea” folder. Thereafter, a file in which “sea” is extracted as a keyword is stored in the “sea” folder. As a result, file classification can be automated.

上記自動キーワード付与モードでは、画像データと音データとでキーワードを生成したが、人物撮影の場合に被写体である人物に言葉を発してもらい、その言葉からキーワードを抽出してファイルに付与することが考えられる。これによって正確なキーワードが得られる。以下、これを可能とするモードを「音声キーワード付与モード」と呼ぶ。 In the automatic keyword assignment mode, keywords are generated from image data and sound data, but in the case of person shooting, it is possible to ask a person who is the subject to speak a word, extract the keyword from the word and attach it to the file. Conceivable. This gives an accurate keyword. Hereinafter, a mode that enables this is referred to as a “voice keyword assignment mode”.

カメラは、音声認識機能、話者認識機能および顔認識機能を備えている。音声認識機能は、人の発声した音声データを解析して発声内容を認識する機能である。話者認識は、音声データから音声（声紋等）の特徴を抽出し、データベースから音声を発した人物を検索する機能である。カメラにはこの種のデータベースが設けられ、複数人分の人物情報と音声特徴とを対応づけて登録できるようになっている。顔認識機能は、画像データ中の人物の顔の位置および大きさを検出するもので、一般的には顔部分にピントや露出を合わせるのに用いられる。 The camera has a voice recognition function, a speaker recognition function, and a face recognition function. The voice recognition function is a function that recognizes the utterance content by analyzing voice data uttered by a person. Speaker recognition is a function that extracts features of voice (voice print, etc.) from voice data and searches a database for a person who utters the voice. This type of database is provided in the camera so that personal information and voice characteristics for a plurality of people can be registered in association with each other. The face recognition function detects the position and size of a person's face in image data, and is generally used to focus and expose the face portion.

図３は音声キーワード付与モード設定時の処理手順を示している。レリーズボタンが全押し操作されるとＣＰＵ６がこのプログラムを起動し、ステップＳ１１で撮像および録音を行い、画像データおよび音声データをそれぞれ取得する。なお、撮影者は、撮像に先立って被写体である人物にキーワードを発声するよう要請しておく。カメラが音声によりキーワードの発声を要請するようにしてもよい。被写体人物は、日付や撮影場所、自分が今何をやっているかなどを発声すればよい。 FIG. 3 shows a processing procedure when the voice keyword assignment mode is set. When the release button is fully pressed, the CPU 6 activates this program and performs image capturing and recording in step S11 to acquire image data and audio data, respectively. Note that the photographer requests the person who is the subject to speak the keyword prior to imaging. The camera may request the utterance of the keyword by voice. The subject person may say the date, shooting location, what he / she is doing now, and so on.

ステップＳ１２では顔認識を行い、ステップＳ１３で顔が認識されたか否かを判定する。肯定されるとステップＳ１４に進み、キーワードの１つとして「ポートレート」を設定する。このモードでは、通常顔が認識される筈であるが、万一認識されない場合はステップＳ２１に進む。 In step S12, face recognition is performed, and it is determined whether or not a face is recognized in step S13. If the determination is affirmative, the process proceeds to step S14, where “portrait” is set as one of the keywords. In this mode, a normal face should be recognized, but if not recognized, the process proceeds to step S21.

ステップＳ１５では、音声認識を行い、ステップＳ１６で認識に成功したと判定されると、ステップＳ１７で音声認識の結果に基づいてキーワードを抽出する。ステップＳ１８では話者認識を行って音声を発声した人物を特定する。換言すれば、被写体人物が誰であるかを特定する。ステップＳ１９で特定に成功したと判定されると、ステップＳ２０で人物情報をキーワードとする。人物情報は上記データベースに登録されているもので、例えばその人物の名前等である。 In step S15, speech recognition is performed. If it is determined in step S16 that the recognition has been successful, keywords are extracted based on the result of speech recognition in step S17. In step S18, speaker recognition is performed to identify the person who uttered the voice. In other words, it identifies who the subject person is. If it is determined in step S19 that the identification is successful, person information is used as a keyword in step S20. The person information is registered in the database and is, for example, the name of the person.

ステップＳ２１では音声データに基づいて音声ファイルを生成し、ステップＳ２２では、ステップＳ１４，Ｓ１７，Ｓ２０等で取得したキーワードをメタデータ領域に記述し、そのメタデータと画像データとを組み合わせて画像ファイルを生成する。なお、画像データ中に顔が認識されない場合はキーワードを取得できないが、この場合は図２に示した方法でキーワードを取得してもよい。 In step S21, an audio file is generated based on the audio data. In step S22, the keywords acquired in steps S14, S17, S20, etc. are described in the metadata area, and the image file is combined with the metadata and the image data. Generate. Note that if a face is not recognized in the image data, the keyword cannot be acquired. In this case, the keyword may be acquired by the method shown in FIG.

画像ファイルに埋め込まれたキーワードは、上述したように検索用のタグとして用いたり、画像ファイルの分類に利用される他、画像再生時のコメントやタイトルとして文字表示することができる。図４はその表示例を示し、（ａ）はいわゆる吹き出しの中に文字表示を行った例を示している。キーワードである「６月１０日」および「苺の収穫」は、いずれも撮像時に画像中の人物が発した言葉である。図４（ｂ）はタイトルとして文字表示を行った例を示す。なお、文字表示における文字数、フォント、文字色、背景色は、ユーザが好みに応じて選択できるようにすることが望ましい。 As described above, the keyword embedded in the image file can be used as a search tag, used for classification of the image file, and can be displayed as a character or a comment at the time of image reproduction. FIG. 4 shows an example of the display, and FIG. 4A shows an example in which characters are displayed in a so-called balloon. The keywords “June 10” and “harvest harvest” are both words uttered by a person in the image at the time of imaging. FIG. 4B shows an example in which characters are displayed as a title. Note that it is desirable that the number of characters, font, character color, and background color in the character display can be selected by the user according to preference.

図５，図６は会話中の複数の人物を撮像・録音したときの画像表示例を示している。かかるケースでは、発声者と発声内容との対応、つまりいずれの人物がいずれの言葉を発したかを上記音声認識と話者認識によって認識し、またカメラ内蔵の時計から時刻を取得することで、誰がいつ何を言ったかをカメラが自動的に把握することができる。把握した情報は、画像ファイルと対応づけられて記録され、画像ファイルを再生したときに、会話内容を議事録風に表示することができる。図５は会話を時系列に並べた例を示し、図６は人物ごとにグループ化した例を示している。両表示形態を簡単な操作にて切換えられるようにすることが望ましい。 5 and 6 show examples of image display when a plurality of persons in conversation are captured and recorded. In such a case, by recognizing the correspondence between the speaker and the content of the utterance, that is, which person uttered which word by the above speech recognition and speaker recognition, and obtaining the time from the camera built-in clock, The camera can automatically grasp who said what and when. The grasped information is recorded in association with the image file, and when the image file is reproduced, the content of the conversation can be displayed like a minutes. FIG. 5 shows an example in which conversations are arranged in a time series, and FIG. 6 shows an example in which each person is grouped. It is desirable that both display modes can be switched by a simple operation.

また、上記人物と会話の対応づけに加え、画像中の各人物がそれぞれ誰であるかが分かれば、画像中のいずれの人物がいずれの言葉を発したか特定することができる。画像中の人物の特定は、顔認識によって各人の顔の特徴情報を抽出し、それに基づいてデータベース検索を行うことで実現できる。カメラにこの種のデータベースを設け、複数人分の人物情報と顔の特徴情報とを対応づけて登録できるようにすればよい。 Further, in addition to associating the person with the conversation, if it is known who each person in the image is, it is possible to specify which person in the image has spoken which word. Identification of a person in an image can be realized by extracting feature information of each person's face by face recognition and performing a database search based on the extracted feature information. This kind of database may be provided in the camera so that personal information and facial feature information for a plurality of people can be registered in association with each other.

そして、例えば画像中の最も左に位置する人物が「Ａ」であることが判明し、一方、上記音声認識と話者認識により、「ＸＸＸＸＸＸ」と言ったのが「Ａ」であることが判明すれば、画像中の最も左に位置する人物Ａが「ＸＸＸＸＸＸ」と言った、ということが把握できる。かかる対応関係を画像ファイルに対応させて記録しておけば、例えば図４（ａ）に示したような吹き出しを用いた画像表示の際に、一番左の人物が「ＸＸＸＸＸＸ」と言い、中央の人物が「ＹＹＹ」と言ったことが分かるように、各吹き出しの位置や向きを設定することができる。吹き出し中の表示を、例えば「Ａ：ＸＸＸＸＸＸ」や、「Ｂ：ＹＹＹ」のようにしてもよい。 Then, for example, the leftmost person in the image is found to be “A”. On the other hand, it is found from the above voice recognition and speaker recognition that “XXXXXX” is “A”. Then, it can be grasped that the person A located at the leftmost position in the image said “XXXXXXX”. If such correspondence is recorded in correspondence with the image file, for example, when displaying an image using a balloon as shown in FIG. 4A, the leftmost person says “XXXXXXX” The position and orientation of each balloon can be set so that the person can say “YYY”. For example, “A: XXXXXX” or “B: YYY” may be displayed in the balloon.

なお、図４〜図６のような表示は、例えば画像ファイルを外部装置（例えば、パーソナルコンピュータ）に取り込んだときに、予めインストールされたプログラムにより実現されるようにしてもよい。また、上記キーワードの作成も外部装置で行えるようにしてもよい。この場合は、外部装置に組み込まれたプログラムが、音付き画像ファイルまたは音声付き画像ファイルを例えばカメラから入力し、メタデータ部から読み込んだ情報に基づいて上述と同様にキーワードを作成するようにすればよい。 4 to 6 may be realized by a program installed in advance when, for example, an image file is taken into an external device (for example, a personal computer). The keyword may be created by an external device. In this case, the program incorporated in the external device inputs an image file with sound or an image file with sound from, for example, a camera, and creates a keyword in the same manner as described above based on information read from the metadata section. That's fine.

以上は静止画ファイルにて説明したが、動画ファイルでもよい。 Although the above has been described with respect to a still image file, a moving image file may be used.

本発明の一実施形態におけるカメラの制御ブロック図。The control block diagram of the camera in one Embodiment of this invention. 自動キーワード付与モード設定時の処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence at the time of automatic keyword provision mode setting. 音声キーワード付与モード設定時の処理手順の一例を示すフローチャート。The flowchart which shows an example of the process sequence at the time of voice keyword provision mode setting. 音声キーワード付与モードで撮像を行った画像の表示例を示す図。The figure which shows the example of a display of the image imaged in the audio | voice keyword provision mode. 音声キーワード付与モードで撮像を行った画像の他の表示例を示す図。The figure which shows the other example of a display of the image imaged in the audio | voice keyword provision mode. 音声キーワード付与モードで撮像を行った画像の他の表示例を示す図。The figure which shows the other example of a display of the image imaged in the audio | voice keyword provision mode.

Explanation of symbols

２撮像素子
３画像処理部
４液晶モニタ
６ＣＰＵ
７マイク
８スピーカ 2 Image sensor 3 Image processing unit 4 Liquid crystal monitor 6 CPU
7 Microphone 8 Speaker

Claims

Imaging means for imaging a subject and generating image data;
Sound acquisition means for acquiring sound data at the time of imaging;
Storage means for storing a first table in which the result of analyzing the color distribution of the image data is associated with a keyword, and a second table in which the result of analyzing the frequency of the sound data is associated with a keyword;
Analyzing the color distribution of the generated image data to extract the corresponding keyword from the first table , analyzing the frequency of the acquired sound data to extract the corresponding keyword from the second table, A determination means for determining a keyword common to each extracted keyword ;
An image pickup apparatus comprising: a recording unit that records the determined keyword in association with the image data .

The imaging device according to claim 1,
The first table stores a result of analyzing a luminance distribution of the image data and a keyword in association with each other.

In the imaging device according to any one of claims 1 and 2,
The image recording apparatus, wherein the recording unit records the image data in a folder corresponding to the determined keyword.

Computer
Means for imaging a subject and generating image data;
Means for acquiring sound data at the time of imaging;
Means for storing a first table associating a result of analyzing the color distribution of the image data with a keyword, and a second table associating a result of analyzing the frequency of the sound data with a keyword;
Analyzing the color distribution of the generated image data to extract the corresponding keyword from the first table, analyzing the frequency of the acquired sound data to extract the corresponding keyword from the second table, Means for determining keywords common to each extracted keyword,
A keyword creation program for causing the determined keyword to function as means for recording the keyword in association with the image data .

In the keyword creation program according to claim 4,
The keyword creating program, wherein the recording means records the image data in a folder corresponding to the determined keyword.