JP2011171921A

JP2011171921A - Digital camera

Info

Publication number: JP2011171921A
Application number: JP2010032617A
Authority: JP
Inventors: Yosuke Kono; 洋介河野; Tetsuo In; 哲生因; Masanaga Nakamura; 正永中村; Takuya Aihara; 卓也相原
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2010-02-17
Filing date: 2010-02-17
Publication date: 2011-09-01

Abstract

<P>PROBLEM TO BE SOLVED: To make a voice uttered to an object person become the one suitable for the person. <P>SOLUTION: A face expression detection part 22 obtains object character information, for example, face expression information detected by a face detection part 21. A voice data selection part 25 selects, on the basis of the face expression information, voice data suitable for the face expression information from a plurality of pieces of voice data stored in a voice data storage part 35, and sends the selected voice data to a speaker 43 in accordance with the operation of a release switch by an operation part 41. Voice data such as "Smile, more" or "Do not cry and Say cheese" is reproduced from the speaker 43. Besides, as the object character information, there are age/gender information and character registration information. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、デジタルカメラに係り、特に画像中の被写体人物を検出する機能を有するデジタルカメラに関する。 The present invention relates to a digital camera, and more particularly to a digital camera having a function of detecting a subject person in an image.

被写体人物の笑顔を撮影したい場合に、被写体人物に所定の単語、例えば、比較的ポピュラーな「はい、チーズ」を発声するように案内し、発声を認識した後に、予め設定したタイミングで撮影を行う人物撮影装置が知られている（例えば、特許文献１参照）。
この人物撮影装置は、例えば証明写真撮影システムなどに使用されることが好適なものであり、被写体人物に対して予め定められた所定の音声でもって撮影手順などを説明する。 When you want to shoot a smile on the subject person, guide the subject person to utter a predetermined word, for example, “yes, cheese”, which is relatively popular. After recognizing the utterance, shoot at a preset timing. A person photographing apparatus is known (see, for example, Patent Document 1).
This person photographing apparatus is suitable for use in, for example, an ID photo photographing system, and a photographing procedure and the like will be described with a predetermined sound predetermined for a subject person.

特開２００８−２６６０４号公報JP 2008-26604 A

一般のデジタルカメラは、被写体人物の色々な表情、例えば、笑い顔や泣き顔や怒り顔や驚き顔などの表情を撮影する場合があり、被写体人物の顔表情に無関係に常に一定の音声を被写体人物に発することは、全く無意味であり、むしろ好ましくない場合がある。 A general digital camera may shoot various facial expressions of a subject person, such as a laughing face, a crying face, an angry face, or a surprised face. It is completely meaningless and may not be preferable.

請求項１の発明によるデジタルカメラは、被写体人物を撮像して、画像データを作成する撮像手段と、撮影動作を開始させる撮影手段と、撮影準備段階において撮像手段が撮像した画像データ中の被写体人物画像を検出し、被写体人物情報を取得する人物情報取得手段と、複数の音声データを記憶する音声データ記憶手段と、人物情報取得手段が取得した被写体人物情報に基づき、音声データ記憶手段に記憶されている複数の音声データの中から所定の音声データを選択する音声データ選択手段と、撮影手段による撮影動作に先立って、選択された音声データを再生する音声再生手段と、を備えることを特徴とする。 According to a first aspect of the present invention, there is provided a digital camera that captures an image of a subject person and creates image data, an image capturing unit that starts an image capturing operation, and a subject person in image data captured by the image capturing unit in the image capturing preparation stage. Person information acquisition means for detecting an image and acquiring subject person information, sound data storage means for storing a plurality of sound data, and subject data acquired by the person information acquisition means are stored in the sound data storage means. Voice data selection means for selecting predetermined voice data from a plurality of voice data, and voice playback means for playing back the selected voice data prior to a shooting operation by the shooting means, To do.

本発明のデジタルカメラによれば、被写体人物を検出して、その検出した被写体人物情報に応じた音声データを選択して、撮影前の被写体人物の顔表情情報などの被写体人物情報に適した音声を発することができる。 According to the digital camera of the present invention, a subject person is detected, voice data corresponding to the detected subject person information is selected, and voice suitable for subject person information such as facial expression information of the subject person before photographing is selected. Can be issued.

本発明の実施の形態に係るデジタルカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the digital camera which concerns on embodiment of this invention.

以下、本発明の実施の形態によるデジタルカメラについて、図面を参照しながら説明する。
図１に示されるように、デジタルカメラは、撮影レンズ１１、絞り１２、撮像素子１３、バッファメモリ１４および画像処理部１５を備える。また、デジタルカメラは、画像記録部１６、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１７、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１８、バス１９、操作部４１、ディスプレイ４２、スピーカー４３および通信部４４を備える。 Hereinafter, a digital camera according to an embodiment of the present invention will be described with reference to the drawings.
As shown in FIG. 1, the digital camera includes a photographic lens 11, a diaphragm 12, an image sensor 13, a buffer memory 14, and an image processing unit 15. The digital camera also includes an image recording unit 16, a CPU (Central Processing Unit) 17, a ROM (Read Only Memory) 18, a bus 19, an operation unit 41, a display 42, a speaker 43, and a communication unit 44.

画像処理部１５、画像記録部１６、ＣＰＵ１７、ＲＯＭ１８、操作部４１、ディスプレイ４２、スピーカー４３および通信部４４は、バス１９を介して互いに接続されている。
ＣＰＵ１７は、顔検出部２１、顔表情検出部２２、年代／性別推論部２３、顔認識部２４、音声データ選択部２５、効果評価部２６およびランキング部２７として機能する。
また、ＣＰＵ１７は、撮影手段である撮影レンズ１１、絞り１２、撮像素子１３などを制御する。
ＲＯＭ１８は、顔データ記憶部３１、顔表情データ記憶部３２、年代／性別データ記憶部３３、顔登録データ記憶部３４および音声データ記憶部３５として機能する。 The image processing unit 15, the image recording unit 16, the CPU 17, the ROM 18, the operation unit 41, the display 42, the speaker 43, and the communication unit 44 are connected to each other via the bus 19.
The CPU 17 functions as a face detection unit 21, facial expression detection unit 22, age / gender reasoning unit 23, face recognition unit 24, voice data selection unit 25, effect evaluation unit 26, and ranking unit 27.
The CPU 17 controls the photographing lens 11, the diaphragm 12, the image sensor 13, and the like that are photographing means.
The ROM 18 functions as a face data storage unit 31, a facial expression data storage unit 32, an age / sex data storage unit 33, a face registration data storage unit 34, and an audio data storage unit 35.

撮影レンズ１１は、ズームレンズやフォーカスレンズを含む複数のレンズで構成され、被写体像を撮像素子１３上に結像する。図１では簡単のため、撮影レンズ１１は１枚のレンズで示されている。撮像素子１３は、撮影レンズ１１からの被写体光Ｌ１を光電変換することにより画像信号を生成する。 The photographing lens 11 includes a plurality of lenses including a zoom lens and a focus lens, and forms a subject image on the image sensor 13. In FIG. 1, for the sake of simplicity, the taking lens 11 is shown as a single lens. The imaging element 13 generates an image signal by photoelectrically converting the subject light L1 from the photographing lens 11.

撮像素子１３から出力される画像信号は、バッファメモリ１４を介して画像処理部１５に送られ、ここで所定の種々の画像処理が施される。
撮影開始前の段階、即ち撮影準備段階では、撮像素子１３からの画像信号は、バッファメモリ１４、画像処理部１５を経てバス１９を介してディスプレイ４２に送られ、スルー画像として表示される。
撮影段階では、撮像素子１３からの画像信号は、バッファメモリ１４、画像処理部１５を経てバス１９を介して画像記録部１６の不揮発性のメモリ（記憶媒体）１６ａに記録される。 The image signal output from the image pickup device 13 is sent to the image processing unit 15 via the buffer memory 14, where predetermined various image processing is performed.
At the stage before shooting is started, that is, at the shooting preparation stage, the image signal from the image sensor 13 is sent to the display 42 via the buffer 19 via the buffer memory 14 and the image processing unit 15 and displayed as a through image.
At the photographing stage, the image signal from the image sensor 13 is recorded in a non-volatile memory (storage medium) 16 a of the image recording unit 16 via the bus 19 via the buffer memory 14 and the image processing unit 15.

顔検出部２１は、スルー画像用の被写体画像データを顔データ記憶部３１に予め記憶されている顔データと比較することにより、画像中の被写体人物の顔を検出する。顔データ記憶部３１は、例えば、眉、眼、鼻、唇の形状に関する特徴点のデータを記憶している。 The face detection unit 21 detects the face of the subject person in the image by comparing the subject image data for the through image with the face data stored in advance in the face data storage unit 31. The face data storage unit 31 stores, for example, feature point data relating to the shape of eyebrows, eyes, nose, and lips.

顔検出には、例えば、特開２００１−１６５７３号公報に開示されている検出手法を用いることができる。この検出手法は、入力画像中から特徴点を抽出して被写体の顔領域、顔の大きさ等を検出するものである。特徴点としては、眉、眼、鼻、唇の各端点、および顔の輪郭点、例えば頭頂点や顎の下端点が挙げられる。 For the face detection, for example, a detection method disclosed in Japanese Patent Laid-Open No. 2001-16573 can be used. In this detection method, feature points are extracted from an input image to detect the face area, face size, etc. of the subject. The feature points include end points of eyebrows, eyes, nose, lips, and face contour points, such as the head vertex and the lower end point of the chin.

他の顔検出としては、例えば、特開２００５−１５７６７９号公報に開示されている検出手法を用いることができる。この検出手法は、先ず、入力画像中の２画素間の輝度差を特徴量として学習しておき、その特徴量に基づいて入力画像中の所定領域に顔が存在するか否かを示す推定値を算出し、推定値が１以上のときにその所定領域に顔が存在すると判別するものである。 As other face detection, for example, a detection method disclosed in JP-A-2005-157679 can be used. In this detection method, first, a luminance difference between two pixels in an input image is learned as a feature amount, and an estimated value indicating whether or not a face exists in a predetermined region in the input image based on the feature amount. And when the estimated value is 1 or more, it is determined that a face exists in the predetermined area.

顔表情検出部２２は、顔検出部２１により検出された顔領域の画像の顔表情を顔表情データ記憶部３２に予め記憶されている複数の顔表情データと比較することにより、顔画像中の被写体人物の顔表情の種類を検出する。
顔表情には、笑顔、泣き顔、怒り顔、驚き顔などの様々な種類があり、顔表情データ記憶部３２は、これらの様々な顔表情をデータとして記憶している。 The facial expression detection unit 22 compares the facial expression of the image of the face area detected by the face detection unit 21 with a plurality of facial expression data stored in advance in the facial expression data storage unit 32, thereby The type of facial expression of the subject person is detected.
There are various types of facial expressions such as a smiling face, a crying face, an angry face, and a surprised face, and the facial expression data storage unit 32 stores these various facial expressions as data.

顔表情検出部２２が顔領域の画像の顔表情を検出する場合には、例えば、特開２００８−４２３１９号公報に開示されている検出手法を用いることができる。この検出手法は、例えば笑顔であることを検出するには、検出された画像の顔が笑顔と通常時の顔という２つの顔表情のいずれに近いかに基づいて表情の種類を判断するものである。顔表情検出部２２は、被写体人物の顔表情の種類に対応する検出信号を出力する。 When the facial expression detection unit 22 detects a facial expression of an image in the face area, for example, a detection method disclosed in Japanese Patent Application Laid-Open No. 2008-42319 can be used. In this detection method, for example, in order to detect a smile, the type of facial expression is determined based on whether the face of the detected image is close to the two facial expressions of a smile and a normal face. . The facial expression detection unit 22 outputs a detection signal corresponding to the type of facial expression of the subject person.

年代／性別推論部２３は、顔検出部２１により検出された顔領域の画像に基づき被写体人物の年代／性別を推論する。すなわち、年代／性別推論部２３は、被写体人物の顔画像中の各特徴点と、年代／性別データ記憶部３３に予め記憶されている顔の各特徴点とについて、各特徴点が持つ特徴量や色彩データなどを比較して被写体人物の年代、性別を推論する。そして、年代／性別推論部２３は、推論結果、例えば、被写体人物の年代は幼児で性別は男子や、年代は２０歳代で性別は女性と言った推論データを出力する。
年代／性別データ記憶部３３は、男女それぞれの年代別に、例えば、眉、眼、鼻、唇などの特徴点が持つ特徴量（寸法、間隔）のデータおよび色彩データを記憶している。 The age / sex reasoning unit 23 infers the age / sex of the subject person based on the image of the face area detected by the face detection unit 21. That is, the age / gender reasoning unit 23 uses the feature amount of each feature point for each feature point in the face image of the subject person and each feature point of the face stored in advance in the age / sex data storage unit 33. Inferring age and gender of subject person by comparing color and color data. Then, the age / sex reasoning unit 23 outputs the reasoning result, for example, inference data that the age of the subject person is an infant and the sex is a boy, and the age is a 20's and the sex is a woman.
The age / sex data storage unit 33 stores, for each age of each gender, data of feature amounts (dimensions, intervals) and color data possessed by feature points such as eyebrows, eyes, nose and lips.

年代／性別を推論する方法としては、例えば、特開２００４−２２２１１８号公報に開示されている方法を用い、上記の特徴量をサポートベクタマシンというシステムに入力することによって行う。 As a method for inferring age / sex, for example, a method disclosed in Japanese Patent Application Laid-Open No. 2004-222118 is used, and the above-described feature amount is input to a system called a support vector machine.

顔認識部２４は、事前に撮影した被写体人物の顔画像の特徴情報を顔登録データとして顔登録データ記憶部３４に記録する人物登録機能と、新たに撮影する被写体人物の顔画像が顔登録データ記憶部３４に記録された顔登録データに対応するかを識別する人物認識機能とを有する。
これらの機能を詳述すると、顔認識部２４は、事前に撮影した人物の顔画像の特徴を分析して、それを顔登録データとして顔登録データ記憶部３４に記録する。この際に、この人物の氏名情報や年代情報や性別情報もこの顔登録データに関連付けて、顔登録データ記憶部３４に記録する。こうして、被写体人物の登録が行われる。
なお、この氏名情報と年代情報と性別情報は、図示を省略した情報入力部から撮影者によって入力される。 The face recognition unit 24 has a person registration function for recording the feature information of the face image of the subject person photographed in advance as face registration data in the face registration data storage unit 34, and the face image of the subject person to be newly photographed is the face registration data. It has a person recognition function for identifying whether it corresponds to the face registration data recorded in the storage unit 34.
In detail, the face recognition unit 24 analyzes the characteristics of the face image of a person photographed in advance and records it as face registration data in the face registration data storage unit 34. At this time, the name information, age information, and gender information of the person are also associated with the face registration data and recorded in the face registration data storage unit 34. In this way, the subject person is registered.
The name information, age information, and gender information are input by the photographer from an information input unit (not shown).

また、顔認識部２４は、顔検出部２１が検出した被写体人物の顔画像と顔登録データ記憶部３４に記録されている顔登録データとを比較して、被写体人物が登録済みの人物、即ち登録人物であるか否かを判別する。
なお、顔登録データ記憶部３４に記憶される顔登録データは、登録人物の顔画像の顔情報、例えば眉や眼や鼻や唇などの特徴点に関する情報である。 In addition, the face recognition unit 24 compares the face image of the subject person detected by the face detection unit 21 with the face registration data recorded in the face registration data storage unit 34, that is, the person who has the subject person registered, that is, It is determined whether or not the person is a registered person.
Note that the face registration data stored in the face registration data storage unit 34 is information on face information of a registered person's face image, for example, feature points such as eyebrows, eyes, nose, and lips.

音声データ選択部２５は、被写体人物に関する情報に基づき、音声データ記憶部３５に記録されている音声データを選択し、選択した音声データをスピーカー４３により音声として再生させる。具体的には、音声データ選択部２５は、顔表情検出部２２が検出した顔表情の種類、年代／性別推論部２３が推論した年代／性別の情報、顔認識部２４が認識した登録人物の情報などに応じて、再生すべき音声データを選択する。 The audio data selection unit 25 selects the audio data recorded in the audio data storage unit 35 based on the information related to the subject person, and causes the speaker 43 to reproduce the selected audio data as audio. Specifically, the voice data selection unit 25 detects the type of facial expression detected by the facial expression detection unit 22, age / gender information inferred by the age / gender inference unit 23, and the registered person recognized by the face recognition unit 24. Audio data to be reproduced is selected according to the information.

音声データ選択部２５は、選択した音声データを、撮影動作の少し前、例えば、撮影動作の１秒前にスピーカー４３に送出して再生する。具体的には、後述の操作部４１のレリーズスイッチが操作されると、音声データ選択部２５が選択済みの音声データをスピーカー４３に送り再生して、その再生の所定時間後に撮影動作が開始される。 The audio data selection unit 25 sends the selected audio data to the speaker 43 for playback slightly before the shooting operation, for example, 1 second before the shooting operation. Specifically, when a release switch of the operation unit 41 to be described later is operated, the audio data selection unit 25 sends the selected audio data to the speaker 43 for reproduction, and the photographing operation is started after a predetermined time of the reproduction. The

音声データ記憶部３５には、撮影時に発声する種々の音声データが記録されている。例えば、「ハイ、笑ってぇ」、「もっと笑ってぇ」、「スマイル！」、「微笑んで」、「そんなに怒らないの」、「泣かないで、チーズ」、「ニコニコして」、「爆笑」、「笑いましょう」、「Ａちゃん、ニコニコして」、「Ｂ君、爆笑」、「Ｃさん、笑いましょう」、「Ｄさん、Ｅさん、ハイ、チーズ」、「皆さん、一斉に笑いましょう」等の音声データが音声データ記憶部３５に記録されている。
なお、「Ａちゃん、ニコニコして」、「Ｂ君、爆笑」、「Ｃさん、笑いましょう」、「Ｄさん、Ｅさん、ハイ、チーズ」の音声データ中の「Ａちゃん」、「Ｂ君」、「Ｃさん」、「Ｄさん」、「Ｅさん」は、顔登録データ記憶部３４に記憶された登録人物の名前である。これらの名前を含む音声データ「Ａちゃん、ニコニコして」、「Ｂ君、爆笑」、「Ｃさん、笑いましょう」、「Ｄさん、Ｅさん、ハイ、チーズ」は、登録人物の氏名情報と年代情報と性別情報とが顔登録データ記憶部３４に記憶される際に不図示の音声データ作成部によって自動的に作成され、音声データ記憶部３５に記憶される。
また、音声データは、アニメソングなどの歌声であってもよく、更には、人間の音声に限るものでなく、犬や猫などの鳴き声であってもよい。 The audio data storage unit 35 stores various audio data uttered at the time of shooting. For example, “High, laugh”, “Smile more”, “Smile!”, “Smile”, “Don't get so angry”, “Don't cry, cheese”, “Smile,” “Lol” "Let's laugh", "A-chan, smile", "B-kun, laughter", "C-Let's laugh", "D-san, E-san, High, Cheese", "Laughter all together Voice data such as “Sho” is recorded in the voice data storage unit 35.
"A-chan, smile", "B-kun, laughter", "C-san, let's laugh", "D-san, E-san, Hai, cheese""A-chan","B “Kimi”, “Mr. C”, “Mr. D”, and “Mr. E” are names of registered persons stored in the face registration data storage unit 34. Voice data including these names, “A-chan, smile”, “B-kun, laughter”, “C-san, let's laugh”, “D-san, E-san, Hai, Cheese” are registered name information When age information and gender information are stored in the face registration data storage unit 34, they are automatically created by a voice data creation unit (not shown) and stored in the voice data storage unit 35.
The voice data may be a singing voice such as an animation song, and is not limited to a human voice, and may be a crying voice of a dog or a cat.

音声データは、本実施の形態のデジタルカメラの製造段階において、多数の音声データが音声データ記憶部３５に記憶されると共に、顔登録データ記憶部３４への被写体人物の登録時にその登録人物の名前情報を含んだ音声データが音声データ記憶部３５に追加記憶され、更には、後述の通信部４４によって外部から入力された音声データが音声データ記憶部３５に追加記憶される。 In the manufacturing stage of the digital camera of the present embodiment, a large number of audio data is stored in the audio data storage unit 35, and the name of the registered person is registered when the subject person is registered in the face registration data storage unit 34. Audio data including information is additionally stored in the audio data storage unit 35, and further, audio data input from the outside by the communication unit 44 described later is additionally stored in the audio data storage unit 35.

例えば、顔表情検出部２２が被写体人物の笑顔を検出して笑顔情報を取得した場合は、音声データ選択部２５は、より良い笑顔を誘うために例えば「もっと笑ってぇ」という音声データを選択し、また、顔表情検出部２２が被写体人物の非笑顔を検出した場合は、音声データ選択部２５は、笑いを誘発するために例えば「はい、笑ってぇー」という音声データを選択する。 For example, when the facial expression detection unit 22 detects the smile of the subject person and acquires smile information, the audio data selection unit 25 selects, for example, audio data “more laugh” to invite a better smile. When the facial expression detection unit 22 detects the non-smile of the subject person, the voice data selection unit 25 selects, for example, voice data “Yes, laugh” to induce laughter.

年代／性別推論部２３が、被写体が例えば幼児であると推論した場合には、音声データ選択部２５は、例えば幼児語の「ニコニコして」や犬の鳴き声「ワン、ワン」の音声データを選択し、年代／性別推論部２３が、被写体が例えば若い男性、又は若い女性であると推論した場合には、音声データ選択部２５は、それぞれ「爆笑」、又は「笑いましょう」や「微笑んで」の音声データを選択する。 When the age / sex reasoning unit 23 infers that the subject is, for example, an infant, the voice data selection unit 25 uses, for example, the voice data of the infant word “Niko Nikote” or the dog's cry “One, One”. When the age / gender reasoning unit 23 selects and infers that the subject is, for example, a young man or a young woman, the voice data selection unit 25 selects “LOL”, “Let's laugh” or “Smile”, respectively. Select the voice data of “In”.

顔認識部２４が、被写体人物が顔登録データ記憶部３４に登録されている登録人物であると認識した場合には、音声データ選択部２５は、その人物の名前を含んだ「Ａちゃん、ニコニコして」、「Ｂ君、爆笑」、「Ｃさん、笑いましょう」、「Ｄさん、Ｅさん、ハイ、チーズ」などの音声データを選択する。 When the face recognition unit 24 recognizes that the subject person is a registered person registered in the face registration data storage unit 34, the voice data selection unit 25 includes “A-chan, Nico Nico” including the name of the person. ”,“ B-kun, laughter ”,“ C-san, let's laugh ”,“ D-san, E-san, High, Cheese ”, etc. are selected.

効果評価部２６は、撮影した撮影画像データの被写体人物の顔表情がその撮影時の再生音声データの内容を反映した顔表情に成っているか否かを評価する。具体的には、顔表情検出部２２が撮影画像データの被写体人物の顔表情を検出すると、効果評価部２６は、その検出された被写体人物の顔表情が撮影時の再生音声データの内容を反映した表情になっているか否かを評価する。例えば、撮影前に再生された再生音声データが「ニコニコして」であった場合に、撮影された画像の顔表情が笑い顔であったか否かを判断し、笑い顔であった場合には音声再生の効果があったと評価し、この音声データに得点１を与える。 The effect evaluation unit 26 evaluates whether the facial expression of the subject person in the captured image data is a facial expression that reflects the content of the reproduced audio data at the time of shooting. Specifically, when the facial expression detection unit 22 detects the facial expression of the subject person in the photographed image data, the effect evaluation unit 26 reflects the content of the reproduced audio data at the time of photographing. Evaluate whether the facial expression is correct. For example, if the playback audio data that was played before shooting was “smiley”, determine whether the facial expression of the shot image was a laughing face, and if it was a laughing face, It is evaluated that there is a reproduction effect, and a score of 1 is given to this audio data.

ランキング部２７は、効果評価部２６の評価結果に基づき、音声データ記憶部３５に記録された複数の音声データに対してランキングを付与する。具体的には、ランキング部２７は、効果評価部２６が音声データに付与した得点を音声データ毎に加算してこの加算結果から音声データのランキングを決める。 The ranking unit 27 gives a ranking to a plurality of audio data recorded in the audio data storage unit 35 based on the evaluation result of the effect evaluation unit 26. Specifically, the ranking unit 27 adds the score given to the audio data by the effect evaluation unit 26 for each audio data, and determines the ranking of the audio data from the addition result.

操作部４１は、デジタルカメラの電源スイッチ、撮影動作を開始するレリーズスイッチ、撮影画像を再生する再生スイッチ、撮影条件モードなどを選択・設定する選択／設定スイッチなどを含み、撮影者が各スイッチを操作すると、その操作に対応する操作信号がＣＰＵ１７へ出力される。例えば、操作部４１の選択機能としては、顔表情検出部２２、年代／性別推論部２３又は顔認識部２４などを選択して作動させる。 The operation unit 41 includes a power switch for the digital camera, a release switch for starting a shooting operation, a playback switch for playing back a captured image, a selection / setting switch for selecting and setting a shooting condition mode, and the like. When operated, an operation signal corresponding to the operation is output to the CPU 17. For example, as the selection function of the operation unit 41, the facial expression detection unit 22, the age / gender reasoning unit 23, the face recognition unit 24, or the like is selected and operated.

ディスプレイ４２は、スルー画像を表示し、メモリ１６ａに保存されている画像データに基づく再生画像を表示し、また、操作メニューなどを表示する。
スピーカー４３は、音声データ記憶部３５に記録されている音声データを音声として再生する。
通信部４４は、通信ネットワークを介して外部のサーバー１００と接続され、サーバー１００から必要に応じて音声データをダウンロードし、ダウンロードされた音声データは、音声データ記憶部３５に格納される。 The display 42 displays a through image, displays a reproduced image based on the image data stored in the memory 16a, and displays an operation menu and the like.
The speaker 43 reproduces the sound data recorded in the sound data storage unit 35 as sound.
The communication unit 44 is connected to an external server 100 via a communication network, downloads audio data from the server 100 as necessary, and the downloaded audio data is stored in the audio data storage unit 35.

以上のように構成されたデジタルカメラの動作について説明する。
−顔表情検出モード−
先ず、顔表情に基づいて、それに相応しい音声データを選択する場合を説明する。
操作者が操作部４１によって顔表情検出モードを選択すると、顔検出部２１と顔表情検出部２２が起動する。
顔検出部２１は、スルー画像データ中の被写体人物画像を顔データ記憶部３１の顔データと比較して、被写体人物の顔部分を検出する。顔表情検出部２２は、顔検出部２１によって検出された被写体人物の顔表情を顔表情データ記憶部３２の顔表情データと比較して、被写体人物の顔表情が「笑顔」又は「泣き顔」又は「怒り顔」などの何れであるかを検出して、笑い顔情報、泣き顔情報、怒り顔情報などを取得する。 The operation of the digital camera configured as described above will be described.
-Facial expression detection mode-
First, a description will be given of a case in which sound data appropriate for a facial expression is selected.
When the operator selects the facial expression detection mode using the operation unit 41, the face detection unit 21 and the facial expression detection unit 22 are activated.
The face detection unit 21 compares the subject person image in the through image data with the face data in the face data storage unit 31 to detect the face part of the subject person. The facial expression detection unit 22 compares the facial expression of the subject person detected by the face detection unit 21 with the facial expression data of the facial expression data storage unit 32, and the facial expression of the subject person is “smile” or “crying face” or It is detected whether it is “angry face” or the like, and laughing face information, crying face information, angry face information, etc. are acquired.

音声データ選択部２５は、顔表情検出部２２からの顔表情情報に基づき、音声データ記憶部３５に記憶されている複数の音声データから、その顔表情情報に適した音声データを選択する。例えば、顔表情検出部２２が取得した顔表情情報が、例えば微笑み顔、又は泣き顔を表す場合に、音声データ「もっと笑ってぇ」又は音声データ「泣かないで、チーズ」をそれぞれ選択する。 The voice data selection unit 25 selects voice data suitable for the facial expression information from a plurality of voice data stored in the voice data storage unit 35 based on the facial expression information from the facial expression detection unit 22. For example, when the facial expression information acquired by the facial expression detection unit 22 represents, for example, a smiling face or a crying face, the voice data “Make me laugh more” or the voice data “Do not cry, cheese” is selected.

その後に操作部４１のレリーズスイッチが操作されると、音声データ選択部２５はこのレリーズスイッチ操作に応じて選択音声データをスピーカー４３に送り、これによって、「もっと笑ってぇ」又は「泣かないで、チーズ」などの音声データが再生される。
この音声データの再生後、所定時間後に、撮影動作が開始され、撮像素子１３からの撮像信号が画像処理部１５で処理された後に画像記録部１６によってメモリ１６ａに記憶される。 Thereafter, when the release switch of the operation unit 41 is operated, the audio data selection unit 25 sends the selected audio data to the speaker 43 in accordance with the operation of the release switch, so that “laugh more” or “do not cry, Audio data such as “Cheese” is reproduced.
A photographing operation is started after a predetermined time after reproduction of the audio data, and the image signal from the image sensor 13 is processed by the image processing unit 15 and then stored in the memory 16a by the image recording unit 16.

顔表情検出部２２はこの撮影された被写体画像の顔表情を検出し、効果評価部２６は、この検出された撮影被写体画像の顔表情が音声データの再生効果によって「笑い顔」になっているか、即ち、撮影前の「微笑み顔」又は「泣き顔」から撮影時の「笑い顔」に変化したかを判別し、音声データの効果を評価する。
効果評価部２６は「笑い顔」に変化したと判断した場合には、この音声データに得点１を付与し、「笑い顔」になっていないと判断した場合には、得点１を付与しない。 The facial expression detection unit 22 detects the facial expression of the photographed subject image, and the effect evaluation unit 26 determines whether the detected facial expression of the photographed subject image is a “laughing face” due to the reproduction effect of the audio data. That is, it is determined whether the “smiling face” or “crying face” before photographing is changed to the “laughing face” at photographing, and the effect of the voice data is evaluated.
When the effect evaluation unit 26 determines that the voice data has changed to “laughing face”, the score 1 is assigned to the voice data, and when it is determined that the voice data is not “laughing face”, the score 1 is not assigned.

ランキング部２７は、効果評価部２６が音声データに付与した得点を、音声データ毎に加算して音声データのランキングを決定する。
このランキングは、操作部４１の選択スイッチの操作に応じて、ディプレイ４２に表示される。このランキング表示は、例えば、１位「もっと笑ってぇ」１５点、２位「泣かないで、チーズ」１４点、３位「ニコニコして」１０点などのように行われる。 The ranking unit 27 adds the score given to the audio data by the effect evaluation unit 26 for each audio data, and determines the ranking of the audio data.
This ranking is displayed on the display 42 according to the operation of the selection switch of the operation unit 41. This ranking display is performed, for example, as follows: 1st place “more laugh” 15 points, 2nd place “Don't cry, cheese” 14 points, 3rd place “smiley” 10 points.

−年代／性別推論モード−
次に、年代、性別の推論に基づいて、それに相応しい音声データを選択する場合を説明する。
操作者が操作部４１によって年代／性別推論モードを選択すると、これによって顔検出部２１と年代／性別推論部２３が起動する。
顔検出部２１が、スルー画像データ中の被写体人物の顔画像を検出すると、年代／性別推論部２３は、顔検出部２１により検出された顔領域の画像に基づき被写体人物の年代／性別を推論し、被写体人物の年代や性別に関する情報を取得する。
年代／性別推論部２３は、被写体人物を例えば、「幼児」又は「２０歳代の女性」であると推論して、「幼児」情報又は「２０歳代の女性」情報を取得する。 -Age / gender inference mode-
Next, a description will be given of a case where sound data suitable for the age and gender is selected.
When the operator selects the age / gender inference mode using the operation unit 41, the face detection unit 21 and the age / gender inference unit 23 are thereby activated.
When the face detection unit 21 detects the face image of the subject person in the through image data, the age / sex reasoning unit 23 infers the age / sex of the subject person based on the image of the face area detected by the face detection unit 21. Then, information on the age and sex of the subject person is acquired.
The age / gender reasoning unit 23 infers that the subject person is, for example, “infant” or “female in his 20s”, and acquires “infant” information or “female in his 20s” information.

音声データ選択部２５は、年代／性別推論部２３からの年代、性別情報に基づき、音声データ記憶部３５に記憶されている複数の音声データから、その年代、性別情報に適した音声データを選択する。
例えば、年代／性別推論部２３が取得した年代、性別情報が「幼児」又は「２０歳代の女性」を表す場合に、音声データ「ニコニコして」又は音声データ「微笑んで」をそれぞれ選択する。また、「幼児」の場合には、犬や猫の鳴き声を選択するようにしても良い。 The voice data selection unit 25 selects voice data suitable for the age and gender information from a plurality of voice data stored in the voice data storage unit 35 based on the age and gender information from the age / gender reasoning unit 23. To do.
For example, when the age and gender information acquired by the age / sex reasoning unit 23 represents “infant” or “female in the 20s”, the voice data “smiley” or the voice data “smile” is selected. . In the case of an “infant”, a dog or cat cry may be selected.

操作部４１のレリーズスイッチが操作されると、音声データ選択部２５はこのレリーズスイッチ操作に応じて選択音声データをスピーカー４３に送り、これによって「ニコニコして」又は「微笑んで」などが再生される。
この音声データの再生後、所定時間後に、撮影動作が開始され、撮像素子１３からの撮像信号が画像処理部１５で処理された後に画像記録部１６によってメモリ１６ａに記憶される。 When the release switch of the operation unit 41 is operated, the voice data selection unit 25 sends the selected voice data to the speaker 43 in accordance with the release switch operation, thereby reproducing “smiley” or “smile”. The
A photographing operation is started after a predetermined time after reproduction of the audio data, and the image signal from the image sensor 13 is processed by the image processing unit 15 and then stored in the memory 16a by the image recording unit 16.

顔表情検出部２２は、この撮影された被写体画像の顔表情を検出し、効果評価部２６は、この検出された撮影被写体画像の顔表情が音声データの再生効果によって、幼児が「笑顔」になっているか、又は「２０歳代の女性」が「微笑み顔」になっているかを判別し、音声データの効果を評価する。
効果評価部２６は効果があったと判断した場合には、この音声データに得点１を付与し、効果が無いと判断した場合には、得点１を付与しない。
ランキング部２７は、効果評価部２６が音声データに付与した得点を、音声データ毎に加算して音声データのランキングを決定する。 The facial expression detection unit 22 detects the facial expression of the photographed subject image, and the effect evaluation unit 26 turns the detected facial expression of the photographed subject image into a “smile” by the reproduction effect of the audio data. Or “a woman in her 20s” has a “smiling face”, and the effect of the voice data is evaluated.
When the effect evaluation unit 26 determines that there is an effect, the score 1 is assigned to the voice data, and when it is determined that there is no effect, the score 1 is not provided.
The ranking unit 27 adds the score given to the audio data by the effect evaluation unit 26 for each audio data, and determines the ranking of the audio data.

上述の顔表情検出モードと年代／性別推論モードとを組み合わせることもできる。この組み合わせモードの場合には、顔表情検出部２２が検出した被写体人物の顔表情と年代／性別推論部２３が推論した被写体人物の年代や性別とに基づき、音声データ選択部２５は音声データ記憶部３５から音声データを選択する。 It is also possible to combine the facial expression detection mode described above and the age / gender inference mode. In this combination mode, the voice data selection unit 25 stores the voice data based on the facial expression of the subject person detected by the facial expression detection unit 22 and the age and sex of the subject person inferred by the age / sex reasoning unit 23. Audio data is selected from the unit 35.

−人物登録＋顔表情検出モード−
操作者が操作部４１によって人物登録＋顔表情検出モードを選択すると、顔検出部２１と顔表情検出部２２と顔認識部２４とが起動する。
先ず、顔登録データの作成を説明する。
人物登録したい被写体人物を撮影してその人物の顔画像データを作成する。顔認識部２４は、このように事前に撮影した人物の顔画像の特徴を分析して、それを顔登録データとして顔登録データ記憶部３４に記録する。この際に、図示を省略した情報入力部によってこの人物の氏名情報や年代情報や性別情報などを入力して、これらの氏名情報や年代情報や性別情報などを顔登録データに関連付けて顔登録データ記憶部３４に記録する。 -Person registration + facial expression detection mode-
When the operator selects the person registration + facial expression detection mode using the operation unit 41, the face detection unit 21, the facial expression detection unit 22, and the face recognition unit 24 are activated.
First, creation of face registration data will be described.
A subject person to be registered as a person is photographed and face image data of the person is created. The face recognition unit 24 analyzes the characteristics of the face image of the person photographed in advance in this way, and records it as face registration data in the face registration data storage unit 34. At this time, the person's name information, age information, gender information, etc. are input by the information input unit (not shown), and the name registration information, age information, gender information, etc. are associated with the face registration data. Record in the storage unit 34.

これらの顔登録データ及び氏名情報や年代情報や性別情報の顔登録データ記憶部３４への記録に応じて、この顔登録データに関連する名前情報を含む音声データを作成し、この音声データをそれの顔登録データに関連付けて、音声データ記憶部３５に記憶する。
以上の動作を繰り返すことにより、複数人の顔登録データとその氏名情報などとを顔登録データ記憶部３４に記録すると共に、それらの顔登録データに関連する音声データが音声データ記憶部３５に記憶される。 In response to recording of the face registration data, name information, age information, and gender information in the face registration data storage unit 34, voice data including name information related to the face registration data is created, and the voice data is stored in the voice data. Is stored in the voice data storage unit 35 in association with the face registration data.
By repeating the above operation, the face registration data and name information of a plurality of persons are recorded in the face registration data storage unit 34, and voice data related to the face registration data is stored in the voice data storage unit 35. Is done.

その後に、撮影準備段階において、顔検出部２１は、スルー画像データ中の被写体人物の画像を顔データ記憶部３１の顔データと比較して、被写体人物の顔部分を検出する。この説明では、被写体人物は１人のみとする。
顔表情検出部２２は、顔検出部２１によって検出された被写体人物の顔表情を顔表情データ記憶部３２の顔表情データと比較して、被写体人物の顔表情が「笑顔」又は「泣き顔」又は「怒り顔」などの何れであるかを検出して、笑い顔情報、泣き顔情報、怒り顔情報などの顔表情情報を取得する。
これと同時に、顔認識部２４は、顔検出部２１が検出した被写体人物の顔画像と顔登録データ記憶部３４に記録されている顔登録データとを比較して、被写体人物が登録済みの人物の誰に該当するかを判別する。 Thereafter, in the shooting preparation stage, the face detection unit 21 compares the image of the subject person in the through image data with the face data in the face data storage unit 31 to detect the face part of the subject person. In this description, it is assumed that there is only one subject person.
The facial expression detection unit 22 compares the facial expression of the subject person detected by the face detection unit 21 with the facial expression data of the facial expression data storage unit 32, and the facial expression of the subject person is “smile” or “crying face” or It is detected whether it is “angry face” or the like, and facial expression information such as laughing face information, crying face information, and angry face information is acquired.
At the same time, the face recognition unit 24 compares the face image of the subject person detected by the face detection unit 21 with the face registration data recorded in the face registration data storage unit 34, so that the subject person has been registered. To which of the following.

音声データ選択部２５は、顔認識部２４が判別した登録人物に関する氏名情報、年代情報及び性別情報と、顔表情検出部２２が取得した顔表情情報とに基づき、音声データ記憶部３５に記憶されている複数の音声データから、その顔表情情報に適した音声データを選択する。このときに選択される音声データは、上記の氏名情報を含んだ音声データであり、例えば、「Ａちゃん、ニコニコして」、「Ｂ君、爆笑」、「Ｃさん、笑いましょう」などの音声データが選択される。 The voice data selection unit 25 is stored in the voice data storage unit 35 based on the name information, age information, and gender information about the registered person determined by the face recognition unit 24 and the facial expression information acquired by the facial expression detection unit 22. Audio data suitable for the facial expression information is selected from the plurality of audio data. The voice data selected at this time is voice data including the above name information. For example, “A-chan, smile”, “B-kun, laughter”, “C-san, let's laugh” Audio data is selected.

次いで、操作部４１のレリーズスイッチが操作されると、音声データ選択部２５はこのレリーズスイッチ操作に応じて、選択された音声データをスピーカー４３に送り、これによって「Ａちゃん、ニコニコして」などが再生される。
この音声データの再生後、所定時間後に、撮影動作が開始され、撮像素子１３からの撮像信号が画像処理部１５で処理された後に画像記録部１６によってメモリ１６ａに記憶される。 Next, when the release switch of the operation unit 41 is operated, the audio data selection unit 25 sends the selected audio data to the speaker 43 according to the operation of the release switch. Is played.
A photographing operation is started after a predetermined time after reproduction of the audio data, and the image signal from the image sensor 13 is processed by the image processing unit 15 and then stored in the memory 16a by the image recording unit 16.

顔表情検出部２２はこの撮影された被写体画像の顔表情を検出し、効果評価部２６はこの検出された撮影被写体画像の顔表情が音声データの再生効果によって「笑い顔」になっているかを判断する。
効果評価部２６は、顔表情検出部２２が「笑い顔」を検出した場合には、この音声データに得点１を付与し、「笑い顔」でないと判断した場合には、得点１を付与しない。
ランキング部２７は、効果評価部２６が音声データに付与した得点を、音声データ毎に加算して音声データのランキングを決定する。 The facial expression detection unit 22 detects the facial expression of the photographed subject image, and the effect evaluation unit 26 determines whether the detected facial expression of the photographed subject image is a “laughing face” due to the reproduction effect of the audio data. to decide.
The effect evaluation unit 26 assigns a score of 1 to the voice data when the facial expression detection unit 22 detects a “laughing face”, and does not assign a score of 1 when it is determined that it is not a “laughing face”. .
The ranking unit 27 adds the score given to the audio data by the effect evaluation unit 26 for each audio data, and determines the ranking of the audio data.

次に、顔検出部２１が、スルー画像データから複数の被写体人物の顔画像を検出した場合について説明する。
顔検出部２１が検出した複数の被写体人物の内に登録人物が１人のみであることを顔認識部２４が判別した場合には、音声データ選択部２５は、「皆さん」のような複数人に呼びかけるような音声データを選択するか、又は氏名情報を含まない音声データを選択する。これは、複数の被写体人物を撮影する時に１人の登録人物の名前を含んだ音声データ、例えば「Ｃさん、笑いましょう」を再生することによってその登録人物のみが笑顔になり、他の被写体人物が非笑顔のままであるという状況を避けるためである。 Next, a case where the face detection unit 21 detects face images of a plurality of subject persons from through image data will be described.
When the face recognition unit 24 determines that there is only one registered person among the plurality of subject persons detected by the face detection unit 21, the voice data selection unit 25 selects a plurality of people such as “you”. Voice data that calls the user is selected, or voice data that does not include name information is selected. This is because, when shooting a plurality of subject persons, only the registered person smiles by playing back audio data including the name of one registered person, for example, “Mr. C, let's laugh.” This is to avoid the situation where the person remains unsmiling.

また、顔検出部２１が検出した複数の被写体人物の内に登録人物が所定人数以上、例えば３人以上いることを顔認識部２４が判別した場合には、音声データ選択部２５は、「皆さん」のような複数人に呼びかけるような音声データを選択するか、又は氏名情報を含まない音声データを選択する。
また、顔検出部２１が検出した被写体人物が２人であり、その２人が共に登録人物である場合には、音声データ選択部２５は、その２名の氏名情報を含んだ音声データ、例えば「Ａさん、Ｂさん、笑って」を選択する。 When the face recognition unit 24 determines that there are more than a predetermined number of registered persons, for example, three or more, among the plurality of subject persons detected by the face detection unit 21, the voice data selection unit 25 The voice data that calls out to a plurality of people such as “” is selected, or the voice data that does not include the name information is selected.
When there are two subject persons detected by the face detection unit 21 and both of them are registered persons, the audio data selection unit 25 selects audio data including name information of the two persons, for example, Select “Mr. A, Mr. B, laugh”.

上述の実施の形態は、以下の効果を有するものである。
（１）被写体人物の顔表情を検出して、その顔表情に基づき音声データを選択して、その選択した音声データを撮影動作の直前に再生することができる。
（２）被写体人物の年代／性別を推論して、その推論した年代／性別に基づき音声データを選択して、その選択した音声データを撮影動作の直前に再生することができる。
（３）被写体人物の顔表情を検出し、かつ被写体人物が登録人物であることを認識し、その顔表情と認識した登録人物情報とに基づいて音声データを選択して、その選択した音声データを撮影動作の直前に再生することができる。
（４）被写体人物が事前に登録された登録人物に相当することを認識した場合には、その登録人物の氏名情報を含む音声データを選択し、その選択した音声データを撮影動作の直前に再生することができる。
（５）被写体人物が複数であり、その複数の被写体人物のうち登録人物が１人のみの場合には、その登録人物の名前を含まない音声データを選択することができる。
（６）被写体人物が複数である場合には、「皆さん」のような複数人に呼びかけるような音声データを選択することができる。 The above-described embodiment has the following effects.
(1) It is possible to detect a facial expression of a subject person, select audio data based on the facial expression, and reproduce the selected audio data immediately before the shooting operation.
(2) The age / gender of the subject person can be inferred, audio data can be selected based on the inferred age / sex, and the selected audio data can be reproduced immediately before the shooting operation.
(3) detecting the facial expression of the subject person, recognizing that the subject person is a registered person, selecting voice data based on the facial expression and the recognized registered person information, and selecting the selected voice data Can be played immediately before the shooting operation.
(4) When it is recognized that the subject person corresponds to a registered person registered in advance, audio data including the name information of the registered person is selected, and the selected audio data is reproduced immediately before the shooting operation. can do.
(5) When there are a plurality of subject persons and only one registered person is among the plurality of subject persons, audio data that does not include the name of the registered person can be selected.
(6) When there are a plurality of subject persons, it is possible to select audio data that calls to a plurality of persons such as “you”.

なお、上述の実施の形態では、顔表情検出部２２によって単に被写体人物の顔表情を検出するものであったが、顔表情の度合い即ち表情レベルをも検出し、その顔表情レベルに応じて音声データを選択してもよい。
例えば、顔表情の度合い、即ち顔表情レベル、例えば「微笑み」、「中位の笑い」及び「大笑い」を区別して検出することができる顔表情レベル検出部をＣＰＵ１７に設け、この顔表情レベル検出部が検出した顔表情レベルに応じて、音声データを選択することもできる。
この場合には、顔表情レベル検出部が「中位の笑い」レベルを検出した場合には、例えば「もっと大きく笑って」などの音声データを選択する。 In the above-described embodiment, the facial expression detection unit 22 simply detects the facial expression of the subject person. However, the degree of facial expression, that is, the expression level is also detected, and the voice is detected according to the facial expression level. Data may be selected.
For example, the CPU 17 is provided with a facial expression level detection unit capable of distinguishing and detecting the degree of facial expression, that is, the facial expression level, for example, “smile”, “medium laughter”, and “big laughter”. Audio data can also be selected according to the facial expression level detected by the section.
In this case, when the facial expression level detection unit detects the “middle laughter” level, for example, voice data such as “laugh more loudly” is selected.

１３：撮像素子１７：ＣＰＵ
１８：ＲＯＭ１９：バス
２２：顔表情検出部２３：年代／性別推論部
２４：顔認識部２５：音声データ選択部
２６：効果評価部２７：ランキング部
３２：顔表情データ記憶部３３：年代／性別データ記憶部
３４：顔登録データ記憶部３５：音声データ記憶部
４１：操作部４２：ディスプレイ
４３：スピーカー 13: Image sensor 17: CPU
18: ROM 19: Bus 22: Facial expression detection unit 23: Age / gender reasoning unit 24: Face recognition unit 25: Speech data selection unit 26: Effect evaluation unit 27: Ranking unit 32: Facial expression data storage unit 33: Age / Gender data storage unit 34: Face registration data storage unit 35: Audio data storage unit 41: Operation unit 42: Display 43: Speaker

Claims

Imaging means for imaging a person and creating image data;
Photographing means for starting the photographing operation;
A person information acquisition unit that detects a subject person image in the image data captured by the imaging unit in a shooting preparation stage and acquires the subject person information;
Voice data storage means for storing a plurality of voice data;
Voice data selection means for selecting predetermined voice data from the plurality of voice data stored in the voice data storage means based on the subject person information acquired by the person information acquisition means;
A digital camera comprising: an audio reproducing unit that reproduces the selected audio data prior to an imaging operation by the imaging unit.

The digital camera according to claim 1, wherein
The person information acquisition means includes face expression detection means for detecting that the face image of the subject person in the image data is a predetermined facial expression and acquiring facial expression information,
The digital camera according to claim 1, wherein the voice data selection means selects the predetermined voice data based on the facial expression information acquired by the facial expression detection means.

The digital camera according to claim 1, wherein
The person information acquisition means includes inference means for inferring at least one of the age and sex of the person as the subject person information based on the subject person image in the image data,
The digital camera, wherein the audio data selection means selects the predetermined audio data based on at least one of the age and sex inferred by the inference means.

The digital camera according to any one of claims 1 to 3,
Face registration data storage means for storing face information of a predetermined person as face registration data;
Face recognition means for detecting that a face image of a subject person in the image data corresponds to predetermined face registration data stored in the face registration data storage means, and recognizing that the subject person is a registered person; Further comprising
The voice data storage means stores voice data including the name of the registered person,
The voice data selection unit selects voice data including the name of the registered person from a plurality of voice data stored in the voice data storage unit according to the registered person recognized by the face recognition unit. A digital camera characterized by that.

The digital camera according to claim 4, wherein
When there are a plurality of subject persons in the image data, and the predetermined number or more of registered persons are included in the plurality of subject persons, the sound data selecting unit does not include the names of the registered persons. A digital camera characterized by selecting.

The digital camera according to claim 4, wherein
When there are a plurality of subject persons in the image data, and the number of registered persons is less than the predetermined number among the plurality of subject persons, the audio data selection unit includes the names of all of the registered persons. A digital camera characterized by selecting audio data.

The digital camera according to claim 4, wherein
When there are a plurality of subject persons in the image data, and only one registered person is among the plurality of subject persons, the sound data selecting means selects sound data not including the name of the registered person. A digital camera characterized by selection.

The digital camera according to claim 2,
Evaluation means for evaluating whether or not the effect of the audio data reproduced by the audio reproduction means is reflected in the face image of the subject person in the image data imaged by the imaging means in the imaging stage by the imaging means; ,
Ranking means for assigning rankings to a plurality of audio data recorded in the audio data storage means based on the evaluation result of the evaluation means;
A digital camera further comprising display means for displaying the ranking result given by the ranking means.

The digital camera according to any one of claims 1 to 8,
A digital camera, wherein a plurality of audio data stored in the audio data storage means is downloaded from a server.