JP2010093603A

JP2010093603A - Camera, reproducing device, and reproducing method

Info

Publication number: JP2010093603A
Application number: JP2008262448A
Authority: JP
Inventors: Hiroshi Iizuka; 博飯塚; Kosuke Matsubara; 浩輔松原; Osamu Nonaka; 修野中
Original assignee: Olympus Imaging Corp
Current assignee: Olympus Imaging Corp
Priority date: 2008-10-09
Filing date: 2008-10-09
Publication date: 2010-04-22
Anticipated expiration: 2028-10-09
Also published as: JP5214394B2

Abstract

PROBLEM TO BE SOLVED: To provide a camera that has sound effect with a rich atmosphere while taking differences between a range that a photographer views and a range that the photographer hears into consideration, a reproducing device, and a reproducing method. SOLUTION: The camera has an imaging unit 2 which images a subject and outputs image data, a sound pickup unit 7 which can change sound pickup ranges of a sound from a subject direction, and a face detection unit 3 which determines whether a person is present in an image by detecting a face part of a person in the image on the basis of the image data obtained by the imaging unit 2. The sound pickup ranges 33a and 33b are made narrow when the face detection unit 3 determines that there is the person in the image for a prescribed time, and made wide when the face detection unit 3 determines that there is no person in the image for the prescribed time. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、カメラおよび再生装置に関し、詳しくは、撮影時に周囲の環境音を録音可能なカメラおよびこのカメラで撮影した撮影画像の再生装置および再生方法に関する。 The present invention relates to a camera and a playback apparatus, and more particularly, to a camera capable of recording ambient environmental sound during shooting, and a playback apparatus and playback method for a captured image shot with this camera.

近年、大画面テレビが普及してきており、撮影画像を大画面テレビに再生表示して楽しむことが行われている。また、テレビの画質も向上してきており、消費電力も低減されてきていることから、撮影画像をポスターのように表示し、画像をインテリアとして楽しむことも行われている。さらに、デジタル画像を表示するためのデジタルフォトフレームも普及してきている。このように、最近では、生活を撮影画像で彩ることが行われている。 In recent years, large screen televisions have become widespread, and captured images are played back and displayed on large screen televisions. Also, since the image quality of televisions has been improved and the power consumption has been reduced, captured images are displayed like posters and the images can be enjoyed as interiors. In addition, digital photo frames for displaying digital images have become widespread. Thus, recently, life has been colored with photographed images.

インテリア感覚での表示にあたっては、画像は押しつけがましいものではなく、雄大な風景や美しい花鳥風月など、癒されるものが求められており、従来のような動画とは異なる撮影方法や、また表示方法が必要となってきている。 When displaying in an interior sensation, images are not intrusive, and there is a need for something that can heal, such as majestic landscapes and beautiful flower-and-birds, and a different shooting method and display method are required. It has become.

また、インタリア感覚で撮影画像を表示するにあたって、撮影時に録音された音声を再生すると、一層、癒される。撮影時に音声を録音することは従来よりも種々提案されている。例えば、特許文献１には、ビデオカメラでの録音にあたって、ズーム撮影時に臨場感を高めるために、ズームレンズのフォーカシングに同期してマイクロフォンの指向性を被写体にフォーカシングするビデオカメラの録音装置が開示されている。また、特許文献２には、画像知識データベースの情報を用いて、分割された画像から画像内の物体や、その物体の動き（位置）や、カメラの操作等を解析し、物体が発していると考えられる音源を音情報から分離し、分離された音源を映像に適した音場空間に再配置するようにした装置および方法が開示されている。
特開平５−３０８５５３号公報特開２０００−２９５７００号公報 In addition, when displaying a photographed image with an interior sensation, if the sound recorded at the time of photographing is reproduced, it is further healed. There have been various proposals for recording audio during shooting. For example, Patent Document 1 discloses a video camera recording apparatus that focuses a microphone directivity on a subject in synchronization with zoom lens focusing in order to enhance the sense of reality when recording with a video camera. ing. Further, in Patent Document 2, using information in an image knowledge database, an object in an image, a movement (position) of the object, an operation of a camera, and the like are analyzed from a divided image, and the object is emitted. An apparatus and a method are disclosed in which a sound source considered to be separated from sound information and the separated sound source is rearranged in a sound field space suitable for video.
Japanese Patent Laid-Open No. 5-308553 JP 2000-295700 A

これまでのカメラは、例えば、運動会や学芸会において、わが子の声を録音できるような用途を想定している。特許文献１に開示されたビデオカメラの録音装置では、指向特性を制御するマイクロフォンを備え、被写体へのフォーカシングに同期してマイクロフォンの指向特性を変えることにより、被写体に音声を合わせるようにしている。また、特許文献２に開示の装置では、臨場感を高めるための音声再生であって、癒されるような環境音の録音や再生を行うものではない。 Conventional cameras are supposed to be used for recording voices of children, for example, in athletic meet or school performance. The recording device for a video camera disclosed in Patent Document 1 includes a microphone that controls directivity, and adjusts the directivity of the microphone in synchronization with focusing on the subject so that the sound is matched to the subject. In addition, the apparatus disclosed in Patent Document 2 is audio reproduction for enhancing the sense of reality, and does not record or reproduce environmental sound that is healed.

何か思い出にふけるような癒しの画像、音声再生では、画像と音源が一致している必要は必ずしもない。近くを見ていても広い背景の環境音を聴いていることが多い。例えば、海を眺めながら潮騒を聴いているような状況では、視点が変わり画像が変化しても、顔の向きに合わせて音声が変化する必要はない。また海辺で貝殻を見ている時は、貝殻の方向の音を聴きたいわけではなく、あくまでも主たるものは潮騒である。ユーザは広い画像を見渡そうとして、顔を動かすかもしれないが、聴覚の指向性ははるかに広く、よほどのことがない限り、いちいち音源の方向を見る必要はない。 It is not always necessary that the image and the sound source coincide with each other in a healing image or sound reproduction that makes you feel something. I often listen to a wide range of environmental sounds even when I look close. For example, in a situation where you are listening to the sea while looking at the sea, even if the viewpoint changes and the image changes, it is not necessary for the sound to change according to the orientation of the face. Also, when watching seashells at the beach, you don't want to hear the sound in the direction of the shells. The user may move his face trying to look over a wide image, but his auditory directivity is much wider and he doesn't have to look at the direction of the sound source every time, unless there is something wrong.

本発明は、このような事情を鑑みてなされたものであり、撮影者の見ている範囲と撮影者が聴いている範囲の差異を考慮して、雰囲気豊かな音響効果を有するカメラ、再生装置、および再生方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and in consideration of the difference between the range that the photographer is viewing and the range that the photographer is listening to, a camera and a playback device that have a rich atmospheric effect It is an object to provide a reproduction method.

上記目的を達成するため第１の発明に係わるカメラは、被写体を撮像し、画像データを出力する撮像部と、被写体方向からの音声の収音範囲を変更可能な収音変更部と、上記撮像部で得た画像データに基づき、画像の中から人物の顔部分を検出することによって、上記画像中に人物がいるか否かの判定を行う顔検出部と、上記顔検出部によって、所定時間にわたって画像内に人物がいると判定した場合には上記収音変更部によって収音範囲を狭くし、一方、所定時間にわたって画像内に人物がいないと判定した場合には上記収音変更部によって収音範囲を広く変更する制御部と、を有する。 In order to achieve the above object, a camera according to a first aspect of the invention includes an imaging unit that captures an image of a subject and outputs image data, a sound collection change unit that can change a sound collection range of sound from the direction of the subject, and the imaging A face detection unit that determines whether or not there is a person in the image by detecting a face portion of the person from the image based on the image data obtained by the unit, and the face detection unit for a predetermined time When it is determined that there is a person in the image, the sound collection change unit narrows the sound collection range. On the other hand, when it is determined that there is no person in the image for a predetermined time, the sound collection change unit performs sound collection. And a controller that changes the range widely.

第２の発明に係わるカメラは、上記第１の発明において、上記顔検出部は、さらに上記人物の表情の画像を検知し、上記制御部は、上記顔検出部によって検知された上記表情の変化に従って上記収音変更部における上記収音範囲を変更する。 In the camera according to a second aspect, in the first aspect, the face detection unit further detects an image of the facial expression of the person, and the control unit changes the facial expression detected by the face detection unit. Then, the sound collection range in the sound collection change unit is changed.

第３の発明に係わるカメラは、被写体を撮像し、画像データを出力する撮像部と、被写体方向からの音声の収音範囲を変更可能な収音変更部と、上記撮像部で得た画像データに基づき、画像の中から人物の顔部分を検出することによって、上記画像中に人物がいるか否かの判定を行う顔検出部と、上記顔検出部によって、撮影前の画像内に人物がいると判定した場合には上記収音変更部によって収音範囲を狭くし、一方、上記撮影前の画像内に人物がいないと判定した場合には上記収音変更部によって収音範囲を広く変更する制御部と、を有する。 A camera according to a third aspect of the invention is an image pickup unit that picks up an image of a subject and outputs image data, a sound pickup change unit that can change a sound pickup range from the subject direction, and image data obtained by the image pickup unit. And detecting a face portion of the person from the image to determine whether or not there is a person in the image, and the face detection section includes a person in the image before shooting. If it is determined that the sound collection range is narrowed by the sound collection change unit, on the other hand, if it is determined that there is no person in the image before shooting, the sound collection range is changed widely by the sound collection change unit. And a control unit.

第４の発明に係わるカメラは、被写体を撮像し、画像データを出力する撮像部と、被写体方向からの音声の収音範囲を変更可能な収音変更部と、撮影者を撮像し、第２画像データを出力する第２撮像部と、上記第２撮像部で得た上記第２画像データに基づき、画像の中から人物の顔部分を検出する顔検出部と、上記顔検出部によって検出された上記顔部分の画像に従って上記収音変更部の上記収音範囲を変更する制御部と、を有する。 According to a fourth aspect of the present invention, there is provided a camera for imaging a subject and outputting image data, a sound collection changing unit capable of changing a sound collection range of sound from the subject direction, a photographer, Based on the second imaging unit that outputs image data, the second image data obtained by the second imaging unit, a face detection unit that detects a human face portion from the image, and the face detection unit And a control unit that changes the sound collection range of the sound collection change unit in accordance with the image of the face portion.

第５の発明に係わるカメラは、上記第４の発明において、上記顔検出部は、さらに上記人物の顔の向きを検知し、上記制御部は、上記顔の向きが変化しない時に、上記収音変更部における収音範囲を狭く変更する。
第６の発明に係わるカメラは、上記第４の発明において、上記制御部は、上記顔検出部によってカメラの撮影者の視点が第１の部分から第２の部分に移ったことを判定した場合には、一旦、上記収音変更部による収音範囲を広げた後、狭くする。 The camera according to a fifth aspect is the camera according to the fourth aspect, wherein the face detection unit further detects the direction of the person's face, and the control unit detects the sound collection when the face direction does not change. The sound collection range in the changing unit is narrowly changed.
The camera according to a sixth aspect is the camera according to the fourth aspect, wherein the control unit determines that the viewpoint of the photographer of the camera has moved from the first part to the second part by the face detection unit. First, the sound collection range by the sound collection change unit is once expanded and then narrowed.

第７の発明に係わるカメラは、被写体を撮像し、画像データを出力する撮像部と、被写体方向からの音声を複数のマイクで収音する収音部と、上記収音部で収音された音声を再生する再生部と、撮影者を撮像し、第２画像データを出力する第２撮像部と、上記第２撮像部で得た上記第２画像データに基づき、画像の中から人物の顔部分を検出する顔検出部と、上記顔検出部によって検出された上記顔部分の画像に従って上記再生部で音声を再生するにあたって、上記音声の音源位置を変更する制御部と、を有する。 According to a seventh aspect of the invention, an image pickup unit that picks up an image of a subject and outputs image data, a sound pickup unit that picks up sound from the subject direction with a plurality of microphones, and the sound pickup unit picks up the sound. Based on the reproduction unit that reproduces sound, the second imaging unit that captures the photographer and outputs the second image data, and the second image data obtained by the second imaging unit, the human face from the image A face detection unit that detects a part; and a control unit that changes a sound source position of the sound when the reproduction unit reproduces the sound according to the image of the face part detected by the face detection unit.

第８の発明に係わるカメラは、上記第７の発明において、上記顔検出部は、さらに上記人物の顔の向きを検知し、上記制御部は、上記顔の向きが変化しない時に、上記再生部において再生する複数の音声のバランスを変更する。 The camera according to an eighth invention is the camera according to the seventh invention, wherein the face detection unit further detects the orientation of the person's face, and the control unit performs the reproduction unit when the face orientation does not change. The balance of a plurality of sounds to be played is changed.

第９の発明に係わるカメラは、被写体を撮像し、画像データを出力する撮像部と、被写体方向からの音声の収音範囲を変更可能な収音変更部と、上記撮像部で得た画像データに基づき、画像の中から人物の顔部分を検出する顔検出部と、上記顔検出部において検出された上記人物の顔の見ている方向に従って、上記収音変更部による上記収音範囲を変更する制御部と、を有する。 A camera according to a ninth aspect of the present invention is an image pickup unit that picks up an image of a subject and outputs image data, a sound pickup change unit that can change a sound pickup range from the subject direction, and image data obtained by the image pickup unit. The sound collection range by the sound collection change unit is changed according to the direction of the face of the person detected by the face detection unit detected by the face detection unit detected from the face based on A control unit.

第１０の発明に係わるカメラは、画像データを出力する撮像部と、被写体方向からの音声の収音範囲を変更可能な収音変更部と、上記撮像部で得た画像データに基づき、画像の中から人物の顔部分を検出する顔検出部と、通常状態では、音声の収音範囲を撮影範囲よりも広範囲とし、上記顔検出部の検出結果に応じて、上記音声の収音範囲を狭める制御手段と、を有する。 A camera according to a tenth aspect of the invention is based on an image pickup unit that outputs image data, a sound pickup change unit that can change a sound pickup range from the subject direction, and image data obtained by the image pickup unit. A face detection unit that detects the face portion of a person from the inside, and in a normal state, the sound collection range of the sound is wider than the shooting range, and the sound collection range of the sound is narrowed according to the detection result of the face detection unit Control means.

第１１の発明に係わるカメラは、上記第１０の発明において、上記撮像部は、撮影光学系によって形成された被写体像に基づいて上記画像データを出力し、上記顔検出部は、上記被写体像中における上記顔部分の位置及び／または上記顔部分の数を検出可能であり、上記制御手段は、上記顔部分が中央にある場合及び／または上記顔部分の数が所定値以上の場合に上記収音範囲を狭くする。 In a camera according to an eleventh aspect based on the tenth aspect, the imaging unit outputs the image data based on a subject image formed by a photographing optical system, and the face detection unit is included in the subject image. The position of the face part and / or the number of the face parts in the camera can be detected, and the control means is configured to store the face part when the face part is in the center and / or when the number of the face parts is a predetermined value or more. Narrow the sound range.

第１２の発明に係わるカメラは、上記第１０の発明において、上記顔検出部は、上記人物の表情変化を判定可能であり、上記制御部は、上記顔検出部によって上記表情変化を検出した場合には、上記収音範囲を急いで狭くする。 The camera according to a twelfth aspect is the camera according to the tenth aspect, wherein the face detection unit can determine the facial expression change of the person, and the control unit detects the facial expression change by the face detection unit. In this case, the sound collection range is quickly narrowed.

第１３の発明に係わるカメラは、上記第１０の発明において、上記撮像部は、撮影者の顔に関する画像データを出力し、上記顔検出部は、上記画像データに基づいて上記撮影者の注視方向を検出し、上記制御部は、上記注視方向に基づいて上記収音範囲を制御する。
第１４の発明に係わるカメラは、上記第１３の発明において、上記顔検出部が、上記注視方向としてカメラの表示部を検出した場合には、上記制御部は上記収音範囲を狭める。 The camera according to a thirteenth aspect is the camera according to the tenth aspect, wherein the imaging section outputs image data relating to a photographer's face, and the face detection section is based on the image data and the gaze direction of the photographer. And the control unit controls the sound collection range based on the gaze direction.
In the camera according to a fourteenth aspect, in the thirteenth aspect, when the face detection unit detects a display unit of the camera as the gaze direction, the control unit narrows the sound collection range.

第１５の発明に係わるカメラは、上記第１０の発明において、上記撮像部は、撮影光学系によって形成された被写体像に基づいて上記画像データを出力し、上記顔検出部は、さらに上記画像データに基づいて上記撮影者の注視方向を検出し、上記制御部は、上記注視方向に基づいて上記収音範囲を制御する。
第１６の発明に係わるカメラは、上記第１０の発明において、さらに、上記画像データに基づく被写体像と、上記画像データと一緒に記録された音声データを再生表示する再生表示部を有する。 In a camera according to a fifteenth aspect based on the tenth aspect, the imaging unit outputs the image data based on a subject image formed by a photographing optical system, and the face detection unit further includes the image data. And the control unit controls the sound collection range based on the gaze direction.
A camera according to a sixteenth aspect of the present invention is the camera according to the tenth aspect, further comprising a reproduction display unit for reproducing and displaying a subject image based on the image data and audio data recorded together with the image data.

第１７の発明に係わる再生装置は、被写体画像データと、これと同時にステレオ録音された音声データを再生する再生装置において、上記音声データ中に記録されているデータに基づいて、撮影者の注視方向を判定する判定部と、上記注視方向の判定に基づいて音声再生を行う再生部と、を有する。 According to a seventeenth aspect of the present invention, there is provided a reproducing apparatus for reproducing subject image data and audio data recorded in stereo at the same time, based on the data recorded in the audio data, and based on the data recorded in the audio data. And a reproducing unit that performs audio reproduction based on the determination of the gaze direction.

第１８の発明に係わる再生装置は、上記第１７の発明において、上記判定部によって注視方向を検出できなかった場合には、広い範囲での音声再生を行い、上記判定部によって注視方向を検出できた場合には、検出された注視方向で強調再生を行う。 In the seventeenth aspect of the invention, the playback device according to the eighteenth aspect of the present invention can reproduce the sound in a wide range when the determination unit cannot detect the gaze direction and can detect the gaze direction by the determination unit. If it is detected, enhancement reproduction is performed in the detected gaze direction.

第１９の発明に係わる再生方法は、被写体画像データと、これと同時にステレオ録音された音声データを再生する再生方法において、上記音声データ中に記録されているデータに基づいて、撮影者の注視方向を判定し、上記注視方向の判定に基づいて音声再生を行う。 According to a nineteenth aspect of the present invention, there is provided a reproduction method for reproducing subject image data and audio data recorded in stereo at the same time, based on the data recorded in the audio data, and based on the data recorded in the audio data. Sound reproduction is performed based on the determination of the gaze direction.

第２０の発明に係わるプログラムは、被写体画像データと、これと同時にステレオ録音された音声データを再生するためのプログラムにおいて、上記音声データ中に記録されているデータに基づいて、撮影者の注視方向を判定し、上記注視方向の判定に基づいて音声再生を行う、ことをコンピュータに実行させる。
第２１の発明に係わるカメラは、被写体を撮像し、画像データを出力する撮像部と、被写体方向からの音声の収音範囲を変更可能な収音変更部と、撮影者の注視方向を判定する注視方向判定部と、上記注視方向判定部によって判定された上記撮影者の注視方向と、上記画像データに基づく撮影画像の画像に従って、上記収音変更部による収音範囲を制御する制御部と、を有する。 A program according to a twentieth invention is a program for reproducing subject image data and audio data recorded in stereo at the same time, and based on the data recorded in the audio data, the direction of the photographer's gaze The computer is caused to perform sound reproduction based on the determination of the gaze direction.
A camera according to a twenty-first aspect of the present invention is an image pickup unit that picks up an image of a subject and outputs image data, a sound pickup change unit that can change a sound pickup range from the subject direction, and a direction in which the photographer is looking A control unit that controls a sound collection range by the sound collection change unit according to a gaze direction of the photographer determined by the gaze direction determination unit, a gaze direction of the photographer determined by the gaze direction determination unit, and a captured image based on the image data; Have

本発明によれば、撮影者の見ている範囲と撮影者が聴いている範囲の差異を考慮して、雰囲気豊かな音響効果を有するカメラ、再生装置、および再生方法を提供することができる。 According to the present invention, it is possible to provide a camera, a playback device, and a playback method that have a rich sound effect in consideration of the difference between the range that the photographer is viewing and the range that the photographer is listening.

以下、図面に従って本発明を適用したデジタルカメラを用いて好ましい実施形態について説明する。本実施形態に係わるデジタルカメラは、画像に加えて、雰囲気豊かな音響効果を有する音声を記録するようにしている。カメラのユーザは、前述したように見ている範囲の音を常に聴いているわけではない。記憶の再現では、厳密な音の方向の再現より、思い出にふけることができるような音声再生ができ、記憶の中の音声を無理なく再現できることが再現できる音声収音が望ましい。本実施形態においては、撮影シーンに応じて最適な収音を行い、撮影時に撮影者が聴いて記憶に残っているような環境音の記録や再生を行うようにしている。また、慌ただしく音声が切り替わることがなく、落ち着いて記憶をひもとくができ、癒し効果のある画像や音声の再生を可能としている。さらに、撮影時の環境の雰囲気を伝えるために、撮影者の目と耳の指向性の差異を考慮し、そのときの情景を視覚でも聴覚でも思い出せるようにしている。 Hereinafter, preferred embodiments using a digital camera to which the present invention is applied will be described with reference to the drawings. The digital camera according to the present embodiment is configured to record sound having a rich sound effect in addition to images. As described above, the camera user does not always listen to the sound in the range of viewing. In the reproduction of the memory, it is desirable to collect the sound so that the sound can be reproduced so that it can be remembered, and the sound in the memory can be reproduced without difficulty, rather than the reproduction of the exact sound direction. In the present embodiment, optimum sound collection is performed according to the shooting scene, and environmental sound that the photographer listens to and remains in memory during shooting is recorded and reproduced. In addition, the voice does not switch quickly, it is possible to calm down and memorize the memory, and it is possible to reproduce images and sounds having a healing effect. Furthermore, in order to convey the atmosphere of the environment at the time of shooting, the difference between the directivity of the photographer's eyes and ears is taken into consideration so that the scene at that time can be remembered both visually and auditorily.

図１は、本発明の第１実施形態に係わるカメラ１０と外部機器２０の構成を示すブロック図である。カメラ１０は、デジタルカメラであり、信号処理及び制御部１、撮像部２、顔検出部３、記録部４、操作判定部６、音声収録部７、表示部８、時計部９、および通信部１２を有する。 FIG. 1 is a block diagram showing the configuration of a camera 10 and an external device 20 according to the first embodiment of the present invention. The camera 10 is a digital camera, and includes a signal processing and control unit 1, an imaging unit 2, a face detection unit 3, a recording unit 4, an operation determination unit 6, an audio recording unit 7, a display unit 8, a clock unit 9, and a communication unit. Twelve.

カメラ１０内の信号処理及び制御部１は、カメラ１０専用の信号処理ＬＳＩ等から構成され、カメラ１０全体を制御するとともに撮像部２から出力される画像データの画像処理を行う。撮像部２は、撮影レンズ２ａ（図２（ａ）参照）やこの撮影レンズ２ａによって形成された被写体像を画像データに変換する撮像素子等から構成される。 The signal processing and control unit 1 in the camera 10 includes a signal processing LSI dedicated to the camera 10 and controls the entire camera 10 and performs image processing of image data output from the imaging unit 2. The imaging unit 2 includes a photographing lens 2a (see FIG. 2A), an imaging element that converts a subject image formed by the photographing lens 2a into image data, and the like.

記録部４は、撮像部２から出力される画像データを、信号処理及び制御部１によって画像処理や圧縮処理された後に記録する。顔検出部３は、撮像部２から出力される画像データを用いて、画像の中に人物の顔画像が含まれるかを判定する。また、顔検出部３は、顔画像が含まれた場合、顔の陰影パターンの変化を検出することによって、人物の表情も判定することが可能である。 The recording unit 4 records the image data output from the imaging unit 2 after being subjected to image processing and compression processing by the signal processing and control unit 1. The face detection unit 3 uses the image data output from the imaging unit 2 to determine whether a human face image is included in the image. Further, when a face image is included, the face detection unit 3 can also determine the facial expression of a person by detecting a change in the shadow pattern of the face.

音声収録部７は、ステレオマイク７ａを有しており、前方周囲の音声を記録する。また、この音声収録部７は、ステレオマイクからの音声信号を信号処理し、音声の収録範囲を変更することができる。音声収録部７から出力される音声データは、信号処理及び制御部１で信号処理を行った後、画像データと共に記録部４に記録される。 The audio recording unit 7 has a stereo microphone 7a and records audio around the front. The audio recording unit 7 can process an audio signal from the stereo microphone and change the audio recording range. The audio data output from the audio recording unit 7 is recorded in the recording unit 4 together with the image data after being subjected to signal processing by the signal processing and control unit 1.

操作判定部６は、レリーズ釦等の操作部材とこれに連動するスイッチ等を有する。操作判定部６によって判定された操作状態は、信号処理及び制御部１に送られ、信号処理及び制御部１は、操作状態に応じた処理を実行する。時計部９は、日時等のカレンダー・計時機能を有し、撮影時の撮影日時情報等を出力する。撮影日時情報は、画像データと共に記録部４に記録される。 The operation determination unit 6 includes an operation member such as a release button and a switch linked to the operation member. The operation state determined by the operation determination unit 6 is sent to the signal processing and control unit 1, and the signal processing and control unit 1 executes a process according to the operation state. The clock unit 9 has a calendar / time keeping function such as date and time, and outputs shooting date and time information at the time of shooting. The shooting date / time information is recorded in the recording unit 4 together with the image data.

表示部８は、撮像部２から出力される画像データに基づいて、被写体像をフレーミング用にライブビュー表示し、また、記録部４に記録されている画像データを再生表示する。通信部１２は、テレビ等の外部機器２０との送信や受信を行う。通信手段としては、無線ＬＡＮ、近接無線通信、赤外線通信、ＵＳＢケーブル等による有線通信等によって行い、カメラ１０で撮影した画像データや音声データを送信可能である。また、近年は、ハイビジョンのディプレイに画像・音声を送信するためにＨＤＭＩ等も利用される傾向にあり、通信部１２は、ＨＤＭＩ端子を備え、これによる有線通信でも良い。 The display unit 8 performs live view display of the subject image for framing based on the image data output from the imaging unit 2, and reproduces and displays the image data recorded in the recording unit 4. The communication unit 12 performs transmission and reception with an external device 20 such as a television. As communication means, wireless LAN, proximity wireless communication, infrared communication, wired communication using a USB cable or the like can be used, and image data and audio data captured by the camera 10 can be transmitted. In recent years, HDMI and the like tend to be used for transmitting images and sounds to a high-definition display, and the communication unit 12 may include an HDMI terminal and may perform wired communication.

テレビやフォトスタンド等の外部機器２０は、信号処理及び制御部２１、通信部２２、表示・再生部２３、表示優先部２４、およびリモコン受信部２５を有する。信号処理及び制御部２１は、カメラ１０の信号処理及び制御部１と同様、外部機器２０専用の信号処理ＬＳＩ等から構成され、外部機器２０全体を制御するとともに、通信部２２を介して受信した画像データや音声データの再生表示の制御を行う。 The external device 20 such as a television or a photo stand includes a signal processing and control unit 21, a communication unit 22, a display / playback unit 23, a display priority unit 24, and a remote control reception unit 25. Similar to the signal processing and control unit 1 of the camera 10, the signal processing and control unit 21 is configured by a signal processing LSI dedicated to the external device 20, etc., and controls the entire external device 20 and received via the communication unit 22. Controls playback and display of image data and audio data.

通信部２２は、カメラ１０との通信を行い、カメラ１０から画像データや音声データを受信する。カメラ１０の通信部１２と同様、無線ＬＡＮ、近接無線通信、赤外線通信、ＵＳＢケーブル、ＨＤＭＩケーブル等による有線通信等による通信が可能である。表示優先部２４は、画像の優先度を判定する。すなわち、カメラ１０に内蔵する表示部８に最初に表示する優先画像か否かの判定を行う。 The communication unit 22 communicates with the camera 10 and receives image data and audio data from the camera 10. Similar to the communication unit 12 of the camera 10, communication by wireless LAN, proximity wireless communication, infrared communication, USB cable, HDMI cable, or the like is possible. The display priority unit 24 determines the priority of the image. That is, it is determined whether or not the image is a priority image to be displayed first on the display unit 8 built in the camera 10.

表示・再生部２３は、薄型の大画面モニタとスピーカを有し、カメラ１０から受信した画像データや音声データの再生表示を行う。再生にあたっては、信号処理及び制御部２１は、表示優先部２４における優先画像か否かの判定結果に応じて、再生制御を行う。なお、外部機器２０がテレビである場合には、通常のテレビ放送等の表示も行う。 The display / playback unit 23 includes a thin large screen monitor and a speaker, and plays back and displays image data and audio data received from the camera 10. In reproduction, the signal processing and control unit 21 performs reproduction control according to the determination result of whether or not the image is the priority image in the display priority unit 24. In addition, when the external device 20 is a television, a normal television broadcast or the like is also displayed.

リモコン受信部２５は、赤外線通信により、リモコン装置より指示信号を受信する。リモコン装置によって、例えば、指定された画像や音声をカメラ１０から受け取ったり、再生したり、中断することが可能となっている。 The remote control receiving unit 25 receives an instruction signal from the remote control device by infrared communication. For example, the remote controller can receive, reproduce, or interrupt a designated image or sound from the camera 10.

次に、図２を用いて、このカメラ１０の使用方法について説明する。ユーザ１５は、図２（ａ）に示すように、カメラ１０を構え、撮影レンズ２ａを通して被写体像を撮影すると共に、ステレオマイク７ａによって、前方からの音声も記録可能となっている。 Next, a method of using the camera 10 will be described with reference to FIG. As shown in FIG. 2A, the user 15 holds the camera 10 and shoots a subject image through the photographic lens 2a, and can also record audio from the front by the stereo microphone 7a.

このようにして撮影された画像や音声は、図２（ｂ）に示すように、カメラ１０の通信部１２、および外部機器２０の通信部２２を介して、外部機器２０に送信される。外部機器２０は、受信した画像や音声を、表示・再生部２３で再生表示する。このように表示される画像は、従来のようなアルバムを見るような鑑賞ではなく、図２（ｃ）に示すように、あたかもインテリアのように表示される画像である。 As shown in FIG. 2B, the image and the sound thus captured are transmitted to the external device 20 via the communication unit 12 of the camera 10 and the communication unit 22 of the external device 20. The external device 20 reproduces and displays the received image and sound on the display / reproduction unit 23. The image displayed in this way is not an appreciation like viewing a conventional album, but an image displayed as if it were an interior, as shown in FIG.

次に、図３を用いて、本実施形態におけるカメラ１０による撮影と音声記録について説明する。図３（ａ）は、カメラ１０によって撮影および音声収録を行っている様子を示している。ユーザ１５は、最初、カメラ１０ｂの位置で撮影を開始し、カメラ１０ａの位置に向けてカメラ１０を動かしている。このとき、カメラ１０ａの位置で画像を撮影すると、図３（ｂ）に示すような画像が得られ、カメラ１０ｂの位置で画像を撮影すると、図３（ｃ）に示すような画像が得られる。 Next, referring to FIG. 3, photographing and sound recording by the camera 10 in the present embodiment will be described. FIG. 3A shows a state in which shooting and audio recording are performed by the camera 10. The user 15 first starts photographing at the position of the camera 10b and moves the camera 10 toward the position of the camera 10a. At this time, when an image is taken at the position of the camera 10a, an image as shown in FIG. 3B is obtained, and when an image is taken at the position of the camera 10b, an image as shown in FIG. 3C is obtained. .

図３（ｂ）（ｃ）に示すような海辺の広々とした情景を、連写または動画で撮影する際に併せて録音すると、カメラ１０の前方の収音範囲３３ａ、３３ｂの音が重点的に収録されることになる。しかし、この範囲では、顔は動かさず、人間の目１５ａ、１５ｂのみを動かすことが可能である。つまり、カメラ１０は画面の移り変わりに応じて、前方の音声を有して収音するが、撮影者の耳１５ｃは、可聴範囲３５の音を聴いている場合がある。 When a wide seaside scene as shown in FIGS. 3B and 3C is recorded together with continuous shooting or moving image recording, sound in the sound collection ranges 33a and 33b in front of the camera 10 is focused. Will be recorded. However, in this range, it is possible to move only the human eyes 15a and 15b without moving the face. In other words, the camera 10 picks up the sound with a forward sound according to the screen change, but the photographer's ear 15 c may be listening to the sound in the audible range 35.

このような状況下で録音した音声が、カメラ１０の動きに応じて落ち着きなく変化すると、図２（ｃ）に示したようなインテリアとして画像と音声を楽しむには相応しくない。そこで、本実施形態においては、収音すべき対象がない限り、収音の指向性はなるべく広くとり、環境音を重視したカメラとしている。図３（ａ）におけるカメラ１０ａの位置や、図３（ｂ）における被写体の人物が何か話している時だけ、収音する範囲を狭め、その声を録音するようにしている。 If the sound recorded under such circumstances changes in a calm manner according to the movement of the camera 10, it is not suitable for enjoying the image and sound as an interior as shown in FIG. Therefore, in the present embodiment, unless there is a target to be picked up, the directivity of picking up the sound is as wide as possible, and the camera emphasizes environmental sound. Only when the position of the camera 10a in FIG. 3 (a) or the subject person in FIG. 3 (b) is talking, the range of sound collection is narrowed and the voice is recorded.

次に、本実施形態における動作を、図４に示すフローチャートを用いて説明する。このフローチャートは、カメラ１０の信号処理及び制御部１が司る。 Next, the operation in the present embodiment will be described with reference to the flowchart shown in FIG. This flowchart is controlled by the signal processing and control unit 1 of the camera 10.

図４に示すカメラ制御のフローに入ると、まず、撮影モードか否かの判定を行う（Ｓ１０１）。このカメラ１０は、撮影モードと再生モードを有している。ステップＳ１０１における判定の結果、撮影モードであった場合には、画像の取り込みを行い、顔検出を行う（Ｓ１０２）。このステップでは、ライブビュー表示用に撮像部２から出力されている画像データを取得し、この取得した画像データを用いて、顔検出部３は顔検出を行う。続いて、画像表示を行う（Ｓ１０３）。ここでは、ステップＳ１０２で取得した画像データに基づいて、表示部８に被写体像を表示する。撮影者はこの被写体像を見ながらフレーミングを行うことができる。 If the camera control flow shown in FIG. 4 is entered, it is first determined whether or not the camera is in shooting mode (S101). The camera 10 has a shooting mode and a playback mode. If the result of determination in step S101 is shooting mode, image capture is performed and face detection is performed (S102). In this step, the image data output from the imaging unit 2 for live view display is acquired, and the face detection unit 3 performs face detection using the acquired image data. Subsequently, image display is performed (S103). Here, the subject image is displayed on the display unit 8 based on the image data acquired in step S102. The photographer can perform framing while viewing the subject image.

画像表示を行うと、次に、顔を検出したか否かの判定を行う（Ｓ１０４）。ステップＳ１０２において顔検出を行っているが、このとき画像の中から顔の部分を検出できたか否かをこのステップで判定する。この判定の結果、顔を検出したと判定した場合には、顔の位置と表情を判定する（Ｓ１０５）。ここで、検出した顔位置は、ピント合わせや露出制御の際に利用する。また、この顔位置に応じて、ステップＳ１１１〜Ｓ１１３において行う収音範囲を初期設定しても良い。 Once image display has been performed, it is next determined whether or not a face has been detected (S104). In step S102, face detection is performed. At this time, it is determined in this step whether or not a face portion has been detected from the image. If it is determined that a face has been detected as a result of this determination, the position and expression of the face are determined (S105). Here, the detected face position is used for focusing and exposure control. Further, the sound collection range performed in steps S111 to S113 may be initially set according to the face position.

顔位置や表情判定を行うと、またはステップＳ１０４における判定の結果において、顔が存在しなかったと判定した場合には、次に、記録を開始するか否かの判定を行う（Ｓ１０６）。ここでは、レリーズ釦の操作状態を検出し、動画撮影やパノラマ撮影等を開始するか否かを判定する。この判定の結果、記録開始でなかった場合には、ステップＳ１０１に戻り、前述の動作を実行する。 If the face position or expression determination is performed, or if it is determined in step S104 that there is no face, it is next determined whether or not to start recording (S106). Here, the operation state of the release button is detected, and it is determined whether to start moving image shooting, panoramic shooting, or the like. If the result of this determination is that recording has not started, processing returns to step S101 and the aforementioned operation is executed.

ステップＳ１０６における判定の結果、記録開始であった場合には、画面中央に所定の大きさ以上の顔または所定数以上の顔が存在するか否かの判定を行う（Ｓ１１１）。ここで、所定以上の大きさの顔として、本実施形態においては、画面の幅の１／５以上の大きさとするが、これ以外の大きさでも良い。顔の大きさや顔の数に応じて、収音範囲の切り換えを行うことから、判定値は主要被写体であるか否かの判定のできる値であれば良い。 If the result of determination in step S106 is that recording has started, it is determined whether or not there is a face larger than a predetermined size or a predetermined number of faces in the center of the screen (S111). Here, in the present embodiment, the face having a predetermined size or more is set to a size of 1/5 or more of the screen width, but may be a size other than this. Since the sound collection range is switched according to the size of the face and the number of faces, the determination value may be any value that can determine whether or not the subject is the main subject.

ステップＳ１１１における判定の結果、画面の中央に所定以上の大きさの顔、または所定数以上の顔が存在していれば、収音範囲を狭くする（Ｓ１１２）。この場合の画像は、人物が主題であり、この人物が何か話した場合に録音できるように、収音範囲を狭くする。一方、ステップＳ１１１における判定の結果が、Ｎｏであった場合には、収音範囲を広くする。ここでは、周囲の環境音を重視した収音を行う。 If the result of determination in step S111 is that there is a face larger than a predetermined size or a predetermined number of faces in the center of the screen, the sound collection range is narrowed (S112). In this case, the sound collection range is narrowed so that the image can be recorded when the person is the subject and the person speaks something. On the other hand, if the result of determination in step S111 is No, the sound collection range is widened. Here, sound collection is performed with emphasis on surrounding environmental sounds.

ステップＳ１１１〜Ｓ１１３における収音の初期設定を行うと、撮影・収音記録を行う（Ｓ１１４）。このサブルーチン内では、画像と音声の記録を連続的に行うと共に、併せて顔検知等を随時行い、この顔検知結果に応じて収音範囲を変更する。撮影・収音動作は、このサブルーチン内において終了判定がなされまで続行する。この撮影・収音記録のサブルーチンについては、図５に示すフローを用いて後述する。 When the sound collection initial setting in steps S111 to S113 is performed, shooting and sound collection are performed (S114). In this subroutine, image and sound are continuously recorded and face detection is performed as needed, and the sound collection range is changed in accordance with the face detection result. The photographing / sound collecting operation continues until an end determination is made in this subroutine. This shooting / sound recording subroutine will be described later using the flow shown in FIG.

ステップＳ１０１における判定の結果、撮影モードが設定されていなかった場合には、再生モードが設定されているか否かの判定を行う（Ｓ１２１）。この判定の結果、再生モードが設定されていなかった場合には、ステップＳ１０１に戻る。一方、ステップＳ１２１における判定の結果、再生モードが設定されていた場合には、再生を行う（Ｓ１２２）。このステップでは、記録部４から記録されている撮影画像を読み出し、表示部８にサムネイル形式で画像を表示し、画像が選択されると、その画像を拡大表示する。また、画像の表示と共に、音声データが一緒に記録されていた場合には、これを再生する。なお、カメラ１０内にスピーカが設けられていない場合には、画像再生のみとし音声再生は行わない。 If the result of determination in step S101 is that shooting mode has not been set, it is determined whether or not playback mode has been set (S121). If the result of this determination is that playback mode has not been set, processing returns to step S101. On the other hand, if the result of determination in step S121 is that playback mode has been set, playback is performed (S122). In this step, the captured image recorded from the recording unit 4 is read out, and the image is displayed on the display unit 8 in the thumbnail format. When an image is selected, the image is enlarged and displayed. Further, when the audio data is recorded together with the display of the image, it is reproduced. If no speaker is provided in the camera 10, only image playback is performed and audio playback is not performed.

再生を行うと、次に、送信を行うか否かの判定を行う（Ｓ１４１）。ここでは、テレビ等の外部機器２０に画像送信するために、送信指示用の操作部材が操作されたか否かの判定を行う。この判定の結果、送信であった場合には、表示画像の送信を行う（Ｓ１４２）。このステップでは、ステップＳ１２２において表示中の画像を、外部機器２０に送信する。なお、複数の画像を選択した場合には、これらの画像をまとめて送信しても良い。表示画像を送信すると、ステップＳ１４１における判定の結果、送信でなかった場合、またはステップＳ１１４における撮影・収音記録が終わると、カメラ制御のフローを終了し、パワーオンのままであれば、ステップＳ１０１に戻り、前述の動作を実行する。 Once playback has been performed, it is next determined whether or not to perform transmission (S141). Here, it is determined whether or not an operation member for transmission instruction has been operated in order to transmit an image to the external device 20 such as a television. If the result of this determination is transmission, a display image is transmitted (S142). In this step, the image being displayed in step S122 is transmitted to the external device 20. When a plurality of images are selected, these images may be transmitted together. When the display image is transmitted, if the result of determination in step S141 is not transmission, or when shooting / sound recording is completed in step S114, the flow of camera control is terminated. Returning to, the above-described operation is executed.

次に、ステップＳ１１４における撮影・収音記録のサブルーチンについて、図５に示すフローチャートを用いて説明する。 Next, the photographing / sound recording subroutine in step S114 will be described with reference to the flowchart shown in FIG.

このフローに入ると、まず、画面中央に顔が存在しているか否かの判定を行う（Ｓ１）。この判定の結果、画面中央に顔が存在していた場合には、画面中央の顔を認識してから所定時間が経過したか否かを判定する（Ｓ２）。ステップＳ１およびＳ２において、画面中央に所定時間にわたって人物の顔があるか否かを判定している。両判定を満足した場合には、その人物が主要被写体であると考えられるので、ステップＳ３以下で収音範囲を狭めるための処理を行う。 When this flow is entered, it is first determined whether or not a face exists in the center of the screen (S1). If the result of this determination is that a face is present at the center of the screen, it is determined whether or not a predetermined time has elapsed since the face at the center of the screen was recognized (S2). In steps S1 and S2, it is determined whether or not there is a person's face in the center of the screen for a predetermined time. If both determinations are satisfied, the person is considered to be the main subject, and therefore processing for narrowing the sound collection range is performed in step S3 and subsequent steps.

ステップＳ２における判定の結果、所定時間が経過すると、現在の収音範囲が広いか否かの判定を行う（Ｓ３）。この判定の結果、収音範囲が広かった場合には、次に、収音範囲が狭範囲側の限界に達しているか否かの判定を行う（Ｓ４）。この判定の結果、限界に達していなければ、表情変化があったか否かの判定を行う（Ｓ５）。顔検出部３は表情変化も検出するので、このステップでは、画面中央の人物の顔に表情変化があったか否かの判定を行う。 If the predetermined time has passed as a result of the determination in step S2, it is determined whether or not the current sound collection range is wide (S3). If the sound collection range is wide as a result of this determination, it is next determined whether or not the sound collection range has reached the limit on the narrow range side (S4). If the result of this determination is that the limit has not been reached, it is determined whether or not there has been a change in facial expression (S5). Since the face detector 3 also detects facial expression changes, in this step, it is determined whether or not there is a facial expression change in the face of the person at the center of the screen.

ステップＳ５における判定の結果、表情変化があった場合には、収音範囲を狭くする（Ｓ７）。一方、判定の結果、表情変化がなかった場合には、収音範囲を徐々に狭くする（Ｓ６）。口を開く等の表情変化があると、主要被写体の人物が何か話す可能性があることから、収音範囲を速く狭くする。一方、表情変化がない場合には、何も話さないかもしれないし、逆に何か話すかもしれないことから、徐々に収音範囲を狭くしている。 If the result of determination in step S5 is facial expression change, the sound collection range is narrowed (S7). On the other hand, if the result of determination is that there is no change in facial expression, the sound collection range is gradually narrowed (S6). If there is a change in facial expression such as opening the mouth, the person who is the main subject may speak something, so the sound collection range is quickly narrowed. On the other hand, when there is no change in facial expression, nothing may be spoken, and conversely something may be spoken, so the sound collection range is gradually narrowed.

ステップＳ１における判定の結果、画面中央に人物の顔が存在していなかった場合には、所定時間が経過したか否かの判定を行う（Ｓ１１）。ここでは、画面中央に顔が存在していなくても、画面中央以外にいる別の人に主題を合わせるシーンを想定しており、ステップＳ１１において所定時間経過後に可聴範囲から画面範囲に収音範囲を狭くするようにしている。 If the result of determination in step S1 is that there is no human face in the center of the screen, it is determined whether or not a predetermined time has passed (S11). Here, a scene is assumed in which the subject is matched with another person outside the center of the screen even if no face is present at the center of the screen, and the sound collection range from the audible range to the screen range after the elapse of a predetermined time in step S11. To narrow.

ステップＳ１１における判定の結果、所定時間が経過すると、次に、現在の収音範囲が狭いか否かを判定する（Ｓ１２）。この判定の結果、収音範囲が狭かった場合には、広範囲側の限界に達しているか否かの判定を行う（Ｓ１３）。この判定の結果、限界に達していなかった場合には、収音範囲を徐々に広くしていく（Ｓ１４）。徐々に広げていくのは、急激な音の変化を抑え、インテリアとして画像を表示するのに相応しい音声再生を可能とするためである。 If the predetermined time has passed as a result of the determination in step S11, it is next determined whether or not the current sound collection range is narrow (S12). If the sound collection range is narrow as a result of this determination, it is determined whether or not the wide-range limit has been reached (S13). If the result of this determination is that the limit has not been reached, the sound collection range is gradually widened (S14). The reason why it is gradually expanded is to suppress a sudden change in sound and to enable sound reproduction suitable for displaying an image as an interior.

ステップＳ２、Ｓ３、Ｓ１１、Ｓ１２における判定の結果がＮｏであった場合には、またはステップＳ４、Ｓ１３における判定の結果がＹｅｓであった場合には、またはステップＳ６、Ｓ７、Ｓ１４における処理を行うと、次に、撮影・収音記録を終了するか否かの判定を行う（Ｓ１０）。前述したように、ステップＳ１０６において、レリーズ釦が操作されることにより、撮影を開始しており、このステップでは、レリーズ釦の操作が終了したか否かの判定を行う。 When the result of determination in steps S2, S3, S11, and S12 is No, or when the result of determination in steps S4 and S13 is Yes, or the processing in steps S6, S7, and S14 is performed. Next, it is determined whether or not to finish shooting / sound recording (S10). As described above, in step S106, shooting is started by operating the release button. In this step, it is determined whether or not the operation of the release button is finished.

ステップＳ１０における判定の結果、終了でなかった場合には、ステップＳ１に戻り、前述の動作を実行する。一方、判定の結果、終了であった場合には、元のフローに戻る。 If the result of determination in step S10 is not end, processing returns to step S1 and the aforementioned operation is executed. On the other hand, if the result of determination is that the processing has ended, the flow returns to the original flow.

以上説明したように、本実施形態に係わるカメラ１０は、撮影画面範囲よりも広い可聴範囲における環境音の収音を重視しており、大きな音の変化を抑えることにより、繰り返し鑑賞に耐えられる画像・音声コンテンツを撮影することが可能としている。また、画面に人物の顔が存在するような場合には、環境音から収音の指向性を狭め人物の話したこと等を記録できるようにしている。さらに、顔の表情も判定し、表情変化がある場合には、収音の指向性を狭めるのを速くし、急に話し出しても録音することが可能としている。 As described above, the camera 10 according to the present embodiment attaches importance to the collection of environmental sound in an audible range wider than the shooting screen range, and an image that can withstand repeated viewing by suppressing a large change in sound.・ It is possible to shoot audio content. In addition, when a person's face is present on the screen, the direction of sound collection is reduced from the environmental sound so that the person's speech can be recorded. Furthermore, facial expressions are also determined, and if there is a change in facial expressions, the directionality of sound collection is reduced, and recording is possible even when suddenly speaking.

なお、本実施形態においては、人物が被写体になる場合、その人物が主被写体になることが多いという統計上の理由から、人物の顔を優先させるようにしていた。しかし、これに限らず、ペットの顔や、鳥のさえずる様子等を判定し、この場合に、収音指向性を限定するようにしても勿論かまわない。 In the present embodiment, when a person becomes a subject, priority is given to the person's face for statistical reasons that the person often becomes the main subject. However, the present invention is not limited to this, and it is of course possible to determine the pet's face, the state of the birds singing, etc., and in this case, the sound collection directivity may be limited.

また、環境音重視の収音はオンオフできるようにしても良い。オンオフできるようにすることにより、例えば、列車が通り過ぎ、余韻を残すようなシーンでは、環境音重視の設定を解除し、列車が通り過ぎていく様子を録音できるように、マイクの指向性を狭くし、ステレオ感を強調することもできる。 In addition, sound collection that emphasizes environmental sound may be turned on and off. By enabling it to turn on and off, for example, in a scene where the train passes and leaves a lingering sound, the setting for emphasizing environmental sound is canceled and the microphone directivity is narrowed so that the train can be recorded as it passes. Also, the stereo feeling can be emphasized.

次に、収音範囲を変化させるための音声収録部７の構成と動作について説明する。音声収録部７は、図６に示すように、ステレオマイク７ａ、ＡＤ変換器４２、加算・乗算器４３から構成される。 Next, the configuration and operation of the sound recording unit 7 for changing the sound collection range will be described. The audio recording unit 7 includes a stereo microphone 7a, an AD converter 42, and an adder / multiplier 43, as shown in FIG.

ステレオマイク７ａは、右側マイク４１ａと左側マイク４１ｂとから構成され、カメラ本体１０の前面側に配置される。ステレオマイク７ａはＡＤコンバータ４２に接続され、音声信号がデジタル化される。すなわち、右側マイク４１ａはＡＤコンバータ４２ａに、また左側マイク４１ｂはＡＤコンバータ４２ｂに、それぞれ接続されデジタル音声データを出力する。 The stereo microphone 7a includes a right microphone 41a and a left microphone 41b, and is disposed on the front side of the camera body 10. The stereo microphone 7a is connected to the AD converter 42, and the audio signal is digitized. That is, the right microphone 41a is connected to the AD converter 42a, and the left microphone 41b is connected to the AD converter 42b to output digital audio data.

ＡＤコンバータ４２の出力端は、加算・乗算器４３に接続され、左右の音声の差分が演算される。すなわち、右側マイク４１ａの音声データを出力するＡＤコンバータ４２ａは、加算器４３ａのマイナス側入力端と、加算器４３ｄのプラス側入力端に接続される。また、左側マイク４１ｂの音声データを出力するＡＤコンバータ４２ｂは、加算器４３ａのプラス側入力端と、加算器４３ｄのマイナス側入力端に接続される。 The output terminal of the AD converter 42 is connected to an adder / multiplier 43 to calculate the difference between the left and right sounds. That is, the AD converter 42a that outputs the audio data of the right microphone 41a is connected to the minus input terminal of the adder 43a and the plus input terminal of the adder 43d. The AD converter 42b that outputs the audio data of the left microphone 41b is connected to the plus side input terminal of the adder 43a and the minus side input terminal of the adder 43d.

加算器４３ａの出力は乗算器４３ｂの入力端に接続され、加算器４３ｄの出力端は乗算器４３ｅの入力端に、それぞれ接続される。乗算器４３ｂと乗算器４３ｅの制御端は、信号処理及び制御部１に接続され、乗算器４３ｂ、４３ｅのゲインを入力する。加算器４３ｃの入力端は、ＡＤコンバータ４２ａの出力端と乗算器４３ｂの出力端が接続される。加算器４３ｆの入力端は、ＡＤコンバータ４２ｂの出力端と、乗算器４３ｅの出力端が接続される。 The output of the adder 43a is connected to the input terminal of the multiplier 43b, and the output terminal of the adder 43d is connected to the input terminal of the multiplier 43e. Control ends of the multiplier 43b and the multiplier 43e are connected to the signal processing and control unit 1, and input gains of the multipliers 43b and 43e. The input terminal of the adder 43c is connected to the output terminal of the AD converter 42a and the output terminal of the multiplier 43b. The input terminal of the adder 43f is connected to the output terminal of the AD converter 42b and the output terminal of the multiplier 43e.

加算・乗算器４３の出力端は、音声収録部７としての出力部であり、記録部４に接続される。すなわち、加算器４３ｃの出力端と、加算器４３ｆの出力端は、それぞれ、右側音声データ、左側音声データを出力し、これらの出力端を介して各音声データは記録部４に記録される。 The output terminal of the adder / multiplier 43 is an output unit as the audio recording unit 7 and is connected to the recording unit 4. That is, the output terminal of the adder 43c and the output terminal of the adder 43f output right audio data and left audio data, respectively, and each audio data is recorded in the recording unit 4 via these output terminals.

このように音声収録部７は構成されており、ステレオ入力した音声データの左右のバランスを制御し、音声の指向性を狭くしたり、広くしたりすることができる。音声収録部７の２つのマイク４１ａ、４１ｂによって入力した音声信号は、ＡＤコンバータ４２ａ、４２ｂによってデジタル音声データに変換され、加算器４３ａによって、（左側の音声データ）−（右側の音声データ）が演算され、加算器４３ｄによって、（右側の音声データ）−（左側の音声データ）が演算される。すなわち、加算器４３ａ、４３ｂによって、左右の音声データの差分が演算される。ここで、演算された差分は左右の音の差異であり、この差異を減らすことにより中央部の音を強調することができ、この加算演算はそのための前処理である。 Thus, the audio recording unit 7 is configured, and the left / right balance of the audio data input in stereo can be controlled, and the directivity of the audio can be narrowed or widened. Audio signals input by the two microphones 41a and 41b of the audio recording unit 7 are converted into digital audio data by the AD converters 42a and 42b, and (left audio data) − (right audio data) is converted by the adder 43a. Then, the adder 43d calculates (right audio data) − (left audio data). That is, the difference between the left and right audio data is calculated by the adders 43a and 43b. Here, the calculated difference is a difference between left and right sounds, and by reducing this difference, the sound at the center can be emphasized, and this addition calculation is a preprocessing for that purpose.

加算器４３ａ、４３ｄで求められた差分は、それぞれ乗算器４３ｂ、４３ｅにおいて信号処理及び制御部１からのゲインに基づいて乗算し、この乗算結果を、加算器４３ｃ、４３ｆにおいて、右側の音声データと左側の音声データに、それぞれ加算する。なお、加算器４３ａ、４３ｄの出力がマイナスなので、実質的には減算することになる。このため、加算器４３ｃ、４３ｆから出力される左右の音声データは、左右の広がりを抑えた音声出力となる。ここで、乗算器４３ｂ、４３ｅにおけるゲインを大きくすれば、広がり感をなくすことができ、ゲインを小さくすれば広がり感を広げることができる。信号処理及び制御部１は、ステップＳ６、Ｓ７、Ｓ１４のタイミングにおいて、乗算器４３ｂ、４３ｅに対してゲインを制御することにより、広がり感を変えることができる。 The differences obtained by the adders 43a and 43d are multiplied by multipliers 43b and 43e based on the gain from the signal processing and control unit 1, respectively, and the multiplication results are added to the right audio data by the adders 43c and 43f. And the left audio data are added respectively. Since the outputs of the adders 43a and 43d are negative, the subtraction is practically performed. For this reason, the left and right audio data output from the adders 43c and 43f is an audio output in which the left and right spread is suppressed. Here, if the gains in the multipliers 43b and 43e are increased, the feeling of spreading can be eliminated, and if the gain is reduced, the feeling of spreading can be increased. The signal processing and control unit 1 can change the sense of spread by controlling the gain for the multipliers 43b and 43e at the timings of steps S6, S7, and S14.

このように、本実施形態においては、一対の同じ性能のマイクを用いて、収音の範囲を広げたり、狭めたりすることができる。指向性が広い場合には雰囲気の豊かな環境音を豊富に取り入れることができ、また指向性が狭い場合には、特定の被写体にフォーカスした音声を記録することができる。 Thus, in this embodiment, the range of sound collection can be expanded or narrowed using a pair of microphones having the same performance. When the directivity is wide, abundant ambient environmental sounds can be taken in abundantly, and when the directivity is narrow, sound focused on a specific subject can be recorded.

なお、同じ性能の一対のマイクを設ける以外にも、例えば、指向性の広いマイクと、指向性の狭いマイクを設けるようにしても良い。また、再生時に必ずしもステレオである必要はなく、単に画面中央部の音を強調したり、強調しないように切り換える等、行っても良い。さらに、左右のみ２チャンネルステレオに限定されるものではなく、５．１ｃｈ等の録音を行うようにしても良い。さらに、左右に限らず、上下用のマイクを設けるようにしても良い。 In addition to providing a pair of microphones having the same performance, for example, a microphone having a wide directivity and a microphone having a narrow directivity may be provided. Further, it is not necessarily stereo at the time of reproduction, and the sound in the center of the screen may be simply emphasized or switched so as not to be emphasized. Further, the recording is not limited to the two-channel stereo on the left and right, and 5.1 ch or the like may be recorded. Further, not only the right and left but also a vertical microphone may be provided.

以上説明したように、本発明の第１実施形態によれば、撮影視野の変更に伴う音声の不要な変化を抑え、落ち着いて再生表示できる画像・音声コンテンツの撮影可能なカメラを提供することができる。また、撮影時の環境の雰囲気を伝えるために、撮影者の目と耳の指向性の差異を考慮し、そのときの情景を視覚でも聴覚でも思い出せるようにしている。 As described above, according to the first embodiment of the present invention, it is possible to provide a camera capable of shooting image / sound content that can suppress an unnecessary change in sound accompanying a change in field of view and can reproduce and display calmly. it can. In addition, in order to convey the atmosphere of the environment at the time of shooting, the difference between the directivity of the photographer's eyes and ears is taken into consideration so that the scene at that time can be remembered both visually and auditorily.

次に、本発明の第２実施形態について、図７乃至図９を用いて説明する。本発明の第１実施形態においては、カメラ１０の画像から被写体中の顔を検出して環境音を重視した音声収録となる広範囲での収音にするか、主被写体の音声を収録するように指向性を狭めていた。第２実施形態においては、カメラの背面にも撮影者を撮像するサブカメラ（背面カメラ）を配置し、撮影者の注視方向を考慮して指向性を切り換えるようにしている。本実施形態における構成は、第１実施形態の構成と大部分が重複しているので、相違点を中心に説明し、同様の構成については同一の符号を付して説明を省略する。 Next, a second embodiment of the present invention will be described with reference to FIGS. In the first embodiment of the present invention, the face in the subject is detected from the image of the camera 10 and the sound is recorded over a wide range, which is sound recording with an emphasis on environmental sound, or the sound of the main subject is recorded. It narrowed the directivity. In the second embodiment, a sub camera (rear camera) that images the photographer is also arranged on the back of the camera, and the directivity is switched in consideration of the gaze direction of the photographer. Since the configuration of the present embodiment largely overlaps with the configuration of the first embodiment, the description will focus on the differences, and the same reference numerals will be given to the same configuration, and description thereof will be omitted.

図７は、第２実施形態におけるカメラ１０の構成を示すブロック図である。図１に示した第１実施形態の構成と異なるのは、撮像部２ａ（背面カメラ）を有している点である。この撮像部２ａは、光学系や撮像素子を有し、図８に示すように、カメラ１０の背面に配置されている。撮像部２ａは、撮影者１６の顔の画像を撮像し、信号処理及び制御部１と顔検出部３に出力する。顔検出部３は、撮像部２ａから画像データを入力し、撮影者１６の注視方向を検出する。 FIG. 7 is a block diagram illustrating a configuration of the camera 10 according to the second embodiment. The difference from the configuration of the first embodiment shown in FIG. 1 is that it has an imaging unit 2a (rear camera). The imaging unit 2a includes an optical system and an imaging element, and is disposed on the back surface of the camera 10 as shown in FIG. The imaging unit 2 a captures an image of the face of the photographer 16 and outputs the image to the signal processing and control unit 1 and the face detection unit 3. The face detection unit 3 receives the image data from the imaging unit 2 a and detects the gaze direction of the photographer 16.

この第２実施形態においては、図８に示すように、カメラ１０を向けた方向３７ではなく、別の方向３８に、例えば、鳥のさえずりが聞こえてきた場合に、画面中央の収音を弱め、環境音を重視し、鳥の声を効果的に記録することも可能である。すなわち、撮像部（背面カメラ）２ａと顔検出部３によって、撮影者１６の顔検知を行い、撮影者１６が別の方向３８を見ていた場合には環境音を重視するように、一方、撮影者１６が表示部８の方向３９を見ていたい場合には、収音の指向性が狭くなるように、音声収録部７による音声収録範囲の制御を行う。 In the second embodiment, as shown in FIG. 8, for example, when a bird's song is heard in another direction 38 instead of the direction 37 in which the camera 10 is directed, the sound collection at the center of the screen is weakened. It is also possible to record the bird's voice effectively with emphasis on environmental sounds. That is, when the photographer 16 detects the face of the photographer 16 by the imaging unit (rear camera) 2a and the face detection unit 3 and the photographer 16 looks at another direction 38, the environmental sound is emphasized. When the photographer 16 wants to see the direction 39 of the display unit 8, the audio recording range is controlled by the audio recording unit 7 so that the directivity of sound collection is narrowed.

次に、本実施形態の動作について、図９に示すフローチャートを用いて説明する。第１実施形態における図４に示したカメラ制御のフローは共通であり、ステップＳ１１４の撮影・収音記録のサブルーチンを図９に置き換えただけである。図９のフローでは、撮像部（背面カメラ）２ａによって、撮影者１６の興味がどこにあるかを判定し、収音範囲や方向を切替えるようにしている。 Next, the operation of the present embodiment will be described using the flowchart shown in FIG. The camera control flow shown in FIG. 4 in the first embodiment is common, and only the shooting / sound recording subroutine in step S114 is replaced with FIG. In the flow of FIG. 9, the imaging unit (rear camera) 2a determines where the photographer 16 is interested and switches the sound collection range and direction.

図９に示す撮影・収音記録のフローに入ると、まず、前方注視しているか否かの判定を行う（Ｓ１ａ）。このステップでは、撮像部２ａからの画像データに基づいて、顔検出部３が撮影者１６の注視方向を検出するので、この検出結果に応じて判定する。この判定の結果、撮影者１６が、前方を注視していた場合、すなわち表示部８の方向３９を注視していた場合には、次に、所定時間が経過しているか否かを判定する（Ｓ２）。このステップでは、撮影者１６がカメラ１０の表示部８を注視するようになってからの経過時間が所定時間を経過したか否かの判定を行う。単に、一時的に撮影者１６が前方の表示部８を見るだけの場合もあることから、所定時間の間、注視していたか否かを判定している。 When the shooting / sound recording flow shown in FIG. 9 is entered, it is first determined whether or not the user is gazing forward (S1a). In this step, since the face detection unit 3 detects the gaze direction of the photographer 16 based on the image data from the imaging unit 2a, the determination is made according to the detection result. As a result of this determination, if the photographer 16 is gazing forward, that is, gazing at the direction 39 of the display unit 8, it is next determined whether or not a predetermined time has elapsed ( S2). In this step, it is determined whether or not a predetermined time has elapsed since the photographer 16 began to gaze at the display unit 8 of the camera 10. In some cases, the photographer 16 merely temporarily looks at the front display unit 8, so it is determined whether or not the user has been gazing for a predetermined time.

ステップＳ２における判定の結果、所定時間が経過した場合には、撮影者１６がカメラ１０の表示部８を所定時間に亘って注視していたことから、撮影者１６は画面の中だけに興味があるとして、収音範囲を視野に合わせて狭めていく。まず、現在の収音範囲が広いか否かの判定を行う（Ｓ３）。この判定の結果、収音範囲が広かった場合には、次に、収音範囲が広範囲側の限界に達しているか否かの判定を行う（Ｓ４）。この判定の結果、限界に達していなければ、収音範囲を徐々に画面中央に狭くしていく（Ｓ６ａ）。 If the result of determination in step S2 is that a predetermined time has elapsed, the photographer 16 was interested in the display unit 8 of the camera 10 for a predetermined time. If there is, narrow the sound collection range according to the field of view. First, it is determined whether or not the current sound collection range is wide (S3). If the sound collection range is wide as a result of this determination, it is next determined whether or not the sound collection range has reached the wide-range limit (S4). If the limit is not reached as a result of this determination, the sound collection range is gradually narrowed to the center of the screen (S6a).

また、ステップＳ３における判定の結果、収音範囲が広くなかった場合には、収音範囲が周辺か否かを判定する（Ｓ８）。例えば、鳥を注視していた場合、この鳥が飛び去ってしまい、見えなくなると、撮影者は、再度、表示部８の画面モニタを見るので、そのときには、ステップＳ３における判定は、収音範囲は狭いと判定される。このあと、ステップＳ８の判定を行い、その結果、収音範囲が周囲となり、後述するステップＳ１３に進み、ステップＳ１４において、収音範囲を徐々に広くし、その後、徐々に収音範囲を狭くする。一方、ステップＳ８における判定の結果、収音範囲が周辺でなかった場合には、ステップＳ１０に進む。 If the sound collection range is not wide as a result of the determination in step S3, it is determined whether or not the sound collection range is the periphery (S8). For example, when a bird is watched, if the bird flies away and disappears, the photographer looks at the screen monitor of the display unit 8 again. In this case, the determination in step S3 is the sound collection range. Is determined to be narrow. Thereafter, the determination in step S8 is performed. As a result, the sound collection range becomes ambient, and the process proceeds to step S13 described later. In step S14, the sound collection range is gradually widened, and then the sound collection range is gradually narrowed. . On the other hand, if the result of determination in step S8 is that the sound collection range is not surrounding, the process proceeds to step S10.

ステップＳ１ａにおける判定の結果、前方を注視していなかった場合、すなわち、撮影者１６が表示部８を見ていなかった場合には、次に、収音範囲が狭いか否かを判定する（Ｓ１２）。前方を注視していない場合には、撮影者の興味は画面外にもあることを示しており、ステップＳ１３以下において収音範囲を広げ、環境音を重視した録音を行う。ステップＳ１２における判定の結果、収音範囲が狭かった場合には、広範囲側の限界か否かを判定する（Ｓ１３）。この判定の結果、限界に達していなければ、収音範囲を徐々に広げる（Ｓ１４）。 If the result of determination in step S1a is that the front has not been watched, that is, if the photographer 16 has not viewed the display unit 8, it is next determined whether or not the sound collection range is narrow (S12). ). When the front is not watched, it indicates that the photographer is interested in the outside of the screen. In step S13 and subsequent steps, the sound collection range is expanded and recording is performed with emphasis on environmental sound. If the result of determination in step S12 is that the sound collection range is narrow, it is determined whether or not it is a wide range limit (S13). If the limit is not reached as a result of this determination, the sound collection range is gradually expanded (S14).

ステップＳ１３における判定の結果、限界に達していた場合には、注視方向が一定か否かの判定を行う（Ｓ２１）。ここでは、撮影者１６の顔を撮像部２ａによってモニタし、顔検出部３による検出の結果、同じ方向を見続けているか否かを判定する。この判定の結果、注視方向が一定の場合には、注視方向を収音する（Ｓ２２）。このステップでは、注視している方向の収音を強調する。すなわち、注視方向が右側であれば、右側マイク４１ａの収音を強調し、注視方向が左側であれば、左側マイク４１ｂの収音を強調する。 If the result of determination in step S13 is that the limit has been reached, it is determined whether or not the gaze direction is constant (S21). Here, the face of the photographer 16 is monitored by the imaging unit 2a, and it is determined whether or not the face detection unit 3 continues to look in the same direction as a result of detection. If the gaze direction is constant as a result of this determination, the gaze direction is picked up (S22). In this step, the sound collection in the direction of gaze is emphasized. That is, if the gaze direction is the right side, the sound collection of the right microphone 41a is emphasized, and if the gaze direction is the left side, the sound collection of the left microphone 41b is emphasized.

ステップＳ２、Ｓ４、Ｓ８、Ｓ１２、Ｓ２１における判定がＮｏであった場合、またはステップＳ１３、Ｓ２２における処理を実行すると、次に、終了か否かの判定を行う（Ｓ１０）。ここでは、第１実施形態と同様、レリーズ釦の操作状態を検出し、これに基づいて判定する。この判定の結果、終了でなかった場合には、ステップＳ１ａに戻り、前述の動作を行い、一方、判定の結果、終了であった場合には、元のフローに戻る。 If the determinations in Steps S2, S4, S8, S12, and S21 are No, or if the processes in Steps S13 and S22 are executed, it is next determined whether or not the process is finished (S10). Here, as in the first embodiment, the operation state of the release button is detected and a determination is made based on this. If the result of this determination is not end, the process returns to step S1a to perform the above-described operation. On the other hand, if the result of determination is end, the process returns to the original flow.

このように、本実施形態においては、画像と音声を別々にし、撮影者が映しかった画像と聴きたかった音声を記録することが可能となる。したがって、撮影者が見ているものと聴いているものが異なるようなシーンであっても、撮影者の意図に沿った撮影を行うことができる。例えば、木の葉がそよいでいる様子を映しながら、別の木に停まっている鳥のさえずりを聴いている状況はよくあり、このような状況を的確に記録することができる。なお、本実施形態と第１実施形態を組み合わせ、例えば、画面中央に人物がおり、撮影者が別の場所を見ている場合であっても、中央を重視するような収音を行うことが考えられる。 As described above, in the present embodiment, it is possible to separate the image and the sound and record the image that the photographer has shown and the sound that the photographer wanted to hear. Therefore, even in a scene where the photographer is watching and what is being listened to can be photographed according to the photographer's intention. For example, there is often a situation where a bird standing on another tree is being listened to while the leaves of the tree are shining, and such a situation can be accurately recorded. It should be noted that the present embodiment and the first embodiment are combined, and for example, even when there is a person in the center of the screen and the photographer is looking at another place, sound collection that emphasizes the center can be performed. Conceivable.

次に、本発明の第３実施形態について、図１０および図１１を用いて説明する。本発明の第１および第２実施形態では、撮影時に、周囲の音も含めて広範囲（環境音重視）で行う収音か、被写体に向けて指向性を狭くする収音との間で収音を制御していた。この第３実施形態においては、撮影時にはステレオで録音し、音声の再生時に、環境音重視で再生するか、被写体に向け指向性を狭くした再生かを、画面のシーンやユーザのしぐさを判定し、切り換えるようにしている。 Next, a third embodiment of the present invention will be described with reference to FIGS. 10 and 11. In the first and second embodiments of the present invention, at the time of shooting, sound is collected between a sound collected in a wide range including ambient sounds (environmental sound emphasis) or a sound collected with a narrow directivity toward the subject. Was controlling. In this third embodiment, recording is performed in stereo at the time of shooting, and at the time of sound playback, whether the playback is performed with emphasis on environmental sound or playback with a narrow directivity toward the subject is determined based on the scene on the screen and the user's gesture. , Switching.

図１０は、音声収録部７の構成を示すブロック図である。この音声収録部７は再生時には、左右の音声再生のバランスを調整する。図６に示した第１実施形態における構成と比較し、ＡＤコンバータ４２と加算・乗算器４３の間に記録部４を接続している点が相違しているが、各回路内の構成は、第１実施形態における音声収録部７と同じである。 FIG. 10 is a block diagram showing the configuration of the audio recording unit 7. This audio recording unit 7 adjusts the balance between left and right audio reproduction during reproduction. Compared with the configuration in the first embodiment shown in FIG. 6, the point that the recording unit 4 is connected between the AD converter 42 and the adder / multiplier 43 is different, but the configuration in each circuit is as follows. This is the same as the sound recording unit 7 in the first embodiment.

すなわち、右側マイク４１ａの音声信号をＡＤ変換するＡＤコンバータ４２ａの出力端は記録部４に接続され、このＡＤコンバータ４２ａによってＡＤ変換された音声データは、加算器４３ａ、４３ｃ、４３ｄに出力される。また、左側マイク４１ｂの音声信号をＡＤ変換するＡＤコンバータ４２ｂの出力端は記録部４に接続され、このＡＤコンバータ４２ｂによってＡＤ変換された音声データは、加算器４３ａ、４３ｄ、４３ｆに出力される。 That is, the output end of the AD converter 42a that AD converts the audio signal of the right microphone 41a is connected to the recording unit 4, and the audio data AD-converted by the AD converter 42a is output to the adders 43a, 43c, and 43d. . The output end of the AD converter 42b that AD converts the audio signal of the left microphone 41b is connected to the recording unit 4, and the audio data AD-converted by the AD converter 42b is output to the adders 43a, 43d, and 43f. .

上述した第１および第２実施形態においては、音声収録部７は撮影時に収音範囲を変更していたが、本実施形態においては、撮影時には、ステレオマイク７ａからの音声信号をＡＤコンバータによってデジタル化し、この音声データの収音範囲を変更することなく、そのまま記録部４に記録する。そして、再生時に記録部４から読み出された音声データに基づいて、加算・乗算器４３によって音声再生のバランスを制御している。 In the first and second embodiments described above, the sound recording unit 7 changes the sound collection range at the time of shooting. In this embodiment, at the time of shooting, the sound signal from the stereo microphone 7a is digitally converted by an AD converter. And the sound data is recorded in the recording unit 4 as it is without changing the sound collection range. The balance of audio reproduction is controlled by the adder / multiplier 43 based on the audio data read from the recording unit 4 during reproduction.

次に、本実施形態における動作について図１１に示すフローチャートを用いて説明する。このフローに入ると、ステップＳ１０１〜Ｓ１０４は、図４に示したカメラ制御のフローと同様であるので、詳しい説明を省略するが、撮影モードであった場合には、画像データを取り込み（Ｓ１０２）、この画像データを用いてライブビュー表示を行い（Ｓ１０３）、また顔判定を行う（Ｓ１０４）。 Next, the operation in this embodiment will be described with reference to the flowchart shown in FIG. Upon entering this flow, steps S101 to S104 are the same as the camera control flow shown in FIG. 4 and will not be described in detail. However, in the shooting mode, image data is captured (S102). Then, live view display is performed using this image data (S103), and face determination is performed (S104).

ステップＳ１０４における顔判定の結果、顔が存在していれば、その顔の位置を判定する（Ｓ１０５ｂ）。顔の位置判定を行うと、または顔判定の結果、顔が存在しなかった場合には、第１実施形態と同様、レリーズ釦が操作状態を判定し、記録を開始するか否かの判定を行う（Ｓ１０６）。この判定の結果、記録開始でなかった場合には、ステップＳ１０１に戻り、前述の動作を実行する。 As a result of the face determination in step S104, if a face exists, the position of the face is determined (S105b). When the face position is determined, or when the result of the face determination is that there is no face, the release button determines the operation state and determines whether to start recording, as in the first embodiment. Perform (S106). If the result of this determination is that recording has not started, processing returns to step S101 and the aforementioned operation is executed.

ステップＳ１０６における判定の結果、記録開始であった場合には、正面撮影でステレオ録音を開始する（Ｓ１１５）。このステップでの撮影は、カメラ１０の正面の方向を撮影する正面撮影であり、動画撮影やパノラマ撮影等、連続的な撮影による画像データを記録部４に記録する。また、画像の記録と併せてステレオマイク７ａによって、左右の音を別々に記録部４に記録する。 If the result of determination in step S 106 is that recording has started, stereo recording is started with front shooting (S 115). The shooting in this step is a front shooting in which the front direction of the camera 10 is shot, and image data obtained by continuous shooting such as moving image shooting or panoramic shooting is recorded in the recording unit 4. In addition to the image recording, the left and right sounds are separately recorded in the recording unit 4 by the stereo microphone 7a.

正面撮影およびステレオ録音と共に、背面画像特徴記録も行う（Ｓ１１６）。ここでは、撮像部２ａから出力されるカメラ１００の背面側の撮影者の画像データに基づいて、撮影者の変化の特徴、例えば、どこを見ているか等について検出し、この検出された変化の特徴を記録部４に記録する。 Along with the front photographing and the stereo recording, the rear image feature recording is also performed (S116). Here, based on the image data of the photographer on the back side of the camera 100 output from the imaging unit 2a, the characteristics of the photographer's change, for example, where the user is looking, and the like are detected, and the detected change of the photographer is detected. The feature is recorded in the recording unit 4.

続いて、記録終了か否かの判定を行う（Ｓ１１９）。このステップでは、レリーズ釦の操作状態を検出し、これに基づいて判定する。この判定の結果、記録終了でなかった場合には、ステップＳ１１５に戻り、撮影を続行する。一方、ステップＳ１１９における判定の結果、記録終了であった場合には、カメラ制御のフローを終了し、再び、ステップＳ１０１から実行する。 Subsequently, it is determined whether or not the recording is finished (S119). In this step, the operation state of the release button is detected and a determination is made based on this. If the result of this determination is that recording has not ended, processing returns to step S115 and imaging is continued. On the other hand, if the result of determination in step S119 is that recording has been completed, the camera control flow is terminated, and processing is executed again from step S101.

ステップＳ１０１における判定の結果、撮影モードでなかった場合には、再生モードか否かの判定を行う（Ｓ１２１）。この判定の結果、再生モードが設定されていなかった場合には、ステップＳ１０１に戻る。一方、再生モードが設定されていた場合には、撮影者が注視しているか否かの判定を行う（Ｓ１３１）。ステップＳ１１６において背面画像を用いて撮影者の注視方向が記録されているので、このステップでは、この記録されている注視方向に基づいて、撮影者が表示部８のモニタ画面を注視しているか否かを判定する。 If the result of determination in step S101 is not shooting mode, it is determined whether or not playback mode is in effect (S121). If the result of this determination is that playback mode has not been set, processing returns to step S101. On the other hand, if the playback mode is set, it is determined whether or not the photographer is gazing (S131). In step S116, the photographer's gaze direction is recorded using the back image. In this step, whether or not the photographer is gazing at the monitor screen of the display unit 8 based on the recorded gaze direction. Determine whether.

ステップＳ１３１における判定の結果、撮影者が注視していなかった場合には、画像の再生と共に広い範囲の音を再生する（Ｓ１３３）。このステップでは、環境音重視で、すなわちステレオマイク７ａで収音した幅広い範囲の音声を再生する。一方、ステップＳ１３１における判定の結果、撮影者が注視していた場合には、画像の再生と共に注視方向を強調した音声再生を行う（Ｓ１３２）。すなわち、撮影者が注視していた画面の方向に音源があるように、音性収録部によって左右の音量のバランスの制御を行う。このような再生時に音を調整する方法であれば、中央の人物が話し出した時に収音が間に合わず遅れてしまうという不具合を防止することができる。 As a result of the determination in step S131, when the photographer is not gazing, a wide range of sounds is reproduced together with the reproduction of the image (S133). In this step, a wide range of sounds picked up by the stereo microphone 7a is reproduced with emphasis on environmental sounds. On the other hand, if the photographer is gazing as a result of the determination in step S131, sound reproduction with emphasis on the gaze direction is performed along with image reproduction (S132). That is, the left and right volume balance is controlled by the sound recording unit so that the sound source is in the direction of the screen on which the photographer is gazing. With such a method of adjusting the sound during reproduction, it is possible to prevent a problem that the collected sound is not in time when the central person speaks and is delayed.

ステップＳ１３２やＳ１３３における音声再生を行うと、次に、再生の終了か否かの判定を行う（Ｓ１３４）。この判定の結果、再生終了でなければ、ステップＳ１３１に戻り、再生を続行する。一方、判定の結果、再生終了であった場合には、次に、送信するか否かの判定を行う（Ｓ１４１）。このステップでは、ステップＳ１２１以下において再生表示をした画像を、外部機器２０において再生表示すべく送信するか否かの判定である。 Once the audio reproduction is performed in step S132 or S133, it is next determined whether or not the reproduction is finished (S134). If the result of this determination is that playback has not ended, processing returns to step S131 and playback continues. On the other hand, if the result of determination is that playback has ended, it is next determined whether or not to transmit (S141). In this step, it is determined whether or not the image reproduced and displayed in step S121 and after is transmitted to be reproduced and displayed on the external device 20.

ステップＳ１４１における判定の結果、送信であった場合には、表示画像の送信を行う（Ｓ１４２）。ここでは、通信部１２を介してテレビやフォトスタンド等の外部機器２０に、選択された表示画像を送信する。これによって、図２（ｃ）に示すような状況で、画像や音声をインテリアとして飾れるコンテンツとすることができる。このコンテンツは、音声も再現するので、思い出の追想にふさわしいものとなっている。 If the result of determination in step S141 is transmission, a display image is transmitted (S142). Here, the selected display image is transmitted to the external device 20 such as a television or a photo stand via the communication unit 12. Thereby, in the situation shown in FIG. 2 (c), it is possible to obtain a content that can decorate an image or sound as an interior. This content also reproduces audio, making it suitable for recollection of memories.

なお、画像の送信前に、環境音か指向性の狭くすることや、音声の左右のバランス等、音声再生についてユーザが補正できるようにしても良い。表示画像を送信すると、または、ステップＳ１４１における判定の結果、送信でなかった場合には、カメラ制御のフローを終了し、再び、ステップＳ１０１から実行する。 It should be noted that the user may be able to correct the sound reproduction before transmitting the image, such as by reducing the directivity of the environmental sound or by adjusting the left / right balance of the sound. If the display image is transmitted or if the result of determination in step S141 is not transmission, the camera control flow is terminated, and the process is executed again from step S101.

このように、本発明の第３実施形態においては、撮影時には、音声については、そのままステレオ録音し、再生時に、左右の音量のバランスや音源の位置を制御するようにしている。このため、撮影場所において急激な状況変化があっても、撮影後に適切に調節を行うことができる。 As described above, in the third embodiment of the present invention, the sound is recorded in stereo as it is at the time of shooting, and the balance between the left and right volumes and the position of the sound source are controlled during playback. For this reason, even if there is a sudden change in the situation at the shooting location, it is possible to adjust appropriately after shooting.

次に、本発明の第４実施形態について、図１２乃至図１４を用いて説明する。第１〜第３実施形態においては、被写体像の中に顔部分が含まれているか否か、または撮影者の注視している位置に基づいて、音声の収音範囲や再生時の音源位置等の音声制御を行っていた。本実施形態においては、被写体が注視している方向を検出し、この検出結果に基づいて音声制御を行うようにしている。 Next, a fourth embodiment of the present invention will be described with reference to FIGS. In the first to third embodiments, based on whether or not a face portion is included in the subject image, or the position where the photographer is gazing, the sound collection range, the sound source position at the time of reproduction, etc. Voice control was performed. In the present embodiment, the direction in which the subject is gazing is detected, and voice control is performed based on the detection result.

例えば、図１２（ｂ）に示すようなシーンを想定してみる。このシーンでは、被写体１７は、海を見ており波の砕ける音を聴いている。カメラ１０ａの位置で撮影した画像は、画角３１ｃを考慮すると、撮影画像３２となる。このような状況において、被写体１７が聴いている音、この例では、波の砕ける音が記録されていることが望ましい。 For example, assume a scene as shown in FIG. In this scene, the subject 17 is watching the sea and listening to the sound of breaking waves. An image captured at the position of the camera 10a becomes a captured image 32 in consideration of the angle of view 31c. In such a situation, it is desirable to record the sound that the subject 17 is listening to, in this example, the sound of breaking waves.

そこで、本実施形態では、撮影者と被写体１７が聴いている音を記録するようにしている。カメラ１０ａ、ユーザ（撮影者）の耳１５ｃ、被写体１７、被写体の聴いている波１８が、図１２に示した位置関係にある場合であって、収音範囲３３ｃが図示した範囲であれば、この収音範囲３３ｃが、ユーザ（撮影者）１５と被写体１７が共通して聴いている範囲となる。図１２の例では、ステレオ録音する場合に、右側（被写体１７が見ている方向）の音声を強調して記録するか、または再生時に右側の音声を強調して再生すれば良い。 Therefore, in this embodiment, the sound that the photographer and the subject 17 are listening to is recorded. If the camera 10a, the user's (photographer's) ear 15c, the subject 17, and the wave 18 being listened to by the subject are in the positional relationship shown in FIG. 12, and the sound collection range 33c is the range shown in the figure, This sound collection range 33c is a range in which the user (photographer) 15 and the subject 17 are listening in common. In the example of FIG. 12, in the case of stereo recording, the right side sound (direction in which the subject 17 is viewed) is emphasized and recorded, or the right side sound is emphasized during reproduction.

本実施形態の構成は、図１に示したカメラ１０と同様な構成でよく、顔検出部３が、被写体１７の顔の向きを判定する機能を有するようにする。その他の構成は、図１と同様であるので、詳しい説明は省略する。 The configuration of the present embodiment may be the same as that of the camera 10 shown in FIG. 1, and the face detection unit 3 has a function of determining the face direction of the subject 17. Since other configurations are the same as those in FIG. 1, detailed description thereof is omitted.

次に、本実施形態における動作を図１３に示すカメラ制御のフローチャートを用いて説明する。このフローに入ると、第１実施形態と同様に、まず、撮影モードか否かの判定を行い（Ｓ１０１）、撮影モードであった場合には、次に、画像を取り込むと共に顔検出を行う（Ｓ１０２）。続いて、取り込んだ画像を用いて表示部８にライブビュー表示を行い（Ｓ１０３）、顔判定を行う（Ｓ１０４ａ）。 Next, the operation in the present embodiment will be described with reference to the flowchart of camera control shown in FIG. When this flow is entered, as in the first embodiment, it is first determined whether or not the camera is in the shooting mode (S101). If the camera is in the shooting mode, then the image is captured and face detection is performed (step S101). S102). Subsequently, live view display is performed on the display unit 8 using the captured image (S103), and face determination is performed (S104a).

ステップＳ１０４ａにおける顔判定の結果、顔が存在すれば、顔の位置判定を行う（Ｓ１０５ｂ）。これらの撮影時の顔検出では、被写体１７の顔の位置を判定しておき、撮影に入ってすぐに、被写体の顔がどちらを向いているかを判定しやすくし、またどこの露出やピントを合わせるべきかを即座に判定できるようにしておく。例えば、被写体１７が撮影前にカメラ１０を見ていれば顔を検知しやすいが、撮影開始後にカメラ１０の反対側を向いてしまうと、顔を検出するのが困難となることから、記録開始前から判定しておき、記録開始後でも顔の向きを記録できるようにしている。 If the face is present as a result of the face determination in step S104a, the face position is determined (S105b). In these face detections at the time of shooting, the position of the face of the subject 17 is determined, so that it is easy to determine which face of the subject is facing immediately after shooting, and where the exposure and focus are determined. Make it possible to determine immediately whether to match. For example, if the subject 17 is looking at the camera 10 before shooting, it is easy to detect the face, but if the camera turns to the opposite side of the camera 10 after starting shooting, it becomes difficult to detect the face, so recording starts. It is determined from the front, and the orientation of the face can be recorded even after the recording is started.

顔位置判定を行うと、またはステップＳ１０４ａにおける判定の結果、顔が存在しなかった場合には、次に、第１実施形態と同様に、記録開始か否かの判定を行う（Ｓ１０６）。この判定の結果、記録開始でなかった場合には、ステップＳ１０１に戻り、前述の動作を実行する。 If the face position is determined or if the result of determination in step S104a is that there is no face, it is next determined whether or not recording is started, as in the first embodiment (S106). If the result of this determination is that recording has not started, processing returns to step S101 and the aforementioned operation is executed.

一方、ステップＳ１０６における判定の結果、記録開始であった場合には、次に、撮影を開始し、ステレオ録音を行い、被写体の注視方向を記録する（Ｓ１１７）。続いて、顔検出部３によって検出された被写体の注視方向に基づいて、音声強調録音を行う（Ｓ１１８）。また、ステレオ録音のみでも良いが、本実施形態においては、被写体の注視方向に基づいて、収音の向きを変更した結果も同時に記録する。これによって、再生時に、どちらの音声でも選択再生が可能となる。 On the other hand, if the result of determination in step S106 is that recording has started, shooting is then started, stereo recording is performed, and the gaze direction of the subject is recorded (S117). Subsequently, voice-enhanced recording is performed based on the gaze direction of the subject detected by the face detection unit 3 (S118). Although only stereo recording may be used, in the present embodiment, the result of changing the direction of sound collection based on the gaze direction of the subject is also recorded simultaneously. As a result, at the time of reproduction, either audio can be selectively reproduced.

次に、記録終了か否かの判定を行う（Ｓ１１９）。記録開始はレリーズ釦の操作によって行っており、記録終了もレリーズ釦の操作状態に基づいて判定する。この判定の結果、記録終了でなかった場合には、撮影、すなわち画像と音声の記録を続行する。一方、判定の結果、記録終了であった場合には、撮影を終了し、再び、ステップＳ１０１に戻り、前述の動作を実行する。 Next, it is determined whether or not the recording is finished (S119). The start of recording is performed by operating the release button, and the end of recording is also determined based on the operating state of the release button. If the result of this determination is that recording has not ended, shooting, that is, recording of images and sounds is continued. On the other hand, if the result of determination is that recording has ended, shooting is ended, and the process returns to step S101 again to execute the above-described operation.

ステップＳ１０１における判定の結果、撮影モードでなかった場合には、次に、再生モードか否かの判定を行う（Ｓ１１１）。この判定の結果、再生モードでなかった場合には、ステップＳ１０１に戻り、モード判定を繰り返す。一方、判定の結果、再生モードであった場合には、次に、注視方向のデータが有るか否かの判定を行う（Ｓ１３１）。このステップでは、画像データの再生を開始すると共に、ステップＳ１１７において画像と共に記録した注視方向の記録が有るか否かの判定を行う。 If the result of determination in step S101 is not shooting mode, it is next determined whether or not playback mode is in effect (S111). If the result of this determination is not playback mode, processing returns to step S101 and mode determination is repeated. On the other hand, if the result of determination is that it is in playback mode, it is next determined whether or not there is data in the gaze direction (S131). In this step, the reproduction of the image data is started, and it is determined whether or not there is a recording in the gaze direction recorded together with the image in step S117.

ステップＳ１３１における判定の結果、注視方向データがなかった場合には、広い範囲、すなわち環境音重視で音声の再生を行う（Ｓ１３３）。一方、注視方向のデータが有った場合には、その注視方向の音声を強調した再生を行う（Ｓ１３２）。これによって、被写体となった人物が、そのときの事を回想するに、聴いていた音声が再生されるので、容易に思い出に浸ることができる。 If the result of determination in step S131 is that there is no gaze direction data, audio playback is performed over a wide range, that is, environmental sound is emphasized (S133). On the other hand, when there is data on the gaze direction, reproduction is performed with emphasis on the sound in the gaze direction (S132). As a result, when the person who is the subject recalls the situation at that time, the sound he / she listened to is reproduced, so that the person can easily soak in the memories.

音声再生を行うと、次に、再生終了か否かの判定を行（Ｓ１３４）。この判定の結果、再生終了でなかった場合には、ステップＳ１３１に戻り、画像と音声の再生を続行する。一方、再生終了であった場合には、次に、外部機器２０へ画像の送信か否かの判定を行う（Ｓ１４１）。この判定の結果、送信であった場合には、表示画像の送信を行う（Ｓ１４２）。第２実施形態における図９のフローで説明したように、これによって、テレビ等の外部機器２０において、インテリアのようにして撮影画像を楽しむことができる。表示画像の送信を行うと、またはステップＳ１４１における判定の結果、送信でなかった場合には、カメラ制御のフローを終了し、再び、ステップＳ１０１から実行する。 Once audio playback has been carried out, it is next determined whether or not playback has ended (S134). If the result of this determination is that playback has not ended, processing returns to step S131 and playback of images and sounds continues. On the other hand, if the reproduction has ended, it is next determined whether or not to transmit an image to the external device 20 (S141). If the result of this determination is transmission, a display image is transmitted (S142). As described with reference to the flow of FIG. 9 in the second embodiment, this allows the external device 20 such as a television to enjoy a photographed image like an interior. If the display image is transmitted, or if the result of determination in step S141 is not transmission, the flow of camera control is terminated, and the process is executed again from step S101.

次に、本実施形態における音声収録部７の構成を、図１４に示すブロック図を用いて説明する。本実施形態における音声収録部７は、図６に示した第１実施形態における音声収録部７に比較し、加算器４３ａ、４３ｄの極性が逆になっている以外は、図６と同じであるので、相違点を中心に説明する。すなわち、右側マイク４１ａからの音声信号をＡＤ変換するＡＤコンバータ４２ａの出力端は、加算器４３ａのプラス側端と、加算器４３ｄのマイナス側端に接続されている。また、左側マイク４１ｂからの音声信号をＡＤ変換するＡＤコンバータ４２ｂの出力端は、加算器４３ａのマイナス側端と、加算器４３ｄのプラス側端に接続されている。これ以外の構成は、図６と同じである。 Next, the configuration of the audio recording unit 7 in the present embodiment will be described with reference to the block diagram shown in FIG. The audio recording unit 7 in this embodiment is the same as FIG. 6 except that the polarities of the adders 43a and 43d are reversed compared to the audio recording unit 7 in the first embodiment shown in FIG. Therefore, the difference will be mainly described. That is, the output end of the AD converter 42a that AD converts the audio signal from the right microphone 41a is connected to the plus side end of the adder 43a and the minus side end of the adder 43d. The output end of the AD converter 42b that AD converts the audio signal from the left microphone 41b is connected to the minus side end of the adder 43a and the plus side end of the adder 43d. The other configuration is the same as that in FIG.

このような構成であることから、加算器４３ａ、４３ｄから出力される差分は、プラスとなり、乗算器４３ｂ、４３ｅに印加するゲインが大きければ大きいほど強調の度合が大きくなる。つまり、左または右の広がりを強調した音声出力を得ることができる。したがって、乗算器４３ａまたは乗算器４３ｅのゲインを大きくすることにより、左または右の収音を大きくすることができる。 Because of such a configuration, the difference output from the adders 43a and 43d becomes positive, and the degree of emphasis increases as the gain applied to the multipliers 43b and 43e increases. That is, it is possible to obtain an audio output that emphasizes the left or right spread. Therefore, the left or right sound collection can be increased by increasing the gain of the multiplier 43a or the multiplier 43e.

前述した図１３におけるカメラ制御のフロー中のステップＳ１１８において、注視方向に応じて、乗算器４３ｂまたは乗算器４３ｅのゲインを変更する制御を行うことにより、注視方向に応じた収音を行うことができる。このように、本実施形態においては、一対の同一の性能のマイクを備え、収音の範囲を左右に偏らすことを簡単に行うことができる。 In step S118 in the camera control flow in FIG. 13 described above, sound collection according to the gaze direction can be performed by performing control to change the gain of the multiplier 43b or the multiplier 43e according to the gaze direction. it can. Thus, in this embodiment, a pair of microphones having the same performance are provided, and the range of sound collection can be easily biased left and right.

本実施形態においては、撮影時の被写体の聴いている音声を検出し、この音声を記録するようにしている。このため、画像に合った音声を記録し再生することが可能となる。 In the present embodiment, the sound that the subject is listening to at the time of shooting is detected, and this sound is recorded. For this reason, it becomes possible to record and reproduce the sound suitable for the image.

以上、説明したように本発明の実施形態においては、撮影者や被写体の聴いている音声を検出し、この音声を記録し、または再生するようにしている。このため、撮影者の見ている範囲と撮影者が聴いている範囲の差異を考慮して、雰囲気豊かな音響効果を有する画像を再生することができる。また、音声をたよりに、撮影時の様子を落ち着いて思い出すことも可能となる。さらに、撮影時の環境の雰囲気を伝えるために、撮影者の目と耳の指向性の差異を考慮し、そのときの情景を視覚でも聴覚でも思い出せるようにしている。 As described above, in the embodiment of the present invention, the sound that the photographer or the subject is listening to is detected, and this sound is recorded or reproduced. For this reason, an image having a rich sound effect can be reproduced in consideration of the difference between the range that the photographer is viewing and the range that the photographer is listening. In addition, it is possible to calmly remember the state at the time of shooting based on the voice. Furthermore, in order to convey the atmosphere of the environment at the time of shooting, the difference between the directivity of the photographer's eyes and ears is taken into consideration so that the scene at that time can be remembered both visually and auditorily.

なお、本発明の各実施形態においては、再生表示する場合には、カメラ１０の表示部８、またはカメラ１０から外部機器２０に送信して行っていた。しかし、これに限らず、例えば、記録部４で記録された記録媒体を直接、テレビやパーソナルコンピュータに装填するようにしても良い。 In each embodiment of the present invention, reproduction display is performed by transmitting from the display unit 8 of the camera 10 or the camera 10 to the external device 20. However, the present invention is not limited to this. For example, the recording medium recorded by the recording unit 4 may be directly loaded into a television or personal computer.

また、本発明の各実施形態においては、撮影のための機器として、デジタルカメラを用いて説明したが、カメラとしては、デジタル一眼レフカメラでもコンパクトデジタルカメラでもよく、ビデオカメラ、ムービーカメラのような動画用のカメラでもよく、さらに、携帯電話や携帯情報端末（ＰＤＡ：Personal Digital Assist）等に内蔵されるカメラでも構わない。いずれにしても、画像と共に音声を記録することのできる撮影のための機器であれば、本発明を適用することができる。 In each embodiment of the present invention, a digital camera has been described as an apparatus for photographing. However, the camera may be a digital single lens reflex camera or a compact digital camera, such as a video camera or a movie camera. It may be a camera for moving images, or may be a camera built in a mobile phone or a personal digital assistant (PDA). In any case, the present invention can be applied to any device for photographing that can record sound together with an image.

本発明は、上記実施形態にそのまま限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素の幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, you may delete some components of all the components shown by embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の第１実施形態に係わるカメラと外部機器の構成を示すブロック図である。It is a block diagram which shows the structure of the camera concerning 1st Embodiment of this invention, and an external apparatus. 本発明の第１実施形態に係るカメラの使用状態を説明する図であり、（ａ）は撮影時の様子を示し、（ｂ）はテレビ等の外部機器に画像を転送する様子を示し、（ｃ）は転送した画像を再生表示している様子を示す図である。It is a figure explaining the use condition of the camera which concerns on 1st Embodiment of this invention, (a) shows a mode at the time of imaging | photography, (b) shows a mode that an image is transferred to external apparatuses, such as a television, (c) is a diagram showing a state in which a transferred image is reproduced and displayed. 本発明の第１実施形態に係わるカメラにおいて、画像と音声の記録について説明する図であり、（ａ）は、カメラによって撮影および音声収録を行っている様子を示し、（ｂ）はカメラ１０ａの位置で撮影した画像を示し、（ｃ）はカメラ１０ｂの位置で撮影した画像を示す図である。FIG. 4 is a diagram for explaining recording of images and sound in the camera according to the first embodiment of the present invention, where (a) shows a state where shooting and sound recording are performed by the camera, and (b) shows the camera 10a. The image image | photographed in the position is shown, (c) is a figure which shows the image image | photographed in the position of the camera 10b. 本発明の第１実施形態に係わるカメラにおけるカメラ制御の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the camera control in the camera concerning 1st Embodiment of this invention. 本発明の第１実施形態における撮影・収音記録の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of imaging | photography and sound recording in 1st Embodiment of this invention. 本発明の第１実施形態に係わるカメラにおいて音声収録部７の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice recording part 7 in the camera concerning 1st Embodiment of this invention. 本発明の第２実施形態に係わるカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the camera concerning 2nd Embodiment of this invention. 本発明の第２実施形態において、カメラを使用する状態を示す図である。It is a figure which shows the state which uses a camera in 2nd Embodiment of this invention. 本発明の第２実施形態における撮影・収音記録の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of imaging | photography and sound recording in 2nd Embodiment of this invention. 本発明の第３実施形態に係わるカメラにおいて音声収録部７の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice recording part 7 in the camera concerning 3rd Embodiment of this invention. 本発明の第３実施形態に係わるカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the camera concerning 3rd Embodiment of this invention. 本発明の第４実施形態に係わるカメラにおいて、画像と音声の記録について説明する図であり、（ａ）は、カメラによって撮影および音声収録を行っている様子を示し、（ｂ）は被写体のいるシーンの中でカメラ１０ａの位置で撮影した画像を示す図である。In the camera concerning 4th Embodiment of this invention, it is a figure explaining recording of an image and an audio | voice, (a) shows a mode that imaging | photography and audio | voice recording are performed with the camera, (b) has a to-be-photographed object. It is a figure which shows the image image | photographed in the position of the camera 10a in the scene. 本発明の第４実施形態に係わるカメラの構成を示すブロック図である。It is a block diagram which shows the structure of the camera concerning 4th Embodiment of this invention. 本発明の第４実施形態に係わるカメラにおいて音声収録部７の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice recording part 7 in the camera concerning 4th Embodiment of this invention.

Explanation of symbols

１・・・信号処理及び制御部、２・・・撮像部、３・・・顔検出部、４・・・記録部、６・・・操作判定部、７・・・音声収録部、８・・・表示部、９・・・時計部、１０・・・カメラ、１０ａ・・・カメラ、１０ｂ・・・カメラ、１２・・・通信部、１５・・・ユーザ（撮影者）、１５ａ・・・目、１５ｂ・・・目、１５ｃ・・・耳、１６・・・撮影者、１７・・・被写体、１８・・・被写体が聴いている波、２０・・・外部機器、２１・・・信号処理及び制御部、２２・・・通信部、２３・・・表示・再生部、２４・・・表示優先部、２５・・・リモコン受信部、３１ａ・・・画角、３１ｂ・・・画角、３１ｃ・・・画角、３２・・・撮影画像、３３ａ・・・収音範囲、３３ｂ・・・収音範囲、３３ｃ・・・収音範囲、３５・・・可聴範囲、３７・・・方向、３８・・・方向、３９・・・方向、４１ａ・・・右側マイク、４１ｂ・・・左側マイク、４２ａ・・・ＡＤコンバータ、４２ｂ・・・ＡＤコンバータ、４３ａ・・・加算器、４３ｂ・・・加算器、４３ｃ・・・乗算器、４３ｄ・・・加算器、４３ｅ・・・乗算器、４３ｆ・・・加算器 DESCRIPTION OF SYMBOLS 1 ... Signal processing and control part, 2 ... Imaging part, 3 ... Face detection part, 4 ... Recording part, 6 ... Operation determination part, 7 ... Audio | voice recording part, 8. ..Display unit, 9 ... clock unit, 10 ... camera, 10a ... camera, 10b ... camera, 12 ... communication unit, 15 ... user (photographer), 15a ... Eyes, 15b ... eyes, 15c ... ears, 16 ... photographer, 17 ... subject, 18 ... wave that subject is listening to, 20 ... external device, 21 ... Signal processing and control unit, 22 ... communication unit, 23 ... display / playback unit, 24 ... display priority unit, 25 ... remote control receiving unit, 31a ... angle of view, 31b ... image Angle, 31c ... Angle of view, 32 ... Captured image, 33a ... Sound collection range, 33b ... Sound collection range, 33c ... Sound collection range, 35 ... Audible range 37 ... direction, 38 ... direction, 39 ... direction, 41a ... right microphone, 41b ... left microphone, 42a ... AD converter, 42b ... AD converter, 43a ... Adder, 43b ... adder, 43c ... multiplier, 43d ... adder, 43e ... multiplier, 43f ... adder

Claims

An imaging unit for imaging a subject and outputting image data;
A sound collection changer that can change the sound collection range of the sound from the subject direction;
A face detection unit that determines whether or not there is a person in the image by detecting a face part of the person from the image based on the image data obtained by the imaging unit;
When the face detection unit determines that there is a person in the image for a predetermined time, the sound collection change unit narrows the sound collection range, while when it is determined that there is no person in the image for a predetermined time. Is a control unit that widely changes the sound collection range by the sound collection change unit,
A camera characterized by comprising:

The face detection unit further detects an image of the person's facial expression,
The control unit changes the sound collection range in the sound collection change unit according to a change in the facial expression detected by the face detection unit.
The camera according to claim 1.

An imaging unit for imaging a subject and outputting image data;
A sound collection changer that can change the sound collection range of the sound from the subject direction;
A face detection unit that determines whether or not there is a person in the image by detecting a face part of the person from the image based on the image data obtained by the imaging unit;
When the face detection unit determines that there is a person in the image before shooting, the sound collection change unit narrows the sound collection range, while when it is determined that there is no person in the image before shooting Includes a control unit that widely changes the sound collection range by the sound collection change unit,
A camera characterized by comprising:

An imaging unit for imaging a subject and outputting image data;
A sound collection changer that can change the sound collection range of the sound from the subject direction;
A second imaging unit that images the photographer and outputs second image data;
A face detection unit for detecting a face portion of a person from an image based on the second image data obtained by the second imaging unit;
A control unit that changes the sound collection range of the sound collection change unit in accordance with the image of the face part detected by the face detection unit;
A camera characterized by comprising:

The face detection unit further detects the orientation of the person's face,
The control unit changes the sound collection range in the sound collection change unit narrowly when the orientation of the face does not change.
The camera according to claim 4.

When the face detection unit determines that the viewpoint of the camera photographer has shifted from the first part to the second part, the control unit temporarily widens the sound collection range by the sound collection change unit. The camera according to claim 4, wherein the camera is narrowed thereafter.

An imaging unit for imaging a subject and outputting image data;
A sound collection unit that picks up sound from the subject direction with a plurality of microphones;
A playback unit for playing back the sound collected by the sound collection unit;
A second imaging unit that images the photographer and outputs second image data;
A face detection unit for detecting a face portion of a person from an image based on the second image data obtained by the second imaging unit;
A control unit that changes the sound source position of the sound when the sound is reproduced by the reproduction unit according to the image of the face part detected by the face detection unit;
A camera characterized by comprising:

The face detection unit further detects the orientation of the person's face,
The control unit changes a balance of a plurality of sounds reproduced in the reproduction unit when the face orientation does not change.
The camera according to claim 7.

An imaging unit for imaging a subject and outputting image data;
A sound collection changer that can change the sound collection range of the sound from the subject direction;
A face detection unit for detecting a human face part from an image based on the image data obtained by the imaging unit;
A control unit that changes the sound collection range by the sound collection change unit in accordance with a direction in which the face of the person detected by the face detection unit is viewed;
A camera characterized by comprising:

An imaging unit that outputs image data;
A sound collection changer that can change the sound collection range of the sound from the subject direction;
A face detection unit for detecting a human face part from an image based on the image data obtained by the imaging unit;
In a normal state, the sound collection range of the sound is wider than the shooting range, and the control means for narrowing the sound collection range of the sound according to the detection result of the face detection unit,
A camera characterized by comprising:

The imaging unit outputs the image data based on a subject image formed by a photographing optical system,
The face detection unit can detect the position of the face portion and / or the number of the face portions in the subject image,
The control means narrows the sound collection range when the face portion is in the center and / or when the number of the face portions is a predetermined value or more.
The camera according to claim 10.

The face detection unit can determine the facial expression change of the person,
The control unit quickly narrows the sound collection range when the facial expression change is detected by the face detection unit.
The camera according to claim 10.

The imaging unit outputs image data related to the photographer's face,
The face detection unit detects the gaze direction of the photographer based on the image data,
The control unit controls the sound collection range based on the gaze direction.
The camera according to claim 10.

The camera according to claim 13, wherein when the face detection unit detects a display unit of the camera as the gaze direction, the control unit narrows the sound collection range.

The imaging unit outputs the image data based on a subject image formed by a photographing optical system,
The face detection unit further detects the gaze direction of the photographer based on the image data,
The control unit controls the sound collection range based on the gaze direction.
The camera according to claim 10.

The camera according to claim 10, further comprising a reproduction display unit that reproduces and displays a subject image based on the image data and audio data recorded together with the image data.

In a playback device that plays back subject image data and audio data recorded in stereo at the same time,
A determination unit that determines the gaze direction of the photographer based on the data recorded in the audio data;
A playback unit that performs audio playback based on the determination of the gaze direction;
A playback apparatus comprising:

When the gaze direction cannot be detected by the determination unit, audio reproduction is performed over a wide range,
When the gaze direction can be detected by the determination unit, enhancement reproduction is performed in the detected gaze direction.
The playback apparatus according to claim 17.

In a playback method for playing back subject image data and audio data recorded in stereo at the same time,
Based on the data recorded in the audio data, determine the gaze direction of the photographer,
Audio playback is performed based on the determination of the gaze direction.
A reproduction method characterized by the above.

In a program for playing back subject image data and audio data recorded in stereo at the same time,
Based on the data recorded in the audio data, determine the gaze direction of the photographer,
Audio playback is performed based on the determination of the gaze direction.
A program characterized by causing a computer to execute the above.

An imaging unit for imaging a subject and outputting image data;
A sound collection changer that can change the sound collection range of the sound from the subject direction;
A gaze direction determination unit that determines the gaze direction of the photographer;
A control unit that controls a sound collection range by the sound collection change unit according to the gaze direction of the photographer determined by the gaze direction determination unit, and an image of a captured image based on the image data;
A camera characterized by comprising: