JP2011114406A

JP2011114406A - Imaging apparatus, imaging method, and program

Info

Publication number: JP2011114406A
Application number: JP2009266703A
Authority: JP
Inventors: Yohei Sakuraba; 洋平櫻庭
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2009-11-24
Filing date: 2009-11-24
Publication date: 2011-06-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide an imaging apparatus, an imaging method and a program, capable of removing noise accompanying an optical zoom operation from recorded sound and/or output sound. <P>SOLUTION: The imaging apparatus includes: an imaging portion 15 for performing the optical zoom operation corresponding to a zoom instruction from a user; a sound signal supply portion (main control portion 11) for supplying sound signals inputted from a microphone 35 to a recorder and/or an output device; a sound signal determination portion (main control portion 11) for determining the input condition of the sound signals; an operation signal determination portion (main control portion 11) for determining whether or not the operation signals of the zoom instruction are inputted; and a zoom operation adjustment portion (main control portion 11) for adjusting the point of time of the optical zoom operation corresponding to the input condition of the sound signals when the operation signals of the zoom instruction are inputted. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、撮像装置、撮像方法およびプログラムに関する。 The present invention relates to an imaging apparatus, an imaging method, and a program.

スチルカメラ、ビデオカメラ、監視カメラ等の撮像装置では、光学ズーム動作に伴うモータ音等がノイズ（以下、ズームノイズとも称する。）としてマイクロホンに収音され、記録および／または出力されてしまう場合がある。 In an imaging apparatus such as a still camera, a video camera, or a surveillance camera, motor sound or the like accompanying an optical zoom operation may be collected by a microphone as noise (hereinafter also referred to as zoom noise) and recorded and / or output. is there.

このため、光学ズームの代わりにデジタルズームを利用したり、光学ズームの動作速度を遅くしてノイズの発生を低減したりすることが行われている。また、ズームノイズを予め記録しておき、記録および／または出力すべき対象音声に重畳されたノイズを信号処理で除去して、ズームノイズを目立たなくさせることも行われている（下記特許文献１、２参照）。 For this reason, digital zoom is used instead of optical zoom, or the operation speed of optical zoom is slowed to reduce the generation of noise. In addition, zoom noise is recorded in advance, and noise superimposed on the target sound to be recorded and / or output is removed by signal processing to make the zoom noise inconspicuous (Patent Document 1 below). 2).

特開２００６−２６２２４１号公報JP 2006-262241 A 特開２０００−４４９４号公報JP 2000-4494 A

しかし、一般に、対象音声に重畳されたノイズを信号処理で除去すると、対象音声に歪みが生じ、記録品質および／または出力品質が低下してしまうことが知られている。 However, it is generally known that if noise superimposed on the target voice is removed by signal processing, the target voice is distorted and the recording quality and / or output quality is reduced.

そこで、本発明は、光学ズーム動作に伴うノイズを記録音声および／または出力音声から除去可能な、撮像装置、撮像方法およびプログラムを提供しようとするものである。 Therefore, the present invention is intended to provide an imaging apparatus, an imaging method, and a program capable of removing noise associated with an optical zoom operation from recorded audio and / or output audio.

本発明の第１の観点によれば、ユーザからのズーム指示に応じて、光学ズーム動作を行う撮像部と、マイクロホンから入力される音声信号を、記録装置および／または出力装置に供給する音声信号供給部と、音声信号の入力状況を判定する音声信号判定部と、ズーム指示の操作信号が入力されるかを判定する操作信号判定部と、ズーム指示の操作信号が入力されると、音声信号の入力状況に応じて、光学ズームの作動時点を調節するズーム作動調節部と、を備える撮像装置が提供される。 According to the first aspect of the present invention, in response to a zoom instruction from a user, an imaging unit that performs an optical zoom operation, and an audio signal that supplies an audio signal input from a microphone to a recording device and / or an output device A supply unit, an audio signal determination unit that determines an input state of an audio signal, an operation signal determination unit that determines whether an operation signal for zoom instruction is input, and an audio signal when an operation signal for zoom instruction is input An image pickup apparatus is provided that includes a zoom operation adjustment unit that adjusts the operation time point of the optical zoom in accordance with the input state.

かかる構成によれば、ズーム指示の操作信号が入力されると、音声信号の入力状況に応じて、光学ズームの作動時点が調節される。これにより、光学ズームの作動時点を対象音声が入力されていない時点に調節することで、ノイズ除去に伴う対象音声の歪みによる記録品質および／または出力品質の低下を防止することができる。 According to this configuration, when the zoom instruction operation signal is input, the operation time point of the optical zoom is adjusted according to the input state of the audio signal. Accordingly, by adjusting the operation time point of the optical zoom to the time point when the target sound is not input, it is possible to prevent the recording quality and / or the output quality from being deteriorated due to the distortion of the target sound due to noise removal.

また、音声信号判定部は、音声信号の入力状況から有音区間または発話区間を判定し、ズーム作動調節部は、有音区間または発話区間にズーム指示の操作信号が入力されると、有音区間または発話区間の終了後に撮像部が光学ズーム動作を行うように、光学ズームの作動時点を調節してもよい。 The voice signal determination unit determines a voiced segment or a speech segment from the input state of the voice signal, and the zoom operation adjustment unit detects a voice when a zoom instruction operation signal is input to the voiced segment or the speech segment. The operation time point of the optical zoom may be adjusted so that the imaging unit performs the optical zoom operation after the section or the speech section ends.

また、撮像部から入力される撮像信号をデジタルズーム処理するズーム処理部をさらに備え、ズーム処理部は、有音区間または発話区間にズーム指示の操作信号が入力されると、有音区間または発話区間にデジタルズーム処理を行ってもよい。 In addition, the zoom processing unit further includes a zoom processing unit that performs digital zoom processing on an imaging signal input from the imaging unit, and the zoom processing unit receives a zooming operation signal when the zoom instruction operation signal is input to the sounding period or the speaking period. Alternatively, digital zoom processing may be performed.

また、光学ズームの作動時点を調節する場合に、音声信号供給部は、光学ズーム動作時に、マイクロホンから入力される音声信号を記録装置および／または出力装置に供給しなくてもよく、マイクロホンから入力される音声信号に代えて、コンフォートノイズを記録装置および／または出力装置に供給してもよい。 Further, when adjusting the operation time of the optical zoom, the audio signal supply unit does not need to supply the audio signal input from the microphone to the recording device and / or the output device at the time of the optical zoom operation. Instead of the audio signal, comfort noise may be supplied to the recording device and / or the output device.

また、音声信号判定部は、光学ズーム動作に伴うノイズを考慮して、有音区間または発話区間を判定し、ズーム作動調節部は、光学ズーム動作時に有音区間または発話区間が判定されると、撮像部が光学ズーム動作を一時的に停止するように、光学ズームの作動を調節してもよい。また、ズーム作動調節部は、さらに、有音区間または発話区間の終了後に撮像部が光学ズーム動作を行うように、光学ズームの作動を調節してもよい。 In addition, the audio signal determination unit determines a voiced section or a speech section in consideration of noise associated with the optical zoom operation, and the zoom operation adjustment unit determines that the voiced section or the speech section is determined during the optical zoom operation. The operation of the optical zoom may be adjusted so that the imaging unit temporarily stops the optical zoom operation. Further, the zoom operation adjustment unit may further adjust the operation of the optical zoom so that the imaging unit performs the optical zoom operation after the end of the sound period or the speech period.

また、音声信号判定部は、音声信号の信号ノイズ比が閾値以上の区間を有音区間または発話区間として判定してもよく、音声信号の音声検出値が閾値以上の区間を有音区間または発話区間として判定してもよい。 The voice signal determination unit may determine a section in which the signal-to-noise ratio of the voice signal is greater than or equal to a threshold as a voiced section or an utterance section, and a section in which the voice detection value of the voice signal is greater than or equal to the threshold. You may determine as an area.

また、本発明の第２の観点によれば、マイクロホンから入力される音声信号を、記録装置および／または出力装置に供給するステップを含み、上記ステップは、音声信号の入力状況を判定するステップと、ズーム指示の操作信号が入力されるかを判定するステップと、ズーム指示の操作信号が入力されると、音声信号の入力状況に応じて、光学ズームの作動時点を調節するステップと、を含む撮像方法が提供される。 According to a second aspect of the present invention, the method includes a step of supplying an audio signal input from a microphone to a recording device and / or an output device, wherein the step includes a step of determining an input state of the audio signal. Determining whether an operation signal for zoom instruction is input, and adjusting an operation time point of the optical zoom according to an input state of the audio signal when the operation signal for zoom instruction is input. An imaging method is provided.

また、本発明の第３の観点によれば、第２の観点による撮像方法をコンピュータに実行させるためのプログラムが提供される。ここで、プログラムは、コンピュータ読取り可能な記録媒体を用いて提供されてもよく、通信手段を介して提供されてもよい。 According to the third aspect of the present invention, there is provided a program for causing a computer to execute the imaging method according to the second aspect. Here, the program may be provided using a computer-readable recording medium or may be provided via communication means.

以上説明したように本発明によれば、光学ズーム動作に伴うノイズを記録音声および／または出力音声から除去可能な、撮像装置、撮像方法およびプログラムを提供することができる。 As described above, according to the present invention, it is possible to provide an imaging apparatus, an imaging method, and a program capable of removing noise associated with an optical zoom operation from recorded audio and / or output audio.

本発明の実施形態に係るスチルカメラの主要な機能構成を示すブロック図である。It is a block diagram which shows the main function structures of the still camera which concerns on embodiment of this invention. 第１の実施形態に係るスチルカメラの動作を示すフロー図である。It is a flowchart which shows operation | movement of the still camera which concerns on 1st Embodiment. 第１の実施形態に係るスチルカメラによる処理結果を示す模式図である。It is a schematic diagram which shows the processing result by the still camera which concerns on 1st Embodiment. 第２の実施形態に係るスチルカメラによる処理結果を示す模式図である。It is a schematic diagram which shows the processing result by the still camera which concerns on 2nd Embodiment. 第３の実施形態に係るスチルカメラによる処理結果を示す模式図である。It is a schematic diagram which shows the processing result by the still camera which concerns on 3rd Embodiment. 第４の実施形態に係るスチルカメラによる処理結果を示す模式図である。It is a schematic diagram which shows the processing result by the still camera which concerns on 4th Embodiment.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書および図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In the present specification and drawings, components having substantially the same functional configuration are denoted by the same reference numerals, and redundant description is omitted.

［１．スチルカメラの構成］
まず、本発明の実施形態に係るスチルカメラ１０について説明する。図１は、本発明の実施形態に係るスチルカメラ１０の主要な機能構成を示すブロック図である。なお、以下では、本発明をスチルカメラ１０に適用する場合を例として説明するが、本発明は、ビデオカメラ、監視カメラ等、他の撮像装置にも同様に適用することができる。 [1. Still camera configuration]
First, the still camera 10 according to the embodiment of the present invention will be described. FIG. 1 is a block diagram showing a main functional configuration of a still camera 10 according to an embodiment of the present invention. In the following, the case where the present invention is applied to the still camera 10 will be described as an example. However, the present invention can be similarly applied to other imaging apparatuses such as a video camera and a surveillance camera.

図１に示すように、スチルカメラ１０は、主制御部１１、操作部１３、撮像部１５、レンズ制御部１７、シャッタ制御部１９、アイリス制御部２１、タイミング生成部２３、撮像信号変換部２５、撮像信号処理部２７、画像処理部２９、表示制御部３１、表示部３３を備える。また、スチルカメラ１０は、マイクロホン３５、音声信号変換部３７、音声信号処理部３９、音声処理部４１、スピーカ４３、メモリ制御部４５、メモリ４７、カード制御部４９、メモリカード５１を備える。 As shown in FIG. 1, the still camera 10 includes a main control unit 11, an operation unit 13, an imaging unit 15, a lens control unit 17, a shutter control unit 19, an iris control unit 21, a timing generation unit 23, and an imaging signal conversion unit 25. The imaging signal processing unit 27, the image processing unit 29, the display control unit 31, and the display unit 33 are provided. The still camera 10 includes a microphone 35, an audio signal conversion unit 37, an audio signal processing unit 39, an audio processing unit 41, a speaker 43, a memory control unit 45, a memory 47, a card control unit 49, and a memory card 51.

主制御部１１は、メモリ４７に格納されているプログラムを実行し、スチルカメラ１０の各部を制御する。操作部１３は、不図示の記録開始ボタン、ズームボタン等を通じてユーザから入力される指示を、操作信号に変換して主制御部１１に供給する。撮像部１５は、レンズ群１５ａ（ズームレンズを含む。）、シャッタ１５ｂ、アイリス１５ｃ、ＣＣＤ等の撮像素子１５ｄを備える。撮像部１５は、被写体の光像をレンズ群１５ａ、シャッタ１５ｂ、アイリス１５ｃを通じて撮像素子１５ｄに結像し、撮像信号に変換して撮像信号変換部２５に供給する。 The main control unit 11 executes a program stored in the memory 47 and controls each unit of the still camera 10. The operation unit 13 converts an instruction input from the user through a recording start button, a zoom button, and the like (not shown) into an operation signal and supplies the operation signal to the main control unit 11. The imaging unit 15 includes a lens group 15a (including a zoom lens), a shutter 15b, an iris 15c, and an imaging element 15d such as a CCD. The imaging unit 15 forms an optical image of the subject on the imaging element 15d through the lens group 15a, the shutter 15b, and the iris 15c, converts the image into an imaging signal, and supplies the imaging signal to the imaging signal conversion unit 25.

レンズ制御部１７、シャッタ制御部１９、アイリス制御部２１は、レンズ群１５ａ、シャッタ１５ｂ、アイリス１５ｃを各々に制御する。タイミング生成部２３は、撮像素子１５ｄおよび撮像信号変換部２５にタイミング信号を供給し、撮像のタイミングを制御する。撮像信号変換部２５は、撮像素子１５ｄから供給される撮像信号に対して信号増幅、Ａ／Ｄ変換、ホワイトバランス調整等を行い、撮像信号処理部２７に供給する。 The lens control unit 17, the shutter control unit 19, and the iris control unit 21 control the lens group 15a, the shutter 15b, and the iris 15c, respectively. The timing generation unit 23 supplies a timing signal to the imaging element 15d and the imaging signal conversion unit 25, and controls imaging timing. The imaging signal conversion unit 25 performs signal amplification, A / D conversion, white balance adjustment, and the like on the imaging signal supplied from the imaging element 15 d and supplies the imaging signal processing unit 27 with the signal.

撮像信号処理部２７は、撮像信号に対して色空間変換、ノイズ除去等を行い、画像データを生成してメインバス６１に供給する。画像処理部２９は、画像データに対して拡大、縮小、符号化、復号化等を行う。表示制御部３１は、画像データを出力信号に変換し、液晶ディスプレイ等の表示部３３による画像の出力を制御する。また、表示制御部３１は、メモリ４７内のフレームメモリに展開された画像データ、ライブ動画像データ、主制御部１１により生成されたＧＵＩ領域の画像データを合成する。 The imaging signal processing unit 27 performs color space conversion, noise removal, and the like on the imaging signal, generates image data, and supplies the image data to the main bus 61. The image processing unit 29 performs enlargement, reduction, encoding, decoding, and the like on the image data. The display control unit 31 converts the image data into an output signal, and controls the image output by the display unit 33 such as a liquid crystal display. Further, the display control unit 31 combines the image data expanded in the frame memory in the memory 47, the live moving image data, and the image data of the GUI area generated by the main control unit 11.

マイクロホン３５は、スチルカメラ１０周辺の音声を収音し、音声信号に変換して音声信号変換部３７に供給する。音声信号変換部３７は、音声信号に対して信号増幅、Ａ／Ｄ変換等を行い、音声信号処理部３９に供給する。音声信号処理部３９は、音声信号に対して各種の信号処理を行い、音声データを生成してメインバス６１に供給する。音声処理部４１は、音声データに対して符号化、復号化等を行い、メインバス６１に供給する。スピーカ４３は、音声データを出力信号に変換し、音声を外部に出力する。 The microphone 35 collects sound around the still camera 10, converts it into an audio signal, and supplies the audio signal to the audio signal conversion unit 37. The audio signal conversion unit 37 performs signal amplification, A / D conversion, and the like on the audio signal, and supplies the audio signal processing unit 39 with the signal. The audio signal processing unit 39 performs various signal processing on the audio signal, generates audio data, and supplies the audio data to the main bus 61. The audio processing unit 41 performs encoding, decoding, etc. on the audio data and supplies the encoded audio data to the main bus 61. The speaker 43 converts audio data into an output signal and outputs the audio to the outside.

メモリ制御部４５は、画像データおよび音声データを一時記憶するために、主制御部１１の制御の下で、メモリ４７に対するデータの書込み／読出しを制御する。カード制御部４９は、画像データおよび音声データを格納するために、主制御部１１の制御の下で、メモリカード５１に対するデータの書込み／読出しを制御する。画像データ、音声データについては、画像処理部２９、音声処理部４１により符号化／復号化されて、メモリ４７、メモリカード５１に対して書込み／読出しが行われる。 The memory control unit 45 controls writing / reading of data to / from the memory 47 under the control of the main control unit 11 in order to temporarily store image data and audio data. The card control unit 49 controls writing / reading of data to / from the memory card 51 under the control of the main control unit 11 in order to store image data and audio data. The image data and audio data are encoded / decoded by the image processing unit 29 and the audio processing unit 41, and are written / read to / from the memory 47 and the memory card 51.

スチルカメラ１０は、光学ズーム機能およびデジタルズーム機能を有する。主制御部１１は、光学ズーム動作の開始に際して、レンズ群１５ａの光学ズームの目標倍率、シャッタ１５ｂのスピード（撮像素子１５ｄの露光時間）、アイリス１５ｃの絞り量等の動作条件を設定する。そして、主制御部１１は、撮像部１５が動作条件に従って駆動するように、レンズ制御部１７、シャッタ制御部１９、アイリス制御部２１を制御する。また、主制御部１１は、デジタルズーム動作の開始に際して、デジタルズームの目標倍率、撮像素子１５ｄの撮像範囲等の動作条件を設定する。そして、主制御部１１は、動作条件に従って画像データが拡大されるように、画像処理部２９を制御する。 The still camera 10 has an optical zoom function and a digital zoom function. When starting the optical zoom operation, the main control unit 11 sets operation conditions such as the target magnification of the optical zoom of the lens group 15a, the speed of the shutter 15b (exposure time of the image sensor 15d), and the iris amount of the iris 15c. Then, the main control unit 11 controls the lens control unit 17, the shutter control unit 19, and the iris control unit 21 so that the imaging unit 15 is driven according to the operating conditions. Further, when starting the digital zoom operation, the main control unit 11 sets operation conditions such as a target magnification of the digital zoom and an imaging range of the image sensor 15d. Then, the main control unit 11 controls the image processing unit 29 so that the image data is enlarged according to the operating conditions.

主制御部１１は、音声信号処理部３９に供給される音声信号に基づいて、音声信号の入力状況、特に、有音区間または発話区間を判定する。ここで、有音区間または発話区間とは、記録および／または出力すべき対象音声が収音される区間を意味する。主制御部１１は、詳細は後述するが、光学ズーム動作およびデジタルズーム動作を制御し、特に、音声信号の入力状況に応じて、光学ズームの作動時点を調節する。 Based on the audio signal supplied to the audio signal processing unit 39, the main control unit 11 determines the input state of the audio signal, in particular, a voiced section or an utterance section. Here, the voiced section or the utterance section means a section in which the target voice to be recorded and / or output is collected. Although the details will be described later, the main control unit 11 controls the optical zoom operation and the digital zoom operation, and particularly adjusts the operation time point of the optical zoom according to the input state of the audio signal.

スチルカメラ１０が起動されると、主制御部１１は、操作部１３を通じた操作信号の入力を待機する。ユーザが記録開始ボタンを操作すると、主制御部１１は、記録開始を示す操作信号を入力され、各部を制御して画像データおよび音声データの記録動作を開始する。スチルカメラ１０では、ユーザが記録終了ボタンを操作するまで、画像データおよび音声データの記録動作が継続される。つまり、スチルカメラ１０では、ライブ動画像およびライブ音声が記録される。 When the still camera 10 is activated, the main control unit 11 waits for an operation signal to be input through the operation unit 13. When the user operates the recording start button, the main control unit 11 receives an operation signal indicating recording start, and controls each unit to start recording operation of image data and audio data. In the still camera 10, the recording operation of the image data and the audio data is continued until the user operates the recording end button. That is, the still camera 10 records live moving images and live audio.

撮像部１５では、被写体の光像が撮像素子１５ｄに結像され、撮像信号に変換される。撮像信号は、一定周期で処理されて画像データとしてメインバス６１に供給される。そして、画像データは、符号化されてメモリ４７に書き込まれるとともに、画像として表示部３３に表示される。 In the imaging unit 15, a light image of the subject is formed on the imaging element 15d and converted into an imaging signal. The imaging signal is processed at a constant cycle and supplied to the main bus 61 as image data. The image data is encoded and written in the memory 47 and displayed on the display unit 33 as an image.

マイクロホン３５では、スチルカメラ１０周辺の音声が収音され、音声信号に変換される。音声信号は、一定周期で処理されて音声データとしてメインバス６１に供給される。そして、音声データは、符号化されてメモリ４７に書き込まれる。 The microphone 35 collects sound around the still camera 10 and converts it into an audio signal. The audio signal is processed at a constant cycle and supplied to the main bus 61 as audio data. The audio data is encoded and written to the memory 47.

また、ユーザがデータ保存ボタンを操作すると、主制御部１１は、メモリ４７に一時記憶されているデータをメモリカード５１に格納するように、メモリ制御部４５およびカード制御部４９を制御する。 When the user operates the data save button, the main control unit 11 controls the memory control unit 45 and the card control unit 49 so that the data temporarily stored in the memory 47 is stored in the memory card 51.

なお、以下では、画像データおよび音声データを記録（格納）する場合について説明する。しかし、画像データおよび音声データは、記録に代えて／記録と併せて、不図示の出力ポート等を通じて、外部装置に出力されてもよい。 In the following, a case where image data and audio data are recorded (stored) will be described. However, image data and audio data may be output to an external device through an output port (not shown) instead of recording / in combination with recording.

［２．スチルカメラの動作］
つぎに、本発明のスチルカメラ１０の動作に係る第１〜第５の実施形態について説明する。なお、各実施形態で重複する説明は省略する。 [2. Still camera operation]
Next, first to fifth embodiments relating to the operation of the still camera 10 of the present invention will be described. In addition, the description which overlaps with each embodiment is abbreviate | omitted.

＜第１の実施形態＞
まず、図２および図３を参照しながら、本発明の第１の実施形態について説明する。図２は、スチルカメラ１０の動作を示すフロー図である。図３は、第１の実施形態に係るスチルカメラ１０による処理結果を示す模式図である。 <First Embodiment>
First, a first embodiment of the present invention will be described with reference to FIGS. FIG. 2 is a flowchart showing the operation of the still camera 10. FIG. 3 is a schematic diagram illustrating a processing result by the still camera 10 according to the first embodiment.

データの記録動作中、主制御部１１は、マイクロホン３５から入力される音声信号の入力状況を判定している。ユーザがズームボタンを操作すると、主制御部１１は、操作信号の入力状況に応じて、ズーム目標倍率等の動作条件を設定する。そして、主制御部１１は、音声信号の入力状況に応じて、光学ズーム動作の可否を判定する。 During the data recording operation, the main control unit 11 determines the input state of the audio signal input from the microphone 35. When the user operates the zoom button, the main control unit 11 sets an operation condition such as a zoom target magnification according to the input state of the operation signal. The main control unit 11 determines whether or not the optical zoom operation is possible according to the input state of the audio signal.

ここで、光学ズーム動作が可能であると判定した場合、主制御部１１は、動作条件に従って撮像部１５に光学ズーム動作を行わせる。一方、光学ズーム動作が不能であると判定した場合、主制御部１１は、光学ズームの作動時点を調節するとともに、動作条件に従って画像処理部２９にデジタルズーム動作を行わせる。 Here, when it is determined that the optical zoom operation is possible, the main control unit 11 causes the imaging unit 15 to perform the optical zoom operation according to the operation condition. On the other hand, when it is determined that the optical zoom operation is impossible, the main control unit 11 adjusts the operation time of the optical zoom and causes the image processing unit 29 to perform the digital zoom operation according to the operation conditions.

以下では、第１の実施形態に係るスチルカメラ１０の動作について詳細に説明する。図２に示すように、データの記録動作中、主制御部１１は、音声信号処理部３９に供給される音声信号に基づいて、音声信号の入力状況を判定する。音声信号の入力状況は、サンプリング周波数（例えば１６ｋＨｚ）の１サンプル毎に判定されてもよく、１フレーム（例えば５１２サンプル）毎に判定されてもよい。 Below, operation | movement of the still camera 10 which concerns on 1st Embodiment is demonstrated in detail. As shown in FIG. 2, during the data recording operation, the main control unit 11 determines the input state of the audio signal based on the audio signal supplied to the audio signal processing unit 39. The input state of the audio signal may be determined for each sample of the sampling frequency (for example, 16 kHz) or may be determined for each frame (for example, 512 samples).

音声信号の入力状況を判定するために、主制御部１１は、以下に示す式１〜式３により音声信号のノイズレベルｎ（ｔ）および信号レベルｓ（ｔ）を推定して、音声信号の信号ノイズ比ＳＮＲを算出する（ステップＳ１０１）。なお、ノイズレベルｎ（ｔ）および信号レベルｓ（ｔ）は、式１〜式３を用いる手法に限らず、音声検出を利用する手法、入力信号の最小値に基づく手法等により推定されてもよい。 In order to determine the input state of the audio signal, the main control unit 11 estimates the noise level n (t) and the signal level s (t) of the audio signal by the following equations 1 to 3, and A signal noise ratio SNR is calculated (step S101). Note that the noise level n (t) and the signal level s (t) are not limited to the method using the equations 1 to 3, but may be estimated by a method using speech detection, a method based on the minimum value of the input signal, or the like. Good.

ここで、ｎ（ｔ）、ｓ（ｔ）が時刻ｔのノイズレベルおよび信号レベルの推定値であり、ｍ（ｔ）が時刻ｔにマイクロホン３５から入力される音声信号の振幅である。また、α、βが０＜β＜α＜１の関係を満たす定数であり、γ、δが０＜γ＜δ＜１の関係を満たす定数である。 Here, n (t) and s (t) are the noise level and signal level estimates at time t, and m (t) is the amplitude of the audio signal input from the microphone 35 at time t. Further, α and β are constants that satisfy the relationship of 0 <β <α <1, and γ and δ are constants that satisfy the relationship of 0 <γ <δ <1.

β＜αの関係から、ノイズレベルｎ（ｔ）には、ノイズレベルが増加傾向にある場合に現時点の振幅ｍ（ｔ）が反映され難くなり、ノイズレベルが減少傾向にある場合に現時点の振幅ｍ（ｔ）が反映され易くなる。また、γ＜δの関係から、信号レベルｓ（ｔ）には、信号レベルが増加傾向にある場合に現時点の振幅ｍ（ｔ）が反映され難くなり、信号レベルが減少傾向にある場合に現時点の振幅ｍ（ｔ）が反映され易くなる。これにより、非定常的な音声の影響を受け難い状態で、ノイズレベルｎ（ｔ）および信号レベルｓ（ｔ）を推定することができる。 From the relationship β <α, the noise level n (t) is less likely to reflect the current amplitude m (t) when the noise level tends to increase, and the current amplitude when the noise level tends to decrease. m (t) is easily reflected. Further, from the relationship of γ <δ, the signal level s (t) is less likely to reflect the current amplitude m (t) when the signal level tends to increase, and the current level when the signal level tends to decrease. The amplitude m (t) is easily reflected. As a result, the noise level n (t) and the signal level s (t) can be estimated in a state where the influence of the non-stationary sound is difficult.

主制御部１１は、操作部１３から入力される操作信号に基づいて、ズームボタンの操作状況を判定する（ステップＳ１０３）。ズームボタンが操作されている場合、主制御部１１は、操作状況に応じて光学ズームの目標倍率を更新する（ステップＳ１０５）。 The main control unit 11 determines the operation state of the zoom button based on the operation signal input from the operation unit 13 (step S103). When the zoom button is operated, the main control unit 11 updates the optical zoom target magnification in accordance with the operation status (step S105).

主制御部１１は、光学ズームの現時点の倍率が目標倍率未満であるかを判定する（ステップＳ１０７）。そして、条件に該当する場合にステップＳ１０９の処理を行い、条件に該当しない場合にステップＳ１１５の処理を行う。 The main control unit 11 determines whether or not the current magnification of the optical zoom is less than the target magnification (step S107). If the condition is met, the process of step S109 is performed. If the condition is not met, the process of step S115 is performed.

光学ズームの現時点の倍率が目標倍率未満である場合、主制御部１１は、ステップＳ１０１で算出したＳＮＲが所定の閾値Ｔ未満であるかを判定する（ステップＳ１０９）。ここで、閾値Ｔは、有音区間または発話区間を判定可能な値として予め設定されている。そして、主制御部１１は、条件に該当する場合、撮像部１５に光学ズーム動作（望遠側）を行わせ（ステップＳ１１１）、条件に該当しない場合、画像処理部２９にデジタルズーム動作（望遠側）を行わせる（ステップＳ１１３）。 When the current magnification of the optical zoom is less than the target magnification, the main control unit 11 determines whether the SNR calculated in step S101 is less than a predetermined threshold T (step S109). Here, the threshold T is set in advance as a value capable of determining a voiced section or a speech section. When the condition is met, the main control unit 11 causes the imaging unit 15 to perform an optical zoom operation (telephoto side) (step S111). When the condition is not met, the image processing unit 29 performs a digital zoom operation (telephoto side). (Step S113).

ステップＳ１０７の処理で条件に該当しない場合（光学ズームの現時点の倍率が目標倍率以上である場合）、主制御部１１は、光学ズームの現時点の倍率が目標倍率と等しいかを判定する（ステップＳ１１５）。そして、条件に該当する場合（光学ズームが目標倍率に達している場合）、ズーム動作を行わず（ステップＳ１１７）、条件に該当しない場合（光学ズームが目標倍率を超えている場合）、撮像部１５に光学ズーム動作（広角側）を行わせる（ステップＳ１１９）。 If the condition is not met in the process of step S107 (when the current magnification of the optical zoom is equal to or greater than the target magnification), the main control unit 11 determines whether the current magnification of the optical zoom is equal to the target magnification (step S115). ). When the condition is met (when the optical zoom has reached the target magnification), the zoom operation is not performed (step S117), when the condition is not met (when the optical zoom exceeds the target magnification), the imaging unit 15 is caused to perform an optical zoom operation (wide angle side) (step S119).

上記フローによれば、有音区間または発話区間でズーム指示が入力されると、光学ズームの作動時点が調節され、光学ズーム動作に代えて、デジタルズーム動作が行われる（ステップＳ１１３）。そして、有音区間または発話区間の終了後に、作動時点を調節された光学ズーム動作が行われる（ステップＳ１１１）。なお、作動時点を調節された光学ズーム動作を行う場合、先に行われたデジタルズーム動作が解除される。 According to the above flow, when a zoom instruction is input in a voiced section or a speech section, the operation time point of the optical zoom is adjusted, and a digital zoom operation is performed instead of the optical zoom operation (step S113). Then, after the end of the voiced section or the speech section, an optical zoom operation in which the operation time is adjusted is performed (step S111). In addition, when performing the optical zoom operation in which the operation time is adjusted, the previously performed digital zoom operation is cancelled.

図３には、第１の実施形態による処理が従来の処理と対比して示されている。なお、図中、記録対象となる音声データに対応する音声信号がＶｒとして示されている。従来、状態ＳＴ−０に示すように、対象音声の入力時に光学ズーム動作が行われると、ズームノイズＮｚが収音されて、対象音声に重畳されてズームノイズＮｚが記録されてしまう。そして、対象音声に重畳されたノイズを信号処理で除去しようとすると、対象音声に歪みが生じ、記録品質が低下してしまう。 FIG. 3 shows the processing according to the first embodiment in contrast to the conventional processing. In the figure, an audio signal corresponding to audio data to be recorded is shown as Vr. Conventionally, as shown in the state ST-0, when an optical zoom operation is performed at the time of inputting a target sound, the zoom noise Nz is picked up and superimposed on the target sound, and the zoom noise Nz is recorded. If the noise superimposed on the target voice is to be removed by signal processing, the target voice is distorted and the recording quality is deteriorated.

一方、本実施形態によれば、状態ＳＴ−１に示すように、対象音声が入力されていない時点（但し、定常雑音は入力されている。）に光学ズーム動作が行われるように、光学ズームの作動時点が調節される。また、光学ズーム動作に代えて、デジタルズーム動作が行われる。 On the other hand, according to the present embodiment, as shown in the state ST-1, the optical zoom operation is performed so that the optical zoom operation is performed when the target sound is not input (however, stationary noise is input). The operating time of is adjusted. In addition, a digital zoom operation is performed instead of the optical zoom operation.

これにより、ノイズ除去に伴う対象音声の歪みによる記録品質の低下を防止することができる。また、デジタルズーム動作を行うことで、画像データの記録品質の低下も防止することができる。 As a result, it is possible to prevent the recording quality from being deteriorated due to distortion of the target sound accompanying noise removal. Also, by performing the digital zoom operation, it is possible to prevent the recording quality of the image data from being lowered.

＜第２の実施形態＞
つぎに、図４を参照しながら、本発明の第２の実施形態について説明する。図４は、第２の実施形態に係るスチルカメラ１０による処理結果を示す模式図である。 <Second Embodiment>
Next, a second embodiment of the present invention will be described with reference to FIG. FIG. 4 is a schematic diagram illustrating a processing result by the still camera 10 according to the second embodiment.

図４には、第２の実施形態による処理が第１の実施形態による処理と対比して示されている。第１の実施形態によれば、状態ＳＴ−１に示すように、無音区間または非発話区間に光学ズーム動作が行われることで、ズームノイズＮｚの発生時点を対象音声が入力されていない時点に移動することができる。ところで、無音区間または非発話区間には、ズームノイズＮｚ以外とともに定常雑音のみがマイクロホン３５に収音されていると考えられる。 FIG. 4 shows the processing according to the second embodiment in contrast to the processing according to the first embodiment. According to the first embodiment, as shown in the state ST-1, an optical zoom operation is performed in a silent section or a non-speech section, so that the time point when the zoom noise Nz is generated is the time point when the target voice is not input. Can move. By the way, it is considered that only stationary noise is collected by the microphone 35 in addition to the zoom noise Nz in the silent section or the non-speech section.

このため、状態ＳＴ−２に示すように、光学ズームの作動時点を調節するとともに、無音区間または非発話区間（但し、定常雑音が存在する。）のうちズームノイズＮｚが収音される期間に亘って、マイクロホン３５から入力される音声信号を無音化してもよい。具体的には、例えば、音声信号処理部３９からメインバス６１への音声データの供給を制限したり、音声データを無音時のデータに変換したりすればよい。これにより、無音区間または非発話区間においても、ズームノイズＮｚを含まない音声データを記録することができる。 For this reason, as shown in the state ST-2, the operation time point of the optical zoom is adjusted, and the zoom noise Nz is picked up during the silent interval or the non-speech interval (where stationary noise exists). The sound signal input from the microphone 35 may be silenced. Specifically, for example, the supply of audio data from the audio signal processing unit 39 to the main bus 61 may be restricted, or the audio data may be converted to data during silence. As a result, it is possible to record audio data that does not include the zoom noise Nz even in a silent section or a non-speech section.

＜第３の実施形態＞
つぎに、図５を参照しながら、本発明の第３の実施形態について説明する。図５は、第３の実施形態に係るスチルカメラ１０による処理結果を示す模式図である。 <Third Embodiment>
Next, a third embodiment of the present invention will be described with reference to FIG. FIG. 5 is a schematic diagram illustrating a processing result by the still camera 10 according to the third embodiment.

図５には、第３の実施形態による処理が第２の実施形態による処理と対比して示されている。第２の実施形態によれば、状態ＳＴ−２に示すように、音声信号を無音化することで、ズームノイズＮｚを抑制することができるが、定常雑音も抑制されてしまう。よって、定常雑音が記録されている区間と記録されていない区間が生じてしまい、音声データの再生時にユーザに違和感を持たせてしまう場合がある。 FIG. 5 shows the processing according to the third embodiment in contrast to the processing according to the second embodiment. According to the second embodiment, as shown in the state ST-2, the sound noise is silenced to suppress the zoom noise Nz, but the stationary noise is also suppressed. As a result, a section in which stationary noise is recorded and a section in which no stationary noise is recorded are generated, which may make the user feel uncomfortable when reproducing audio data.

このため、状態ＳＴ−３に示すように、音声信号を無音化する期間に亘って、コンフォートノイズＮｃのデータを記録してもよい。ここで、コンフォートノイズＮｃは、スチルカメラ１０の使用環境における定常雑音の特性を反映して生成される。定常雑音の特性としては、例えば式１を用いて、周波数帯域毎にノイズレベルを推定した結果が適用される。 For this reason, as shown in the state ST-3, data of the comfort noise Nc may be recorded over a period during which the sound signal is silenced. Here, the comfort noise Nc is generated by reflecting the characteristics of stationary noise in the usage environment of the still camera 10. As the stationary noise characteristic, for example, the result of estimating the noise level for each frequency band using Equation 1 is applied.

具体的には、スチルカメラ１０の起動中、定期的にコンフォートノイズＮｃのデータを生成し、直近のデータが利用可能となるようにデータを更新しておけばよい。そして、音声信号処理部３９からメインバス６１へ供給される音声データをコンフォートノイズＮｃのデータに変換すればよい。これにより、ズームノイズＮｚを抑制するとともに、定常雑音のレベル変動を抑制することができる。 Specifically, while the still camera 10 is activated, the comfort noise Nc data is periodically generated, and the data may be updated so that the latest data can be used. Then, the audio data supplied from the audio signal processing unit 39 to the main bus 61 may be converted into comfort noise Nc data. As a result, it is possible to suppress the zoom noise Nz and suppress the level fluctuation of the stationary noise.

＜第４の実施形態＞
つぎに、図６を参照しながら、本発明の第４の実施形態について説明する。図６は、第４の実施形態に係るスチルカメラ１０による処理結果を示す模式図である。 <Fourth Embodiment>
Next, a fourth embodiment of the present invention will be described with reference to FIG. FIG. 6 is a schematic diagram illustrating a processing result by the still camera 10 according to the fourth embodiment.

図６には、第４の実施形態による処理が第１の実施形態による処理と対比して示されている。第１の実施形態によれば、状態ＳＴ−１´に示すように、光学ズーム動作時に対象音声Ｖｔが入力されると、対象音声Ｖｔに重畳してズームノイズＮｚが収音されてしまう。そして、対象音声Ｖｔにノイズが重畳されているので、音声信号の入力状況から有音区間または発話区間を適切に判定することができない。 FIG. 6 shows the processing according to the fourth embodiment in contrast to the processing according to the first embodiment. According to the first embodiment, as shown in the state ST-1 ′, when the target sound Vt is input during the optical zoom operation, the zoom noise Nz is collected so as to be superimposed on the target sound Vt. Since noise is superimposed on the target voice Vt, it is not possible to appropriately determine a voiced section or an utterance section from the input state of the voice signal.

このため、光学ズーム動作時のノイズレベルｎ_０（ｔ）を予め記録しておき、有音区間または発話区間の判定に利用することが考えられる。つまり、主制御部１１は、光学ズーム動作時以外に前述した式３を用いてＳＮＲを算出し、光学ズーム動作時に以下に示す式４を用いてＳＮＲを算出する。これにより、光学ズーム動作時でも、音声信号の入力状況から有音区間または発話区間を適切に判定することができる。 For this reason, it is conceivable that the noise level n ₀ (t) at the time of the optical zoom operation is recorded in advance and used for determination of a voiced section or a speech section. That is, the main control unit 11 calculates the SNR using the above-described equation 3 other than during the optical zoom operation, and calculates the SNR using the following equation 4 during the optical zoom operation. As a result, even during the optical zoom operation, it is possible to appropriately determine the voiced section or the speech section from the input state of the audio signal.

なお、光学ズーム動作時の有音区間または発話区間は、光学ズーム動作時のノイズレベルを用いる手法に限らず、例えば、光学ズームの動作時と非動作時の間で異なる閾値Ｔを用いる手法、ＳＮＲを周波数帯域毎に求めて加重平均する手法等により判定されてもよい。 Note that the voiced section or speech section at the time of the optical zoom operation is not limited to the method using the noise level at the time of the optical zoom operation. For example, the method using the threshold T that is different between the operation time and the non-operation time of the optical zoom, SNR It may be determined by a method of obtaining and weighted average for each frequency band.

そして、図２に示すステップＳ１１１の処理において、光学ズーム動作時に有音区間または発話区間を判定し、有音区間または発話区間が判定されると、状態ＳＴ−４に示すように、光学ズーム動作を一時的に停止してもよい。さらに、有音区間または発話区間の終了後に、光学ズーム動作を行うように、光学ズームの作動時点を調節してもよい。これにより、光学ズーム動作時に対象音声が入力されても、音声データの記録品質の低下を防止することができる。 Then, in the process of step S111 shown in FIG. 2, the voiced section or the speech section is determined during the optical zoom operation, and when the voiced section or the speech section is determined, as shown in the state ST-4, the optical zoom operation is performed. May be temporarily stopped. Furthermore, the operation time point of the optical zoom may be adjusted so that the optical zoom operation is performed after the end of the voiced section or the speech section. Thereby, even if the target sound is input during the optical zoom operation, it is possible to prevent the recording quality of the sound data from being deteriorated.

＜第５の実施形態＞
つぎに、本発明の第５の実施形態について説明する。 <Fifth Embodiment>
Next, a fifth embodiment of the present invention will be described.

第５の実施形態による処理では、音声信号のＳＮＲを用いて有音区間または発話区間を判定し、光学ズーム動作の可否を判定した。しかし、有音区間または発話区間は、ＳＮＲを用いる代わりに、音声検出により判定されてもよい。 In the processing according to the fifth embodiment, a voiced section or an utterance section is determined using the SNR of the voice signal, and the propriety of the optical zoom operation is determined. However, the voiced section or the speech section may be determined by voice detection instead of using the SNR.

具体的に、主制御部１１は、以下に示す式４によりスペクトルフラックスを算出する。なお、音声検出は、スペクトルフラックスを用いる手法に限らず、自己相関を用いる手法、ＬＰＣを用いる手法等により行われてもよい。 Specifically, the main control unit 11 calculates the spectral flux according to the following equation 4. Note that the voice detection is not limited to the method using the spectral flux, and may be performed by a method using autocorrelation, a method using LPC, or the like.

ここで、Ｘ（ｔ、ｆ）は、時刻ｔのフレームにおける周波数ｆのスペクトル成分の発生頻度を表している。スペクトルフラックスＦ（ｔ）は、式５に示すように、時刻ｔのフレームと時刻ｔ−１のフレームのスペクトル成分の発生頻度の差分の二乗和の平方根として定義される。 Here, X (t, f) represents the frequency of occurrence of the spectral component of frequency f in the frame at time t. As shown in Equation 5, the spectral flux F (t) is defined as the square root of the square sum of the difference in the frequency of occurrence of spectral components between the frame at time t and the frame at time t−1.

スペクトルフラックスＦ（ｔ）は、対象音声の場合、スペクトル成分の時間変動が大きいので大きくなり、定常雑音の場合、スペクトル成分の時間変動が小さいので小さくなることが知られている。このため、図２に示すステップＳ１０１の処理でスペクトルフラックスＦ（ｔ）を算出し、ステップＳ１０９の処理で閾値処理することで、ＳＮＲを用いる代わりに音声検出により有音区間または発話区間を判定することができる。 It is known that the spectrum flux F (t) is large because the time variation of the spectrum component is large in the case of the target speech, and is small because the time variation of the spectrum component is small in the case of stationary noise. Therefore, the spectral flux F (t) is calculated in the process of step S101 shown in FIG. 2, and the threshold value is processed in the process of step S109, so that the voiced section or the utterance section is determined by voice detection instead of using the SNR. be able to.

［３．まとめ］
以上説明したように、本発明の実施形態に係るスチルカメラ１０によれば、ズーム指示の操作信号が入力されると、音声信号の入力状況に応じて、光学ズームの作動時点が調節される。これにより、光学ズームの作動時点を対象音声が入力されていない時点に調節することで、ノイズ除去に伴う対象音声の歪みによる記録品質および／または出力品質の低下を防止することができる。 [3. Summary]
As described above, according to the still camera 10 according to the embodiment of the present invention, when a zoom instruction operation signal is input, the operation time point of the optical zoom is adjusted according to the input state of the audio signal. Accordingly, by adjusting the operation time point of the optical zoom to the time point when the target sound is not input, it is possible to prevent the recording quality and / or the output quality from being deteriorated due to the distortion of the target sound due to noise removal.

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that these also belong to the technical scope of the present invention.

例えば、上記実施形態の説明では、音声データをカードメモリに供給する場合について説明した。しかし、音声データは、カードメモリの代わりに、ハードディスク等の外部記録装置に供給されてもよい。また、音声データは、記録装置による記録に代えて、または記録と併せて、外部のスピーカ等の出力装置に供給されてもよい。 For example, in the description of the above embodiment, the case where audio data is supplied to the card memory has been described. However, the audio data may be supplied to an external recording device such as a hard disk instead of the card memory. Further, the audio data may be supplied to an output device such as an external speaker instead of or in addition to recording by the recording device.

１０スチルカメラ
１１主制御部
１３操作部
１５撮像部
１７レンズ制御部
１９シャッタ制御部
２１アイリス制御部
２３タイミング生成部
２５撮像信号変換部
２７撮像信号処理部
２９画像処理部
３１表示制御部
３３表示部
３５マイクロホン
３７音声信号変換部
３９音声信号処理部
４１音声処理部
４３スピーカ
４５メモリ制御部
４７メモリ
４９カード制御部
５１メモリカード DESCRIPTION OF SYMBOLS 10 Still camera 11 Main control part 13 Operation part 15 Imaging part 17 Lens control part 19 Shutter control part 21 Iris control part 23 Timing generation part 25 Imaging signal conversion part 27 Imaging signal processing part 29 Image processing part 31 Display control part 33 Display part 35 Microphone 37 Audio Signal Conversion Unit 39 Audio Signal Processing Unit 41 Audio Processing Unit 43 Speaker 45 Memory Control Unit 47 Memory 49 Card Control Unit 51 Memory Card

Claims

In response to a zoom instruction from a user, an imaging unit that performs an optical zoom operation;
An audio signal supply unit for supplying an audio signal input from a microphone to a recording device and / or an output device;
An audio signal determination unit for determining an input state of the audio signal;
An operation signal determination unit that determines whether an operation signal for the zoom instruction is input;
When an operation signal of the zoom instruction is input, a zoom operation adjustment unit that adjusts an operation time point of the optical zoom according to an input state of the audio signal
An imaging apparatus comprising:

The voice signal determination unit determines a voiced section or a speech section from the input state of the voice signal,
The zoom operation adjusting unit is configured so that, when an operation signal for the zoom instruction is input to the sounding section or the utterance section, the imaging unit performs an optical zoom operation after the end of the sounding section or the utterance section. The imaging apparatus according to claim 1, wherein an operation time point of the optical zoom is adjusted.

A zoom processing unit that performs digital zoom processing on an image pickup signal input from the image pickup unit;
The imaging apparatus according to claim 2, wherein the zoom processing unit performs a digital zoom process on the sounding section or the utterance section when an operation signal for the zoom instruction is input to the sounding section or the utterance section.

The audio signal supply unit does not supply an audio signal input from the microphone to the recording device and / or the output device during the optical zoom operation when adjusting the operation time of the optical zoom. The imaging device according to any one of the above.

When adjusting the operation time of the optical zoom, the audio signal supply unit supplies comfort noise to the recording device and / or the output device instead of the audio signal input from the microphone during the optical zoom operation. The imaging device according to any one of claims 1 to 3.

The audio signal determination unit determines the voiced section or the utterance section in consideration of noise associated with the optical zoom operation,
The zoom operation adjusting unit adjusts the operation of the optical zoom so that the imaging unit temporarily stops the optical zoom operation when the sound section or the speech section is determined during the optical zoom operation. Item 6. The imaging device according to any one of Items 1 to 5.

The imaging apparatus according to claim 6, wherein the zoom operation adjustment unit further adjusts the operation of the optical zoom such that the imaging unit performs an optical zoom operation after the end of the sound period or the speech period.

The imaging apparatus according to claim 1, wherein the voice signal determination unit determines a section in which a signal-to-noise ratio of the voice signal is equal to or greater than a threshold value as the voiced section or the speech section.

The imaging apparatus according to claim 1, wherein the voice signal determination unit determines a section in which a voice detection value of the voice signal is greater than or equal to a threshold value as the voiced section or the utterance section.

Supplying an audio signal input from a microphone to a recording device and / or an output device;
The step includes
Determining the input status of the audio signal;
Determining whether an operation signal for the zoom instruction is input;
When an operation signal for the zoom instruction is input, adjusting an operation time point of the optical zoom according to an input state of the audio signal;
An imaging method including:

Supplying an audio signal input from a microphone to a recording device and / or an output device;
The step includes
Determining the input status of the audio signal;
Determining whether an operation signal for the zoom instruction is input;
When an operation signal for the zoom instruction is input, adjusting an operation time point of the optical zoom according to an input state of the audio signal;
A program for causing a computer to execute an imaging method including: