JP2018207313A

JP2018207313A - Audio processing device and method of controlling the same

Info

Publication number: JP2018207313A
Application number: JP2017111162A
Authority: JP
Inventors: 悠貴辻本; Yuki Tsujimoto; 啓太園田; Keita Sonoda; 佐藤　龍介; Ryusuke Sato; 龍介佐藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-06-05
Filing date: 2017-06-05
Publication date: 2018-12-27
Anticipated expiration: 2037-06-05
Also published as: JP6929137B2

Abstract

To obtain a noise-reduced stereo audio signal with two-microphone constitution for stereo sound collection without adding any extra microphone.SOLUTION: An audio processing device has: a first microphone of a stereo microphone which is housed in the device at a predetermined position, and converts an audio entering from a first opening part provided at a predetermined position as a main target into an electric signal; and a second microphone which is housed in the device at a position where the microphone functions as the other microphone, and converts an audio entering from a second opening part, having smaller area than the first opening part, into an electric signal. Since driving noise from a predetermined driving part that the audio processing device has is propagated to the second microphone, the capacity of a space between the second microphone and second opening part is larger than the capacity between the first microphone and first opening part. The audio processing device uses and reversely converts audio signal from the first and second microphones into noise-reduced audio signals, and outputs the audio signals.SELECTED DRAWING: Figure 6

Description

本発明は駆動機構を有する装置における音声処理技術に関するものである。 The present invention relates to an audio processing technique in an apparatus having a drive mechanism.

デジタルカメラやビデオカメラに代表される撮像装置は、動被写体を撮像して、その結果得られる動画像のデータを記録すると共に、被写体の周囲の音声も併せて記録することができる。以降、記録の目的となる被写体の周囲の音声を、以下、「周囲環境音」と称する。 An imaging device represented by a digital camera or a video camera can capture a moving subject, record the moving image data obtained as a result, and also record the sound around the subject. Hereinafter, the sound around the subject to be recorded is hereinafter referred to as “ambient environmental sound”.

また、撮像装置は、光学レンズを移動させることで、撮像中に動被写体をフォーカスしたりズームしたりすることができる。ここで、光学レンズの移動はメカニカルに行われるものであり、そのレンズの移動の際には駆動音が発生する。この駆動音が、周囲環境音に重畳してしまうと、音声付動画像としての品位が損なわれてしまう。このため、従来から、このような駆動騒音を低減させることが望まれている。 Further, the imaging apparatus can focus or zoom the moving subject during imaging by moving the optical lens. Here, the movement of the optical lens is mechanically performed, and a driving sound is generated when the lens is moved. If this driving sound is superimposed on the ambient environmental sound, the quality of the moving image with sound is impaired. For this reason, it has been conventionally desired to reduce such driving noise.

特開２０１１−１１４４６５号公報JP 2011-114465 A

装置の駆動部等から発生する駆動騒音を低減させる方法を開示する文献として特許文献１が知られている。この特許文献１は、撮像装置内に通常の音声信号取得用のマイクの他にノイズ検出用マイクを搭載するものである。特許文献１によると撮像装置は、装置の外部の音声を録音する第一のマイクと、装置の内部で発生するノイズを録音するための第二のマイクを備えている。また、第一のマイクは装置の外側を向き、第二のマイクは装置の内側を向く構成とすることで、装置内部で発生するノイズを検出している。しかしながら、特許文献１に開示された技術は、要するに録音したい音声チャンネルに対して必要なマイクのほかに、ノイズを検出するためのマイクを追加するというものである。つまり、録音目的のマイク以外のマイクを搭載することになるので、コストや面積の問題が発生する。また、特許文献１ではモノラル音声録音を行う構成となっているが、例えば、ステレオ音声録音を行うためには、ステレオ用に２つ、ノイズ用に１つの計３つのマイクが必要となっています。 Patent Document 1 is known as a document disclosing a method for reducing drive noise generated from a drive unit or the like of an apparatus. In this patent document 1, a noise detection microphone is mounted in an imaging apparatus in addition to a normal audio signal acquisition microphone. According to Patent Document 1, the imaging apparatus includes a first microphone for recording sound outside the apparatus and a second microphone for recording noise generated inside the apparatus. In addition, the first microphone faces the outside of the device and the second microphone faces the inside of the device, thereby detecting noise generated inside the device. However, the technique disclosed in Patent Document 1 is to add a microphone for detecting noise in addition to a microphone necessary for an audio channel to be recorded. That is, since a microphone other than the microphone intended for recording is mounted, a problem of cost and area occurs. In addition, Patent Document 1 is configured to record monaural audio. For example, in order to perform stereo audio recording, two microphones for stereo and one for noise are required. .

本発明はかかる問題に鑑みなされたものであり、ステレオ音声録音を行うための２つのマイクを備える構成に対して、新たにマイクを追加することなく、ステレオの収音とノイズ検出を行うことが可能な技術を提供しようとするものである。 The present invention has been made in view of such a problem, and it is possible to perform stereo sound collection and noise detection without adding a new microphone to a configuration including two microphones for performing stereo sound recording. It is intended to provide possible technology.

この課題を解決するため、例えば本発明の音声処理装置は以下の構成を備える。すなわち、
筐体と、
駆動部と、
前記筐体の第１の所定位置に設けられた第１の開口部を介して音声が伝搬されるように前記筐体の内部に収容された第１のマイクと、
前記筐体の、前記前記第１の所定位置に関連した第２の所定位置に設けられた、前記第１の開口部よりも面積が小さい第２の開口部を介して音声が伝搬される第２のマイクであって、前記第２のマイクと前記２の開口部との間の第２の空間の容積が、前記第１のマイクと前記第１の開口部との間の第１の空間の容積よりも大きくなるように前記筐体の内部に収容された前記第２のマイクと、
前記第１のマイクから得られた時系列の音声データを第１の周波数スペクトルデータに、前記第２のマイクから得られた時系列の音声データを第２の周波数スペクトルデータに変換する変換手段と、
前記変換手段により得られた前記第１の周波数スペクトルデータと前記第２の周波数スペクトルデータから、周波数毎の前記駆動部による騒音の量を算出する算出手段と、
前記第１の周波数スペクトルデータ、第２の周波数スペクトルデータ、並びに、前記算出手段により算出された前記騒音の量に基づいて、前記騒音が抑制された、左チャネルの周波数スペクトルデータと、右チャネルの周波数スペクトルデータとを生成する生成手段と、
前記生成手段で生成された左右のチャネルのそれぞれの周波数スペクトルデータを、時系列の左右チャネルのそれぞれの音声データに逆変換する逆変換手段とを有する。 In order to solve this problem, for example, the speech processing apparatus of the present invention has the following configuration. That is,
A housing,
A drive unit;
A first microphone housed in the housing such that sound is propagated through a first opening provided at a first predetermined position of the housing;
The sound is propagated through a second opening having a smaller area than the first opening provided at a second predetermined position of the housing related to the first predetermined position. 2, wherein the volume of the second space between the second microphone and the second opening is the first space between the first microphone and the first opening. The second microphone housed inside the housing to be larger than the volume of
Conversion means for converting time-series audio data obtained from the first microphone into first frequency spectrum data and time-series audio data obtained from the second microphone into second frequency spectrum data; ,
Calculating means for calculating the amount of noise by the driving unit for each frequency from the first frequency spectrum data and the second frequency spectrum data obtained by the converting means;
Based on the first frequency spectrum data, the second frequency spectrum data, and the amount of the noise calculated by the calculation means, the frequency spectrum data of the left channel in which the noise is suppressed, and the right channel Generating means for generating frequency spectrum data;
Inverse conversion means for inversely converting the frequency spectrum data of the left and right channels generated by the generation means into audio data of the time-series left and right channels.

本発明によれば、ステレオ集音するための２つのマイクの構成のままで、新たなマイクを追加せずに、ノイズを低減したステレオ音声信号を得ることができる。 According to the present invention, it is possible to obtain a stereo audio signal with reduced noise without adding a new microphone, with the configuration of two microphones for collecting stereo sound.

実施形態の撮像装置のブロック構成図。1 is a block configuration diagram of an imaging apparatus according to an embodiment. 実施形態の撮像装置の撮像部、音声入力部の詳細なブロック構成図。The detailed block block diagram of the imaging part of the imaging device of embodiment, and an audio | voice input part. 実施形態の撮像装置の音声入力部のメカ構成図。The mechanical block diagram of the audio | voice input part of the imaging device of embodiment. 実施形態の撮像装置のＲＥＣのシーケンスを示すフローチャート。6 is a flowchart illustrating a REC sequence of the imaging apparatus according to the embodiment. 実施形態の撮像装置のＬ／Ｒｃｈ生成部のタイミングチャート。4 is a timing chart of an L / Rch generation unit of the imaging apparatus according to the embodiment. 実施形態の撮像装置の音声入力部の詳細な構成を示すブロック図。FIG. 2 is a block diagram illustrating a detailed configuration of a voice input unit of the imaging apparatus according to the embodiment. 周囲環境音の撮像装置への伝搬する系を示す図。The figure which shows the system which propagates ambient environment sound to the imaging device. 実施形態の撮像装置のメインマイクａからの周波数スペクトルとサブマイクｂからの周波数スペクトルの位相の関係を示す図。The figure which shows the relationship of the phase of the frequency spectrum from the main microphone a of the imaging device of embodiment, and the frequency spectrum from the submicrophone b. 実施形態のステレオ感の強調係数と周波数の関係を示す図。The figure which shows the relationship between the emphasis coefficient of the stereo feeling of embodiment, and a frequency. 実施形態の撮像装置のメインマイクａとサブマイクｂ其々の各周波数毎の振幅スペクトルを示す図。The figure which shows the amplitude spectrum for each frequency of each of the main microphone a and the sub microphone b of the imaging apparatus of the embodiment. 実施形態の撮像装置のサブマイクｂの周波数Ｎポイント目の時系列の振幅スペクトルを示す図。The figure which shows the time-series amplitude spectrum of the frequency N point of the sub microphone b of the imaging device of embodiment. 実施形態の撮像装置のメインマイクａとサブマイクｂ其々の時系列の位相を示す図である。It is a figure which shows the phase of a time series of the main microphone a and the submicrophone b of the imaging device of embodiment. 実施形態の撮像装置のＭｃｈ−Ｓｃｈ演算部の動作タイミングチャート。6 is an operation timing chart of an Mch-Sch operation unit of the imaging apparatus according to the embodiment. 実施形態の撮像装置の感度差補正部の動作タイミングチャート。6 is an operation timing chart of a sensitivity difference correction unit of the imaging apparatus according to the embodiment. 実施形態の撮像装置の音声入力部のメカ構成図。The mechanical block diagram of the audio | voice input part of the imaging device of embodiment. 実施形態の撮像装置のメインマイクａからの周波数スペクトルとサブマイクｂからの周波数スペクトルを示す図。The figure which shows the frequency spectrum from the main microphone a of the imaging device of embodiment, and the frequency spectrum from the submicrophone b. 実施形態の風雑音レベルに対する風雑音ゲインの周波数関係を示す図。The figure which shows the frequency relationship of the wind noise gain with respect to the wind noise level of embodiment. 実施形態の撮像装置のメインマイクａからの周波数スペクトルとサブマイクｂからの周波数スペクトルの合成される比率と周波数の関係を示す図。The figure which shows the relationship between the ratio and frequency with which the frequency spectrum from the main microphone a of the imaging device of embodiment and the frequency spectrum from the sub microphone b are synthesize | combined. 実施形態の撮像装置のステレオ抑制部について、駆動騒音検出時と風雑音検出時に応じて、ステレオ効果の強調に用いる強調係数を変更するタイミングチャート。The timing chart which changes the emphasis coefficient used for emphasizing a stereo effect according to the time of driving noise detection and the time of wind noise detection about the stereo suppression part of the imaging device of an embodiment. 実施形態の風雑音検出時における、合成比率と周波数とステレオ効果の強調に用いる強調係数の関係を示す図。The figure which shows the relationship of the emphasis coefficient used for emphasis of a synthetic | combination ratio, frequency, and a stereo effect at the time of the wind noise detection of embodiment. 実施形態の駆動騒音除去ゲインと風雑音減算量とＬｃｈ生成用ステレオゲインとＲｃｈ生成用ステレオゲインの時定数を示す図。The figure which shows the time constant of the drive noise removal gain of the embodiment, the wind noise subtraction amount, the stereo gain for Lch generation, and the stereo gain for Rch generation.

以下図面に従って本発明に係る実施形態を詳細に説明する。本実施形態では、撮像装置に収容される音声処理装置ついて説明する。 Embodiments according to the present invention will be described below in detail with reference to the drawings. In the present embodiment, a voice processing device accommodated in an imaging device will be described.

図１は実施形態の撮像装置１００の構成を示すブロック図である。撮像装置１００は、撮像部１０１、音声入力部１０２、メモリ１０３、表示制御部１０４、表示部１０５を有する。また、撮像装置１００は、符号化処理部１０６、記録再生部１０７、記録媒体１０８、制御部１０９、操作部１１０、音声出力部１１１、スピーカ１１２、外部出力部１１３、並びに、これらを接続するバス１１４を有する。 FIG. 1 is a block diagram illustrating a configuration of an imaging apparatus 100 according to the embodiment. The imaging apparatus 100 includes an imaging unit 101, a voice input unit 102, a memory 103, a display control unit 104, and a display unit 105. In addition, the imaging apparatus 100 includes an encoding processing unit 106, a recording / playback unit 107, a recording medium 108, a control unit 109, an operation unit 110, an audio output unit 111, a speaker 112, an external output unit 113, and a bus connecting them. 114.

撮像部１０１は、撮影光学レンズにより取り込まれた被写体の光学像を撮像素子により画像信号に変換し、アナログデジタル変換、画像調整処理などを行い、画像データを生成する。撮影光学レンズは、内蔵型の光学レンズであっても、着脱式の光学レンズであっても良い。また、撮像素子は、ＣＣＤ、ＣＭＯＳ等に代表される光電変換素子であればよい。 The imaging unit 101 converts an optical image of a subject captured by a photographing optical lens into an image signal by an imaging element, performs analog-digital conversion, image adjustment processing, and the like, and generates image data. The photographing optical lens may be a built-in optical lens or a detachable optical lens. Further, the imaging element may be a photoelectric conversion element typified by a CCD, a CMOS or the like.

音声入力部１０２は、内蔵または音声端子を介して接続されたマイクにより、音声処理装置外（実施形態では撮像装置外）からの周辺の音声を集音し、電気信号を生成する。また、音声入力部１０２は、アナログデジタル変換、音声処理などを行い音声データを生成する。マイクは、指向性、無指向性を問わないが、本実施形態では無指向性のマイクを使用するものとする。 The audio input unit 102 collects peripheral audio from outside the audio processing device (in the embodiment, outside the imaging device) with a built-in microphone or connected via an audio terminal, and generates an electrical signal. The audio input unit 102 performs analog-digital conversion, audio processing, and the like to generate audio data. The microphone may be directivity or omnidirectional, but in this embodiment, an omnidirectional microphone is used.

メモリ１０３は、撮像部１０１により得られた画像データや、音声入力部１０２により得られた音声データを一時的に記憶するために利用される。 The memory 103 is used for temporarily storing image data obtained by the imaging unit 101 and audio data obtained by the audio input unit 102.

表示制御部１０４は、撮像部１０１により得られた画像データに係る画像や、撮像装置１００の操作画面、メニュー画面等を表示部１０５や、不図示の映像端子を介して外部のディスプレイに表示する。表示部１０５の種類は問わないが、例えば液晶表示器である。 The display control unit 104 displays an image related to the image data obtained by the imaging unit 101, an operation screen of the imaging apparatus 100, a menu screen, and the like on the display unit 105 or an external display via a video terminal (not shown). . Although the kind of the display part 105 is not ask | required, it is a liquid crystal display, for example.

符号化処理部１０６は、メモリ１０３に一時的に記憶された画像データや音声データを読み出して所定の符号化を行い、圧縮画像データ、圧縮音声データ等を生成する。また、音声データに関しては圧縮しないようにしてもよい。圧縮画像データは、例えば、ＭＰＥＧ２やＨ．２６４／ＭＰＥＧ４−ＡＶＣなど、どのような圧縮方式で圧縮されたものであってもよい。また、圧縮音声データも、ＡＣ３（Ａ）ＡＣ、ＡＴＲＡＣ、ＡＤＰＣＭなどのような圧縮方式で圧縮されたものであってもよい。また、符号化処理部１０６は、上記の符号化データ（圧縮画像データ、圧縮音声データ）の復号処理も行う。 The encoding processing unit 106 reads out image data and audio data temporarily stored in the memory 103, performs predetermined encoding, and generates compressed image data, compressed audio data, and the like. Further, the audio data may not be compressed. The compressed image data is, for example, MPEG2 or H.264. It may be compressed by any compression method such as H.264 / MPEG4-AVC. The compressed audio data may also be compressed by a compression method such as AC3 (A) AC, ATRAC, ADPCM. The encoding processing unit 106 also performs decoding processing of the encoded data (compressed image data and compressed audio data).

記録再生部１０７は、記録媒体１０８に対して、符号化処理部１０６で生成された圧縮画像データ、圧縮音声データまたは音声データ、各種データを記録したり、記録媒体１０８から読出したりする。ここで、記録媒体１０８は、画像データ、音声データ等を記録する不揮発性の記録媒体である。例えば、磁気ディスク、光学式ディスク、半導体メモリなどであり、その種類は問わない。また、記録媒体１０８は、本装置１００に対して固定であっても、脱着可能であっても構わない。 The recording / reproducing unit 107 records the compressed image data, compressed audio data or audio data, and various data generated by the encoding processing unit 106 on the recording medium 108, and reads out from the recording medium 108. Here, the recording medium 108 is a non-volatile recording medium that records image data, audio data, and the like. For example, there are a magnetic disk, an optical disk, a semiconductor memory, and the like. Further, the recording medium 108 may be fixed to the apparatus 100 or detachable.

制御部１０９は、バス１１４を介して、撮像装置１００の各ブロックに制御信号を送信することで撮像装置１００の各ブロックを制御するものであり、各種制御を実行するためのＣＰＵやメモリなどから構成される。制御部１０９で使用するメモリは、各種制御プログラムを格納するＲＯＭ、演算処理のためのワークエリアとして利用するＲＡＭ等であり、制御部１０９の外付けのメモリも含む。 The control unit 109 controls each block of the image capturing apparatus 100 by transmitting a control signal to each block of the image capturing apparatus 100 via the bus 114. From the CPU, memory, or the like for executing various controls. Composed. The memory used by the control unit 109 is a ROM that stores various control programs, a RAM that is used as a work area for arithmetic processing, and the like, and includes an external memory of the control unit 109.

操作部１１０は、ボタン、ダイヤル、タッチパネル、或いはそれらの組み合わせであり、ユーザの操作に応じて、指示信号を制御部１０９に送信する。操作部１１０は、具体的には、動画記録開始、終了を指示するための撮影ボタン、光学的もしくは電子的に画像に対してズーム動作する指示するためのズームレバー、各種調整をするための十字キー、決定キーなどを有する。 The operation unit 110 is a button, a dial, a touch panel, or a combination thereof, and transmits an instruction signal to the control unit 109 according to a user operation. Specifically, the operation unit 110 includes a shooting button for instructing start and end of moving image recording, a zoom lever for instructing to perform an optical or electronic zoom operation on the image, and a cross for performing various adjustments. Key, enter key, etc.

音声出力部１１１は、記録再生部１０７により再生された音声データや圧縮音声データ、または制御部１０９により出力される音声データをスピーカ１１２や音声端子などに出力する。外部出力部１１３は、記録再生部１０７により再生された圧縮映像データや圧縮音声データ、音声データなどを外部機器に出力する。データバス１１４は、音声データや画像データ等の各種データ、各種制御信号を撮像装置１００の各ブロックに供給する。 The audio output unit 111 outputs the audio data and compressed audio data reproduced by the recording / reproducing unit 107 or the audio data output by the control unit 109 to the speaker 112 and the audio terminal. The external output unit 113 outputs the compressed video data, compressed audio data, audio data, and the like reproduced by the recording / reproducing unit 107 to an external device. The data bus 114 supplies various data such as audio data and image data and various control signals to each block of the imaging apparatus 100.

以上が実施形態における撮像装置１００の構成の説明である。次に、実施形態における撮像装置の通常の動作について説明する。 The above is the description of the configuration of the imaging device 100 according to the embodiment. Next, a normal operation of the imaging apparatus in the embodiment will be described.

本実施形態の撮像装置１００は、ユーザが操作部１１０を操作して電源を投入する指示が出されたことに応じて、不図示の電源供給部からの電力が、撮像装置の各ブロックに供給される。 In the imaging apparatus 100 according to the present embodiment, power from a power supply unit (not illustrated) is supplied to each block of the imaging apparatus in response to a user operating the operation unit 110 and giving an instruction to turn on the power. Is done.

電源が供給されると、制御部１０９は、操作部１１０のモード切り換えスイッチが、例えば、撮影モード、再生モード等のどのモードを指定しているかを、操作部１１０からの指示信号により確認する。撮影モードにおける動画記録モードでは、撮像部１０１により得られた画像データと音声入力部１０２により得られた音声データとを１つの画像ファイルとして保存する。再生モードでは、記録媒体１０８に記録された画像ファイルを記録再生部１０７により再生して表示部１０５に表示させ、スピーカ１１２より出力することになる。 When the power is supplied, the control unit 109 confirms which mode, for example, the shooting mode or the reproduction mode is designated by the mode change switch of the operation unit 110 by an instruction signal from the operation unit 110. In the moving image recording mode in the shooting mode, the image data obtained by the imaging unit 101 and the audio data obtained by the audio input unit 102 are stored as one image file. In the reproduction mode, the image file recorded on the recording medium 108 is reproduced by the recording / reproducing unit 107, displayed on the display unit 105, and output from the speaker 112.

撮影モードでは、まず、制御部１０９は、撮影待機状態に移行させるように制御信号を撮像装置１００の各ブロックに送信し、以下のような動作をさせる。 In the shooting mode, first, the control unit 109 transmits a control signal to each block of the imaging apparatus 100 so as to shift to the shooting standby state, and performs the following operation.

撮像部１０１は、撮影光学レンズにより取り込まれた被写体の光学像を撮像素子により動画像信号に変換し、アナログデジタル変換、画像調整処理などを行い、動画像データを生成する。そして、撮像部１０１は、得られた動画像データを表示処理部１０４に送信し、表示部１０５に表示させる。なお、撮像部１０１は、１フレームが水平１９２０画素×垂直１０８０画素、フレームレートが３０フレーム／秒の動画像信号を出力する。ユーザはこの様にして表示された画面を見ながら撮影の準備を行う。 The imaging unit 101 converts an optical image of a subject captured by a photographic optical lens into a moving image signal using an imaging element, performs analog-digital conversion, image adjustment processing, and the like to generate moving image data. Then, the imaging unit 101 transmits the obtained moving image data to the display processing unit 104 and causes the display unit 105 to display the moving image data. The imaging unit 101 outputs a moving image signal in which one frame is horizontal 1920 pixels × vertical 1080 pixels and the frame rate is 30 frames / second. The user prepares for shooting while viewing the screen displayed in this way.

音声入力部１０２は、複数のマイクにより得られたアナログ音声信号をデジタル信号に変換し、得られた複数のデジタル音声信号を処理して、マルチチャンネルの音声データを生成する。そして、得られた音声データを音声出力部１１１に送信し、接続されたスピーカ１１２や不図示のイヤホンから音声として出力させる。ユーザは、この様にして出力された音声を聞きながら記録音量を決定するためのマニュアルボリュームの調整をすることもできる。 The audio input unit 102 converts analog audio signals obtained by a plurality of microphones into digital signals, processes the obtained digital audio signals, and generates multi-channel audio data. Then, the obtained audio data is transmitted to the audio output unit 111 and is output as audio from the connected speaker 112 or an unillustrated earphone. The user can also adjust the manual volume to determine the recording volume while listening to the sound output in this way.

次に、ユーザが操作部１１０の記録ボタンを操作することにより撮影開始の指示信号が制御部１０９に送信されると、制御部１０９は、撮像装置１００の各ブロックに撮影開始の指示信号を送信し、撮影モードにおける動画像記録モードに移行する。具体的な、制御部１０９の処理は以下の通りである。 Next, when a shooting start instruction signal is transmitted to the control unit 109 by the user operating the recording button of the operation unit 110, the control unit 109 transmits a shooting start instruction signal to each block of the imaging apparatus 100. Then, the moving image recording mode in the shooting mode is entered. Specific processing of the control unit 109 is as follows.

撮像部１０１は、撮影光学レンズにより取り込まれた被写体の光学像を撮像素子により動画像信号に変換し、アナログデジタル変換、画像調整処理などを行い、動画像データを生成する。そして、得られた動画像データを表示処理部１０４に送信し、表示部１０５に表示させる。また、撮像部１０１は、得られた画像データをメモリ１０３へ送信する。 The imaging unit 101 converts an optical image of a subject captured by a photographic optical lens into a moving image signal using an imaging element, performs analog-digital conversion, image adjustment processing, and the like to generate moving image data. Then, the obtained moving image data is transmitted to the display processing unit 104 and displayed on the display unit 105. Further, the imaging unit 101 transmits the obtained image data to the memory 103.

音声入力部１０２は、複数のマイクにより得られたアナログ音声信号をデジタル信号に変換し、得られた複数のデジタル音声信号を処理して、マルチチャンネルの音声データを生成する。そして、得られた音声データをメモリ１０３に送信する。また、マイクが一つの場合には、得られたアナログ音声信号をデジタル変換し音声データを生成し、音声データをメモリ１０３に送信する。 The audio input unit 102 converts analog audio signals obtained by a plurality of microphones into digital signals, processes the obtained digital audio signals, and generates multi-channel audio data. Then, the obtained audio data is transmitted to the memory 103. If there is only one microphone, the obtained analog audio signal is digitally converted to generate audio data, and the audio data is transmitted to the memory 103.

符号化処理部１０６は、メモリ１０３に一時的に記憶された動画像データや音声データを読み出して所定の符号化を行い、圧縮動画像データ、圧縮音声データ等を生成し、再びメモリ１０３に格納する。 The encoding processing unit 106 reads moving image data and audio data temporarily stored in the memory 103 and performs predetermined encoding, generates compressed moving image data, compressed audio data, and the like, and stores them in the memory 103 again. To do.

制御部１０９は、メモリ１０３に格納された圧縮動画像データ、圧縮音声データを合成し、データストリームを形成し、記録再生部１０７に出力する。音声データを圧縮しない場合には、制御部１０９は、メモリ１０３に格納された音声データと圧縮動画像データとを合成し、データストリームを形成して記録再生部１０７に出力する。 The control unit 109 combines the compressed moving image data and the compressed audio data stored in the memory 103 to form a data stream, and outputs the data stream to the recording / reproducing unit 107. When the audio data is not compressed, the control unit 109 combines the audio data stored in the memory 103 and the compressed moving image data, forms a data stream, and outputs the data stream to the recording / reproducing unit 107.

記録再生部１０７は、ＵＤＦ、ＦＡＴ等のファイルシステム管理のもとに、データストリームを一つの動画ファイルとして記録媒体１０８に書き込んでいく。 The recording / playback unit 107 writes the data stream to the recording medium 108 as one moving image file under the management of a file system such as UDF or FAT.

撮像装置１００は、上記の処理を動画記録状態中、継続することになる。そして、ユーザが操作部１１０の記録ボタンを操作することにより撮影終了の指示信号が制御部１０９に送信されると、制御部１０９は、撮像装置１００の各ブロックに撮影終了の指示信号を送信し、以下のような動作をさせる。 The imaging apparatus 100 continues the above process during the moving image recording state. When the user operates the recording button of the operation unit 110 and a shooting end instruction signal is transmitted to the control unit 109, the control unit 109 transmits a shooting end instruction signal to each block of the imaging apparatus 100. The following operations are performed.

撮像部１０１、音声入力部１０２は、それぞれ動画像データ、音声データの生成を停止する。符号化処理部１０６は、メモリに記憶されている残りの画像データと音声データとを読出して所定の符号化を行い、圧縮動画像データ、圧縮音声データ等を生成し終えたら動作を停止する。音声データを圧縮しない場合には、当然、圧縮動画像データの生成が終わったら動作を停止する。 The imaging unit 101 and the audio input unit 102 stop generating moving image data and audio data, respectively. The encoding processing unit 106 reads the remaining image data and audio data stored in the memory, performs predetermined encoding, and stops operation when generation of compressed moving image data, compressed audio data, and the like is completed. When the audio data is not compressed, the operation is naturally stopped when the generation of the compressed moving image data is finished.

そして、制御部１０９は、これらの最後の圧縮動画像データと、圧縮音声データまたは音声データとを合成し、データストリームを形成し、記録再生部１０７に出力する。 Then, the control unit 109 synthesizes the last compressed moving image data and the compressed audio data or audio data, forms a data stream, and outputs the data stream to the recording / reproducing unit 107.

記録再生部１０７は、ＵＤＦ、ＦＡＴ等のファイルシステム管理のもとに、データストリームを一つの動画ファイルとして記録媒体１０８に書き込んでいく。そして、データストリームの供給が停止したら、動画ファイルを完成させて、記録動作を停止させる。 The recording / playback unit 107 writes the data stream to the recording medium 108 as one moving image file under the management of a file system such as UDF or FAT. When the supply of the data stream is stopped, the moving image file is completed and the recording operation is stopped.

制御部１０９は、記録動作が停止すると、撮影待機状態に移行させるように制御信号を撮像装置１００の各ブロックに送信して、撮影待機状態に戻る。 When the recording operation stops, the control unit 109 transmits a control signal to each block of the imaging apparatus 100 so as to shift to the shooting standby state, and returns to the shooting standby state.

次に、再生モードについて説明する。ユーザが操作部１１０を操作して再生モードにした場合、制御部１０９は、再生状態に移行させるように制御信号を撮像装置１００の各ブロックに送信し、以下のような動作をさせる。 Next, the playback mode will be described. When the user operates the operation unit 110 to enter the reproduction mode, the control unit 109 transmits a control signal to each block of the imaging apparatus 100 so as to shift to the reproduction state, and performs the following operation.

記録媒体１０８に記録された圧縮動画像データと圧縮音声データとからなる動画ファイルを記録再生部１０７が読出して、読出された圧縮動画像データ、圧縮音声データを符号化処理部１０６に送る。 The recording / playback unit 107 reads a moving image file composed of compressed moving image data and compressed audio data recorded on the recording medium 108, and sends the read compressed moving image data and compressed audio data to the encoding processing unit 106.

符号化処理部１０６は、圧縮動画像データ、圧縮音声データを復号し、それぞれを表示制御部１０４、音声出力部１１１に送信する。表示制御部１０４は、復号された動画像データを表示部１０５に表示させる。音声出力部１１１は、復号された音声データを内蔵のスピーカ１１２、または、取付けられた外部スピーカに出力して、音響として再生させる。 The encoding processing unit 106 decodes the compressed moving image data and the compressed audio data, and transmits them to the display control unit 104 and the audio output unit 111, respectively. The display control unit 104 causes the display unit 105 to display the decoded moving image data. The audio output unit 111 outputs the decoded audio data to the built-in speaker 112 or an attached external speaker and reproduces it as sound.

本実施形態の撮像装置１００は以上のように、動画像、音声の記録再生を行うことができる。 As described above, the imaging apparatus 100 of the present embodiment can perform recording and reproduction of moving images and sounds.

本実施形態では、音声入力部１０２において、音声信号を得る際に、マイクにより得られた音声信号のレベル調整処理等の処理をしている。この処理は、装置が起動してから常に行われてもよいし、撮影モードが選択されてから行われてもよい。或いは、音声の記録に関連するモードが選択されてから行われても良い。また、音声の記録に関連するモードにおいて、音声の記録が開始したことに応じて上記の処理を行ってもよい。本実施形態では、動画像撮影の開始されたタイミングで上記の処理を行うものとして説明する。 In the present embodiment, when obtaining an audio signal, the audio input unit 102 performs processing such as level adjustment processing of the audio signal obtained by the microphone. This process may always be performed after the apparatus is activated, or may be performed after the photographing mode is selected. Alternatively, it may be performed after a mode related to audio recording is selected. Further, in a mode related to audio recording, the above processing may be performed in response to the start of audio recording. In the present embodiment, description will be made assuming that the above processing is performed at the timing when moving image shooting is started.

図２は本実施形態の撮像装置１００の撮像部１０１、音声入力部１０２のブロック構成図である。 FIG. 2 is a block configuration diagram of the imaging unit 101 and the audio input unit 102 of the imaging apparatus 100 of the present embodiment.

撮像部１０１は、被写体の光学像を取り込む光学レンズ２０１、光学レンズ２０１により取り込まれた被写体の光学像を電気信号（画像信号）に変換させる撮像素子２０２を有する。さらに、撮像部１０１は、撮像素子２０２により得られたアナログ画像信号をデジタル画像信号に変換し、画質調整処理をして画像データを形成し、メモリに送信する画像処理部２０３を有している。さらに、撮像部１０１は、光学レンズ２０１を移動させるための位置センサ、モータ等の公知の駆動メカニズムを有する光学レンズ制御部２０４を有している。本実施形態では、撮像部１０１に光学レンズ２０１、光学レンズ制御部２０４が内蔵されているように記載しているが、光学レンズ２０１は、レンズマウントを介して撮像装置１００に着脱自在な交換レンズであっても良い。また、光学レンズ制御部２０４は、交換レンズ内に設けられるようにしても良い。 The imaging unit 101 includes an optical lens 201 that captures an optical image of a subject, and an imaging element 202 that converts the optical image of the subject captured by the optical lens 201 into an electrical signal (image signal). Furthermore, the imaging unit 101 includes an image processing unit 203 that converts an analog image signal obtained by the imaging element 202 into a digital image signal, performs image quality adjustment processing, forms image data, and transmits the image data to a memory. . Furthermore, the imaging unit 101 includes an optical lens control unit 204 having a known drive mechanism such as a position sensor for moving the optical lens 201, a motor, and the like. In the present embodiment, it is described that the image pickup unit 101 includes the optical lens 201 and the optical lens control unit 204. However, the optical lens 201 is an interchangeable lens that is detachable from the image pickup apparatus 100 via a lens mount. It may be. Further, the optical lens control unit 204 may be provided in the interchangeable lens.

ここで、ズーム動作、フォーカス調整などの指示を、ユーザが操作部１１０を操作して入力すると、制御部１０９は、光学レンズ制御部２０４に光学レンズ２０１を移動させるための制御信号（駆動信号）を送信する。光学レンズ制御部２０４は、この制御信号に応じて、不図示の位置センサで光学レンズ２０１の位置を確認し、不図示のモータ等で光学レンズ２０１の移動を行う。また、画像処理部２０３により得られた画像や被写体との距離を制御部１０９が確認し、自動的に調整する場合は、光学レンズを駆動させる制御信号を送信することになる。また、画像のブレを防止する、いわゆる防振機能を備えている場合には、制御部１０９は、不図示の振動センサにより検出された振動に基づいて、光学レンズ２０１を移動させるための制御信号を光学レンズ制御部２０４に送信することになる。 Here, when a user operates the operation unit 110 to input instructions such as zoom operation and focus adjustment, the control unit 109 causes the optical lens control unit 204 to move the optical lens 201 (control signal (drive signal)). Send. In response to this control signal, the optical lens control unit 204 confirms the position of the optical lens 201 with a position sensor (not shown), and moves the optical lens 201 with a motor (not shown). In addition, when the control unit 109 confirms the image obtained by the image processing unit 203 and the distance to the subject and automatically adjusts the distance, a control signal for driving the optical lens is transmitted. In addition, when a so-called image stabilization function for preventing image blurring is provided, the control unit 109 controls the optical lens 201 based on vibration detected by a vibration sensor (not shown). Is transmitted to the optical lens control unit 204.

このときに、光学レンズ２０１の移動による駆動騒音や光学レンズ２０１を移動させるためのモータの駆動騒音が発生することになる。制御部１０９からの光学レンズ２０１を駆動させる制御信号に応じて、光学レンズ制御部２０４が光学レンズ２０１を駆動させる。従って、制御部１０９は、駆動騒音が発生するタイミングを知る（検出するまたは、決定する）ことができる。 At this time, driving noise due to movement of the optical lens 201 and driving noise of a motor for moving the optical lens 201 are generated. In response to a control signal for driving the optical lens 201 from the control unit 109, the optical lens control unit 204 drives the optical lens 201. Therefore, the control unit 109 can know (detect or determine) the timing at which driving noise occurs.

本実施形態において、光学レンズ２０１の制御により、例えば最大で５０倍、最小で１倍のズーミングを光学的に行うことができる。これを本実施形態では光学ズームと言う。勿論、光学ズームの倍率は前記以上でも前記以下でも構わないものとする。光学ズームは、制御部１０９からの指示で、光学レンズ制御部２０４が、光学レンズ２０１の光学レンズを移動させることで、被写体の光学像をズーミングさせるものである。また、画像処理部２０３は、撮像素子２０２により得られた画像信号の一部をズームインした画像信号を出力する電子ズーム機能を備えている。また、撮像素子２０２により得る画像の範囲を広くし、画像処理部２０３で画像サイズをズームアウトした画像信号を出力する電子ズーム機能を備えている。 In the present embodiment, under the control of the optical lens 201, for example, zooming at a maximum of 50 times and at a minimum of 1 time can be optically performed. This is called optical zoom in this embodiment. Of course, the magnification of the optical zoom may be above or below. In the optical zoom, the optical lens control unit 204 moves the optical lens of the optical lens 201 in accordance with an instruction from the control unit 109 to zoom the optical image of the subject. The image processing unit 203 also has an electronic zoom function that outputs an image signal obtained by zooming in on a part of the image signal obtained by the image sensor 202. Also, an electronic zoom function is provided that outputs an image signal obtained by widening the range of an image obtained by the image sensor 202 and zooming out the image size by the image processing unit 203.

以上が実施形態における撮像部１０１の構成とその動作である。次に、音声入力部１０２の構成と動作を説明する。 The above is the configuration and operation of the imaging unit 101 in the embodiment. Next, the configuration and operation of the voice input unit 102 will be described.

実施形態の撮像装置１００は、参照符号２０５ａ，２０５ｂで示す２つのマイクを有する。これらマイク２０５ａ、２０５ｂは空気（媒体）を伝播する振動を電気信号に変換し、音声信号を出力するものである。マイク２０５ａがメイン（ＭＡＩＮ）マイク、マイク２０５ｂがサブ（ＳＵＢ）マイク２０５ｂであって、以降、この名称で表現する。 The imaging apparatus 100 according to the embodiment includes two microphones denoted by reference numerals 205a and 205b. These microphones 205a and 205b convert vibrations propagating through the air (medium) into electrical signals and output audio signals. The microphone 205a is a main (MAIN) microphone, and the microphone 205b is a sub (SUB) microphone 205b.

詳細は後述する説明で明らかにするが、メインマイク２０５ａは、ステレオ音声の一方のチャネルに対応するマイクとして機能し、且つ、音声処理装置外（実施形態では撮像装置１００外）から音声を主として取得するためのマイクである。また、サブマイク２０５ｂは、ステレオ音声のもう一方のチャネルに対応するマイクとして機能する位置に配置される。サブマイク２０５ｂは、メインマイク２０５ａと比較して、音声処理装置内（撮像装置１００）の駆動部からの駆動騒音を主として取得するためのマイクである。 As will be described in detail later, the main microphone 205a functions as a microphone corresponding to one channel of stereo sound and mainly acquires sound from outside the sound processing apparatus (in the embodiment, outside the imaging apparatus 100). It is a microphone for doing. The sub microphone 205b is arranged at a position that functions as a microphone corresponding to the other channel of stereo sound. The sub microphone 205b is a microphone for mainly acquiring driving noise from the driving unit in the sound processing apparatus (the imaging apparatus 100) as compared with the main microphone 205a.

メインマイク２０５ａはアナログの音声信号をＭｃｈ（メインチャネル）、サブマイク２０５ｂはアナログの音声信号をＳｃｈ（サブチャネル）として出力する。本実施形態において第一の音声入力部をメインマイク２０５ａ、第一の音声信号をＭｃｈとする。また、第二の音声入力部をサブマイク２０５ｂ、第二の音声信号をＳｃｈとする。本実施形態では２チャンネルで構成されたステレオ方式とするため、メインマイク２０５ａとサブマイク２０５ｂの配置位置は、撮像部１０１の正立に構えた際の水平方向に所定距離隔てた位置に設けられている。なお、実施形態では、マイク数を２としているが、それ以上のマイクを保持する構成でも構わない。 The main microphone 205a outputs an analog audio signal as Mch (main channel), and the sub microphone 205b outputs an analog audio signal as Sch (subchannel). In the present embodiment, the first sound input unit is the main microphone 205a, and the first sound signal is Mch. The second audio input unit is the sub microphone 205b, and the second audio signal is Sch. In this embodiment, since the stereo system is configured with two channels, the arrangement positions of the main microphone 205a and the sub microphone 205b are provided at positions separated by a predetermined distance in the horizontal direction when the imaging unit 101 is held upright. Yes. In the embodiment, the number of microphones is 2. However, a configuration in which more microphones are held may be used.

メインマイク２０５ａ，サブマイク２０５ｂにより得られたアナログ音声信号はＡ／Ｄ変換部２０６に供給され、ここでそれぞれの音声信号がデジタルの音声データに変換される。本実施形態におけるＡ／Ｄ変換部２０６は、４８ＫＨｚのサンプリングレートでサンプリングを行い、１サンプリング当たり１６ｂｉｔのデジタルデータを生成するものとする。 The analog audio signals obtained by the main microphone 205a and the sub microphone 205b are supplied to the A / D conversion unit 206, where each audio signal is converted into digital audio data. The A / D conversion unit 206 in this embodiment performs sampling at a sampling rate of 48 KHz and generates 16-bit digital data per sampling.

Ａ／Ｄ変換部２０６で得られた、予め設定された音声信号の期間（フレーム）の時系列のデジタルの音声データはＦＦＴ部２０７に供給され、ここで高速フーリエ変換され、周波数毎の周波数スペクトルデータに変換される。本実施形態において、周波数スペクトルは、０Ｈｚから４８ｋＨｚまでにおいて１０２４ポイントの周波数スペクトルデータとして変換され、ナイキスト周波数である２４ｋＨｚまでにおいては５１２ポイントの周波数スペクトルを持つものとする。メインマイク２０５ａからの周波数スペクトルデータをＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルデータをＳｕｂ［０］〜［５１１］と表す。また、本実施形態において、第一の音声スペクトルデータをＭａｉｎ［０］〜［５１１］、第二の音声スペクトルデータをＳｕｂ［０］〜［５１１］と表すものとする。なお、各スペクトルデータの添え字が「０」が最低周波数を、「５１１」が最大周波数を表すものとする。 The time-sequential digital audio data obtained by the A / D conversion unit 206 in a predetermined period (frame) of the audio signal is supplied to the FFT unit 207, where it is subjected to fast Fourier transform, and a frequency spectrum for each frequency. Converted to data. In this embodiment, the frequency spectrum is converted as 1024 point frequency spectrum data from 0 Hz to 48 kHz, and has a 512 point frequency spectrum up to 24 kHz which is the Nyquist frequency. The frequency spectrum data from the main microphone 205a is represented as Main [0] to [511], and the frequency spectrum data from the sub microphone 205b is represented as Sub [0] to [511]. In the present embodiment, the first sound spectrum data is represented as Main [0] to [511], and the second sound spectrum data is represented as Sub [0] to [511]. It is assumed that the subscript “0” of each spectrum data represents the minimum frequency and “511” represents the maximum frequency.

駆動音演算処理部２０９は、駆動部を駆動させるための、制御部１０９からの制御信号に応じて、ＦＦＴ部２０７により得た周波数スペクトルデータの周波数成分毎の、駆動騒音の減算量を決定する。この駆動騒音は、光学レンズ２０１が駆動されることにより発生される。なお、本実施形態における駆動部はズーム動作、フォーカス調整により駆動する光学レンズ２０１を指すものとする。駆動音演算処理部２０９は、周波数スペクトル毎の減算量を表すＮＣ＿Ｇａｉｎ［０］〜［５１１］と、駆動騒音検出信号を出力する。 The drive sound calculation processing unit 209 determines a drive noise subtraction amount for each frequency component of the frequency spectrum data obtained by the FFT unit 207 in accordance with a control signal from the control unit 109 for driving the drive unit. . This driving noise is generated when the optical lens 201 is driven. Note that the drive unit in the present embodiment refers to the optical lens 201 that is driven by zoom operation and focus adjustment. The drive sound calculation processing unit 209 outputs NC_Gain [0] to [511] representing the subtraction amount for each frequency spectrum and a drive noise detection signal.

詳細は後述する説明から明らかになるが、感度差補正部２０８は、駆動音演算処理部２０９からの、１フレーム前の駆動騒音検出信号に応じて、現フレームのＭａｉｎ［０］〜［５１１］に対するＳｕｂ［０］〜［５１１］の感度を補正し、補正後の周波数スペクトルデータＭａｉｎ［０］〜［５１１］、Ｓｕｂ［０］〜［５１１］を出力する。 Although details will become clear from the description to be described later, the sensitivity difference correction unit 208 performs the Main [0] to [511] of the current frame in accordance with the driving noise detection signal one frame before from the driving sound calculation processing unit 209. The sensitivity of Sub [0] to [511] is corrected, and the corrected frequency spectrum data Main [0] to [511] and Sub [0] to [511] are output.

風雑音演算処理部２１０は、ＦＦＴ部２０７からの周波数スペクトルデータから、風雑音を検出し、減算量を決定する。そして、風雑音演算処理部２１０は、決定した風雑音の周波数スペクトルデータＷＣ＿Ｇａｉｎ［０］〜［５１１］と、風雑音レベル信号を出力する。 The wind noise calculation processing unit 210 detects wind noise from the frequency spectrum data from the FFT unit 207 and determines a subtraction amount. Then, the wind noise calculation processing unit 210 outputs the determined wind noise frequency spectrum data WC_Gain [0] to [511] and the wind noise level signal.

ステレオゲイン演算処理部２１１は、ＦＦＴ部２０７からの周波数スペクトルデータに対し、ステレオのＬｃｈ（左チャネル）及びＲｃｈ（右チャネル）それぞれのゲインを決定する。そして、ステレオゲイン演算処理部２１１は、各チャネルの、決定した周波数スペクトルの成分毎のゲインを表すＧａｉｎ＿Ｌ［０］〜［５１１］とＧａｉｎ＿Ｒ［０］〜［５１１］を出力する。ここで、左チャンネルのゲインがＧａｉｎ＿Ｌ［０］〜［５１１］、右チャンネルのゲインがＧａｉｎ＿Ｒ［０］〜［５１１］である。 Stereo gain calculation processing section 211 determines the gains of stereo Lch (left channel) and Rch (right channel) for the frequency spectrum data from FFT section 207. Then, the stereo gain calculation processing unit 211 outputs Gain_L [0] to [511] and Gain_R [0] to [511] representing the gains of the determined frequency spectrum components of each channel. Here, the gain of the left channel is Gain_L [0] to [511], and the gain of the right channel is Gain_R [0] to [511].

トータルゲイン演算部２１２は、駆動音演算処理部２０９、風雑音演算処理部２１０、および、ステレオゲイン演算処理部２１１において決定したＮＣ＿Ｇａｉｎ［０］〜［５１１］、ＷＣ＿Ｇａｉｎ［０］〜［５１１］、Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｇａｉｎ＿Ｒ［０］〜［５１１］を合算し、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒ［０］〜［５１１］を出力する。具体的には、次式の通りである。実施形態では、このトータルゲイン演算部２１２は、トータルゲイン決定部として機能する。
Total_Gain_R[]=NC_Gain[]+WC_Gain[]+Gain_R[]
Total_Gain_L[]=NC_Gain[]+WC_Gain[]+Gain_L[] The total gain calculation unit 212 includes NC_Gain [0] to [511], WC_Gain [0] to [511] determined by the drive sound calculation processing unit 209, the wind noise calculation processing unit 210, and the stereo gain calculation processing unit 211. Gain_L [0] to [511] and Gain_R [0] to [511] are added together to output Total_Gain_L [0] to [511] and Total_Gain_R [0] to [511]. Specifically, it is as follows. In the embodiment, the total gain calculation unit 212 functions as a total gain determination unit.
Total_Gain_R [] = NC_Gain [] + WC_Gain [] + Gain_R []
Total_Gain_L [] = NC_Gain [] + WC_Gain [] + Gain_L []

Ｌ／Ｒｃｈ生成部２１３は、ＭＡＩＮ［０］〜［５１１］の周波数毎の周波数スペクトルと、トータルゲイン演算部２１２で決定したＴｏｔａｌ＿Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒ［０］〜［５１１］を用いて、ＬｃｈとＲｃｈの周波数スペクトルデータを生成する（詳細後述）。つまり、本実施形態におけるＬ／Ｒｃｈ生成部２１３はステレオ生成部として機能する。 The L / Rch generation unit 213 obtains the frequency spectrum for each frequency of MAIN [0] to [511], and Total_Gain_L [0] to [511] and Total_Gain_R [0] to [511] determined by the total gain calculation unit 212. By using this, frequency spectrum data of Lch and Rch is generated (details will be described later). That is, the L / Rch generation unit 213 in this embodiment functions as a stereo generation unit.

ｉＦＦＴ部２１４は、Ｌ／Ｒｃｈ生成部２１３で生成された各チャネルの周波数スペクトルデータに対して逆高速フーリエ変換を行い、それぞれのチャネルの時系列の音声信号に戻す。 The iFFT unit 214 performs inverse fast Fourier transform on the frequency spectrum data of each channel generated by the L / Rch generation unit 213, and returns the time-series audio signal of each channel.

音声処理部２１５は、イコライザ等の処理を実施する。オートレベルコントローラは、時系列の音声信号の振幅を所定のレベルに調整する（以後、ＡＬＣ部２１６）。 The audio processing unit 215 performs processing such as equalizer. The auto level controller adjusts the amplitude of the time-series audio signal to a predetermined level (hereinafter, ALC unit 216).

以上の構成により、音声入力部１０２は、音声信号に所定の処理を行い音声データを形成し、メモリ１０３へ送信することになる。 With the above configuration, the audio input unit 102 performs predetermined processing on the audio signal to form audio data, and transmits the audio data to the memory 103.

次に、本実施形態の撮像装置１００の記録動作について図４を用いて説明する。同図は実施形態の撮像装置１００の記録のシーケンスを示すフローチャートである。 Next, a recording operation of the imaging apparatus 100 according to the present embodiment will be described with reference to FIG. FIG. 3 is a flowchart illustrating a recording sequence of the imaging apparatus 100 according to the embodiment.

Ｓ４０１にて、ユーザによる操作部１１０の操作により記録（ＲＥＣ）開始が指示されることで、本処理が開始される。Ｓ４０２にて、制御部１０９は音声録音するために音声のパスを接続する。音声パスが確立した後、Ｓ４０３にて、制御部１０９は、本実施形態で説明する制御を含めた信号処理の初期設定をおこない、処理を開始する。この信号処理の内容に関しては後述する。以降、ＲＥＣシーケンスが終了するまで、本実施形態で説明する制御を含めた信号処理は実施される。 In S401, the recording (REC) start is instructed by the operation of the operation unit 110 by the user, and this processing is started. In step S402, the control unit 109 connects a voice path for voice recording. After the voice path is established, in step S403, the control unit 109 performs initial setting of signal processing including control described in the present embodiment, and starts processing. The contents of this signal processing will be described later. Thereafter, signal processing including control described in the present embodiment is performed until the REC sequence ends.

記録処理シーケンス中、制御部１０９は、ユーザによる操作部１１０への操作を監視する。そして、ユーザにより、操作部１１０の一部であるズームレバーが操作された場合、Ｓ４０４からＳ４０５に処理を進め、制御部１０９は撮像部１０１を制御し、ズーム処理を行う。このズーム処理は、Ｓ４０６にて、ユーザがズームレバーの操作を止めたと判定されるまで継続する。ズーム処理中は、先に説明したように、レンズ２０１の移動による駆動騒音が発生し、その騒音が周囲環境音に重畳して録音されてしまう点に注意されたい。 During the recording processing sequence, the control unit 109 monitors an operation on the operation unit 110 by the user. When the zoom lever that is a part of the operation unit 110 is operated by the user, the process proceeds from S404 to S405, and the control unit 109 controls the imaging unit 101 to perform zoom processing. This zoom process continues until it is determined in S406 that the user has stopped operating the zoom lever. It should be noted that during the zoom process, as described above, driving noise is generated due to the movement of the lens 201, and the noise is recorded superimposed on the ambient environmental sound.

そして、制御部１０９は、ユーザによる操作部１１０の操作や、記録媒体１０８の状況によって、記録終了が指示されたと判断した場合、Ｓ４０７からＳ４０８に処理を進める。Ｓ４０８にて、制御部１０９は音声パスを切断し、次いで、Ｓ４０９にて信号処理も終了する。 If the control unit 109 determines that the end of recording is instructed by the operation of the operation unit 110 by the user or the status of the recording medium 108, the control unit 109 advances the process from S407 to S408. In step S408, the control unit 109 disconnects the voice path, and then ends the signal processing in step S409.

次に、本実施形態の撮像装置１００の音声入力部１０２の詳細を図６を用いて説明する。同図は、本実施形態の音声入力部１０２の詳細な構成を示すブロック図である。 Next, details of the audio input unit 102 of the imaging apparatus 100 of the present embodiment will be described with reference to FIG. FIG. 2 is a block diagram showing a detailed configuration of the voice input unit 102 of the present embodiment.

本実施形態における音声入力部１０２は、前述の通り、空気中を伝播する音声振動を電気信号に変換し、音声信号を出力するメインマイク２０５ａとサブマイク２０５ｂを有する。また前述の通り、Ａ／Ｄ変換部２０６は、アナログ音声信号を、４８ＫＨｚ、１６ｂｉｔのサンプリングを行い、アナログ音声信号からデジタル音声データに変換する。 As described above, the audio input unit 102 according to the present embodiment includes the main microphone 205a and the sub microphone 205b that convert audio vibration propagating in the air into an electric signal and output the audio signal. As described above, the A / D conversion unit 206 performs sampling of the analog audio signal at 48 KHz and 16 bits, and converts the analog audio signal into digital audio data.

感度差補正部２０８は、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］と、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］との感度差を補正する。このため、感度差補正部２０８は、感度補正積分器２０８１、感度補正検出部２０８２、補正量演算部２０８３、感度補正ゲインテーブル２０８４、感度差補正ゲイン部２０８５を含む。 The sensitivity difference correction unit 208 corrects the difference in sensitivity between the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b. For this reason, the sensitivity difference correction unit 208 includes a sensitivity correction integrator 2081, a sensitivity correction detection unit 2082, a correction amount calculation unit 2083, a sensitivity correction gain table 2084, and a sensitivity difference correction gain unit 2085.

感度補正積分器２０８１は、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］、及び、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］に対し、時間軸方向のレベル変化に時定数を持たせる。 The sensitivity correction integrator 2081 changes the level in the time axis direction with respect to the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b. Has a time constant.

感度補正検出部２０８２は、感度補正積分器２０８１にて時定数を持たせた周波数スペクトルデータであるＭａｉｎ［０］〜［５１１］とＳｕｂ［０］〜［５１１］のレベル差『Ｍａｉｎ［ｎ］−Ｓｕｂ［ｎ］』を、全周波数ポイントについて求める。ここで、差分は正負の符号が発生することに注意されたい。 The sensitivity correction detection unit 2082 has a level difference “Main [n]” between Main [0] to [511] and Sub [0] to [511] which is frequency spectrum data provided with a time constant by the sensitivity correction integrator 2081. -Sub [n] "is obtained for all frequency points. Here, it should be noted that the sign of the difference is generated.

補正量演算部２０８３は、感度補正検出部２０８２からの差分レベルが負の場合（Ｍａｉｎ［ｎ］＜Ｓｕｂ［ｎ］の場合に等価）、Ｍａｉｎ［ｎ］＝Ｓｕｂ［ｎ］となるようにするため、Ｓｕｂ［ｎ］の補正量を算出する。 When the difference level from the sensitivity correction detection unit 2082 is negative (equivalent to Main [n] <Sub [n]), the correction amount calculation unit 2083 makes Main [n] = Sub [n]. Therefore, the correction amount of Sub [n] is calculated.

なお、感度補正検出部２０８２からの差分レベルが正の場合（Ｍａｉｎ［ｎ］≧Ｓｕｂ［ｎ］の場合に等価）、Ｓｕｂ［ｎ］を補正する必要が無い。したがって、この場合、補正量演算部２０８３はＳｕｂ［ｎ］の補正量として０を出力する。 When the difference level from the sensitivity correction detection unit 2082 is positive (equivalent to Main [n] ≧ Sub [n]), it is not necessary to correct Sub [n]. Therefore, in this case, the correction amount calculation unit 2083 outputs 0 as the correction amount of Sub [n].

感度補正ゲインテーブル２０８４は、補正量演算部２０８３にて算出された各周波数スペクトルＳｕｂ［０］〜［５１１］の具体的な補正量が格納している。 The sensitivity correction gain table 2084 stores specific correction amounts of the frequency spectra Sub [0] to [511] calculated by the correction amount calculation unit 2083.

感度差補正ゲイン部２０８５は、実際に、感度補正ゲインテーブル２０８４を基に各周波数スペクトルＳｕｂ［０］〜［５１１］のレベル補正を実行する。 The sensitivity difference correction gain unit 2085 actually executes level correction of each frequency spectrum Sub [0] to [511] based on the sensitivity correction gain table 2084.

ここで上記の時定数については、感度補正の追従を限りなく遅くする事を目的とするので数十秒単位とする。また、感度補正積分器２０８１は、後述する駆動検出部２０９５により駆動騒音の検出を表す駆動騒音検出信号を受けた場合、その動作を停止する。これは、光学レンズ２０１が駆動している不安定な期間における積分を排除する事を意図する。 Here, the above time constant is set to a unit of several tens of seconds because the purpose is to delay the follow-up of the sensitivity correction as much as possible. Further, when the sensitivity correction integrator 2081 receives a drive noise detection signal indicating detection of drive noise from the drive detection unit 2095 described later, the sensitivity correction integrator 2081 stops its operation. This is intended to eliminate integration during an unstable period when the optical lens 201 is driven.

以上が実施形態における感度差補正部２０８を構成する各処理部の説明である。次に、駆動音演算処理部２０９について説明する。 The above is the description of each processing unit constituting the sensitivity difference correction unit 208 in the embodiment. Next, the drive sound calculation processing unit 209 will be described.

駆動音演算処理部２０９は、メインマイク２０５ａ、サブマイク２０５ｂからの周波数スペクトルデータであるＭａｉｎ［０］〜［５１１］、Ｓｕｂ［０］〜［５１１］から、駆動騒音の減算量ＮＣ＿Ｇａｉｎ［０］〜［５１１］を決定し、駆動騒音を検出した事を示す駆動騒音検出信号を出力する。このため、駆動音演算処理部２０９は、Ｍｃｈ−Ｓｃｈ演算部２０９１、駆動騒音除去ゲイン演算部２０９２、時間毎振幅変動検出部２０９３、時間毎位相変動検出部２０９４、駆動検出部２０９５、フレーム間振幅差検出部２０９６、駆動音減算量積分器２０９７を有する。 The driving sound calculation processing unit 209 subtracts the driving noise subtracting amount NC_Gain [0] from Main [0] to [511] and Sub [0] to [511] which are frequency spectrum data from the main microphone 205a and the sub microphone 205b. [511] is determined, and a drive noise detection signal indicating that the drive noise has been detected is output. For this reason, the drive sound calculation processing unit 209 includes an Mch-Sch calculation unit 2091, a drive noise removal gain calculation unit 2092, an hourly amplitude fluctuation detection unit 2093, an hourly phase fluctuation detection unit 2094, a drive detection unit 2095, and an interframe amplitude. A difference detection unit 2096 and a drive sound subtraction amount integrator 2097 are included.

Ｍｃｈ−Ｓｃｈ演算部２０９１は、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］から、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］を差し引いた値を、駆動騒音の減算量として出力する。 The Mch-Sch operation unit 2091 calculates a value obtained by subtracting the frequency spectrum data Sub [0] to [511] from the sub microphone 205b from the frequency spectrum data Main [0] to [511] from the main microphone 205a. Output as subtraction amount.

ただし、周波数スペクトルｎポイント目において、Ｍａｉｎ［ｎ］＞Ｓｕｂ［ｎ］の場合には、減算量［ｎ］は０とする。つまり、Ｍｃｈ−Ｓｃｈ演算部２０９１は、周波数スペクトルｎポイント目において、Ｍａｉｎ［ｎ］−Ｓｕｂ［ｎ］＜０であることを条件に負の値を減算量［ｎ］として出力する。 However, if Main [n]> Sub [n] at the nth point of the frequency spectrum, the subtraction amount [n] is 0. That is, the Mch-Sch operation unit 2091 outputs a negative value as the subtraction amount [n] on the condition that Main [n] −Sub [n] <0 at the nth point of the frequency spectrum.

また、Ｍａｉｎ［ｎ］に対してＳｕｂ［ｎ］が十分に大きく、Ｍａｉｎ［ｎ］−Ｓｕｂ［ｎ］が予め設定した閾値（負の値）を下回る場合、Ｍｃｈ−Ｓｃｈ演算部２０９１は、駆動騒音を検出したことを示す検出信号［ｎ］を出力し、否の場合には検出信号を出力しない。なお、実際には、騒音検出を"１"、非検出を"０"として表しても良い。 In addition, when Sub [n] is sufficiently larger than Main [n] and Main [n] −Sub [n] falls below a preset threshold value (negative value), the Mch-Sch operation unit 2091 drives A detection signal [n] indicating that noise has been detected is output. If no, a detection signal is not output. Actually, noise detection may be expressed as “1” and non-detection as “0”.

また、駆動騒音検出の判定は、減算関係を逆にして、Ｓｕｂ［ｎ］−Ｍａｉｎ［ｎ］と閾値（正の値を持つ）との比較で行っても良い。この場合、Ｍｃｈ−Ｓｃｈ演算部２０９１は、この演算の結果が閾値を上回った場合に駆動騒音検出を示す信号を出力することになる。 The determination of driving noise detection may be performed by comparing Sub [n] −Main [n] and a threshold value (having a positive value) with the subtraction relationship reversed. In this case, the Mch-Sch calculation unit 2091 outputs a signal indicating drive noise detection when the result of this calculation exceeds a threshold value.

駆動検出部２０９５は、Ｍｃｈ−Ｓｃｈ演算部２０９１からの１フレーム分の検出信号［０］〜［５１１］を受け、その中に、１以上の検出信号が存在した場合、該当のフレームにおいては駆動騒音を検出した事を表す駆動騒音検出信号を出力する。 The drive detection unit 2095 receives the detection signals [0] to [511] for one frame from the Mch-Sch calculation unit 2091. A drive noise detection signal indicating that noise has been detected is output.

Ｍｃｈ−Ｓｃｈ演算部２０９１及び駆動検出部２０９５による処理は、正の閾値をＴｈと定義したとき、次式を満たす「ｉ」（ｉは０から５１１のいずれか）が存在するか否かの判定を行い、その判定結果を駆動騒音検出を示す信号として出力していると、と言える。
Ｍａｉｎ［ｉ］＋Ｔｈ＜Ｓｕｂ［ｉ］ The processing by the Mch-Sch operation unit 2091 and the drive detection unit 2095 determines whether or not “i” (i is any of 0 to 511) that satisfies the following expression exists when the positive threshold is defined as Th. It can be said that the determination result is output as a signal indicating drive noise detection.
Main [i] + Th <Sub [i]

時間毎振幅変動検出部２０９３は、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］に対し、時間方向のフレーム間での振幅変動量の検出を行う。具体的には、時間毎振幅変動検出部２０９３は、現在のフレームの周波数スペクトルのｎポイント目の成分値と、前フレームの周波数スペクトルのｎポイント目の成分値との差分値を求め、出力する。そして、ｎポイント目での変動量が予め設定された閾値を超えた場合、時間毎振幅変動検出部２０９３は、時間毎振幅変動量［ｎ］を出力し、閾値以下の場合には０を出力する。 The hourly amplitude fluctuation detection unit 2093 receives the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b between frames in the time direction. The amplitude fluctuation amount is detected. Specifically, the hourly amplitude fluctuation detection unit 2093 obtains and outputs a difference value between the n-th component value of the frequency spectrum of the current frame and the n-th component value of the frequency spectrum of the previous frame. . When the fluctuation amount at the n-th point exceeds a preset threshold value, the hourly amplitude fluctuation detection unit 2093 outputs the hourly amplitude fluctuation amount [n], and outputs 0 when it is equal to or smaller than the threshold value. To do.

時間毎位相変動検出部２０９４は、後述の位相差判定部２１１１から取得する位相情報に基づき、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］の位相変動量の検出を行う。例えば周波数スペクトルｎポイント目において前記変動量が予め定められた閾値を超えた場合は、時間毎位相変動検出部２０９４は、時間毎位相変動量［ｎ］を出力する。また、変動量が閾値以下の場合、時間毎位相変動検出部２０９４は、時間毎位相変動量［ｎ］を出力しない、又は、時間毎位相変動量［ｎ］＝０として出力する。 The hourly phase fluctuation detection unit 2094 is based on phase information acquired from a phase difference determination unit 2111 described later, and frequency spectrum data Main [0] to [511] from the main microphone 205a and frequency spectrum data Sub [ 0] to [511] are detected. For example, when the fluctuation amount exceeds a predetermined threshold at the nth point of the frequency spectrum, the hourly phase fluctuation detection unit 2094 outputs the hourly phase fluctuation amount [n]. When the fluctuation amount is equal to or smaller than the threshold value, the hourly phase fluctuation detection unit 2094 does not output the hourly phase fluctuation amount [n] or outputs the hourly phase fluctuation amount [n] = 0.

フレーム間振幅差検出部２０９６は、駆動検出部２０９５からの駆動騒音検出信号に基づき、サブマイク２０５ｂからの周波数スペクトルデータであるＳｕｂ［０］〜［５１１］の時間方向のフレーム間での振幅差の検出を行う。例えば周波数スペクトルｎポイント目において、駆動騒音検出信号があり、前フレームと現フレームとの振幅差が予め定められた閾値を超えた場合は、フレーム間振幅差検出部２０９６は、フレーム間振幅差量［ｎ］を出力する。また、差が閾値以下の場合、フレーム間振幅差検出部２０９６は、フレーム間振幅差量［ｎ］を出力しない、もしくは、フレーム間振幅差量［ｎ］＝０として出力する。 Based on the drive noise detection signal from the drive detection unit 2095, the inter-frame amplitude difference detection unit 2096 detects the amplitude difference between frames in the time direction of Sub [0] to [511], which is the frequency spectrum data from the sub microphone 205b. Perform detection. For example, when there is a driving noise detection signal at the nth point of the frequency spectrum and the amplitude difference between the previous frame and the current frame exceeds a predetermined threshold, the interframe amplitude difference detection unit 2096 selects the interframe amplitude difference amount. [N] is output. When the difference is equal to or smaller than the threshold value, the inter-frame amplitude difference detection unit 2096 does not output the inter-frame amplitude difference amount [n] or outputs the inter-frame amplitude difference amount [n] = 0.

駆動騒音除去ゲイン演算部２０９２は、同一フレームにおいて、前述のＭｃｈ−Ｓｃｈ演算部２０９５からの減算量［０］〜［５１１］、時間毎振幅変動検出部２０９３からの時間毎振幅変動量［０］〜［５１１］、時間毎位相変動検出部２０９４からの時間毎位相変動量［０］〜［５１１］、フレーム間振幅差検出部２０９６からのフレーム間振幅差量［０］〜［５１１］其々の結果に対し、予め定められた系数を乗算して、加算した駆動騒音除去量［０］〜［５１１］を算出し、出力する。 In the same frame, the drive noise elimination gain calculation unit 2092 subtracts [0] to [511] from the above-described Mch-Sch calculation unit 2095, and the hourly amplitude fluctuation amount [0] from the hourly amplitude fluctuation detection unit 2093. To [511], hourly phase fluctuation amounts [0] to [511] from the hourly phase fluctuation detection unit 2094, and interframe amplitude difference amounts [0] to [511] from the interframe amplitude difference detection unit 2096, respectively. Is multiplied by a predetermined system number, and the added drive noise removal amounts [0] to [511] are calculated and output.

駆動音減算量積分器２０９７は、駆動騒音除去ゲイン演算部２０９２から出力された駆動騒音除去量［０］〜［５１１］に対し、時間方向の変動量に時定数を持たせ、駆動騒音除去ゲインＮＣ＿Ｇａｉｎ［０］〜［５１１］（正負の符号付き）を出力する。 The drive sound subtraction amount integrator 2097 gives a time constant to the amount of fluctuation in the time direction with respect to the drive noise removal amounts [0] to [511] output from the drive noise removal gain calculation unit 2092, thereby driving noise removal gain. NC_Gain [0] to [511] (with positive and negative signs) are output.

以上が実施形態の駆動音演算処理部２０９の構成と動作である。次に、風雑音演算処理部２１０について説明する。 The above is the configuration and operation of the drive sound calculation processing unit 209 of the embodiment. Next, the wind noise calculation processing unit 210 will be described.

風雑音演算処理部２１０は、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］から風雑音を検出し、減算量を表すＷＣ＿Ｇａｉｎ［０］〜［５１１］と、風雑音レベル信号を出力する。風雑音演算処理部２１０は、風検出部２１０１、風雑音ゲイン演算部２１０２、風雑音減算量積分器２１０３を有する。 The wind noise calculation processing unit 210 detects wind noise from the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b, and calculates the subtraction amount. WC_Gain [0] to [511] and a wind noise level signal are output. The wind noise calculation processing unit 210 includes a wind detection unit 2101, a wind noise gain calculation unit 2102, and a wind noise subtraction amount integrator 2103.

風検出部２１０１は、メインマイク２０５ａからの周波数スペクトルＭａｉｎ［０］〜［５１１］、及び、サブマイク２０５ｂからの周波数スペクトルＳｕｂ［０］〜［５１１］のうちから、それぞれの低周波数域の所定数のポイントの相関に応じて風雑音レベルの検出を行う。例えば低域の１０ポイントにおいて、次式に従い風雑音レベルを求め、出力する。なお、ここでの"ｎ"は、実施形態の場合には０乃至９であるが、この数は適宜変更しても構わない。
風雑音レベル＝Σ（Ｍａｉｎ［ｎ］−Ｓｕｂ［ｎ］）／（Ｍａｉｎ［ｎ］＋Ｓｕｂ［ｎ］）
なお、上式のΣは、ｎ＝０乃至９の合算を示している。 The wind detection unit 2101 has a predetermined number of low frequency regions out of the frequency spectra Main [0] to [511] from the main microphone 205a and the frequency spectra Sub [0] to [511] from the sub microphone 205b. The wind noise level is detected according to the correlation of the points. For example, at 10 points in the low band, the wind noise level is obtained and output according to the following equation. Here, “n” is 0 to 9 in the embodiment, but this number may be changed as appropriate.
Wind noise level = Σ (Main [n] −Sub [n]) / (Main [n] + Sub [n])
Note that Σ in the above equation indicates the sum of n = 0 to 9.

また、風雑音ゲイン演算部２１０２は、図１７に示すような特性線分を持つテーブルを有する。図示のように、１つの線分は、或る周波数以下ではゲインが負、その周波数以上ではゲインが０となる。そして、ゲインが負から０となる周波数の位置が互いに異なる複数の線分を含む。そして、風雑音ゲイン演算部２１０２は、風雑音レベルに従った１つの線分を用いて、風雑音ゲイン［０］〜［５１１］を決定し、出力する。なお、実施形態では、風雑音ゲイン［０］〜［５１１］をテーブルを用いて決定するものとしたが、風雑音レベルを引数とする関数を用いて、風雑音ゲイン［０］〜［５１１］を決定しても良い。 The wind noise gain calculation unit 2102 has a table having characteristic line segments as shown in FIG. As shown in the figure, one line segment has a negative gain below a certain frequency, and has a gain of zero above that frequency. Then, it includes a plurality of line segments having different frequency positions at which the gain changes from negative to zero. Then, the wind noise gain calculation unit 2102 determines and outputs wind noise gains [0] to [511] using one line segment according to the wind noise level. In the embodiment, the wind noise gains [0] to [511] are determined using a table. However, the wind noise gains [0] to [511] are determined using a function having the wind noise level as an argument. May be determined.

風雑音減算量積分器２１０３は、風雑音ゲイン演算部２１０２から出力された風雑音ゲイン［０］〜［５１１］に対し、時間方向の変動量に時定数を持たせ、風雑音ゲインＷＣ＿Ｇａｉｎ［０］〜［５１１］（正負の符号付き）を出力する。 The wind noise subtraction amount integrator 2103 gives a time constant to the fluctuation amount in the time direction with respect to the wind noise gains [0] to [511] output from the wind noise gain calculation unit 2102, and wind noise gain WC_Gain [0 ] To [511] (with positive and negative signs) are output.

以上が実施形態における風雑音演算処理部２１０の構成と動作である。次に、実施形態におけるステレオゲイン演算処理部２１１を説明する。 The above is the configuration and operation of the wind noise calculation processing unit 210 in the embodiment. Next, the stereo gain calculation processing unit 211 in the embodiment will be described.

ステレオゲイン演算処理部２１１は、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］から、ステレオのＬｃｈのゲインＧａｉｎ＿Ｌ［０］〜［５１１］と、ＲｃｈのゲインＧａｉｎ＿Ｒ［０］〜［５１１］を生成し、出力する。このために、ステレオゲイン演算処理部２１１は、位相差判定部２１１１、ステレオゲイン演算部２１１２、ステレオ抑制部２１１３、左ゲイン積分器２１１４，右ゲイン積分器２１１５を有する。 The stereo gain calculation processing unit 211 obtains the stereo Lch gain Gain_L [0] from the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b. ] To [511] and Rch gains Gain_R [0] to [511] are generated and output. For this purpose, the stereo gain calculation processing unit 211 includes a phase difference determination unit 2111, a stereo gain calculation unit 2112, a stereo suppression unit 2113, a left gain integrator 2114, and a right gain integrator 2115.

位相差判定部２１１１は、周波数スペクトルデータＭａｉｎ［０］〜［５１１］に対するＳｕｂ［０］〜［５１１］の位相情報を算出する。 The phase difference determination unit 2111 calculates the phase information of Sub [0] to [511] with respect to the frequency spectrum data Main [0] to [511].

例えば、周波数スペクトルデータにおける各ポイントの位相ベクトルをＶ（）として表した場合の、周波数ポイントｎの位相情報［ｎ］は次式に従って算出される。
位相情報［ｎ］＝｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］)｜／（｜Ｖ(Ｍａｉｎ［ｎ］) ｜・｜Ｖ(Ｓｕｂ［ｎ］)｜）
ここで、右辺の"｜ｘ｜"はベクトルｘの絶対値（スカラー）を表し、分母の"・"はスカラーどうしの積、分子の"×"は２つのベクトルの正弦である外積を表している。 For example, when the phase vector of each point in the frequency spectrum data is expressed as V (), the phase information [n] at the frequency point n is calculated according to the following equation.
Phase information [n] = | V (Main [n]) × V (Sub [n]) | / (| V (Main [n]) | · | V (Sub [n]) |)
Here, “| x |” on the right side represents the absolute value (scalar) of the vector x, “·” in the denominator represents the product of the scalars, and “x” in the numerator represents the outer product that is the sine of the two vectors. Yes.

位相差判定部２１１１は上式に従って算出した位相情報［０］〜［５１１］を出力する。 The phase difference determination unit 2111 outputs the phase information [0] to [511] calculated according to the above formula.

ステレオゲイン演算部２１１２は、位相差判定部２１１１からの位相情報［０］〜［５１１］からステレオゲイン［０］〜［５１１］の演算を行う。例えば周波数ポイントｎにおいて、次式に従って各チャネルのゲインを得る。
Ｌｃｈ生成用のステレオゲイン＝１＋位相情報［ｎ］×強調係数
Ｒｃｈ生成用のステレオゲイン＝１−位相情報［ｎ］×強調係数
ステレオゲイン演算部２１１２は、上式にて算出されたＬｃｈ，Ｒｃｈのステレオゲイン［ｎ］を出力する。ここで、強調係数は周波数に応じて変更されるものであり、上限を１、下限を０とするものである。 The stereo gain calculation unit 2112 calculates stereo gains [0] to [511] from the phase information [0] to [511] from the phase difference determination unit 2111. For example, at the frequency point n, the gain of each channel is obtained according to the following equation.
Stereo gain for Lch generation = 1 + phase information [n] × stereo gain for enhancement coefficient Rch = 1−phase information [n] × enhancement coefficient The stereo gain computing unit 2112 calculates Lch and Rch calculated by the above equation. Stereo gain [n] is output. Here, the enhancement coefficient is changed according to the frequency, and the upper limit is 1 and the lower limit is 0.

ステレオ抑制部２１１３は、駆動音演算処理部２０９内のＭｃｈ−Ｓｃｈ演算部２０９１からの駆動騒音を検出したことを示す検出信号を受けた場合に強調係数を０にする。また、ステレオ抑制部２１１３は、風雑音演算処理部２１０内の風検出部２１０１からの風雑音レベルに応じて強調係数を０にする。 The stereo suppression unit 2113 sets the enhancement coefficient to 0 when receiving a detection signal indicating that drive noise is detected from the Mch-Sch calculation unit 2091 in the drive sound calculation processing unit 209. Further, the stereo suppression unit 2113 sets the enhancement coefficient to 0 according to the wind noise level from the wind detection unit 2101 in the wind noise calculation processing unit 210.

左ゲイン積分器２１１４は、ステレオゲイン演算部２１１２から出力された、Ｌｃｈ生成用のステレオゲイン［０］〜［５１１］に対し、時間方向の変動量に所定の時定数を持たせ、それをステレオゲインＧａｉｎＬ［０］〜［５１１］（正負の符号付き）として出力する。 The left gain integrator 2114 gives a predetermined time constant to the amount of fluctuation in the time direction with respect to the stereo gain [0] to [511] for generating Lch output from the stereo gain calculation unit 2112, and converts the stereo gain into stereo. The gains are output as GainL [0] to [511] (with positive and negative signs).

右ゲイン積分器２１１５は、ステレオゲイン演算部２１１２から出力された、Ｒｃｈ生成用のステレオゲイン［０］〜［５１１］に対し、時間方向の変動量に所定の時定数を持たせ、それをステレオゲインＧａｉｎＲ［０］〜［５１１］（正負の符号付き）として出力する。 The right gain integrator 2115 gives the Rch generation stereo gain [0] to [511] output from the stereo gain calculation unit 2112 a predetermined time constant to the amount of fluctuation in the time direction, and converts it to the stereo gain. Output as gains GainR [0] to [511] (with positive and negative signs).

以上が実施形態のステレオゲイン演算処理部２１１の構成と動作である。次に、実施形態におけるトータルゲイン演算部２１２を説明する。 The above is the configuration and operation of the stereo gain calculation processing unit 211 of the embodiment. Next, the total gain calculation unit 212 in the embodiment will be described.

トータルゲイン演算部２１２は、駆動音演算処理部２０９、風雑音演算処理部２１０、および、ステレオゲイン演算処理部２１１において決定したＮＣ＿Ｇａｉｎ［０］〜［５１１］、ＷＣ＿Ｇａｉｎ［０］〜［５１１］、Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｇａｉｎ＿Ｒ［０］〜［５１１］を合算し、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒ［０］〜［５１１］を出力する。具体的には次式である。
Total_Gain_L[]＝NC_Gain[] ＋ WC_Gain[] ＋ Gain_L[]
Total_Gain_R[]＝NC_Gain[] ＋ WC_Gain[] ＋ Gain_R[] The total gain calculation unit 212 includes NC_Gain [0] to [511], WC_Gain [0] to [511] determined by the drive sound calculation processing unit 209, the wind noise calculation processing unit 210, and the stereo gain calculation processing unit 211. Gain_L [0] to [511] and Gain_R [0] to [511] are added together to output Total_Gain_L [0] to [511] and Total_Gain_R [0] to [511]. Specifically,
Total_Gain_L [] = NC_Gain [] + WC_Gain [] + Gain_L []
Total_Gain_R [] = NC_Gain [] + WC_Gain [] + Gain_R []

次に、Ｌ／Ｒｃｈ生成部２１３を説明する。このＬ／Ｒｃｈ生成部２１３は、周波数スペクトルデータＭＡＩＮ［０］〜［５１１］から、トータルゲイン演算部２１２で決定したＴｏｔａｌ＿Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒ［０］〜［５１１］を用いて、ＬｃｈとＲｃｈの出力用の周波数スペクトルデータを作成する。Ｌ／Ｒｃｈ生成部２１３は、Ｍｃｈ／Ｓｃｈ選択部２１３１、Ｌ／Ｒｃｈゲイン加算部２１３２を有する。 Next, the L / Rch generation unit 213 will be described. The L / Rch generator 213 uses Total_Gain_L [0] to [511] and Total_Gain_R [0] to [511] determined by the total gain calculator 212 from the frequency spectrum data MAIN [0] to [511]. , Lch and Rch output frequency spectrum data is created. The L / Rch generation unit 213 includes an Mch / Sch selection unit 2131 and an L / Rch gain addition unit 2132.

Ｍｃｈ／Ｓｃｈ選択部２１３１は、風検出部２１０１による風雑音レベルに応じて、周波数スペクトルデータＭａｉｎ［０］〜［５１１］に合成することになるＳｕｂ［０］〜［５１１］の周波数ポイントの範囲を選択する。また、Ｍｃｈ／Ｓｃｈ選択部２１３１は、風雑音レベルに応じて、合成する境界位置を低周波数ポイントから高周波数ポイントへと変化させる。また、風を検出されない場合、Ｍｃｈ／Ｓｃｈ選択部２１３１は合成を行わず、周波数スペクトルデータＭａｉｎ［０］〜［５１１］をそのまま出力する。 The Mch / Sch selection unit 2131 is a range of frequency points of Sub [0] to [511] to be combined with the frequency spectrum data Main [0] to [511] according to the wind noise level by the wind detection unit 2101. Select. Further, the Mch / Sch selection unit 2131 changes the boundary position to be synthesized from the low frequency point to the high frequency point according to the wind noise level. When no wind is detected, the Mch / Sch selection unit 2131 does not perform synthesis and outputs the frequency spectrum data Main [0] to [511] as they are.

Ｌ／Ｒｃｈゲイン加算部２１３２は、Ｍｃｈ／Ｓｃｈ選択部２１３２から出力された周波数スペクトルデータＭａｉｎ［０］〜［５１１］に対して、トータルゲイン演算部２１２で決定したＴｏｔａｌ＿Ｇａｉｎ＿Ｌ［０］〜［５１１］、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒ［０］〜［５１１］を用いて、左右チャネル（ＬｃｈとＲｃｈ）の周波数スペクトルデータを作成する。 The L / Rch gain addition unit 2132 applies the Total_Gain_L [0] to [511] determined by the total gain calculation unit 212 to the frequency spectrum data Main [0] to [511] output from the Mch / Sch selection unit 2132. , Total_Gain_R [0] to [511] are used to create frequency spectrum data of the left and right channels (Lch and Rch).

以上が実施形態のＬ／Ｒｃｈ生成部２１３の構成と動作である。 The above is the configuration and operation of the L / Rch generation unit 213 of the embodiment.

ｉＦＦＴ部２１４は、Ｌ／Ｒｃｈ生成部２１３で生成された各チャネルの周波数スペクトルデータを逆変換（逆ＦＦＴ変換）し、元の時系列の音声信号に戻す。音声処理部２１５は、イコライザ等の処理を実施する。ＡＬＣ（オートレベルコントローラ）２１６は、時系列の音声信号の振幅を所定のレベルに調整する。 The iFFT unit 214 performs inverse transform (inverse FFT transform) on the frequency spectrum data of each channel generated by the L / Rch generation unit 213, and returns the original time-series audio signal. The audio processing unit 215 performs processing such as equalizer. An ALC (auto level controller) 216 adjusts the amplitude of the time-series audio signal to a predetermined level.

以上の構成を備え、音声入力部１０２は、音声信号に所定の処理を行い音声データを形成して、メモリ１０３へ送信し、格納することになる。 With the above configuration, the voice input unit 102 performs predetermined processing on the voice signal to form voice data, and transmits the voice data to the memory 103 for storage.

ここで、本実施形態の音声入力部１０２の一部を構成するメカ構成について、図３（ａ）、３（ｂ）を用いて説明する。 Here, a mechanical configuration constituting a part of the voice input unit 102 of the present embodiment will be described with reference to FIGS.

図３（ａ）は、本実施形態の撮像装置の筐体の外観図である。撮影対象に撮像装置が向いた状態で、撮影者から見て右側の所定位置の参照符号"ａ"がメインマイク２０５ａの入力穴（開口部）、左側の対向する位置の参照符号"ｂ"がサブマイク２０５ｂの入力穴となる。図３（ｂ）においての拡大図は、音声入力部１０２の一部であるメインマイク２０５ａとサブマイク２０５ｂのメカ構成部である。図３（ｂ）は、前記メカ構成を示す断面図である。マイク穴を構成する外装部１０２−１、メインマイク２０５ａを保持するメインマイクブッシュ１０２−２ａ、サブマイク２０５ｂを保持するサブマイクブッシュ１０２−２ｂ、其々のマイクブッシュを外装部へ押し付け保持をする押し付け部１０３により構成される。外装部１０２−１、押し付け部１０３についてはＰＣ材等のモールド部材で構成されるが、アルミ、ステンレス等の金属部材であっても問題ない。また、メインマイクブッシュ１０２−２ａ、サブマイクブッシュ１０２−２ｂについては、エチレンプロピレンジエンゴム等のゴム材にて構成される。 FIG. 3A is an external view of the housing of the imaging apparatus according to the present embodiment. With the imaging device facing the object to be imaged, the reference symbol “a” at a predetermined position on the right side when viewed from the photographer is the input hole (opening) of the main microphone 205a, and the reference symbol “b” at the opposite position on the left side. This is an input hole for the sub microphone 205b. The enlarged view in FIG. 3B is a mechanical component of the main microphone 205a and the sub microphone 205b that are part of the audio input unit 102. FIG. 3B is a cross-sectional view showing the mechanical configuration. The exterior part 102-1 constituting the microphone hole, the main microphone bush 102-2a for holding the main microphone 205a, the sub microphone bush 102-2b for holding the sub microphone 205b, and pressing for pressing and holding the respective microphone bushes against the exterior part The unit 103 is configured. The exterior portion 102-1 and the pressing portion 103 are made of a mold member such as a PC material, but there is no problem even if it is a metal member such as aluminum or stainless steel. The main microphone bushing 102-2a and the sub microphone bushing 102-2b are made of a rubber material such as ethylene propylene diene rubber.

ここで、外装部におけるマイク穴の径について説明する。サブマイク２０５ｂへのマイク穴の径（開口している面積）は、メインマイク２０５ａへのマイク穴の径（同面積）に対して小さく、所定の倍率にて縮小された構成をとる。マイク穴形状については円状か楕円状が望ましいが、方形状でも構わない。また、其々の穴形状について、同形状でも別形状でも構わない。前記構成は、撮像装置内部でマイクに空気伝搬して伝わる駆動騒音についてサブマイク２０５ｂのマイク穴側から外部へ漏れにくくなる事を目的とする。 Here, the diameter of the microphone hole in the exterior part will be described. The diameter (open area) of the microphone hole to the sub microphone 205b is smaller than the diameter (same area) of the microphone hole to the main microphone 205a and is reduced at a predetermined magnification. The microphone hole shape is preferably circular or elliptical, but may be rectangular. Moreover, about each hole shape, the same shape or another shape may be sufficient. The above configuration is intended to make it difficult for drive noise transmitted by air propagation to the microphone inside the imaging apparatus to be leaked from the microphone hole side of the sub microphone 205b to the outside.

次に、外装部１０２−１とマイクブッシュで構成されるマイク前面の空間について説明する。外装部１０２−１とサブマイクブッシュ１０２−２ｂで構成されるサブマイク２０５ｂの前面の空間の容積は、外装部１０２−１とメインマイクブッシュ１０２−２ａで構成されるメインマイク２０５ａの前面の空間のそれより大きく、所定の倍率の容積を確保する構成をとる。この構成は、サブマイク２０５ｂの前面の空間において、空間内の気圧変化が大きくなり、駆動騒音が強調される事を目的とする。 Next, the space in front of the microphone constituted by the exterior part 102-1 and the microphone bush will be described. The volume of the space in front of the sub microphone 205b composed of the exterior portion 102-1 and the sub microphone bush 102-2b is the volume of the space in front of the main microphone 205a composed of the exterior portion 102-1 and the main microphone bush 102-2a. It is larger than that and has a configuration that secures a volume of a predetermined magnification. The purpose of this configuration is to increase the atmospheric pressure change in the space in front of the sub microphone 205b and enhance the driving noise.

前述の通り、マイク入力のメカ構成におけるサブマイク２０５ｂ入力は、メインマイク２０５ａ入力に対して、駆動騒音の振幅が大きく強調される構成をとる。各マイクへ入力される駆動騒音の音声レベルの関係は、メインマイク２０５ａ＜サブマイク２０５ｂとなる。一方、マイク穴の前面から空気伝搬により各マイクへ入力される、装置外からの音声（本来の集音目的である周辺環境音）のレベル関係は、メインマイク２０５ａ≧サブマイク２０５ｂの関係となることに注意されたい。 As described above, the sub microphone 205b input in the microphone input mechanical configuration has a configuration in which the amplitude of the driving noise is greatly emphasized with respect to the main microphone 205a input. The relationship between the sound levels of the drive noises input to the microphones is main microphone 205a <sub microphone 205b. On the other hand, the level relationship of sound from outside the device (peripheral environmental sound that is the original purpose of sound collection) input to each microphone by air propagation from the front of the microphone hole is such that main microphone 205a ≧ sub microphone 205b. Please be careful.

ここで、本実施形態の音声入力部１０２でのステレオゲイン演算処理部２１１の動作について、図７から図９を用いて説明する。 Here, the operation of the stereo gain calculation processing unit 211 in the audio input unit 102 of the present embodiment will be described with reference to FIGS.

図７は、撮像装置１００に内蔵されたマイクに対する外部からの音声の経路と、内蔵の光学レンズ２０１の駆動時の音声の経路の一例を示している。この時のマイクは、図２に示すメインマイク２０５ａおよびサブマイク２０５ｂが該当する。図７のように周囲環境音の音源と撮像装置１００との距離は、メインマイク２０５ａとサブマイク２０５ｂ間の距離に対して十分に大きい。よって、周囲環境音の音源からのメインマイク２０５ａへの音声の伝播経路と、周囲環境音の音源とサブマイク２０５ｂへの音声の伝播経路は殆ど同一と考えて良い。しかし、撮像装置内蔵の光学レンズ２０１は、メインマイク２０５ａとサブマイク２０５ｂに近接している。また、光学レンズ２０１の移動を行うためのモータからマイクへの距離が均等でなかったり、撮像装置内での音声の経路が異なる可能性もある。故に、光学レンズ駆動系からメインマイク２０５ａ、サブマイク２０５ｂそれぞれへの音声経路（距離）は大きく異なってしまう。つまり、周囲環境音と駆動騒音とでは、ＭｃｈとＳｃｈの音声レベルの差分に大きな差が出る事となる。それ故、周囲環境音と光学レンズの駆動騒音は大きく差が出て、これらを容易に区別することができる。 FIG. 7 shows an example of a sound path from the outside to the microphone built in the imaging apparatus 100 and a sound path when the built-in optical lens 201 is driven. The microphones at this time correspond to the main microphone 205a and the sub microphone 205b shown in FIG. As shown in FIG. 7, the distance between the sound source of the ambient environmental sound and the imaging device 100 is sufficiently larger than the distance between the main microphone 205a and the sub microphone 205b. Therefore, the sound propagation path from the ambient sound source to the main microphone 205a and the sound propagation path from the ambient sound source to the sub microphone 205b may be considered to be almost the same. However, the optical lens 201 built in the imaging apparatus is close to the main microphone 205a and the sub microphone 205b. In addition, there is a possibility that the distance from the motor to the microphone for moving the optical lens 201 is not uniform, or the sound path in the imaging apparatus is different. Therefore, the sound paths (distances) from the optical lens driving system to the main microphone 205a and the sub microphone 205b are greatly different. That is, there is a large difference in the difference between the sound levels of Mch and Sch between the ambient environmental sound and the driving noise. Therefore, there is a large difference between the ambient environmental sound and the driving noise of the optical lens, and these can be easily distinguished.

一方、本来、周囲環境音は左右のどちら側から発生したかはＭｃｈとＳｃｈでは大きさでは判断することは難しい。そこで、周囲環境音は音声信号の位相を利用して判断することができる。詳細について説明する。 On the other hand, it is difficult for Mch and Sch to determine from the left or right side whether the ambient environmental sound is originally generated. Therefore, the ambient environmental sound can be determined using the phase of the audio signal. Details will be described.

図８（ａ）〜（ｃ）は、或る周波数スペクトルデータＭａｉｎ［ｎ］とＳｕｂ［ｎ］の関係を示している。 FIGS. 8A to 8C show the relationship between certain frequency spectrum data Main [n] and Sub [n].

ステレオゲイン演算処理部２１１は、メインマイク２０５ａからの周波数スペクトルデーＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］から、ステレオのＬｃｈのゲインＧａｉｎ＿Ｌ［０］〜［５１１］、ＲｃｈのゲインＧａｉｎ＿Ｒ［０］〜［５１１］を出力する。ステレオゲイン演算処理部２１１は以下の構成を備えている。 The stereo gain calculation processing unit 211 obtains a stereo Lch gain Gain_L [0] from the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b. ] To [511] and Rch gains Gain_R [0] to [511] are output. The stereo gain calculation processing unit 211 has the following configuration.

位相差判定部２１１１は、周波数スペクトルデータＭａｉｎ［０］〜［５１１］に対する周波数スペクトルデータＳｕｂ［０］〜［５１１］の位相情報を算出する。 The phase difference determination unit 2111 calculates phase information of the frequency spectrum data Sub [0] to [511] with respect to the frequency spectrum data Main [0] to [511].

例えば周波数ポイントｎの周囲環境音が、メインマイク２０５ａ側から発生した場合、Ｖ（Ｍａｉｎ［ｎ］）とＶ（Ｓｕｂ［ｎ］）の関係は図８（ａ）のような関係になる。本実施形態でのマイク配置においても、周波数スペクトルの大きさは変わってしまっても、位相は変わることはない。そこで、位相情報を、Ｖ（Ｍａｉｎ［ｎ］) とＶ(Ｓｕｂ［ｎ］)の外積（｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］) ｜）を用いることで得る。
位相情報［ｎ］＝｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］) ｜／（｜Ｖ(Ｍａｉｎ［ｎ］) ｜・｜Ｖ(Ｓｕｂ［ｎ］) ｜）
位相差判定部２１１１は、上式にて算出された位相情報［ｎ］を出力する。ここで求められる位相情報［ｎ］は、すなわち、Ｖ（Ｍａｉｎ［ｎ］）とＶ（Ｓｕｂ［ｎ］）のｓｉｎθであり、周囲環境音がメインマイク２０５ａ側（撮像装置１００を構えるユーザの右側）から発生した場合は、０＜位相情報［ｎ］≦１となる。 For example, when the ambient environmental sound at the frequency point n is generated from the main microphone 205a side, the relationship between V (Main [n]) and V (Sub [n]) is as shown in FIG. Even in the microphone arrangement in the present embodiment, the phase does not change even if the size of the frequency spectrum changes. Therefore, the phase information is obtained by using the outer product of V (Main [n]) and V (Sub [n]) (| V (Main [n]) × V (Sub [n]) |).
Phase information [n] = | V (Main [n]) × V (Sub [n]) | / (| V (Main [n]) |. | V (Sub [n]) |)
The phase difference determination unit 2111 outputs the phase information [n] calculated by the above equation. The phase information [n] obtained here is sin θ of V (Main [n]) and V (Sub [n]), and the ambient environmental sound is on the main microphone 205a side (the right side of the user holding the imaging device 100). ) 0 <phase information [n] ≦ 1.

また、周波数ポイントｎの周囲環境音が、サブマイク２０５ｂ側から発生した場合、Ｖ（Ｍａｉｎ［ｎ］）とＶ（Ｓｕｂ［ｎ］）の関係は図８（ｂ）のような周波数スペクトルの関係になる。本実施形態でのマイク配置においても、周波数スペクトルの大きさは変わってしまっても、位相は変わることはない。 When the ambient environmental sound at the frequency point n is generated from the sub microphone 205b side, the relationship between V (Main [n]) and V (Sub [n]) is the frequency spectrum as shown in FIG. Become. Even in the microphone arrangement in the present embodiment, the phase does not change even if the size of the frequency spectrum changes.

そこで、位相情報をＶ（Ｍａｉｎ［ｎ］) とＶ(Ｓｕｂ［ｎ］) の外積（｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］) ｜）を用いることで得る。
位相情報［ｎ］＝｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］) ｜／（｜Ｖ(Ｍａｉｎ［ｎ］) ｜・｜Ｖ(Ｓｕｂ［ｎ］) ｜）
位相差判定部２１１１は、上式にて算出された位相情報［ｎ］を出力する。ここで求められる位相情報［ｎ］は、すなわち、Ｖ（Ｍａｉｎ［ｎ］）とＶ（Ｓｕｂ［ｎ］）のｓｉｎθであり、周囲環境音がサブマイク２０５ｂ側からの場合、０＞位相情報［ｎ］≧−１となる。 Therefore, the phase information is obtained by using the outer product of V (Main [n]) and V (Sub [n]) (| V (Main [n]) × V (Sub [n]) |).
Phase information [n] = | V (Main [n]) × V (Sub [n]) | / (| V (Main [n]) |. | V (Sub [n]) |)
The phase difference determination unit 2111 outputs the phase information [n] calculated by the above equation. The phase information [n] obtained here is sin θ of V (Main [n]) and V (Sub [n]), and when the ambient environmental sound is from the sub microphone 205b side, 0> phase information [n ] ≧ −1.

また周波数ポイントｎの周囲環境音がメインマイク２０５ａ、サブマイク２０５ｂと同じ距離、すなわち光学レンズ２０１の中心から発生した場合、Ｖ（Ｍａｉｎ［ｎ］）とＶ（Ｓｕｂ［ｎ］）の関係は図８（ｃ）のような周波数スペクトルの関係になる。本実施形態でのマイク配置においても、周波数スペクトルの大きさは変わってしまっても、位相は変わることはない。 When the ambient environmental sound at the frequency point n is generated from the same distance as the main microphone 205a and the sub microphone 205b, that is, from the center of the optical lens 201, the relationship between V (Main [n]) and V (Sub [n]) is shown in FIG. The frequency spectrum relationship is as shown in (c). Even in the microphone arrangement in the present embodiment, the phase does not change even if the size of the frequency spectrum changes.

位相情報は、Ｖ(Ｍａｉｎ［ｎ］) とＶ(Ｓｕｂ［ｎ］) の外積（｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］) ｜）を用いることで得ることができる。
位相情報［ｎ］＝｜Ｖ(Ｍａｉｎ［ｎ］) ×Ｖ(Ｓｕｂ［ｎ］) ｜／（｜Ｖ(Ｍａｉｎ［ｎ］) ｜・｜Ｖ(Ｓｕｂ［ｎ］) ｜）
位相差判定部２１１１は、上式にて算出された位相情報［ｎ］を出力する。ここで求められる位相情報［ｎ］は、Ｖ（Ｍａｉｎ［ｎ］）とＶ（Ｓｕｂ［ｎ］）のｓｉｎθであり、周囲環境音が、サブマイク２０５ｂ側からは位相情報［ｎ］≒０となる。 The phase information can be obtained by using the outer product (| V (Main [n]) × V (Sub [n]) |) of V (Main [n]) and V (Sub [n]).
Phase information [n] = | V (Main [n]) × V (Sub [n]) | / (| V (Main [n]) |. | V (Sub [n]) |)
The phase difference determination unit 2111 outputs the phase information [n] calculated by the above equation. The phase information [n] obtained here is sin θ of V (Main [n]) and V (Sub [n]), and the ambient sound is phase information [n] ≈0 from the sub microphone 205b side. .

ステレオゲイン演算部２１１２は、上記のようにして決定した位相情報［０］〜［５１１］を用いて、ステレオゲイン［０］〜［５１１］の演算を行っている。例えば周波数ポイントｎにおいて、ステレオゲイン演算部２１１２は次式に従って各チャネルのゲインを算出する。
Ｌｃｈ生成用のステレオゲイン＝１＋位相情報［ｎ］×強調係数
Ｒｃｈ生成用のステレオゲイン＝１−位相情報［ｎ］×強調係数
そして、ステレオゲイン演算部２１１２は、上式にて算出された各チャネルのステレオゲイン［ｎ］を出力する。 The stereo gain calculation unit 2112 calculates the stereo gains [0] to [511] using the phase information [0] to [511] determined as described above. For example, at the frequency point n, the stereo gain calculator 2112 calculates the gain of each channel according to the following equation.
Stereo gain for Lch generation = 1 + phase information [n] × enhancement coefficient Rch generation stereo gain = 1−phase information [n] × enhancement coefficient Further, the stereo gain calculation unit 2112 calculates each of the above-described equations. The stereo gain [n] of the channel is output.

図９はステレオゲイン演算部２１１２で用いられる各周波数ポイントにおける強調係数を示した図である。 FIG. 9 is a diagram showing enhancement coefficients at each frequency point used in the stereo gain calculator 2112.

横軸を周波数ポイント、縦軸を強調係数とした時、もっとも強調したい周波数の強調係数を最大値の１．０として、位相差がでにくい低域と位相差が判断できない高域の強調係数は最小値の０とする。 When the horizontal axis is the frequency point and the vertical axis is the emphasis coefficient, the emphasis coefficient of the frequency to be most emphasized is 1.0 as the maximum value. The minimum value is 0.

例えばもっとも強調したい１ｋＨｚ〜５ｋＨｚは強調係数を１．０とし、２００Ｈｚ以下は０とする。 For example, the emphasis coefficient is 1.0 for 1 kHz to 5 kHz that is most emphasized, and 0 for 200 Hz or less.

位相差が判断できない高域の強調係数は、メインマイク２０５ａとサブマイク２０５ｂの距離で決定する。例えば、メインマイク２０５ａとサブマイク２０５ｂの距離が１５ｍｍの時、音速を３４０ｍ／ｓとすると、１５ｍｍの間に半波長が入る１１．３ｋＨｚ以上になると、正しい位相情報が取れず、左右が反転してしまう可能性がある。また、１５ｍｍの間に１／４波長の入る５．７ｋＨｚ以上は正確性が低い。そこで図９に示すような周波数に応じた強調係数のかけ方を行う。 The high-frequency enhancement coefficient for which the phase difference cannot be determined is determined by the distance between the main microphone 205a and the sub microphone 205b. For example, when the distance between the main microphone 205a and the sub microphone 205b is 15 mm, if the sound speed is 340 m / s, the correct phase information cannot be obtained and the left and right are reversed when the half-wavelength is 11.3 kHz or more between 15 mm. There is a possibility. In addition, the accuracy is low at 5.7 kHz or more where a quarter wavelength enters between 15 mm. Therefore, the emphasis coefficient is applied according to the frequency as shown in FIG.

ここで、本実施形態の音声入力部１０２での駆動音演算処理部２０９、トータルゲイン演算部２１２、Ｌ／Ｒｃｈ生成部２１３の動作について、図５、図１０から図１３を用いて説明する。 Here, operations of the drive sound calculation processing unit 209, the total gain calculation unit 212, and the L / Rch generation unit 213 in the voice input unit 102 of the present embodiment will be described with reference to FIGS. 5 and 10 to 13.

図１０は、メインマイク２０５ａとサブマイク２０５ｂそれぞれの各周波数の振幅スペクトルデータの例を示している。 FIG. 10 shows an example of amplitude spectrum data of each frequency of the main microphone 205a and the sub microphone 205b.

ＦＦＴ部２０７により、各チャネルの音声信号は０Ｈｚから４８ｋＨｚまでにおいて１０２４ポイントの周波数スペクトルとして変換される。変換後の周波数スペクトルデータは、ナイキスト周波数である２４ｋＨｚまでにおいては５１２ポイントの周波数スペクトルを持つものとする。 The FFT unit 207 converts the audio signal of each channel as a frequency spectrum of 1024 points from 0 Hz to 48 kHz. The converted frequency spectrum data is assumed to have a 512-point frequency spectrum up to 24 kHz which is the Nyquist frequency.

先に図３（ａ），（ｂ）を用いて説明したように、実施形態の撮像装置１００のマイク入力のメカ構成によれば、サブマイク２０５ｂは、メインマイク２０５ａに対して、駆動騒音の振幅が大きく強調された信号を生成する。つまり振幅スペクトルにおいて、
周囲環境音レベル：メインマイク２０５ａ≧サブマイク２０５ｂ
駆動騒音レベル：メインマイク２０５ａ＜サブマイク２０５ｂ
との関係となる。 As described above with reference to FIGS. 3A and 3B, according to the microphone input mechanical configuration of the imaging apparatus 100 according to the embodiment, the sub microphone 205b has an amplitude of drive noise relative to the main microphone 205a. Produces a greatly enhanced signal. In other words, in the amplitude spectrum,
Ambient environmental sound level: main microphone 205a ≧ sub microphone 205b
Driving noise level: main microphone 205a <sub microphone 205b
It becomes the relationship.

図１０に、メインマイク２０５ａからの振幅スペクトルデータＭａｉｎ［］、サブマイク２０５ｂからの振幅スペクトルデータＳｕｂ［］の一例を示す。また、同図における「Ｍａｉｎ−Ｓｕｂ」は、Ｍｃｈ−Ｓｃｈ演算部２０９１にて演算される、Ｍａｉｎ［］からＳｕｂ［］を差し引いた減算量［０］〜［５１１］を示している。 FIG. 10 shows an example of amplitude spectrum data Main [] from the main microphone 205a and amplitude spectrum data Sub [] from the sub microphone 205b. Further, “Main-Sub” in the figure indicates subtraction amounts [0] to [511] calculated by the Mch-Sch operation unit 2091 by subtracting Sub [] from Main [].

例えば、ＳｃｈにおけるＮポイント目の周辺の振幅スペクトルを着目すると、Ｓｃｈ＞Ｍｃｈであり、つまり駆動騒音が支配的なポイントである事が言える。この時、Ｍａｉｎ−Ｓｕｂには、Ｎポイント目周辺にて予め定められたズーム検出閾値を超える（下回る）減算量が算出され、Ｎポイント目周辺は「駆動騒音」とされる振幅スペクトルと検出される。一方、ＭｃｈにおけるＮ２ポイント目の振幅スペクトルを着目すると、Ｓｃｈ≦Ｍｃｈである。つまり周囲環境音が支配的なポイントであることが言える。この時、Ｍａｉｎ−Ｓｕｂには、ズーム検出閾値を超える減算量は算出されないため、Ｎ２ポイント目周辺の振幅スペクトルは駆動騒音とは検出されることはない。上記演算を［０］〜［５１１］の振幅スペクトル全ての範囲において実行する。 For example, focusing on the amplitude spectrum around the Nth point in Sch, it can be said that Sch> Mch, that is, the driving noise is the dominant point. At this time, in Main-Sub, a subtraction amount that exceeds (below) a predetermined zoom detection threshold around the Nth point is calculated, and an amplitude spectrum that is “driving noise” is detected around the Nth point. The On the other hand, paying attention to the amplitude spectrum at the N2th point in Mch, Sch ≦ Mch. In other words, it can be said that ambient sound is the dominant point. At this time, since the subtraction amount exceeding the zoom detection threshold is not calculated in Main-Sub, the amplitude spectrum around the N2th point is not detected as drive noise. The above calculation is performed in the entire range of the amplitude spectrum of [0] to [511].

図１１は、サブマイク２０５ｂの周波数Ｎポイント目の時系列の振幅スペクトルを示す図である。 FIG. 11 is a diagram showing a time-series amplitude spectrum at the frequency N point of the sub microphone 205b.

図示の「Ｓｕｂｃｈ」は、Ｎポイント目の振幅スペクトルデータが時系列にて変動する事を示す。 “Sub ch” in the figure indicates that the amplitude spectrum data of the Nth point fluctuates in time series.

Ｓｃｈ｜ｔ_n−ｔ_(n-1)｜は、ＳｃｈＮポイント目の振幅スペクトルに対し、時間毎振幅変動検出部２０９３により演算される時間方向のフレーム間での振幅変動量を示し、時間毎変動量［ｎ］として出力される。例えば、ｔ１からｔ２にてＳｃｈの振幅スペクトルに着目すると、時間方向での変動量は大きくなっており、Ｓｃｈ｜ｔ_n−ｔ_(n-1)｜には、ｔ１からｔ２において、変動量検出閾値を超える時間毎変動量が算出される。この演算を［０］〜［５１１］の振幅スペクトルの全てのポイントにおいて実行する。 _{Sch | t n -t (n-} 1) | , compared amplitude spectrum of SchN point th represents the amplitude change amount between the time direction of the frame is calculated by the time each amplitude variation detecting section 2093, each time variation Output as quantity [n]. For example, when attention is focused on the amplitude spectrum of Sch from t1 to t2, the amount of fluctuation in the time direction is large, and the amount of fluctuation is detected from t1 to t2 at Sch | t _n −t _(n−1) |. The amount of hourly fluctuation exceeding the threshold is calculated. This calculation is executed at all points in the amplitude spectrum of [0] to [511].

図１２（ａ），（ｂ）は、メインマイク２０５ａからの振幅スペクトル、サブマイク２０５ｂからの振幅スペクトルにおける、周波数Ｎポイント目の時系列の位相を示す図である。 FIGS. 12A and 12B are diagrams showing time-series phases at the Nth frequency in the amplitude spectrum from the main microphone 205a and the amplitude spectrum from the sub microphone 205b.

同図（ａ）は複素数平面Ｉｍ，Ｒｅにより、時間方向における「周囲環境音」の位相の変化を示しており、実線部はＭｃｈを、点線部はＳｃｈを表している。ｔ０，ｔ１，ｔ２，ｔ３，ｔ４については、時間方向の推移を示す。 FIG. 6A shows changes in the phase of the “ambient environmental sound” in the time direction by the complex planes Im and Re. The solid line portion represents Mch and the dotted line portion represents Sch. About t0, t1, t2, t3, t4, the transition of a time direction is shown.

同図（ｂ）は、「駆動騒音」の位相の変化を示している。 FIG. 4B shows the phase change of “driving noise”.

ここにおいて、周囲環境音については、ＭｃｈとＳｃｈの位相は、ｔ０からｔ４の時間の推移において一定である。駆動騒音については、ＭｃｈとＳｃｈの位相はｔ０からｔ４の時間の推移において大きく変動している。其々の時間方向での位相の変動は時間毎位相変動検出部２０９４にて検出され、時間毎位相変動量［ｎ］として出力される。時間毎位相変動検出部２０９４は、この演算を［０］〜［５１１］の振幅スペクトルの全て周波数ポイントについて実行する。 Here, for ambient sound, the phases of Mch and Sch are constant over time from t0 to t4. Regarding the driving noise, the phases of Mch and Sch greatly fluctuate in the transition of time from t0 to t4. The phase fluctuation in each time direction is detected by the hourly phase fluctuation detection unit 2094 and output as the hourly phase fluctuation amount [n]. The hourly phase fluctuation detection unit 2094 executes this calculation for all frequency points in the amplitude spectrum of [0] to [511].

図１３（ａ）、（ｂ）は、Ｍｃｈ−Ｓｃｈ演算部２０９１の動作タイミングチャートの一例を表している。 FIGS. 13A and 13B show an example of an operation timing chart of the Mch-Sch operation unit 2091. FIG.

同図（ａ）におけるＭａｉｎ［Ｎ］、Ｓｕｂ［Ｎ］、Ｍａｉｎ［Ｎ］−Ｓｕｂ［Ｎ］は、それぞれ周波数Ｎポイント目のＭｃｈの振幅スペクトルデータ、Ｓｃｈの振幅スペクトルデータ、Ｍｃｈ振幅スペクトルからＳｃｈ振幅スペクトルを差し引いた減算量［Ｎ］を示している。Ｍａｉｎ［Ｎ］−Ｓｕｂ［Ｎ］は、Ｍｃｈ−Ｓｃｈ演算部２０９１にて演算を行われた結果を出力している。 Main [N], Sub [N], and Main [N] -Sub [N] in FIG. 4A are Mch amplitude spectrum data, Sch amplitude spectrum data, and Mch amplitude spectrum at the frequency N point, respectively. The subtraction amount [N] obtained by subtracting the amplitude spectrum is shown. Main [N] -Sub [N] outputs the result of the operation performed by the Mch-Sch operation unit 2091.

ここで、同図（ａ）のｔ１からｔ２の期間を着目すると、Ｓｕｂ［Ｎ］の振幅スペクトルは、Ｍａｉｎ［Ｎ］に対して大きく上回っており、Ｍａｉｎ［Ｎ］−Ｓｕｂ［Ｎ］の演算結果はズーム閾値を上回る結果となっており、駆動騒音として検出され、減算量［Ｎ］が出力される。 Here, focusing on the period from t1 to t2 in FIG. 9A, the amplitude spectrum of Sub [N] is significantly higher than that of Main [N], and the calculation of Main [N] −Sub [N] is performed. The result exceeds the zoom threshold, and is detected as drive noise, and a subtraction amount [N] is output.

図１３（ｂ）におけるＭａｉｎ［Ｎ２］、Ｓｕｂ［Ｎ２］、Ｍａｉｎ［Ｎ２］−Ｓｕｂ［Ｎ２］は、それぞれ周波数Ｎ２ポイント目のＭｃｈの振幅スペクトル、Ｓｃｈの振幅スペクトル、Ｍｃｈ振幅スペクトルからＳｃｈ振幅スペクトルを差し引いた減算量［ｎ］を示す。ここで、同図（ｂ）のｔ１からｔ２の期間を着目すると、Ｍａｉｎ［Ｎ２］とＳｕｂ［Ｎ２］が同レベルで変動しており、Ｍａｉｎ［Ｎ２］−Ｓｕｂ［Ｎ２］の演算結果もズーム閾値を上回る結果はない。周波数Ｎ２ポイント目において駆動騒音は検出されない結果となる。Ｍｃｈ−Ｓｃｈ演算部２０９１は上記タイミングチャートで示した演算を［０］〜［５１１］の振幅スペクトル全てにおいて実行する。 Main [N2], Sub [N2], and Main [N2] -Sub [N2] in FIG. 13B are the Mch amplitude spectrum, the Sch amplitude spectrum, and the Mch amplitude spectrum at the frequency N2 point, respectively. Represents the subtraction amount [n]. Here, focusing on the period from t1 to t2 in FIG. 5B, Main [N2] and Sub [N2] fluctuate at the same level, and the calculation result of Main [N2] -Sub [N2] is also zoomed. There is no result exceeding the threshold. The driving noise is not detected at the frequency N2 point. The Mch-Sch operation unit 2091 performs the operation shown in the timing chart on all the amplitude spectra [0] to [511].

図５はＬ／Ｒｃｈ生成部２１３のタイミングチャートの一例を表す。ズーム駆動動作は、制御部１０９からの制御を受け、ｔ１からｔ２のタイミングにおいて、光学レンズ２０１が駆動動作となる。Ｍｃｈスペクトルは、図５において抽出した特定の周波数Ｎポイント目のスペクトルを表す。Ｌｃｈ，Ｒｃｈについては、トータルゲイン演算部２１２で決定したＴｏｔａｌ＿Ｇａｉｎ＿Ｌ、Ｔｏｔａｌ＿Ｇａｉｎ＿ＲをＭｃｈに加算することで生成される。同図のタイミングチャートに示されるように、例えば、Ｍｃｈに対し、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｌを下げ、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒを上げることで、Ｒｃｈが強調することができ、１ｃｈの入力で２ｃｈのステレオ信号を生成する事が可能である。 FIG. 5 shows an example of a timing chart of the L / Rch generation unit 213. The zoom driving operation is controlled by the control unit 109, and the optical lens 201 is driven at a timing from t1 to t2. The Mch spectrum represents the spectrum of the specific frequency N point extracted in FIG. Lch and Rch are generated by adding Total_Gain_L and Total_Gain_R determined by the total gain calculation unit 212 to Mch. As shown in the timing chart of the figure, for example, by lowering Total_Gain_L and raising Total_Gain_R with respect to Mch, Rch can be emphasized, and 2ch stereo signals can be generated with 1ch input. is there.

また、ｔ１からｔ２における光学レンズの駆動動作中においても、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｌ、Ｔｏｔａｌ＿Ｇａｉｎ＿Ｒを下げることで、Ｌｃｈ，Ｒｃｈに対し、駆動騒音を除去することが可能である。 Further, even during the driving operation of the optical lens from t1 to t2, it is possible to remove drive noise from Lch and Rch by lowering Total_Gain_L and Total_Gain_R.

ここで、ここで、本実施形態の音声入力部１０２での感度差補正部２０８の動作について、図１４を用いて説明する。 Here, the operation of the sensitivity difference correction unit 208 in the voice input unit 102 of the present embodiment will be described with reference to FIG.

図１４は、感度差補正部２０８の動作タイミングチャートの一例を示している。同図において、ズーム検出は駆動検出部２０９５の駆動騒音の検出結果を示す。入力スペクトルＮＰｏｉｎｔは、周波数Ｎポイント目のＭｃｈの振幅スペクトル、Ｓｃｈの振幅スペクトルを示す。実線部はＭｃｈを、点線部はＳｃｈを示わしている。 FIG. 14 shows an example of an operation timing chart of the sensitivity difference correction unit 208. In the figure, the zoom detection indicates the detection result of the drive noise of the drive detection unit 2095. The input spectrum NPoint indicates the Mch amplitude spectrum and the Sch amplitude spectrum at the Nth frequency point. The solid line portion indicates Mch, and the dotted line portion indicates Sch.

入力スペクトル（積分）ＮＰｏｉｎｔは、周波数Ｎポイント目の感度補正積分器２０８１のＭｃｈ、Ｓｃｈの積分結果を示す。感度調整出力スペクトルＮＰｏｉｎｔは、周波数Ｎポイント目の感度差補正ゲイン部２０８５によりレベル補正されたＭｃｈの振幅スペクトル、Ｓｃｈの振幅スペクトルを示す。実線部はＭｃｈを、点線部はＳｃｈを示す。 Input spectrum (integration) NPoint indicates the integration result of Mch and Sch of the sensitivity correction integrator 2081 at the frequency N point. The sensitivity adjustment output spectrum NPoint indicates the Mch amplitude spectrum and the Sch amplitude spectrum level-corrected by the sensitivity difference correction gain unit 2085 at the frequency N point. The solid line portion indicates Mch, and the dotted line portion indicates Sch.

図１４において、ｔ０はＲＥＣ開始のタイミングであり、ｔ０からｔ１にかけては数１０秒程度の充分長い時間を表わしている。タイミングｔ２からｔ３にかけては、ズーム検出がＯＮされており、駆動検出部２０９５により駆動騒音が発生していることを表す。 In FIG. 14, t0 is the REC start timing, and represents a sufficiently long time of about several tens of seconds from t0 to t1. From timing t2 to t3, the zoom detection is ON, and the drive detection unit 2095 indicates that drive noise is generated.

入力スペクトルＮＰｏｉｎｔは、ＭｃｈとＳｃｈはＲＥＣ開始時ｔ０においてレベル差が生じている。それに対し、入力スペクトル（積分）ＮＰｏｉｎｔは、感度補正積分器２０８１により、積分されｔ０からｔ１にかけてゆっくりとレベル差に追従していく。感度調整出力スペクトルＮＰｏｉｎｔも入力スペクトル（積分）ＮＰｏｉｎｔの積分結果に対し、ｔ０からｔ１にかけて充分に時間を掛けて感度差補正ゲイン部２０８５にてゲイン補正をしていく。これは、感度差補正部２０８は、メインマイク２０５ａとサブマイク２０５ｂの感度補正を目的としているので、数十秒程度の充分な時間を掛けてのレベル補正で良く、過渡的な応答性を必要としない。 In the input spectrum NPoint, there is a level difference between Mch and Sch at the start of REC t0. On the other hand, the input spectrum (integration) NPoint is integrated by the sensitivity correction integrator 2081 and slowly follows the level difference from t0 to t1. The sensitivity adjustment output spectrum NPoint is also subjected to gain correction by the sensitivity difference correction gain unit 2085 with sufficient time from t0 to t1 with respect to the integration result of the input spectrum (integration) NPoint. This is because the sensitivity difference correction unit 208 is intended to correct the sensitivity of the main microphone 205a and the sub microphone 205b. Therefore, level correction over a sufficient time of about several tens of seconds is sufficient, and transient response is required. do not do.

また、タイミングｔ２からｔ３にかけてのズーム検出ＯＮ期間においては、感度補正積分器２０８１が停止状態となる。よって、駆動騒音が発生することにより、Ｍｃｈの振幅スペクトル、Ｓｃｈの振幅スペクトルに大きなレベル差が発生するが、感度補正積分器２０８１が停止状態にあるので、レベル差に追従することなく、値は保持される。前述したが、感度差補正部２０８は、メインマイク２０５ａとサブマイク２０５ｂの感度補正を目的としているので、駆動騒音による過渡的なレベル差分に対する応答は必要としない。感度差補正部２０８は上記タイミングチャートで示した補正を［０］〜［５１１］の振幅スペクトル全てにおいて実行する。 Further, in the zoom detection ON period from timing t2 to t3, the sensitivity correction integrator 2081 is stopped. Therefore, a large level difference occurs between the amplitude spectrum of Mch and the amplitude spectrum of Sch due to the generation of driving noise. However, since the sensitivity correction integrator 2081 is in a stopped state, the value does not follow the level difference. Retained. As described above, the sensitivity difference correction unit 208 is intended to correct the sensitivity of the main microphone 205a and the sub microphone 205b, and therefore does not require a response to a transient level difference due to driving noise. The sensitivity difference correction unit 208 performs the correction shown in the timing chart on all the amplitude spectra [0] to [511].

ここで、本実施形態の音声入力部１０２での風雑音演算処理部２１０の動作について、図１５から図１７を用いて説明する。 Here, the operation of the wind noise calculation processing unit 210 in the voice input unit 102 of the present embodiment will be described with reference to FIGS. 15 to 17.

図１５は、音声入力部１０２の一部であるサブマイク２０５ｂに対し、風防材１０２−３を構成したメカ構成を示す断面図である。 FIG. 15 is a cross-sectional view illustrating a mechanical configuration in which the windshield material 102-3 is configured for the sub microphone 205b which is a part of the voice input unit 102.

マイク穴を構成する外装部１０２−１は、メインマイク２０５ａを保持するメインマイクブッシュ１０２−２ａ、サブマイク２０５ｂを保持するサブマイクブッシュ１０２−２ｂ、其々のマイクブッシュを外装部へ押し付け保持をする押し付け部１０３により構成される。外装部１０２−１、押し付け部１０３についてはＰＣ材等のモールド部材で構成されるが、アルミ、ステンレス等の金属部材であっても問題ない。また、メインマイクブッシュ１０２−２ａ、サブマイクブッシュ１０２−２ｂについては、エチレンプロピレンジエンゴム等のゴム材にて構成される。 The exterior part 102-1 constituting the microphone hole presses and holds the main microphone bush 102-2a for holding the main microphone 205a, the sub microphone bush 102-2b for holding the sub microphone 205b, and the respective microphone bushes against the exterior part. The pressing unit 103 is configured. The exterior portion 102-1 and the pressing portion 103 are made of a mold member such as a PC material, but there is no problem even if it is a metal member such as aluminum or stainless steel. The main microphone bushing 102-2a and the sub microphone bushing 102-2b are made of a rubber material such as ethylene propylene diene rubber.

ここで、外装部１０２−１におけるマイク穴の穴径について説明する。サブマイク２０５ｂへのマイク穴の径は、メインマイク２０５ａへのマイク穴の径に対して小さい。実施形態では、サブマイク２０５ｂのマイク孔の径（直径）は、メインマイク２０５ａのマイク穴のそれの１／３の寸法としている。マイク穴形状については円状、楕円状が望ましいが、方形状でも構わない。また、其々の穴形状について、同形状でも別形状でも構わない。 Here, the hole diameter of the microphone hole in the exterior part 102-1 will be described. The diameter of the microphone hole to the sub microphone 205b is smaller than the diameter of the microphone hole to the main microphone 205a. In the embodiment, the diameter (diameter) of the microphone hole of the sub microphone 205b is set to 1/3 of that of the microphone hole of the main microphone 205a. The microphone hole shape is preferably circular or elliptical, but may be rectangular. Moreover, about each hole shape, the same shape or another shape may be sufficient.

次に、外装部１０２−１とマイクブッシュ１０２−２ａ、１０２−２ｂで構成されるマイク前面の空間と、クッション材の配置について説明する。外装部１０２−１とサブマイクブッシュ１０２−２ｂで構成されるサブマイク２０５ｂの前面の空間の容積は、外装部１０２−１とメインマイクブッシュ１０２−２ａで構成されるメインマイク２０５ａ前面の空間のそれより大きく、３倍確保する構成をとる。 Next, the space in front of the microphone constituted by the exterior portion 102-1 and the microphone bushes 102-2a and 102-2b and the arrangement of the cushion material will be described. The volume of the space in front of the sub microphone 205b composed of the exterior portion 102-1 and the sub microphone bush 102-2b is that of the space in front of the main microphone 205a composed of the exterior portion 102-1 and the main microphone bush 102-2a. Larger and 3 times assured.

外装部１０２−１とサブマイクブッシュ１０２−２ｂで構成されるサブマイク２０５ｂの前面の空間においては、風防材１０２−３として、風防クッション材やシールマイクを配置する。いずれも風の周波数に対応した０〜４ｋＨｚ程度の低周波帯の信号成分をフィルタする部材として構成される。風防材１０２−３により低周波帯が支配的である風雑音のサブマイク２０５ｂへの空気伝搬の影響を大幅に軽減する事が可能である。 In the space in front of the sub microphone 205b configured by the exterior portion 102-1 and the sub microphone bush 102-2b, a windshield cushion material and a seal microphone are disposed as the windshield material 102-3. Both are configured as members that filter signal components in the low frequency band of about 0 to 4 kHz corresponding to the wind frequency. By the windshield material 102-3, it is possible to greatly reduce the influence of air propagation to the sub microphone 205b of wind noise in which the low frequency band is dominant.

図１６は、風雑音入力時のメインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］と、サブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］を示している。風雑音入力時、風雑音成分は点線部の低周波帯域において存在している。風検出部２１０１は、メインマイク２０５ａからの周波数スペクトルＭａｉｎ［０］〜［５１１］、サブマイク２０５ｂからの周波数スペクトルＳｕｂ［０］〜［５１１］のうちから、低域周波数帯の例えば１０ポイントの相関をみて風雑音レベルの検出を行っている。風検出部２１０１は、例えば低域の周波数ポイントｎにおいて、次式に従って風雑音レベルを算出し、出力する。
風雑音レベル＝Σ（Ｍａｉｎ［ｎ］−Ｓｕｂ［ｎ］）／（Ｍａｉｎ［ｎ］＋Ｓｕｂ［ｎ］）
なお、上式は、低周波成分の１０ポイントとしているで、ｎは０乃至９の範囲内である。また、実施形態では、低域周波数帯を１０ポイントとしたが、この数は一例である。撮像装置の設計に応じて適宜設定することが望まれる。 FIG. 16 shows the frequency spectrum data Main [0] to [511] from the main microphone 205a and the frequency spectrum data Sub [0] to [511] from the sub microphone 205b when wind noise is input. When wind noise is input, the wind noise component exists in the low frequency band of the dotted line. The wind detection unit 2101 correlates, for example, 10 points in the low frequency band from the frequency spectrum Main [0] to [511] from the main microphone 205a and the frequency spectrum Sub [0] to [511] from the sub microphone 205b. The wind noise level is detected. The wind detection unit 2101 calculates and outputs a wind noise level according to the following equation, for example, at a low frequency point n.
Wind noise level = Σ (Main [n] −Sub [n]) / (Main [n] + Sub [n])
In the above equation, 10 points of low frequency components are used, and n is in the range of 0 to 9. In the embodiment, the low frequency band is 10 points, but this number is an example. It is desirable to set appropriately according to the design of the imaging device.

図１７は、風雑音ゲイン演算部２１０２にて演算される、風検出部２１０１からの風雑音レベルに対する風雑音ゲイン［０］〜［５１１］の周波数関係を示す。風検出部２１０１からの風雑音レベルが大きい程、風雑音ゲインはマイナス側へシフトし、点線の示すカットオフ周波数を高周波帯域へシフトする。前記カットオフの周波数により風雑音ゲイン［０］〜［５１１］は決定される。 FIG. 17 shows the frequency relationship of the wind noise gain [0] to [511] with respect to the wind noise level from the wind detection unit 2101 calculated by the wind noise gain calculation unit 2102. As the wind noise level from the wind detector 2101 increases, the wind noise gain shifts to the minus side, and the cutoff frequency indicated by the dotted line shifts to the high frequency band. Wind noise gains [0] to [511] are determined by the cut-off frequency.

次に、本実施形態の音声入力部１０２でのＭｃｈ／Ｓｃｈ選択部２１３の動作について、図１８（ａ），（ｂ）を用いて説明する。 Next, the operation of the Mch / Sch selection unit 213 in the voice input unit 102 of the present embodiment will be described with reference to FIGS. 18 (a) and 18 (b).

図１８（ａ）は、Ｍｃｈ／Ｓｃｈ選択部２１３１にて合成される、メインマイク２０５ａからの周波数スペクトルデータＭａｉｎ［０］〜［５１１］（図示のＭａｉｎｃｈ）とサブマイク２０５ｂからの周波数スペクトルデータＳｕｂ［０］〜［５１１］（図示のＳｕｂｃｈ）との、風雑音レベルに応じた合成比率と周波数の関係を示している。 FIG. 18A shows frequency spectrum data Main [0] to [511] (Main channel in the figure) from the main microphone 205a and frequency spectrum data Sub from the sub microphone 205b synthesized by the Mch / Sch selection unit 2131. The relationship between the synthesis ratio according to the wind noise level and the frequency with [0] to [511] (Sub ch in the drawing) is shown.

ここで図３（ａ）は、図３（ｂ）記載のメインマイク２０５ａとサブマイク２０５ｂのメカ構成に対応した実施形態を示す。ここでＭｃｈ／Ｓｃｈ選択部２１３１は、図１８（ａ）に示すように、風雑音レベルに基づき、Ｍａｉｎｃｈを１．０から０．５の比率で、また、Ｓｕｂｃｈを０から０．５の比率で合成する。 Here, FIG. 3A shows an embodiment corresponding to the mechanical configuration of the main microphone 205a and the sub microphone 205b shown in FIG. Here, as shown in FIG. 18A, the Mch / Sch selection unit 2131 sets Mainch at a ratio of 1.0 to 0.5 and Subch from 0 to 0.5 based on the wind noise level. Synthesize with the ratio of

風雑音レベルが大きい程、Ｍａｉｎｃｈにおいては１．０から０．５へ合成比率を下げ、Ｓｕｂｃｈにおいては０から０．５へ合成比率を上げ、ＭａｉｎｃｈとＳｕｂｃｈを合成するクロスオーバーの周波数（合成の上限周波数）を上げていく。そして、Ｍｃｈ／Ｓｃｈ選択部２１３１は、風雑音レベルに依存する上限周波数以下ではＭａｉｎｃｈとＳｕｂｃｈとを図示の比率で合成し、上限周波数を上回る周波数ではＭａｉｎｃｈを選択して出力する。風雑音レベルが０の場合は、Ｓｃｈの合成比率は０となる。ここで、図３（ｂ）で記載の通り、サブマイク２０５ｂへのマイク穴の径は、メインマイク２０５ａへのマイク穴の径に対して小さく、１／３に縮小された構成をとる。よって、サブマイク２０５ｂへの風雑音の影響度はメインマイク２０５ａよりも弱い。よって、風検出部２１０１からの風雑音レベルに応じて、Ｍｃｈに対しＳｃｈを合成する事により、風雑音の軽減に効果を果たす。 The higher the wind noise level, the lower the synthesis ratio from 1.0 to 0.5 at Main ch, and the synthesis ratio from 0 to 0.5 at Sub ch, and the crossover of synthesizing Main ch and Sub ch. Increase the frequency (upper limit frequency of synthesis). Then, the Mch / Sch selection unit 2131 synthesizes Main ch and Sub ch at a ratio below the upper limit frequency depending on the wind noise level, and selects and outputs the Main ch at a frequency exceeding the upper limit frequency. When the wind noise level is 0, the combination ratio of Sch is 0. Here, as described in FIG. 3B, the diameter of the microphone hole to the sub microphone 205b is smaller than the diameter of the microphone hole to the main microphone 205a and is reduced to 1/3. Therefore, the influence of wind noise on the sub microphone 205b is weaker than that of the main microphone 205a. Therefore, by combining Sch with Mch according to the wind noise level from the wind detection unit 2101, the effect of reducing wind noise is achieved.

次に図１８（ｂ）は、図１５に示すようにサブマイク２０５ｂに対し、風防材１０２−３を構成したメカ構成に対応した実施形態を示す。ここでＭｃｈ／Ｓｃｈ選択部２１３１は風雑音レベルから、Ｍｃｈを１．０から０の比率で、Ｓｃｈを０から１．０の比率で合成する。つまり、風雑音レベルが大きい程、Ｍｃｈにおいては１．０から０へ合成比率を下げ、Ｓｃｈにおいては０から１．０へ合成比率を上げ、ＭｃｈとＳｃｈを合成するクロスオーバーの周波数を上げていく。風雑音レベルが０の場合は、Ｓｃｈの合成比率は０となる。ここで図３（ａ）や図１５で記載の通り、サブマイク２０５ｂへのマイク穴の径は、メインマイク２０５ａへのマイク穴の径に対して小さく、１／３に縮小された構成をとる。かつ、外装部１０２−１とサブマイクブッシュ１０２−２ｂで構成されるサブマイク２０５ｂ前面の空間には、風防材１０２−３を設けている。よって、サブマイク２０５ｂへの風雑音の影響度はメインマイク２０５ａに対し、更に小さくできる。よって、風検出部２１０１からの風雑音レベルに応じて、ＭｃｈからＳｃｈに切り替えていく事で、風雑音の軽減に効果を果たす。 Next, FIG.18 (b) shows embodiment corresponding to the mechanical structure which comprised the windshield material 102-3 with respect to the sub microphone 205b as shown in FIG. Here, the Mch / Sch selection unit 2131 synthesizes Mch at a ratio of 1.0 to 0 and Sch at a ratio of 0 to 1.0 from the wind noise level. In other words, the greater the wind noise level, the lower the synthesis ratio from 1.0 to 0 for Mch, the synthesis ratio from 0 to 1.0 for Sch, and the crossover frequency for synthesizing Mch and Sch. Go. When the wind noise level is 0, the combination ratio of Sch is 0. Here, as described in FIGS. 3A and 15, the diameter of the microphone hole to the sub microphone 205b is smaller than the diameter of the microphone hole to the main microphone 205a and is reduced to 1/3. In addition, a windshield material 102-3 is provided in a space in front of the sub microphone 205b configured by the exterior portion 102-1 and the sub microphone bush 102-2b. Therefore, the influence of wind noise on the sub microphone 205b can be further reduced with respect to the main microphone 205a. Therefore, switching from Mch to Sch according to the wind noise level from the wind detection unit 2101 is effective in reducing wind noise.

ここで、本実施形態の音声入力部１０２でのステレオ抑制部２１１３の具体的動作について図１９、図２０を用いて説明する。 Here, a specific operation of the stereo suppression unit 2113 in the voice input unit 102 of the present embodiment will be described with reference to FIGS. 19 and 20.

図１９は、ステレオ抑制部２１１３について、駆動騒音検出時と風雑音検出時に応じて、ステレオ効果の強調に用いる強調係数を変更するタイミングチャートを示している。図１９において、Ｍａｉｎ［Ｎ］は、周波数Ｎポイント目のＭｃｈの振幅スペクトルデータを示す。また、駆動騒音検出信号は、駆動検出部２０９５により駆動騒音を検出した事を示す検出信号を示す。また、風雑音検出信号は、風検出部２１０１により風雑音を検出した事を示す風雑音レベル（予め設定された閾値以上の風雑音レベル）を示す。ＧａｉｎＬ［Ｎ］、ＧａｉｎＲ［Ｎ］は、ステレオゲイン演算処理部２１１２により決定された周波数Ｎポイント目のＭｃｈの振幅スペクトルに加算するステレオのＬｃｈ及び、Ｒｃｈのゲインを示す。 FIG. 19 shows a timing chart for changing the enhancement coefficient used for enhancing the stereo effect in the stereo suppression unit 2113 according to the drive noise detection and the wind noise detection. In FIG. 19, Main [N] indicates Mch amplitude spectrum data at the frequency N point. The drive noise detection signal is a detection signal indicating that the drive noise is detected by the drive detection unit 2095. The wind noise detection signal indicates a wind noise level (wind noise level equal to or higher than a preset threshold) indicating that the wind detection unit 2101 has detected wind noise. GainL [N] and GainR [N] indicate the stereo Lch and Rch gains to be added to the Mch amplitude spectrum at the Nth frequency determined by the stereo gain calculation processing unit 2112.

ステレオ抑制部２１１３は、Ｍｃｈ−Ｓｃｈ演算部２０９１からの駆動騒音を検出したことを示す検出信号を受けて、強調係数を０にする。また、風検出部２１０１からの風雑音を検出した事を示す風雑音レベルを受けて、強調係数を周波数に応じて０にする。 The stereo suppression unit 2113 receives the detection signal indicating that the driving noise is detected from the Mch-Sch calculation unit 2091 and sets the enhancement coefficient to 0. Further, in response to a wind noise level indicating that wind noise has been detected from the wind detection unit 2101, the enhancement coefficient is set to 0 according to the frequency.

ここで、タイミングｔ１からｔ２の期間を着目すると、Ｍａｉｎ［Ｎ］の振幅スペクトルは大きく変動しており、Ｍｃｈ−Ｓｃｈ演算部２０９１からの検出信号は、駆動騒音有りとして検出を示している。この期間、ＧａｉｎＬ［Ｎ］、ＧａｉｎＲ［Ｎ］は０に固定されている。つまりステレオ抑制部２１１３が強調係数を０にした事を表している。また、タイミングｔ３からｔ４の期間を着目すると、Ｍａｉｎ［Ｎ］の振幅スペクトルは大きく変動しており、風検出部２１０１からの風雑音検出信号は検出を示している。この期間、ＧａｉｎＬ［Ｎ］、ＧａｉｎＲ［Ｎ］は０に固定されている。つまりステレオ抑制部２１１３が強調係数を０にした事を表している。 Here, focusing on the period from the timing t1 to the timing t2, the amplitude spectrum of Main [N] greatly fluctuates, and the detection signal from the Mch-Sch calculation unit 2091 indicates detection as having drive noise. During this period, GainL [N] and GainR [N] are fixed to 0. In other words, this indicates that the stereo suppression unit 2113 has set the enhancement coefficient to 0. When attention is paid to the period from timing t3 to t4, the amplitude spectrum of Main [N] largely fluctuates, and the wind noise detection signal from the wind detection unit 2101 indicates detection. During this period, GainL [N] and GainR [N] are fixed to 0. In other words, this indicates that the stereo suppression unit 2113 has set the enhancement coefficient to 0.

図２０は、風検出部２１０１からの風雑音レベル検出時において、Ｍｃｈ／Ｓｃｈ選択部２１３１にてメインマイク２０５ａからの周波数スペクトルＭａｉｎ［０］〜［５１１］とサブマイク２０５ｂからの周波数スペクトルＳｕｂ［０］〜［５１１］が合成される比率と、周波数に対し、ステレオ抑制部２１１３にて強調係数を０にする周波数の関係を示す図である。ここでＭｃｈ／Ｓｃｈ選択部２１３１は風雑音レベルから、風雑音レベルが大きい程、Ｍｃｈにおいては１．０から０．５へ合成比率を下げ、Ｓｃｈにおいては０から０．５へ合成比率を上げ、ＭｃｈとＳｃｈを合成するクロスオーバーの周波数を上げていく。風雑音レベルの場合、前記クロスオーバーの周波数は５００Ｈｚである。これに対しステレオ抑制部２１１３は、前記クロスオーバーの周波数よりも高い周波数７５０Ｈｚまで強調係数を０に固定する。ステレオ抑制部２１１３は、風検出部２１０１からの風雑音レベルが大きい程、強調係数を０に固定する周波数を上げていく。ＧａｉｎＬ、ＧａｉｎＲのステレオゲインによる強調により、風雑音も強調されることを防ぐ。 FIG. 20 shows the frequency spectrum Main [0] to [511] from the main microphone 205a and the frequency spectrum Sub [0] from the sub microphone 205b in the Mch / Sch selection unit 2131 when the wind noise level from the wind detection unit 2101 is detected. ] To [511] are diagrams showing the relationship between the ratio of combining frequency and the frequency at which the stereo suppression unit 2113 sets the enhancement coefficient to 0 with respect to the frequency. Here, the Mch / Sch selection unit 2131 decreases the synthesis ratio from 1.0 to 0.5 for Mch and increases the synthesis ratio from 0 to 0.5 for Sch as the wind noise level increases from the wind noise level. The frequency of the crossover for synthesizing Mch and Sch is increased. For wind noise levels, the crossover frequency is 500 Hz. On the other hand, the stereo suppression unit 2113 fixes the enhancement coefficient to 0 until a frequency 750 Hz higher than the crossover frequency. The stereo suppression unit 2113 increases the frequency at which the enhancement coefficient is fixed to 0 as the wind noise level from the wind detection unit 2101 increases. By emphasizing GainL and GainR with stereo gain, wind noise is also prevented from being enhanced.

ここで、本実施形態の音声入力部１０２での駆動音減算量積分器２０９７、風雑音減算量積分器２１０３、右ゲイン積分器２１１４、左ゲイン積分器２１１５の動作について図２１を用いて説明する。 Here, operations of the drive sound subtraction amount integrator 2097, the wind noise subtraction amount integrator 2103, the right gain integrator 2114, and the left gain integrator 2115 in the sound input unit 102 of the present embodiment will be described with reference to FIG. .

図２１は、周波数Ｎポイント目のＭｃｈの振幅スペクトルデータについての、それぞれ決定される駆動騒音除去ゲインＮＣ＿ＧＡＩＮ［Ｎ］、風雑音減算量ＷＣ＿ＧＡＩＮ［Ｎ］、Ｌｃｈ生成用ステレオゲインＬ＿ＧＡＩＮ［Ｎ］、Ｒｃｈ生成用ステレオゲインＲ＿ＧＡＩＮ［Ｎ］のそれぞれに対する時定数を示す。これらは、駆動音減算量積分器２０９７、風雑音減算量積分器２１０３、左ゲイン積分器２１１４，右ゲイン積分器２１１５により決定される。駆動騒音減算量積分器の時定数は、右ゲイン積分器２１１５、左ゲイン積分器２１１４の時定数に対して遅く、風雑音減算量積分器の時定数は右ゲイン積分器２１１５、左ゲイン積分器２１１４の時定数に対して遅い。駆動騒音と風雑音は、それぞれ駆動騒音成分であり、時系列でのばらつきも大きく、時定数を遅くしてそれぞれ駆動騒音減算と風雑音減算の追従を遅くすることで前記ばらつきを抑える。また、ステレオゲインについては、時定数を早くすることで、発音する被写体の移動に対する追従を早くする。 FIG. 21 shows the determined drive noise elimination gain NC_GAIN [N], wind noise subtraction amount WC_GAIN [N], Lch generation stereo gain L_GAIN [N], Rch for the Mch amplitude spectrum data at the Nth frequency point. A time constant for each of the generating stereo gains R_GAIN [N] is shown. These are determined by a driving sound subtraction amount integrator 2097, a wind noise subtraction amount integrator 2103, a left gain integrator 2114, and a right gain integrator 2115. The time constant of the driving noise subtraction amount integrator is slower than the time constant of the right gain integrator 2115 and the left gain integrator 2114, and the time constant of the wind noise subtraction amount integrator is the right gain integrator 2115 and the left gain integrator. Slow for 2114 time constant. Driving noise and wind noise are driving noise components, respectively, and have large variations in time series. The variation is suppressed by delaying the time constant and delaying the tracking of driving noise subtraction and wind noise subtraction, respectively. As for the stereo gain, the follow-up to the movement of the sounding subject is accelerated by increasing the time constant.

本実施形態においては、２系統の音声が入力される場合について説明したが、それ以上のチャンネル数であっても適用することができる。 In the present embodiment, the case where two lines of audio are input has been described, but the present invention can be applied even when the number of channels is more than that.

また、本実施形態においては、撮像装置について説明したが、本実施形態の音声入力部１０２の音声処理は、外部の音声を記録、または入力するような装置つまり、音声記録装置であればどのような装置であっても適用することができる。例えば、ＩＣレコーダ、携帯電話等に適用しても良い。 In the present embodiment, the imaging apparatus has been described. However, the audio processing of the audio input unit 102 according to the present embodiment is any apparatus that records or inputs external audio, that is, an audio recording apparatus. Even a simple device can be applied. For example, you may apply to an IC recorder, a mobile telephone, etc.

また、実施形態では、図６に示す構成をハードウェアにより実現する例を説明したが、例えば、同図のマイクやＡＤ変換部等を除く処理部の多くを、プロセッサが実行するプロシージャやサブルーチン等のプログラムで実現しても構わない。 In the embodiment, the example in which the configuration illustrated in FIG. 6 is realized by hardware has been described. However, for example, procedures, subroutines, and the like that are executed by the processor in many of the processing units other than the microphone and the AD conversion unit in FIG. This program may be realized.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００…撮像装置、１０１…撮像部、１０２…音声入力部、１０３…メモリ、１０４…表示制御部、１０５…表示部、１０６…符号化処理部、１０７…記録再生部、１０８…記録媒体、１０９…制御部、１１０…操作部、１１１…音声出力部、１１２…スピーカ、１１３…外部出力部、１１４…データバス、２０１…光学レンズ、２０２…撮像素子、２０３…画像処理部、２０４…光学レンズ制御部、２０５…マイク、２０５ａ…メインマイク、２０５ｂ…サブマイク、２０６…Ａ／Ｄ変換部、２０７…ＦＦＴ部、２０８…感度差補正部、２０９…駆動音演算処理部、２１０…風雑音演算処理部、２１１…ステレオゲイン演算処理部、２１２…トータルゲイン演算部、２１３…Ｌ／Ｒｃｈ生成部、２１４…ｉＦＦＴ部、２１５…音声処理部、２１６…ＡＬＣ部、１０２−１…外装部、１０２−２ａ…メインマイクブッシュ、１０２−２ｂ…サブマイクブッシュ、１０２−３…風防材 DESCRIPTION OF SYMBOLS 100 ... Imaging device 101 ... Imaging part 102 ... Audio | voice input part 103 ... Memory 104 ... Display control part 105 ... Display part 106 ... Encoding process part 107 ... Recording / reproducing part 108 ... Recording medium, 109 DESCRIPTION OF SYMBOLS Control part 110 ... Operation part 111 ... Audio | voice output part 112 ... Speaker 113 ... External output part 114 ... Data bus 201 ... Optical lens 202 ... Imaging element 203 ... Image processing part 204 ... Optical lens Control unit, 205 ... microphone, 205a ... main microphone, 205b ... sub microphone, 206 ... A / D conversion unit, 207 ... FFT unit, 208 ... sensitivity difference correction unit, 209 ... drive sound calculation processing unit, 210 ... wind noise calculation processing 211, stereo gain calculation processing unit, 212 ... total gain calculation unit, 213 ... L / Rch generation unit, 214 ... iFFT unit, 215 ... audio processing unit, 16 ... ALC unit, 102-1 ... exterior, 102-2A ... main microphone bush, 102-2B ... sub microphone bush, 102-3 ... windshield member

Claims

A housing,
A drive unit;
A first microphone housed in the housing such that sound is propagated through a first opening provided at a first predetermined position of the housing;
The sound is propagated through a second opening having a smaller area than the first opening provided at a second predetermined position of the housing related to the first predetermined position. 2, wherein the volume of the second space between the second microphone and the second opening is the first space between the first microphone and the first opening. The second microphone housed inside the housing to be larger than the volume of
Conversion means for converting time-series audio data obtained from the first microphone into first frequency spectrum data and time-series audio data obtained from the second microphone into second frequency spectrum data; ,
Calculating means for calculating the amount of noise by the driving unit for each frequency from the first frequency spectrum data and the second frequency spectrum data obtained by the converting means;
Based on the first frequency spectrum data, the second frequency spectrum data, and the amount of the noise calculated by the calculation means, the frequency spectrum data of the left channel in which the noise is suppressed, and the right channel Generating means for generating frequency spectrum data;
An audio processing apparatus comprising: inverse conversion means for inversely converting the frequency spectrum data of the left and right channels generated by the generation means into the audio data of the time-series left and right channels.

A first microphone bush for holding the first microphone;
A second microphone bush for holding the second microphone;
The first space is constituted by the casing and the first microphone bush,
The audio processing apparatus according to claim 1, wherein the second space includes the casing and the second microphone bush.

The noise propagated to the second microphone via the second space is larger than the noise propagated to the first microphone via the first space. The speech processing apparatus according to 1.

The audio processing apparatus according to claim 1, wherein the first microphone is a microphone corresponding to one of a left channel and a right channel, and the second microphone is a microphone corresponding to the other.

The generation means determines the gain of each of the right channel and the left channel based on the first frequency spectrum data, the second frequency spectrum data, and the amount of noise calculated by the calculation means, The first frequency spectrum data is controlled by the right channel gain to generate the right channel frequency spectrum data, and the first frequency spectrum data is controlled by the left channel gain to control the left channel frequency spectrum data. The speech processing apparatus according to claim 1, wherein:

A method for controlling a speech processing apparatus,
The sound processing device is accommodated in the housing so that sound is propagated through a housing, a drive unit, and a first opening provided at a first predetermined position of the housing. The first microphone and a second opening of the casing that is provided at a second predetermined position related to the first predetermined position and having a smaller area than the first opening. A second microphone through which sound is propagated, wherein a volume of a second space between the second microphone and the second opening is determined by the first microphone and the first opening. The second microphone housed in the housing to be larger than the volume of the first space between
The method
A conversion step of converting time-series audio data obtained from the first microphone into first frequency spectrum data, and converting time-series audio data obtained from the second microphone into second frequency spectrum data; ,
From the first frequency spectrum data and the second frequency spectrum data obtained by the conversion step, a calculation step for calculating the amount of noise by the driving unit for each frequency;
Based on the first frequency spectrum data, the second frequency spectrum data, and the amount of the noise calculated by the calculation step, the frequency spectrum data of the left channel in which the noise is suppressed, and the right channel Generating step of generating frequency spectrum data;
And a reverse conversion step of reversely converting the frequency spectrum data of the left and right channels generated in the generation step into the audio data of the time-series left and right channels, respectively.

A program that is read and executed by a processor of a sound processing device,
The voice processing device
The sound processing device is accommodated in the housing so that sound is propagated through a housing, a drive unit, and a first opening provided at a first predetermined position of the housing. The first microphone and a second opening of the casing that is provided at a second predetermined position related to the first predetermined position and having a smaller area than the first opening. A second microphone through which sound is propagated, wherein a volume of a second space between the second microphone and the second opening is determined by the first microphone and the first opening. The second microphone housed in the housing to be larger than the volume of the first space between
The said program is a program for making the said processor perform each step of the method of Claim 6.