JP4639902B2

JP4639902B2 - Imaging apparatus, audio recording method, and program

Info

Publication number: JP4639902B2
Application number: JP2005098492A
Authority: JP
Inventors: 孝夫菅家
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2005-03-30
Filing date: 2005-03-30
Publication date: 2011-02-23
Anticipated expiration: 2025-03-30
Also published as: JP2006279757A

Abstract

<P>PROBLEM TO BE SOLVED: To properly eliminate a mechanism noise sound caused at photographing from a sound signal and recording the resulting sound signal. <P>SOLUTION: A storage section 54 stores a plurality of spectral patterns of motor sound picked up in advance through a sound input section 51 (main microphone) as first noise spectral patterns. Further, a storage section 64 stores a plurality of spectral patterns of motor sound picked up in advance through a reference input section 61 (reference microphone) as second noise spectral patterns. In photographing, the second noise spectral in accordance with the motor sound obtained from the reference input section 61 is selected from the storage section 64 and the first noise spectral corresponding to the second noise spectral is selected from the storage section 54 and both are given to a subtract section 55. Thus, the noise component can properly be eliminated by subtracting the noise spectral with the same input characteristic as that of the spectral of the sound signal from the spectral of the sound signal obtained through the sound input section 51. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、デジタルカメラ等の撮像装置に係り、特に撮影中に入力された音声信号を撮影画像と共に記録可能な機能を備えた撮像装置と、この撮像装置に用いられる音声記録方法及びプログラムに関する。 The present invention relates to an imaging apparatus such as a digital camera, and more particularly to an imaging apparatus having a function capable of recording an audio signal input during imaging together with a captured image, and an audio recording method and program used for the imaging apparatus.

従来から音声信号に重畳した雑音を除去するための手法として、スペクトルサブトラクション（ｓｐｅｃｔｒａｌｓｕｂｔｒａｃｔｉｏｎ）法が知られている。このスペクトルサブトラクション法（以下、ＳＳ法と呼ぶ）とは、無音区間におけるスペクトルを雑音スペクトルと推定し、その雑音スペクトルに所定の係数（サブトラクト係数）を乗じた信号を入力音声スペクトルから差し引くことで雑音成分を除去する方法である。 Conventionally, a spectral subtraction method is known as a method for removing noise superimposed on an audio signal. This spectral subtraction method (hereinafter referred to as the SS method) estimates the spectrum in the silent section as a noise spectrum, and subtracts a signal obtained by multiplying the noise spectrum by a predetermined coefficient (subtract coefficient) from the input speech spectrum. This is a method for removing components.

ここで、特許文献１では、前記ＳＳ法を用いた雑音除去システムにおいて、サブトラクト係数を音声信号のフレームパワーに依存してフレーム毎に変化させることにより、推定雑音スペクトルの引き過ぎによるスペクトル歪を減少させることが開示されている。すなわち、母音のような音声パワーが確保されている区間は通常のサブトラクト係数を掛け、破裂子音のような音声パワーの少ない部分ではサブトラクト係数を小さくすることにより、推定雑音スペクトルの引き過ぎによる入力音声スペクトルの歪を抑えるようにしている。 Here, in Patent Document 1, in the noise removal system using the SS method, the subtract coefficient is changed for each frame depending on the frame power of the audio signal, thereby reducing the spectrum distortion due to the excessive pulling of the estimated noise spectrum. Is disclosed. That is, input speech due to excessive estimation noise spectrum by multiplying normal subtract coefficients such as vowels and reducing the subtract coefficient in parts with low speech power such as burst consonants. The distortion of the spectrum is suppressed.

また、特許文献２では、雑音スペクトルを無音区間のスペクトルでは無く、別途設けた参照入力部から入力されたスペクトルから雑音スペクトルを推定する方式を提案している。
特開平８−２２１０９２号公報特開平５−１６５４９２号公報 Further, Patent Document 2 proposes a method of estimating a noise spectrum from a spectrum input from a separately provided reference input unit, instead of a noise spectrum.
JP-A-8-2221092 JP-A-5-165492

上述したように、ＳＳ法を用いて入力音声から雑音成分を除去する方法が知られている。しかしながら、音声付きの動画撮影機能を備えたデジタルカメラでは、その撮影中に音声入力とは無関係にズーム音やフォーカス音などの機構音が発生して入力音声に入り込む問題がある。 As described above, a method for removing a noise component from input speech using the SS method is known. However, a digital camera equipped with a moving image recording function with sound has a problem that a mechanism sound such as a zoom sound or a focus sound is generated and enters the input sound regardless of the sound input during the shooting.

この場合、前記特許文献１のように、無音区間の音声スペクトル信号から雑音スペクトルを推定する方法では、音声入力とは無関係に発生する機構音を雑音として除去することはできない。 In this case, with the method of estimating the noise spectrum from the speech spectrum signal in the silent section as in Patent Document 1, the mechanical sound generated regardless of the speech input cannot be removed as noise.

また、前記特許文献２のように、参照入力部を新たに設ける構成では、以下のような問題がある。 In addition, the configuration in which a reference input unit is newly provided as in Patent Document 2 has the following problems.

ＳＳ法をデジタルカメラでの動画撮影時における機構音、例えばズームモータ音の低減に適用することを考える。すなわち、カメラ機器の筐体上に設置されて、音声信号を主に集音する主マイクと、機器筐体内のモータ近くに設置されて、モータ音を主に集音する参照マイクを用意する。そして、主マイクから入力された音声信号のスペクトルと、参照マイクから入力されたモータ音のスペクトルを生成後、音声スペクトル信号からモータ音スペクトル信号を減算することにより、主マイクからの音声信号に含まれるズーム音を低減することになる。 Consider the application of the SS method to the reduction of mechanical sounds, such as zoom motor sound, during moving image shooting with a digital camera. That is, a main microphone that is installed on the housing of the camera device and mainly collects audio signals and a reference microphone that is installed near the motor in the device housing and mainly collects motor sound are prepared. Then, after generating the spectrum of the audio signal input from the main microphone and the spectrum of the motor sound input from the reference microphone, it is included in the audio signal from the main microphone by subtracting the motor sound spectrum signal from the audio spectrum signal. The zoom sound will be reduced.

しかしながら、主マイクと参照マイクの設置場所の違いにより、モータからそれぞれのマイクヘの伝達関数がかなり違った特性になる。つまり、参照マイクは、外部音を入力しないで、機器内部のモータ音のみを主として入力するものであるため、機器筐体内部での多重反射音、電子基板や筐体を伝わる振動等を含むことになる。 However, the transfer function from the motor to each microphone has considerably different characteristics due to the difference in installation location of the main microphone and the reference microphone. In other words, since the reference microphone mainly inputs only the motor sound inside the device without inputting external sound, it includes multiple reflected sounds inside the device case, vibrations transmitted through the electronic board and case, etc. become.

したがって、主マイクを通じて入力された音声信号のスペクトルから参照マイクを通じて入力されたモータ音のスペクトルを減算すると、両者の入力特性が違うために、モータ音を正しく除去できないばかりか、スペクトルの引き過ぎなどにより入力音声そのものが歪んでしまうなどの問題がある。 Therefore, if the spectrum of the motor sound input through the reference microphone is subtracted from the spectrum of the audio signal input through the main microphone, the motor sound cannot be removed correctly because the input characteristics of the two differ, and the spectrum is excessively pulled. Therefore, there is a problem that the input voice itself is distorted.

本発明は前記のような点に鑑みなされたもので、音声信号から撮影時に発生する機構音を適切に除去して記録することのできる撮像装置、音声記録方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and an object of the present invention is to provide an imaging apparatus, an audio recording method, and a program capable of appropriately removing and recording a mechanical sound generated during imaging from an audio signal. To do.

本発明の請求項１に係る撮像装置は、音声付き動画撮影を行う場合に、音声信号から撮影動作に伴って発生する機構音を雑音として除去して記録する音声記録機能を備えた撮像装置であって、音声信号を入力する第１の入力手段と、この第１の入力手段を通じて事前に採取した機構音のスペクトルを第１の雑音スペクトルとして複数パターン記憶した第１の記憶手段と、機構音の発生源近傍に設けられた第２の入力手段と、この第２の入力手段を通じて事前に採取した機構音のスペクトルを第２の雑音スペクトルとして、前記第１の雑音スペクトルの各パターンに関連付けて複数パターン記憶した第２の記憶手段と、撮影時に前記第２の入力手段から得られる機構音のスペクトルに基づいて前記第２の記憶手段の中から当該機構音の特性に応じた第２の雑音スペクトルを選択すると共に、前記第１の記憶手段の中から当該第２の雑音スペクトルに対応した第１の雑音スペクトルを選択する選択手段と、前記第１の入力手段から得られる音声信号のスペクトルから前記選択手段によって選択された第１の雑音スペクトルに所定の係数を乗じた信号を減算することで雑音成分を除去する雑音除去手段と、この雑音除去手段によって得られた雑音除去後の音声スペクトルを元の音声信号に逆変換する逆変換手段と、この逆変換手段によって得られた音声信号を撮影画像と共に記録する記録手段とを具備したことを特徴とする。 The imaging apparatus according to claim 1 of the present invention is an imaging apparatus having an audio recording function for removing and recording mechanical sound generated as a result of a shooting operation from an audio signal as noise when performing moving image shooting with audio. A first input means for inputting an audio signal; a first storage means for storing a plurality of patterns of the spectrum of the mechanism sound previously collected through the first input means as a first noise spectrum; and a mechanism sound. A second input means provided in the vicinity of the source of the noise and a spectrum of the mechanical sound collected in advance through the second input means as a second noise spectrum in association with each pattern of the first noise spectrum Based on the second storage means storing a plurality of patterns and the spectrum of the mechanical sound obtained from the second input means at the time of shooting, the characteristics of the mechanical sound are selected from the second storage means. The second noise spectrum is selected, and the selection means for selecting the first noise spectrum corresponding to the second noise spectrum from the first storage means and the first input means are obtained. Noise removal means for removing a noise component by subtracting a signal obtained by multiplying a first noise spectrum selected by the selection means by a predetermined coefficient from the spectrum of the audio signal, and noise removal obtained by the noise removal means The present invention is characterized by comprising inverse conversion means for inversely converting a later audio spectrum into the original audio signal, and recording means for recording the audio signal obtained by the inverse conversion means together with the photographed image.

このような構成によれば、第１の入力手段とは別に機構音の発生源近傍に設けられた第２の入力手段を通じて機構音のみの信号を入力して、その機構音の特性に対応した雑音スペクトルを入力音声のスペクトルから減算する。その際に、音声信号と同じ入力特性を有する雑音スペクトルを用いて減算することで、音声信号に含まれる機構音を雑音成分として適切に除去して記録することができる。 According to such a configuration, a signal of only the mechanical sound is input through the second input means provided in the vicinity of the mechanical sound generation source separately from the first input means, and the characteristic of the mechanical sound is dealt with. The noise spectrum is subtracted from the input speech spectrum. At that time, by subtracting using a noise spectrum having the same input characteristics as the voice signal, the mechanical sound included in the voice signal can be appropriately removed and recorded as a noise component.

また、本発明の請求項２は、前記請求項１記載の撮像装置において、撮影時に前記第２の入力手段から得られる機構音のスペクトルと、前記第２の記憶手段に記憶された雑音スペクトルの各パターンとの類似度を計算する類似度計算手段を備え、前記選択手段は、前記類似度計算手段の計算結果に基づいて前記第２の記憶手段の中から最も類似度の高い第２の雑音スペクトルを選択し、前記第１の記憶手段の中から当該第２の雑音スペクトルに対応した第１の雑音スペクトルを選択することを特徴とする。 According to a second aspect of the present invention, in the imaging apparatus according to the first aspect, the spectrum of the mechanical sound obtained from the second input means during photographing and the noise spectrum stored in the second storage means Similarity calculation means for calculating the similarity to each pattern is provided, and the selection means is configured to select a second noise having the highest similarity from the second storage means based on a calculation result of the similarity calculation means. A spectrum is selected, and a first noise spectrum corresponding to the second noise spectrum is selected from the first storage means.

このような構成によれば、機構音の特性に最も近い雑音スペクトルを用いて、音声信号に含まれる機構音を雑音成分として適切に除去して記録することができる。 According to such a configuration, it is possible to appropriately remove and record the mechanical sound included in the audio signal as a noise component using the noise spectrum closest to the characteristic of the mechanical sound.

また、本発明の請求項３は、前記請求項１記載の撮像装置において、前記機構音は、撮影動作に関連した特定のモータの駆動音を含み、前記第１および第２の記憶手段には、少なくとも前記モータの駆動開始時、定常回転時、駆動停止時の３つの期間に対応した雑音スペクトルの各パターンが記憶されていることを特徴とする。 According to a third aspect of the present invention, in the imaging apparatus according to the first aspect, the mechanical sound includes a driving sound of a specific motor related to a photographing operation, and the first and second storage units include Each pattern of the noise spectrum corresponding to at least three periods of when the motor starts to be driven, during steady rotation, and when the drive is stopped is stored.

このような構成によれば、機構音として、撮影動作に関連した特定のモータの駆動音を含む場合において、このモータが駆動されたときの各期間に応じて雑音スペクトルの各パターンを選択的に使用して、音声信号に含まれる機構音を雑音成分として適切に除去して記録することができる。 According to such a configuration, when the mechanism sound includes a driving sound of a specific motor related to the photographing operation, each pattern of the noise spectrum is selectively selected according to each period when the motor is driven. By using it, the mechanical sound contained in the audio signal can be appropriately removed as a noise component and recorded.

また、本発明の請求項４は、前記請求項１記載の撮像装置において、前記機構音は、撮影動作に関連した特定のモータの駆動音を含み、前記第１および第２の記憶手段には、前記モータを所定方向に駆動させた場合と前記所定方向とは反対の方向に駆動させた場合の雑音スペクトルの各パターンが記憶されていることを特徴とする。 According to a fourth aspect of the present invention, in the imaging apparatus according to the first aspect, the mechanical sound includes a driving sound of a specific motor related to a photographing operation, and the first and second storage units include Each pattern of the noise spectrum when the motor is driven in a predetermined direction and when the motor is driven in a direction opposite to the predetermined direction is stored.

このような構成によれば、機構音として、撮影動作に関連した特定のモータの駆動音を含む場合において、このモータの駆動方向に応じて雑音スペクトルの各パターンを選択的に使用して、音声信号に含まれる機構音を雑音成分として適切に除去して記録することができる。 According to such a configuration, when the mechanism sound includes a driving sound of a specific motor related to the shooting operation, each noise spectrum pattern is selectively used according to the driving direction of the motor, The mechanical sound included in the signal can be appropriately removed as a noise component and recorded.

また、本発明の請求項５は、前記請求項１記載の撮像装置において、前記機構音は、撮影動作に関連した特定のモータの駆動音を含み、前記第１および第２の記憶手段には、前記モータを所定方向に駆動させた場合と前記所定方向とは反対の方向に駆動させた場合とで、少なくとも前記モータの駆動開始時、定常回転時、駆動停止時の３つの期間に対応した雑音スペクトルの各パターンが記憶されていることを特徴とする。 According to a fifth aspect of the present invention, in the imaging apparatus according to the first aspect, the mechanical sound includes a driving sound of a specific motor related to a photographing operation, and the first and second storage units include , When the motor is driven in a predetermined direction and when it is driven in a direction opposite to the predetermined direction, the motor corresponds to at least three periods at the start of driving, at the time of steady rotation, and at the time of stopping driving. Each pattern of the noise spectrum is stored.

このような構成によれば、機構音として、撮影動作に関連した特定のモータの駆動音を含む場合において、このモータの回転方向と駆動期間に応じて雑音スペクトルの各パターンを選択的に使用して、音声信号に含まれる機構音を雑音成分として適切に除去して記録することができる。 According to such a configuration, when the mechanism sound includes a driving sound of a specific motor related to the photographing operation, each pattern of the noise spectrum is selectively used according to the rotation direction and the driving period of the motor. Thus, the mechanical sound included in the audio signal can be appropriately removed as a noise component and recorded.

また、本発明の請求項６は、前記請求項１乃至５のいずれか１つに記載の撮像装置において、前記第２の入力手段から得られる機構音のパワーを算出するパワー算出手段と、このパワー算出手段によって算出された機構音のパワーに基づいて前記雑音除去手段による雑音除去動作を制御する制御手段とを備えたことを特徴とする。 According to a sixth aspect of the present invention, in the imaging apparatus according to any one of the first to fifth aspects, a power calculating unit that calculates the power of the mechanical sound obtained from the second input unit; And control means for controlling a noise removal operation by the noise removal means based on the power of the mechanical sound calculated by the power calculation means.

このような構成によれば、撮影時に機構音が実際に発生している場合にのみ、雑音スペクトルを用いて雑音除去を行うことができる。 According to such a configuration, noise removal can be performed using the noise spectrum only when a mechanical sound is actually generated at the time of shooting.

また、本発明の請求項７は、前記請求項６記載の撮像装置において、前記制御手段は、前記パワー算出手段によって算出された機構音のパワーが所定値より小さい場合に前記雑音除去手段による雑音除去動作を禁止し、前記機構音のパワーが所定値以上の場合に前記雑音除去手段による雑音除去動作を許可することを特徴とする。 According to a seventh aspect of the present invention, in the imaging apparatus according to the sixth aspect, the control means is configured to reduce noise generated by the noise removal means when the power of the mechanical sound calculated by the power calculation means is smaller than a predetermined value. A removal operation is prohibited, and a noise removal operation by the noise removal unit is permitted when the power of the mechanical sound is equal to or higher than a predetermined value.

このような構成によれば、機構音のパワーが所定値より小さい場合に雑音除去動作を禁止することで、撮影時に機構音が発生していないときに雑音除去を行ってしまい、スペクトルの引き過ぎなどにより音声信号が歪むことを防止することができる。 According to such a configuration, the noise removal operation is prohibited when the mechanical sound power is lower than the predetermined value, so that the noise removal is performed when no mechanical sound is generated at the time of shooting, and the spectrum is excessively drawn. It is possible to prevent the audio signal from being distorted by such as

また、本発明の請求項８は、前記請求項１乃至５のいずれか１つに記載の撮像装置において、前記第１の入力手段から得られる音声信号を増幅する増幅手段と、前記音声信号のパワーを算出し、そのパワー値に基づいて前記増幅手段の増幅率を調整する増幅率調整手段と、この増幅率調整手段によって調整された前記増幅手段の増幅率に合わせて前記第１の雑音スペクトルに乗じる係数の値を変更する係数可変手段とを備えたことを特徴とする。 According to an eighth aspect of the present invention, in the imaging apparatus according to any one of the first to fifth aspects, an amplification unit that amplifies an audio signal obtained from the first input unit; An amplification factor adjustment unit that calculates power and adjusts the amplification factor of the amplification unit based on the power value, and the first noise spectrum according to the amplification factor of the amplification unit adjusted by the amplification factor adjustment unit Coefficient variable means for changing the value of the coefficient to be multiplied by is provided.

このような構成によれば、入力音声の増幅率を調整して録音レベルを制御する場合において、その増幅率に応じて第１の雑音スペクトルに乗じる係数の値を変更することで、スペクトルの引き過ぎや引き残しを防ぐことができる。 According to such a configuration, when the recording level is controlled by adjusting the amplification factor of the input sound, the value of the coefficient multiplied by the first noise spectrum is changed according to the amplification factor, thereby subtracting the spectrum. It can prevent over and over.

本発明の請求項９に係る音声記録方法は、音声信号を入力する第１の入力部とは別に、機構音の発生源近傍に第２の入力部が設けられた撮像装置に用いられる音声記録方法であって、前記第１の入力部を通じて事前に採取した機構音のスペクトルを第１の雑音スペクトルとして第１の記憶部に複数パターン記憶するステップと、前記第２の入力部を通じて事前に採取した機構音のスペクトルを第２の雑音スペクトルとして、前記第１の雑音スペクトルの各パターンに関連付けて第２の記憶部に複数パターン記憶するステップと、撮影時に前記第２の入力部から得られる機構音のスペクトルに基づいて前記第２の記憶部の中から当該機構音の特性に応じた第２の雑音スペクトルを選択すると共に、前記第１の記憶部の中から当該第２の雑音スペクトルに対応した第１の雑音スペクトルを選択するステップと、前記第１の入力部から得られる音声信号のスペクトルから前記選択された第１の雑音スペクトルに所定の係数を乗じた信号を減算することで雑音成分を除去するステップと、この雑音除去後の音声スペクトルを元の音声信号に逆変換するステップと、この逆変換手段によって得られた音声信号を撮影画像と共に所定のメモリに記録するステップとを備えたことを特徴とする。 According to a ninth aspect of the present invention, there is provided an audio recording method for use in an image pickup apparatus provided with a second input unit in the vicinity of a mechanical sound source, in addition to the first input unit for inputting an audio signal. A method of storing a plurality of patterns of mechanical sound previously collected through the first input unit as a first noise spectrum in a first storage unit, and collecting in advance through the second input unit A plurality of patterns stored in the second storage unit in association with each pattern of the first noise spectrum as a second noise spectrum, and a mechanism obtained from the second input unit during photographing Based on the sound spectrum, a second noise spectrum corresponding to the characteristic of the mechanical sound is selected from the second storage unit, and the second noise spectrum is selected from the first storage unit. Selecting a first noise spectrum corresponding to the tone, and subtracting a signal obtained by multiplying the selected first noise spectrum by a predetermined coefficient from the spectrum of the audio signal obtained from the first input unit. Removing the noise component in step, inversely transforming the speech spectrum after the noise removal into the original speech signal, recording the speech signal obtained by the inverse transforming means together with the captured image in a predetermined memory, and It is provided with.

このような音声記録方法によれば、前記各ステップに従った処理を実行することにより、前記請求項１記載の発明と同様の作用効果が奏せられる。 According to such a sound recording method, the same effects as those of the first aspect of the invention can be achieved by executing the processing according to the steps.

本発明の請求項１０に係るプログラムは、音声信号を入力する第１の入力部とは別に、機構音の発生源近傍に第２の入力部が設けられた撮像装置に搭載されたコンピュータによって実行されるプログラムであって、前記コンピュータに、前記第１の入力部を通じて事前に採取した機構音のスペクトルを第１の雑音スペクトルとして第１の記憶部に複数パターン記憶する機能と、前記第２の入力部を通じて事前に採取した機構音のスペクトルを第２の雑音スペクトルとして、前記第１の雑音スペクトルの各パターンに関連付けて第２の記憶部に複数パターン記憶する機能と、撮影時に前記第２の入力部から得られる機構音のスペクトルに基づいて前記第２の記憶部の中から当該機構音の特性に応じた第２の雑音スペクトルを選択すると共に、前記第１の記憶部の中から当該第２の雑音スペクトルに対応した第１の雑音スペクトルを選択する機能と、前記第１の入力部から得られる音声信号のスペクトルから前記選択された第１の雑音スペクトルに所定の係数を乗じた信号を減算することで雑音成分を除去する機能と、この雑音除去後の音声スペクトルを元の音声信号に逆変換する機能と、この逆変換手段によって得られた音声信号を撮影画像と共に所定のメモリに記録する機能とを実現させることを特徴とする。 The program according to claim 10 of the present invention is executed by a computer mounted on an imaging device provided with a second input unit in the vicinity of the mechanical sound source, in addition to the first input unit that inputs an audio signal. A program for storing a plurality of patterns of mechanical sounds previously collected through the first input unit in the computer as a first noise spectrum in the first storage unit; The mechanism sound spectrum obtained in advance through the input unit is stored as a second noise spectrum in association with each pattern of the first noise spectrum, and a plurality of patterns are stored in the second storage unit. Based on the spectrum of the mechanical sound obtained from the input unit, the second noise spectrum corresponding to the characteristic of the mechanical sound is selected from the second storage unit, A function of selecting a first noise spectrum corresponding to the second noise spectrum from the first storage unit, and the first noise selected from the spectrum of the audio signal obtained from the first input unit A function for removing noise components by subtracting a signal obtained by multiplying the spectrum by a predetermined coefficient, a function for inversely transforming the speech spectrum after noise removal into the original speech signal, and a speech obtained by the inverse transform means And a function of recording a signal in a predetermined memory together with a photographed image.

したがって、コンピュータが前記各機能を実現するためのプログラムを実行することにより、前記請求項１記載の発明と同様の作用効果が奏せられる。 Therefore, when the computer executes the program for realizing each function, the same effects as those of the first aspect of the invention can be achieved.

本発明によれば、音声付き動画撮影を行う場合において、音声信号のスペクトルから同じ入力特性を有する雑音スペクトルを減算することができる。これにより、撮影時に音声信号に含まれる機構音を雑音成分として適切に除去して、撮影画像と共に高品質に記録することができる。 According to the present invention, a noise spectrum having the same input characteristics can be subtracted from the spectrum of an audio signal when shooting a moving image with audio. As a result, the mechanical sound included in the audio signal at the time of shooting can be appropriately removed as a noise component and recorded together with the shot image with high quality.

以下、図面を参照して本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は本発明の撮像装置としてデジタルカメラを例にした場合の外観構成を示す図であり、図１（ａ）は主に前面の構成、同図（ｂ）は主に背面の構成を示す斜視図である。 (First embodiment)
1A and 1B are diagrams showing an external configuration when a digital camera is taken as an example of the imaging apparatus of the present invention. FIG. 1A mainly shows a front configuration, and FIG. 1B mainly shows a rear configuration. It is a perspective view.

このデジタルカメラ１は、略矩形の薄板状ボディ２の前面に、撮影レンズ３、セルフタイマランプ４、光学ファインダ窓５、ストロボ発光部６、マイクロホン部７などを有し、上面の（ユーザにとって）右端側には電源キー８及びシャッタキー９などが設けられている。 The digital camera 1 has a photographing lens 3, a self-timer lamp 4, an optical finder window 5, a strobe light emitting unit 6, a microphone unit 7 and the like on the front surface of a substantially rectangular thin plate-like body 2 on the upper surface (for the user). On the right end side, a power key 8 and a shutter key 9 are provided.

電源キー８は、電源のオン／オフ毎に操作するキーであり、シャッタキー９は、撮影時に撮影タイミングを指示するキーである。 The power key 8 is a key operated every time the power is turned on / off, and the shutter key 9 is a key for instructing a photographing timing at the time of photographing.

また、デジタルカメラ１の背面には、撮影モード（Ｒ）キー１０、再生モード（Ｐ）キー１１、光学ファインダ１２、スピーカ部１３、マクロキー１４、ストロボキー１５、メニュー（ＭＥＮＵ）キー１６、リングキー１７、セット（ＳＥＴ）キー１８、表示部１９などが設けられている。 Also, on the back of the digital camera 1, a shooting mode (R) key 10, a playback mode (P) key 11, an optical viewfinder 12, a speaker unit 13, a macro key 14, a strobe key 15, a menu (MENU) key 16, a ring A key 17, a set (SET) key 18, a display unit 19, and the like are provided.

撮影モードキー１０は、電源オフの状態から操作することで自動的に電源オンとして静止画の撮影モードに移行する一方で、電源オンの状態から繰返し操作することで、静止画モード、動画モードを循環的に設定する。静止画モードは、静止画を撮影するためのモードである。また、動画モードは、動画を撮影するためのモードであり、特に本実施形態では音声付き動画撮影が可能であるとする。 The shooting mode key 10 is operated automatically from the power-off state to automatically turn on the power and shift to the still image shooting mode. On the other hand, by repeatedly operating from the power-on state, the still image mode and the moving image mode are switched. Set cyclically. The still image mode is a mode for photographing a still image. The moving image mode is a mode for shooting a moving image. In particular, in this embodiment, it is assumed that moving image shooting with sound is possible.

前記シャッタキー９は、これらの撮影モードに共通に使用される。すなわち、静止画モードでは、シャッタキー９が押下されたときのタイミングで静止画の撮影が行われる。動画モードでは、シャッタキー９が押下されたときのタイミングで動画の撮影が開始され、シャッタキー９が再度押下されたときにその動画の撮影が終了する。 The shutter key 9 is commonly used for these photographing modes. That is, in the still image mode, a still image is taken at the timing when the shutter key 9 is pressed. In the moving image mode, shooting of a moving image is started at a timing when the shutter key 9 is pressed, and shooting of the moving image is ended when the shutter key 9 is pressed again.

再生モードキー１１は、電源オフの状態から操作することで自動的に電源オンとして再生モードに移行する。 When the playback mode key 11 is operated from the power-off state, the playback mode key 11 is automatically turned on to enter the playback mode.

マクロキー１４は、静止画の撮影モードで通常撮影とマクロ撮影とを切換える際に操作する。ストロボキー１５は、ストロボ発光部６の発光モードを切換える際に操作する。メニューキー１６は、各種メニュー項目等を選択する際に操作する。リングキー１７は、上下左右各方向への項目選択用のキーが一体に形成されたものであり、このリングキー１７の中央に位置するセットキー１８は、その時点で選択されている項目を設定する際に操作する。 The macro key 14 is operated when switching between normal shooting and macro shooting in the still image shooting mode. The strobe key 15 is operated when switching the light emission mode of the strobe light emitting unit 6. The menu key 16 is operated when selecting various menu items. The ring key 17 is integrally formed with item selection keys in the up, down, left, and right directions, and the set key 18 located in the center of the ring key 17 sets the item selected at that time. To operate.

表示部１９は、バックライト付きのカラー液晶パネルで構成されるもので、撮影モード時には電子ファインダとしてスルー画像のモニタ表示を行う一方で、再生モード時には選択した画像等を再生表示する。 The display unit 19 is composed of a color liquid crystal panel with a backlight, and displays a through image on the monitor as an electronic viewfinder in the photographing mode, and reproduces and displays the selected image and the like in the reproduction mode.

また、このデジタルカメラ１には、光学ズーム機能が備えられており、ズームキー２０ａ，２０ｂの操作により焦点距離を物理的に変化させて画像の拡大率を変更することができる。ズームキー２０ａ，２０ｂのうち、一方のズームキー２０ａはテレ端用であり、望遠側へズーム倍率を変更する場合に用いられる。他方のズームキー２０ｂはワイド端用であり、広角側へズーム倍率を変更する場合に用いられる。 Further, the digital camera 1 is provided with an optical zoom function, and the enlargement ratio of the image can be changed by physically changing the focal length by operating the zoom keys 20a and 20b. Of the zoom keys 20a and 20b, one zoom key 20a is for the telephoto end and is used when the zoom magnification is changed to the telephoto side. The other zoom key 20b is for the wide end and is used when the zoom magnification is changed to the wide angle side.

なお、図示はしないがデジタルカメラ１の底面には、記録媒体として用いられるメモリカードを着脱するためのメモリカードスロットや、外部のパーソナルコンピュータ等と接続するためのシリアルインタフェースコネクタとして、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）コネクタ等が設けられている。 Although not shown, the digital camera 1 has a memory card slot for attaching / detaching a memory card used as a recording medium, a serial interface connector for connecting to an external personal computer, etc., for example, USB (Universal). Serial Bus) connector and the like are provided.

図２はデジタルカメラ１の電子回路構成を示すブロック図である。 FIG. 2 is a block diagram showing an electronic circuit configuration of the digital camera 1.

このデジタルカメラ１には、前記撮影レンズ３を構成する図示せぬフォーカスレンズおよびズームレンズなどを含むレンズ光学系２２が光軸方向に所定の範囲内で移動可能に設けられている。このレンズ光学系２２は、モータ駆動部２１ａによって回転駆動されるモータ２１により移動する。 The digital camera 1 is provided with a lens optical system 22 including a focus lens and a zoom lens (not shown) constituting the photographing lens 3 so as to be movable within a predetermined range in the optical axis direction. The lens optical system 22 is moved by a motor 21 that is rotationally driven by a motor drive unit 21a.

なお、前記モータ２１として、ズーム倍率調整用のモータ（ズームモータ）、フォーカス調整用のモータ（フォーカスモータ）などの複数の異なるモータを含み、それぞれに対応したモータ駆動部２１ａが設けられているものとする。 The motor 21 includes a plurality of different motors such as a zoom magnification adjustment motor (zoom motor) and a focus adjustment motor (focus motor), and a motor driving unit 21a corresponding to each of them is provided. And

このモータ２１の光軸後方に撮像素子であるＣＣＤ（ｃｈａｒｇｅｃｏｕｐｌｅｄｄｅｖｉｃｅ）２３が配設されている。このＣＣＤ２３は、撮影レンズ３を通して入力される被写体の各部位からの光を受光し、その光の強度に応じた電気信号を出力する。 A CCD (charge coupled device) 23 that is an image pickup device is disposed behind the optical axis of the motor 21. The CCD 23 receives light from each part of the subject input through the photographing lens 3 and outputs an electrical signal corresponding to the intensity of the light.

基本モードである記録モード時において、ＣＣＤ２３がタイミング発生器（ＴＧ）２４、ドライバ２５によって走査駆動され、一定周期毎に結像した光像に対応する光電変換出力を１画面分出力する。このＣＣＤ２３の光電変換出力は、アナログ値の信号の状態でＲＧＢの各原色成分毎に適宜ゲイン調整された後に、サンプルホールド回路２６でサンプルホールドされ、Ａ／Ｄ変換器２７でデジタルデータに変換される。 In the recording mode, which is the basic mode, the CCD 23 is scanned and driven by a timing generator (TG) 24 and a driver 25, and outputs a photoelectric conversion output corresponding to a light image formed at regular intervals for one screen. The photoelectric conversion output of the CCD 23 is appropriately gain-adjusted for each primary color component of RGB in the state of an analog value signal, sampled and held by the sample hold circuit 26, and converted into digital data by the A / D converter 27. The

そして、画像処理回路２８において、画素補間処理及びγ補正処理を含む画像処理が行われて、デジタル値の輝度信号Ｙ及び色差信号Ｕ，Ｖ（Ｃｂ，Ｃｒ）が生成され、ＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）コントローラ２９に出力される。 Then, the image processing circuit 28 performs image processing including pixel interpolation processing and γ correction processing to generate a digital luminance signal Y and color difference signals U and V (Cb, Cr), and DMA (Direct Memory Access). ) Output to the controller 29.

ＤＭＡコントローラ２９は、画像処理回路２８の出力する輝度信号Ｙ及び色差信号Ｕ，Ｖを、同じく画像処理回路２８からの複合同期信号、メモリ書込みイネーブル信号、及びクロック信号を用いて一度ＤＭＡコントローラ２９内部のバッファに書き込み、ＤＲＡＭインタフェース（Ｉ／Ｆ）３０を介してバッファメモリとして使用されるＤＲＡＭ３１にＤＭＡ転送を行う。 The DMA controller 29 once uses the luminance signal Y and the color difference signals U and V output from the image processing circuit 28 by using the composite synchronization signal, the memory write enable signal, and the clock signal from the image processing circuit 28 once. And the DMA transfer to the DRAM 31 used as the buffer memory via the DRAM interface (I / F) 30.

制御部３２は、デジタルカメラ１全体の制御を行うものであり、ＣＰＵと、このＣＰＵで実行される動作プログラムを記憶したＲＯＭ、及びワークメモリとして使用されるＲＡＭなどを含むマイクロコンピュータにより構成される。この制御部３２は、前記輝度及び色差信号のＤＲＡＭ３１へのＤＭＡ転送終了後に、この輝度及び色差信号をＤＲＡＭインタフェース３０を介してＤＲＡＭ３１より読み出し、ＶＲＡＭコントローラ３３を介してＶＲＡＭ３４に書き込む。 The control unit 32 controls the entire digital camera 1 and is constituted by a microcomputer including a CPU, a ROM storing an operation program executed by the CPU, a RAM used as a work memory, and the like. . After the DMA transfer of the luminance and color difference signals to the DRAM 31, the control unit 32 reads the luminance and color difference signals from the DRAM 31 via the DRAM interface 30 and writes them to the VRAM 34 via the VRAM controller 33.

デジタルビデオエンコーダ３５は、前記輝度及び色差信号をＶＲＡＭコントローラ３３を介してＶＲＡＭ３４より定期的に読み出し、これらのデータを元にビデオ信号を発生して表示部１９に出力する。 The digital video encoder 35 periodically reads the luminance and color difference signals from the VRAM 34 via the VRAM controller 33, generates a video signal based on these data, and outputs the video signal to the display unit 19.

この表示部１９は、上述した如く撮影時にはモニタ表示部（電子ファインダ）として機能するもので、デジタルビデオエンコーダ３５からのビデオ信号に基づいた表示を行うことで、その時点でＶＲＡＭコントローラ３３から取込んでいる画像情報に基づく画像をリアルタイムに表示することとなる。 As described above, the display unit 19 functions as a monitor display unit (electronic finder) at the time of shooting. By performing display based on the video signal from the digital video encoder 35, the display unit 19 captures from the VRAM controller 33 at that time. An image based on the image information is displayed in real time.

このように、表示部１９にその時点での画像がモニタ画像としてリアルタイムに表示されている状態で、例えば静止画撮影を行いたいタイミングでシャッタキー９を押下操作すると、トリガ信号が発生する。 As described above, when the image at that time is displayed in real time as the monitor image on the display unit 19, for example, when the shutter key 9 is pressed at a timing at which still image shooting is desired, a trigger signal is generated.

制御部３２は、このトリガ信号に応じて、その時点でＣＣＤ２３から取込んでいる１画面分の輝度及び色差信号のＤＲＡＭ３１へのＤＭＡ転送の終了後、直ちにＣＣＤ２３からのＤＲＡＭ３１への経路を停止し、記録保存の状態に遷移する。 In response to the trigger signal, the control unit 32 immediately stops the path from the CCD 23 to the DRAM 31 immediately after the DMA transfer of the luminance and color difference signals for one screen captured from the CCD 23 to the DRAM 31 is completed. , Transition to the record storage state.

この記録保存の状態では、制御部３２がＤＲＡＭ３１に書き込まれている１フレーム分の輝度及び色差信号をＤＲＡＭインタフェース３０を介してＹ，Ｃｂ，Ｃｒの各コンポーネント毎に縦８画素×横８画素の基本ブロックと呼称される単位で読み出して、ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｃｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）回路３７に書き込み、このＪＰＥＧ回路３７でＡＤＣＴ（ＡｄａｐｔｉｖｅＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ：適応離散コサイン変換）、エントロピ符号化方式であるハフマン符号化等の処理によりデータ圧縮する。 In this recording and storage state, the control unit 32 outputs the luminance and color difference signals for one frame written in the DRAM 31 to 8 pixels × 8 pixels for each of Y, Cb, and Cr components via the DRAM interface 30. The data is read out in units called basic blocks, written in a JPEG (Joint Photographic Coding Experts Group) circuit 37, and this JPEG circuit 37 uses an ADCT (Adaptive Discrete Cosine Transform), which is an entropy coding system. Data compression is performed by processing such as conversion.

そして得た符号データを１画像のデータファイルとして該ＪＰＥＧ回路３７から読み出して記録用のメモリ３８に書き込む。このメモリ３８としては、予め本体に内蔵されたフラッシュメモリ等の内部メモリの他に、記録媒体として着脱自在に装着されるメモリカードなどを含む。１フレーム分の輝度及び色差信号の圧縮処理及びメモリ３８への全圧縮データの書込み終了に伴って、制御部３２はＣＣＤ２３からＤＲＡＭ３１への経路を再び起動する。 The obtained code data is read out from the JPEG circuit 37 as a data file of one image and written in the recording memory 38. The memory 38 includes a memory card that is detachably mounted as a recording medium in addition to an internal memory such as a flash memory built in the main body in advance. With the compression processing of the luminance and color difference signals for one frame and the completion of writing all the compressed data to the memory 38, the control unit 32 activates the path from the CCD 23 to the DRAM 31 again.

制御部３２には、さらに音声処理部３９、ＵＳＢインタフェース（Ｉ／Ｆ）４０、ストロボ駆動部４１が接続される。 The control unit 32 is further connected with an audio processing unit 39, a USB interface (I / F) 40, and a strobe driving unit 41.

音声処理部３９は、ＰＣＭ音源等の音源回路を備え、音声の録音時には前記マイクロホン部（ＭＩＣ）７より入力された音声信号をデジタル化し、所定のデータファイル形式、例えばＭＰ３（ＭＰＥＧ−１ａｕｄｉｏｌａｙｅｒ３）規格に従ってデータ圧縮して音声データファイルを作成してメモリ３８へ送出する一方、音声の再生時にはメモリ３８から読み出された音声データファイルの圧縮を解いてアナログ化し、上述したデジタルカメラ１の背面側に設けられるスピーカ部（ＳＰ）１３を通じて出力する。 The sound processing unit 39 includes a sound source circuit such as a PCM sound source, digitizes the sound signal input from the microphone unit (MIC) 7 when recording sound, and performs a predetermined data file format such as MP3 (MPEG-1 audio layer). 3) Data compression is performed in accordance with the standard to create an audio data file and send it to the memory 38. On the other hand, at the time of audio reproduction, the audio data file read from the memory 38 is uncompressed and converted into an analog signal. The sound is output through a speaker unit (SP) 13 provided on the back side.

なお、この音声処理部３９には、後述するように、マイクロホン部（ＭＩＣ）７とは別にモータ２１の近くに設置された参照マイク７ａが接続されている。この参照マイク７ａは、雑音除去用として主にモータ音を入力するための入力手段として用いられるものである。 Note that, as will be described later, a reference microphone 7 a installed near the motor 21 is connected to the audio processing unit 39 separately from the microphone unit (MIC) 7. This reference microphone 7a is mainly used as an input means for inputting motor sound for noise removal.

ＵＳＢインタフェース４０は、ＵＳＢコネクタを介して有線接続されるパーソナルコンピュータ等の他の情報端末装置との間で画像データ、その他の送受を行う場合の通信制御を行う。ストロボ駆動部４１は、撮影時に図示せぬストロボ用の大容量コンデンサを充電した上で、制御部３２からの制御に基づいてストロボ発光部６を閃光駆動する。 The USB interface 40 performs communication control when image data and other information are transmitted / received to / from another information terminal device such as a personal computer connected by wire via a USB connector. The strobe drive unit 41 charges a strobe capacitor (not shown) at the time of shooting, and then drives the strobe light emitting unit 6 to flash based on control from the control unit 32.

なお、前記キー入力部３６は、上述したシャッタキー９の他に、電源キー８、撮影モードキー１０、再生モードキー１１、マクロキー１４、ストロボキー１５、メニューキー１６、リングキー１７、セットキー１８、ズームキー２０ａ，２０ｂなどから構成され、それらのキー操作に伴う信号は直接制御部３２へ送出される。 In addition to the shutter key 9 described above, the key input unit 36 includes a power key 8, a shooting mode key 10, a playback mode key 11, a macro key 14, a strobe key 15, a menu key 16, a ring key 17, and a set key. 18, zoom keys 20 a and 20 b and the like, and signals accompanying these key operations are sent directly to the control unit 32.

また、静止画像ではなく動画像の撮影時においては、シャッタキー９が押下操作されたときに、上述したＪＰＥＧ回路３７によりｍｏｔｉｏｎ−ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）などの手法により撮影動画をデータ圧縮してメモリ３８へ記録する。この場合、音声付き動画撮影であれば、その撮影中にマイクロホン部（ＭＩＣ）７より入力された音声信号が動画データと共に前記メモリ３８に記録されることになる。再度シャッタキー９が操作されると、動画データの記録を終了する。 Further, when shooting a moving image instead of a still image, when the shutter key 9 is pressed, the above-described JPEG circuit 37 compresses the captured moving image using a technique such as motion-JPEG (Joint Photographic Experts Group). To the memory 38. In this case, in the case of moving image shooting with audio, the audio signal input from the microphone unit (MIC) 7 during the shooting is recorded in the memory 38 together with the moving image data. When the shutter key 9 is operated again, the recording of the moving image data is finished.

一方、基本モードである再生モード時には、制御部３２がメモリ３８に記録されている画像データを選択的に読み出し、ＪＰＥＧ回路３７で記録モード時にデータ圧縮した手順と全く逆の手順で、圧縮されている画像データを伸長する。そして、この伸長した画像データをＤＲＡＭインタフェース３０を介してＤＲＡＭ３１に保持させた上で、このＤＲＡＭ３１の保持内容をＶＲＡＭコントローラ３３を介してＶＲＡＭ３４に記憶させ、このＶＲＡＭ３４より定期的に画像データを読み出してビデオ信号を発生し、表示部１９で再生出力させる。 On the other hand, in the playback mode which is the basic mode, the control unit 32 selectively reads out the image data recorded in the memory 38 and is compressed by a procedure completely opposite to the procedure of data compression in the recording mode by the JPEG circuit 37. Decompress image data. The decompressed image data is held in the DRAM 31 via the DRAM interface 30, and then the content held in the DRAM 31 is stored in the VRAM 34 via the VRAM controller 33. The image data is periodically read out from the VRAM 34. A video signal is generated and reproduced and output by the display unit 19.

選択した画像データが静止画像ではなく動画像であった場合には、その動画データを構成する複数フレームの静止画データを時系列の順で順次再生して表示し、すべての静止画データの再生を終了した時点で、例えば、次に再生の指示がなされるまで先頭に位置する静止画データを表示するなどを行う。その際、当該動画データに音声データが含まれていれば、その音声データがスピーカ部（ＳＰ）１３を通じて出力されることになる。 If the selected image data is not a still image but a moving image, the multiple frames of still image data that make up the moving image data are played back and displayed sequentially in chronological order, and all the still image data is played back. For example, the top still image data is displayed until the next playback instruction is given. At this time, if the moving image data includes audio data, the audio data is output through the speaker unit (SP) 13.

次に、このデジタルカメラ１に用いられる雑音除去機能を備えた音声記録装置について説明する。 Next, an audio recording apparatus having a noise removal function used in the digital camera 1 will be described.

図３は本発明の第１の実施形態に係るデジタルカメラ１に用いられる雑音除去機能を備えた音声記録装置の構成を示すブロック図である。 FIG. 3 is a block diagram showing a configuration of an audio recording apparatus having a noise removal function used in the digital camera 1 according to the first embodiment of the present invention.

この音声記録装置は、主としてデジタルカメラ１の音声付き動画撮影に用いられるものであり、その撮影中に音声信号に混入するズーム音やフォーカス音などの機構音を雑音として除去する機能を備えている。 This sound recording apparatus is mainly used for moving image shooting with sound of the digital camera 1 and has a function of removing mechanical sounds such as zoom sound and focus sound mixed in sound signals during the shooting as noise. .

第１の実施形態において、この音声記録装置は、モータ２１、モータ駆動部２１ａ、制御部３２、キー入力部３６、音声入力部５１、フレーム分割部５２、フーリエ変換部５３、第１のスペクトル記憶部５４、サブトラクト部５５、スペクトル切り替え部５６、逆フーリエ変換部５７、波形合成部５８を備える。さらに、別系統として、参照入力部６１、フレーム分割部６２、フーリエ変換部６３、第２のスペクトル記憶部６４、類似度計算部６５、スペクトル選択部６６を備える。 In the first embodiment, the voice recording apparatus includes a motor 21, a motor drive unit 21a, a control unit 32, a key input unit 36, a voice input unit 51, a frame division unit 52, a Fourier transform unit 53, and a first spectrum storage. Unit 54, subtracting unit 55, spectrum switching unit 56, inverse Fourier transform unit 57, and waveform synthesis unit 58. Furthermore, as another system, a reference input unit 61, a frame division unit 62, a Fourier transform unit 63, a second spectrum storage unit 64, a similarity calculation unit 65, and a spectrum selection unit 66 are provided.

なお、前記各構成部のうち、５１〜５８、６１〜６６の部分は図２に示したデジタルカメラ１の音声処理部３９に含まれる。 Of the components, 51 to 58 and 61 to 66 are included in the audio processing unit 39 of the digital camera 1 shown in FIG.

モータ２１はズームレンズなどのレンズ光学系２２を光軸方向に移動させるためのモータであり、モータ駆動部２１ａはそのモータ２１を回転駆動させるための駆動機構である。 The motor 21 is a motor for moving the lens optical system 22 such as a zoom lens in the optical axis direction, and the motor drive unit 21a is a drive mechanism for driving the motor 21 to rotate.

制御部３２は、キー入力部３６に含まれるズームキー２０ａ，２０ｂなどの操作信号を受けてモータ駆動制御信号をモータ駆動部２１ａに出力すると共に、ここでは、音声付き動画撮影中にモータ２１の駆動タイミングに基づいてスペクトル切り換え部５６を制御する機能を備える。 The control unit 32 receives an operation signal from the zoom keys 20a and 20b included in the key input unit 36 and outputs a motor drive control signal to the motor drive unit 21a. Here, the drive of the motor 21 is performed during video recording with sound. A function of controlling the spectrum switching unit 56 based on the timing is provided.

一方、音声入力部５１は、図１に示すデジタルカメラ１の機器筐体上に設置されたマイクロホン部７を主マイクとして含み、この主マイクを通じて入力される音声信号を主信号としてフレーム分割部５２に与える。この場合、音声付き動画撮影中に例えばズーム操作が行われると、そのズーム操作に伴って発生するモータ音（ズーム音）が音声入力部５１を通じて音声信号と共に入り込むことになる。 On the other hand, the audio input unit 51 includes a microphone unit 7 installed on the device housing of the digital camera 1 shown in FIG. 1 as a main microphone, and a frame dividing unit 52 using an audio signal input through the main microphone as a main signal. To give. In this case, for example, when a zoom operation is performed during moving image recording with audio, a motor sound (zoom sound) generated along with the zoom operation enters through the audio input unit 51 together with the audio signal.

フレーム分割部５２は、この音声入力部５１によって入力された音声信号（主信号）を所定時間分のフレーム単位で分割する。フーリエ変換部５３は、このフレーム分割部５２によってフレーム単位で分割された音声信号をフーリエ変換し、周波数毎のパワーを示したスペクトル信号（Ｉａ）に変換する。 The frame dividing unit 52 divides the audio signal (main signal) input by the audio input unit 51 in units of frames for a predetermined time. The Fourier transform unit 53 performs Fourier transform on the audio signal divided by the frame unit by the frame division unit 52 and converts it into a spectrum signal (Ia) indicating the power for each frequency.

第１のスペクトル記憶部５４には、音声入力部５１（主マイク）を通じて事前に採取したモータ音のスペクトル信号が第１の雑音スペクトル信号として複数パターン記憶されている（図４参照）。 In the first spectrum storage unit 54, a plurality of patterns of motor sound spectrum signals collected in advance through the voice input unit 51 (main microphone) are stored as first noise spectrum signals (see FIG. 4).

サブトラクト部５５は、フーリエ変換部５３によって得られた入力音声スペクトル信号（Ｉａ）と、第１のスペクトル記憶部５４の中から選択された雑音スペクトル信号（Ｘｖ）に基づいて、ＳＳ（ｓｐｅｃｔｒａｌｓｕｂｔｒａｃｔｉｏｎ）法による雑音除去処理を行う。 Based on the input speech spectrum signal (Ia) obtained by the Fourier transform unit 53 and the noise spectrum signal (Xv) selected from the first spectrum storage unit 54, the subtracting unit 55 performs SS (spectral subtraction). The noise removal process by the method is performed.

詳しくは、入力音声スペクトル信号（Ｉａ）から雑音スペクトル信号（Ｘｖ）に所定のサブトラクト係数αを乗じた信号を減算することで、音声信号に含まれる雑音成分を除去する処理を行う。スペクトル切り替え部５６は、フーリエ変換部５３によって得られた入力音声スペクトル信号（Ｉａ）と、このサブトラクト部５５によって得られる雑音除去後の音声スペクトル信号（Ｉｂ）を制御部３２から出力される選択信号によって切り替えて逆フーリエ変換部５７に与える。 Specifically, a process of removing a noise component contained in the audio signal is performed by subtracting a signal obtained by multiplying the noise spectrum signal (Xv) by a predetermined subtract coefficient α from the input audio spectrum signal (Ia). The spectrum switching unit 56 selects the input speech spectrum signal (Ia) obtained by the Fourier transform unit 53 and the speech spectrum signal (Ib) after noise removal obtained by the subtracting unit 55 from the control unit 32. And are given to the inverse Fourier transform unit 57.

逆フーリエ変換部５７は、スペクトル切り替え部５６を通じて入力された入力音声スペクトル信号（Ｉａ）または雑音除去後の音声スペクトル信号（Ｉｂ）を逆フーリエ変換して元のフレーム単位毎の音声信号に戻す。 The inverse Fourier transform unit 57 performs inverse Fourier transform on the input speech spectrum signal (Ia) input through the spectrum switching unit 56 or the speech spectrum signal (Ib) after noise removal, and returns the original speech signal for each frame unit.

波形合成部５８は、この逆フーリエ変換部５７によって得られるフレーム単位毎の音声信号を合成することで時系的に連続した音声信号に復元する。この音声信号は、最終的な記録用の音声信号として用いられ、デジタルカメラ１の撮像系から得られる動画データと共に図２に示したメモリ３８に記録される。 The waveform synthesizing unit 58 synthesizes the audio signal for each frame obtained by the inverse Fourier transform unit 57 to restore the audio signal continuous in time. This audio signal is used as an audio signal for final recording, and is recorded in the memory 38 shown in FIG. 2 together with moving image data obtained from the imaging system of the digital camera 1.

また、参照入力部６１は、主としてモータ音を集音するための参照マイク７ａを含み、この参照マイク７ａを通じて入力されるモータ音のみの信号を参照信号としてフレーム分割部５２に与える。参照マイク７ａは、主マイクであるマイクロホン部（ＭＩＣ）７とは別に機器筐体内のモータ２１の近傍に設置されており、そのモータ駆動時に発生するモータ音のみを入力する。 Further, the reference input unit 61 mainly includes a reference microphone 7a for collecting motor sound, and gives a signal of only motor sound input through the reference microphone 7a to the frame dividing unit 52 as a reference signal. The reference microphone 7a is installed in the vicinity of the motor 21 in the device casing separately from the microphone unit (MIC) 7 as the main microphone, and inputs only the motor sound generated when the motor is driven.

フレーム分割部６２は、この参照入力部６１によって入力されたモータ音のみの信号（参照信号）を所定時間分のフレーム単位で分割する。フーリエ変換部６３は、このフレーム分割部６２によってフレーム単位で分割されたモータ音信号をフーリエ変換し、周波数毎のパワーを示したスペクトル信号（Ｒｖ）に変換する。 The frame dividing unit 62 divides the motor sound only signal (reference signal) input by the reference input unit 61 in units of frames for a predetermined time. The Fourier transform unit 63 performs a Fourier transform on the motor sound signal divided by the frame unit by the frame division unit 62 and converts the signal into a spectrum signal (Rv) indicating the power for each frequency.

第２のスペクトル記憶部６４には、参照入力部６１（参照マイク）を通じて事前に採取したモータ音のスペクトル信号が第２の雑音スペクトル信号として複数パターン記憶されている（図５参照）。 The second spectrum storage unit 64 stores a plurality of patterns of motor sound spectrum signals collected in advance through a reference input unit 61 (reference microphone) as second noise spectrum signals (see FIG. 5).

前記第１のスペクトル記憶部５４に記憶された第１の雑音スペクトル信号と、前記第２のスペクトル記憶部６４に記憶された第２の雑音スペクトル信号との違いは、第１の雑音スペクトル信号は音声入力部５１つまり主マイク（マイクロホン部７）を通じて得られるモータ音の入力特性を有し、第２の雑音スペクトルは参照入力部６１つまり参照マイク７ａを通じて得られるモータ音の入力特性を有することである。 The difference between the first noise spectrum signal stored in the first spectrum storage unit 54 and the second noise spectrum signal stored in the second spectrum storage unit 64 is that the first noise spectrum signal is By having the input characteristic of the motor sound obtained through the voice input unit 51, that is, the main microphone (microphone unit 7), the second noise spectrum has the input characteristic of the motor sound obtained through the reference input unit 61, that is, the reference microphone 7a. is there.

類似度計算部６５は、参照入力部６１を通じて得られた入力モータ音スペクトル信号（Ｒｖ）と、予め第２のスペクトル記憶部６４に記憶されている第２の雑音スペクトル信号の各パターンとの類似度を計算する。 The similarity calculation unit 65 is similar to the input motor sound spectrum signal (Rv) obtained through the reference input unit 61 and each pattern of the second noise spectrum signal stored in the second spectrum storage unit 64 in advance. Calculate the degree.

スペクトル選択部６６は、類似度計算部６５による類似度計算結果に基づき、第２のスペクトル記憶部６４の中で最も類似度の高い第２の雑音スペクトル信号を選択すると共に、第１のスペクトル記憶部５４の中から当該雑音スペクトル信号に対応した第１の雑音スペクトル信号を選択する。このとき選択された雑音スペクトル信号は、音声信号と同じ入力特性を有する最適な雑音スペクトル信号（Ｘｖ）としてサブトラクト部５５に与えられて、ＳＳ法による雑音除去処理に用いられる。 The spectrum selection unit 66 selects the second noise spectrum signal having the highest similarity in the second spectrum storage unit 64 based on the similarity calculation result by the similarity calculation unit 65, and also stores the first spectrum storage. A first noise spectrum signal corresponding to the noise spectrum signal is selected from the unit 54. The noise spectrum signal selected at this time is given to the subtractor 55 as an optimum noise spectrum signal (Xv) having the same input characteristics as the voice signal, and is used for noise removal processing by the SS method.

次に、第１の実施形態の動作について説明する。 Next, the operation of the first embodiment will be described.

今、音声付き動画撮影を行っている最中に、例えばユーザがキー入力部３６に含まれるズームキー２０ａ，２０ｂを操作したとする。 Now, assume that, for example, the user operates the zoom keys 20a and 20b included in the key input unit 36 while shooting a moving image with sound.

デジタルカメラ全体の動作を制御する制御部３２は、キー入力部３６に含まれるズームキー２０ａ，２０ｂのズーム操作信号を入力すると、モータ駆動部２１ａに対して駆動開始信号を送る。モータ駆動部２１ａは、この駆動開始信号を受けてモータ２１を回転駆動する。このモータ２１の回転に伴い、図２のレンズ光学系２２に含まれる図示せぬズームレンズが光軸上に移動してズーム倍率が変化する。 When the control unit 32 that controls the operation of the entire digital camera inputs zoom operation signals of the zoom keys 20a and 20b included in the key input unit 36, it sends a drive start signal to the motor drive unit 21a. The motor drive unit 21a receives the drive start signal and rotationally drives the motor 21. As the motor 21 rotates, a zoom lens (not shown) included in the lens optical system 22 shown in FIG. 2 moves on the optical axis and the zoom magnification changes.

また、ユーザがズーム操作を終了すると、制御部３２はモータ駆動部２１ａに対して駆動停止信号を送る。これにより、モータ２１の回転駆動が停止し、ズーム動作が終了する。 When the user finishes the zoom operation, the control unit 32 sends a drive stop signal to the motor drive unit 21a. Thereby, the rotational drive of the motor 21 is stopped and the zoom operation is finished.

ここで、音声付き動画の撮影中は、常に主マイク（マイクロホン部７）による音声入力機能がＯＮ状態にある。このため、前記ズーム操作に伴って発生するモータ音が入力音声の中に雑音として混入する問題がある。このようなモータ音を音声信号から除去して記録するべく、以下のような処理が行われる。 Here, during shooting of a moving image with sound, the sound input function by the main microphone (microphone unit 7) is always ON. For this reason, there is a problem that the motor sound generated by the zoom operation is mixed as noise in the input voice. In order to remove such motor noise from the audio signal and record it, the following processing is performed.

すなわち、まず、雑音除去対象となるモータ音（機構音）のスペクトル信号を事前に採取しておき、第１のスペクトル記憶部５４および第２のスペクトル記憶部６４に記憶しておく。以下では、ズーム操作時に発生するモータ音つまりズーム音を雑音除去対象として説明する。 That is, first, a spectrum signal of a motor sound (mechanism sound) to be subjected to noise removal is collected in advance and stored in the first spectrum storage unit 54 and the second spectrum storage unit 64. In the following, a motor sound generated during zoom operation, that is, a zoom sound will be described as a noise removal target.

ズーム音の採取方法は、無音状態でズーム操作を行い、そのときに発生するズーム音のみを主マイク（マイクロホン部７）である音声入力部５１と、参照マイク７ａである参照入力部６１からそれぞれ入力することで行う。 The zoom sound is collected by performing a zoom operation in a silent state, and only the zoom sound generated at that time is transmitted from the audio input unit 51 as the main microphone (microphone unit 7) and the reference input unit 61 as the reference microphone 7a. Do this by typing.

この場合、ズーム音はモータ２１の駆動タイミングにより駆動開始点、駆動中間点、駆動停止点でそれぞれ異なる特性を有する。また、モータ２１を広角方向に回転駆動する場合と望遠方向に駆動する場合とでもズーム音の特性が異なってくる。 In this case, the zoom sound has different characteristics depending on the drive timing of the motor 21 at the drive start point, the drive intermediate point, and the drive stop point. Also, the zoom sound characteristics differ between when the motor 21 is rotated in the wide-angle direction and when it is driven in the telephoto direction.

そこで、少なくとも、モータ２１を広角方向に回転駆動した場合の駆動開始点でのズーム音「ＺＯ１」、駆動中間点でのズーム音「ＺＯ２」、駆動停止点でのズーム音「ＺＯ３」と、モータ２１を望遠方向に回転駆動した場合の駆動開始点でのズーム音「ＺＩ１」、駆動中間点でのズーム音「ＺＩ２」，駆動停止点でのズーム音「ＺＩ３」の計６種類のパターンを採取するものとする。 Therefore, at least the zoom sound “ZO1” at the drive start point when the motor 21 is rotationally driven in the wide-angle direction, the zoom sound “ZO2” at the drive intermediate point, the zoom sound “ZO3” at the drive stop point, and the motor A total of six patterns are collected: zoom sound “ZI1” at the drive start point, zoom sound “ZI2” at the drive intermediate point, and zoom sound “ZI3” at the drive stop point when the 21 is rotated in the telephoto direction. It shall be.

第１のスペクトル記憶部５４への記憶は、ズーム音のみの信号を音声入力部５１から入力し、フレーム分割部５２により数１０ｍｓ程度のフレーム区間に切り出し、これをフーリエ変換部５３によりスペクトル信号に変換する。このスペクトル信号を上述した駆動開始点、駆動中間点、駆動停止点の３つの期間を対象にして、それぞれモータ２１を広角方向に回転駆動した場合と、望遠方向に回転駆動した場合について算出して、各期間毎にレベルの平均値を求める。 The first spectrum storage unit 54 stores a zoom sound only signal from the audio input unit 51, cuts out a frame section of about several tens of ms by the frame division unit 52, and converts this into a spectrum signal by the Fourier transform unit 53. Convert. The spectrum signal is calculated for the three periods of the drive start point, drive intermediate point, and drive stop point described above for the case where the motor 21 is rotated in the wide angle direction and the case where it is rotated in the telephoto direction. The average value of the level is obtained for each period.

このようにして、モータ２１を広角方向に回転駆動した場合と、望遠方向に回転駆動した場合にそれぞれ得られた第１の雑音スペクトル信号の各パターンをＭＺＯ１、ＭＺＯ，ＭＺＯ３，ＭＺＩ１，ＭＺＩ２，ＭＺＩ３として、図４に示すように第１のスペクトル記憶部４１に記憶させておく。 In this way, the patterns of the first noise spectrum signal obtained when the motor 21 is rotationally driven in the wide-angle direction and when it is rotationally driven in the telephoto direction are represented as MZO1, MZO, MZO3, MZI1, MZI2, and MZOI3. As shown in FIG. 4, it is stored in the first spectrum storage unit 41.

同様にして、第２のスペクトル記憶部６４への記憶は、ズーム音のみの信号を参照入力部６１から入力し、フレーム分割部６２により数１０ｍｓ程度のフレーム区間に切り出し、これをフーリエ変換部５３によりスペクトル信号に変換する。このスペクトル信号を上述した駆動開始点、駆動中間点、駆動停止点の３つの期間を対象にして、それぞれモータ２１を広角方向に回転駆動した場合と、望遠方向に回転駆動した場合について算出して、各期間毎にレベルの平均値を求める。 Similarly, the second spectrum storage unit 64 stores a zoom sound-only signal from the reference input unit 61, cuts it out into a frame interval of about several tens of ms by the frame division unit 62, and converts this into a Fourier transform unit 53. Is converted into a spectrum signal. The spectrum signal is calculated for the three periods of the drive start point, drive intermediate point, and drive stop point described above for the case where the motor 21 is rotated in the wide angle direction and the case where it is rotated in the telephoto direction. The average value of the level is obtained for each period.

このようにして、モータ２１を広角方向に回転駆動した場合と、望遠方向に回転駆動した場合にそれぞれ得られた第２の雑音スペクトル信号の各パターンをＲＺＯ１，ＲＺＯ２，ＲＺＯ３、ＲＺＩ１，ＲＺＩ２，ＲＺＩ３として、図５に示すように第２のスペクトル記憶部６４に記憶させておく。 In this way, the patterns of the second noise spectrum signals obtained when the motor 21 is rotationally driven in the wide-angle direction and when it is rotationally driven in the telephoto direction are RZO1, RZO2, RZO3, RZI1, RZI2, and RZI3. As shown in FIG. 5, it is stored in the second spectrum storage unit 64.

この場合、制御部３２では、第１のスペクトル記憶部５４に記憶されたＭＺＯ１〜ＭＺＯ３，ＭＺＩ１〜ＭＺＩ３の各パターンと、第２のスペクトル記憶部６４に記憶されたＲＺＯ１〜ＲＺＯ３、ＲＺＩ１〜ＲＺＩ３の各パターンを図示せぬ管理テーブルなどを用いて関連付けて管理している。 In this case, in the control unit 32, each pattern of MZO1 to MZO3 and MZI1 to MZOI3 stored in the first spectrum storage unit 54 and RZO1 to RZO3 and RZI1 to RZI3 stored in the second spectrum storage unit 64 are stored. Each pattern is managed in association with a management table (not shown).

ここで、ズーム操作が行われていない状態では、制御部３２はフーリエ変換部５３から得られる入力音声スペクトル信号（Ｉａ）を選択するようにスペクトル切り替え部５６を切り替え制御する。これにより、逆フーリエ変換部５７および波形合成部５８を通じて入力音声信号がそのまま出力されることになる。 Here, when the zoom operation is not performed, the control unit 32 switches and controls the spectrum switching unit 56 so as to select the input audio spectrum signal (Ia) obtained from the Fourier transform unit 53. As a result, the input audio signal is output as it is through the inverse Fourier transform unit 57 and the waveform synthesis unit 58.

一方、制御部３２はキー入力部３６からのズーム操作信号に基づいてズーム操作が開始されたことを判断すると、モータ２１（ここではズームモータ）の駆動開始と同時にサブトラクト部５５から得られる雑音除去後の音声スペクトル信号（Ｉｂ）を選択するようにスペクトル切り替え部５６を切り替え制御する。 On the other hand, when the control unit 32 determines that the zoom operation is started based on the zoom operation signal from the key input unit 36, the noise removal obtained from the subtracting unit 55 simultaneously with the start of driving of the motor 21 (here, the zoom motor). The spectrum switching unit 56 is controlled so as to select a later audio spectrum signal (Ib).

上述したように、ズーム操作を行っているとき、音声入力部５１には音声信号に加えて、そのときに発生するモータ音が主マイク（マイクロホン部７）を通じて入力されている。このため、フーリエ変換部５３からは入力音声のスペクトルとモータ音のスペクトルが混合した入力音声スペクトル信号（Ｉａ）が出力される。 As described above, during the zoom operation, in addition to the audio signal, the audio sound generated at that time is input to the audio input unit 51 through the main microphone (microphone unit 7). Therefore, the Fourier transform unit 53 outputs an input sound spectrum signal (Ia) in which the spectrum of the input sound and the spectrum of the motor sound are mixed.

一方、参照入力部６１には、機器筐体内に参照マイク７ａが設置されていることから、基本的にモータ音のみが入力される。したがって、フーリエ変換部６３の出力はモータ音のみのスペクトル信号（Ｒｖ）になっている。 On the other hand, only the motor sound is basically input to the reference input unit 61 because the reference microphone 7a is installed in the device casing. Therefore, the output of the Fourier transform unit 63 is a spectrum signal (Rv) of only motor sound.

ここで、類似度計算部６５において、フーリエ変換部６３によって得られたモータ音のスペクトル信号（Ｒｖ）と、第２のスペクトル記憶部６４に予め記憶された第２の雑音スペクトルの各パターンＲＺＯ１〜ＲＺＯ３，ＲＺＩ１〜ＲＺＩ３との類似度が計算される。 Here, in the similarity calculation unit 65, the spectrum signal (Rv) of the motor sound obtained by the Fourier transform unit 63 and each pattern RZO <b> 1 of the second noise spectrum stored in advance in the second spectrum storage unit 64. Similarities with RZO3, RZI1-RZI3 are calculated.

類似度の一例として、入力モータ音スペクトル信号（Ｒｖ）の絶対値を｜ＲＣ（ω）｜、第２のスペクトル記憶部６４に記憶されている第２の雑音スペクトル信号の絶対値を｜ＲＭ（ω）｜とすると、それらの二乗平均誤差をＭＳＥとした場合に、そのＭＳＥの逆数が類似度となる。 As an example of the similarity, the absolute value of the input motor sound spectrum signal (Rv) is | RC (ω) |, and the absolute value of the second noise spectrum signal stored in the second spectrum storage unit 64 is | RM ( ω) |, where the root mean square error is MSE, the reciprocal of the MSE is the similarity.

ＭＳＥ＝Σ［｛｜ＲＣ（ω）｜−｜ＲＭ（ω）｜｝
＊｛｜ＲＣ（ω）｜−｜ＲＭ（ω）｜｝］ …（１）
Σは角周波数ω＝０〜πまでの積算値
π＝サンプリング周波数の１／２
類似度＝１／ＭＳＥ …（２）
前記式（１），（２）によれば、ＲＣ（ω）とＲＭ（ω）が似ていれば、ＭＳＥの値は小さくなり、類似度は大きくなる。逆に、ＲＣ（ω）とＲＭ（ω）が似ていなければ、ＭＳＥは大きくなり、類似度は小さくなる。 MSE = Σ [{| RC (ω) | − | RM (ω) |}
* {| RC (ω) |-| RM (ω) |}] (1)
Σ is the integrated value from angular frequency ω = 0 to π
π = 1/2 of sampling frequency
Similarity = 1 / MSE (2)
According to the above formulas (1) and (2), if RC (ω) and RM (ω) are similar, the value of MSE decreases and the degree of similarity increases. Conversely, if RC (ω) and RM (ω) are not similar, MSE increases and the similarity decreases.

ここで、例えばＲＺＯ１〜ＲＺＯ３，ＲＺＩ１〜ＲＺＩ３の中で最も類似度の高いパターンをＲＺＯ１とすると、スペクトル選択部６６は第１のスペクトル記憶部５４の中からＲＺＯ１に対応するＭＺＯ１を最適な雑音スペクトル信号（Ｘｖ）として選択する。ＲＺＯ１とＭＺＯ１は、モータ２１を広角方向に回転駆動した場合の駆動開始点で事前に採取したモータ特性を有する。 Here, for example, assuming that the pattern having the highest similarity among RZO1 to RZO3, RZO1 to RZOI3 is RZO1, the spectrum selecting unit 66 selects the MZO1 corresponding to RZO1 from the first spectrum storage unit 54 as the optimum noise spectrum. Select as signal (Xv). RZO1 and MZO1 have motor characteristics obtained in advance at the drive start point when the motor 21 is rotationally driven in the wide-angle direction.

つまり、例えばモータ２１を広角方向に回転駆動するようなズーム操作が行われた場合には、モータ２１の駆動開始が定常回転に達するまでの期間ではＲＺＯ１→ＭＺＯ１、定常回転期間ではＲＺＯ２→ＭＺＯ２、駆動停止から実際にモータ２１の回転が停止するまでの期間ではＲＺＯ３→ＭＺＯ３といったように、第１のスペクトル記憶部５４の中から第２の雑音スペクトル信号に対応した第１の雑音スペクトル信号が現時点でのモータ特性に応じた最適な雑音スペクトルとして順次選択されてサブトラクト部５５に与えられることになる。 That is, for example, when a zoom operation that rotates the motor 21 in the wide-angle direction is performed, RZO1 → MZO1 during the period until the driving start of the motor 21 reaches steady rotation, and RZO2 → MZO2 during the steady rotation period, In the period from when the drive is stopped until the rotation of the motor 21 is actually stopped, the first noise spectrum signal corresponding to the second noise spectrum signal is present from the first spectrum storage unit 54, such as RZO3 → MZO3. The optimum noise spectrum corresponding to the motor characteristics is sequentially selected and provided to the subtractor 55.

サブトラクト部５５では、フーリエ変換部５３によって得られた入力音声スペクトル信号（Ｉａ）と第１のスペクトル記憶部５４の中から選択された雑音スペクトル信号（Ｘｖ）に基づいてＳＳ法による雑音除去処理を行う。 The subtractor 55 performs noise removal processing by the SS method based on the input speech spectrum signal (Ia) obtained by the Fourier transform unit 53 and the noise spectrum signal (Xv) selected from the first spectrum storage unit 54. Do.

この雑音除去処理について、図６を参照して詳しく説明する。 This noise removal processing will be described in detail with reference to FIG.

図６はＳＳ法（スペクトルサブトラクション法）を用いた雑音除去処理を説明するための図である。図６（ａ）は入力音声の波形データ、同図（ｂ）はこの入力音声をフレーム単位でフーリエ変換して得られた音声スペクトル信号（Ｉａ）である。 FIG. 6 is a diagram for explaining noise removal processing using the SS method (spectral subtraction method). 6A shows the waveform data of the input speech, and FIG. 6B shows the speech spectrum signal (Ia) obtained by Fourier transforming the input speech in units of frames.

また、同図（ｃ）は雑音除去用に採取したモータ音のスペクトルつまり雑音スペクトル信号（Ｘｖ）、同図（ｄ）はその雑音スペクトル信号（Ｘｖ）に所定のサブトラクト係数αを乗じた信号である。同図（ｅ）は入力音声スペクトル信号（Ｉａ）から係数乗算後の雑音スペクトル信号（Ｘｖ）を減算して得られるスペクトル信号つまり雑音除去後の音声スペクトル信号（Ｉｂ）である。同図（ｆ）はその雑音除去後の音声スペクトル信号（Ｉｂ）を逆フーリエ変換して得られた音声信号、同図（ｇ）はフレーム単位で分割された音声信号を時系列に合成して元の音声波形に戻した状態を示している。 FIG. 6C shows the spectrum of the motor sound collected for noise removal, that is, the noise spectrum signal (Xv), and FIG. 8D shows the signal obtained by multiplying the noise spectrum signal (Xv) by a predetermined subtract coefficient α. is there. FIG. 4E shows a spectrum signal obtained by subtracting the noise spectrum signal (Xv) after coefficient multiplication from the input voice spectrum signal (Ia), that is, the voice spectrum signal (Ib) after noise removal. Fig. 8 (f) shows an audio signal obtained by inverse Fourier transform of the audio spectrum signal (Ib) after the noise removal, and Fig. 10 (g) shows an audio signal divided in units of frames synthesized in time series. It shows a state in which the original sound waveform is restored.

今、図６（ａ）に示すような波形を有する音声信号が音声入力部５１（主マイク）に入力されたとする。この音声信号には、例えばズーム操作に伴って発生するモータ音つまりズーム音が雑音として混入されている。 Assume that an audio signal having a waveform as shown in FIG. 6A is input to the audio input unit 51 (main microphone). In this audio signal, for example, a motor sound generated by a zoom operation, that is, a zoom sound is mixed as noise.

まず、フレーム分割部５２において、例えば１０ｍｓ程度のフレーム区間で音声信号を切り出し、同図（ｂ）に示すように、フーリエ変換部５３にて周波数毎のパワーを表した入力音声スペクトル信号（Ｉａ）を生成する。 First, in the frame dividing unit 52, an audio signal is cut out in a frame section of, for example, about 10 ms, and an input audio spectrum signal (Ia) representing the power for each frequency in the Fourier transform unit 53 as shown in FIG. Is generated.

ここで、参照入力部６１（参照マイク）から入力されたモータ音のみのスペクトル信号（Ｒｖ）に基づいて、第２のスペクトル記憶部６４の中から最も類似度の高い第２の雑音スペクトル信号が選択され、さらに、その選択された雑音スペクトル信号に対応した第１の雑音スペクトル信号が第１のスペクトル記憶部５４の中からモータ特性に応じた最適な雑音スペクトル信号（Ｘｖ）として選択される。 Here, based on the spectrum signal (Rv) of only the motor sound input from the reference input unit 61 (reference microphone), the second noise spectrum signal having the highest similarity is selected from the second spectrum storage unit 64. Further, the first noise spectrum signal corresponding to the selected noise spectrum signal is selected from the first spectrum storage unit 54 as the optimum noise spectrum signal (Xv) corresponding to the motor characteristics.

そして、同図（ｃ）〜（ｅ）に示すように、サブトラクト部５５では、入力音声スペクトル信号（Ｉａ）から最適な雑音スペクトル信号（Ｘｖ）に所定のサブトラクト係数αを乗じた信号を減算することにより、雑音除去後の音声スペクトル信号（Ｉｂ）を得る。 Then, as shown in FIGS. 5C to 5E, the subtracting unit 55 subtracts a signal obtained by multiplying the optimum noise spectrum signal (Xv) by a predetermined subtract coefficient α from the input speech spectrum signal (Ia). Thus, the speech spectrum signal (Ib) after noise removal is obtained.

なお、前記サブトラクト係数αは、第１のスペクトル記憶部５４に記憶される第１の雑音スペクトル信号のレベルに応じて予め決められており、通常、“１”以上の値である。 The subtract coefficient α is determined in advance according to the level of the first noise spectrum signal stored in the first spectrum storage unit 54, and is usually a value of “1” or more.

制御部３２では、ズーム操作が行われている間、つまり、ズームモータであるモータ２１の駆動期間中（モータ駆動開始〜駆動停止までの期間）において、前記サブトラクト部５５から得られる雑音除去後の音声スペクトル信号（Ｉｂ）を選択するようにスペクトル切り替え部５６を制御する。 In the control unit 32, during the zoom operation, that is, during the drive period of the motor 21 that is a zoom motor (a period from the start of motor drive to the stop of drive), after noise removal obtained from the subtractor 55 is performed. The spectrum switching unit 56 is controlled to select the audio spectrum signal (Ib).

図６（ｆ）に示すように、この雑音除去後の音声スペクトル信号（Ｉｂ）は逆フーリエ変換部５７にて逆フーリエ変換される。そして、同図（ｇ）に示すように、波形合成部５８にて各フレーム毎の音声信号が時系列に合成処理されて、元のアナログ波形信号である音声信号に復元される。この音声信号は、雑音除去後の音声信号として動画撮影中に画像データと共にメモリ３８に記録される。 As shown in FIG. 6 (f), the speech spectrum signal (Ib) after this noise removal is subjected to inverse Fourier transform by an inverse Fourier transform unit 57. Then, as shown in FIG. 5G, the waveform synthesizing unit 58 synthesizes the audio signal for each frame in time series and restores the original analog waveform signal as the audio signal. This sound signal is recorded in the memory 38 together with image data during moving image shooting as a sound signal after noise removal.

なお、前記のような雑音除去処理において、実際にはフレーム分割部５２にて入力音声をフレーム分割してフーリエ変換する前に、その音声信号に「ハニング窓」等の窓関数をかけておく。また、後段の波形合成部５８で逆フーリエ変換後の音声信号をフレーム毎に合成処理する際にフレーム境界で不連続な波形になるのを防止するために、フレーム毎の音声信号を多少オーバーラップして合成していく。 In the noise removal process as described above, before the input speech is actually divided into frames by the frame dividing unit 52 and subjected to Fourier transform, the speech signal is subjected to a window function such as a “Hanning window”. In addition, when the audio signal after inverse Fourier transform is synthesized for each frame by the waveform synthesis unit 58 in the subsequent stage, the audio signal for each frame is somewhat overlapped in order to prevent a discontinuous waveform at the frame boundary. And then synthesize.

例えば、フレーム長が２５６サンプルとして分析ポイントを１２８サンプルずつシフトしていく。この場合のハニング窓は式（３）のように表せる。 For example, the analysis point is shifted by 128 samples with a frame length of 256 samples. The Hanning window in this case can be expressed as shown in Equation (3).

ｗ（ｎ）＝０．５−ｃｏｓ｛２＊ＰＩ＊ｎ／（Ｌ−１）｝ …（３）
Ｌ：１フレームのサンプル数
ｎ＝０，１，…，Ｌ−１
このように、各信号を１／２フレームずらして重ね合わせると、振幅が一定で不連続点のない音声波形を得ることができる。 w (n) = 0.5−cos {2 * PI * n / (L−1)} (3)
L: number of samples in one frame n = 0, 1,..., L−1
In this way, when the signals are overlapped with a shift of ½ frame, a speech waveform having a constant amplitude and no discontinuity can be obtained.

以上のように、本発明の第１の実施形態によれば、撮影時に参照入力部６１から入力されるモータ音によって第２の雑音スペクトル信号を選択した後、それに対応する第１の雑音スペクトル信号を選択して雑音除去用に使用するので、音声信号のスペクトルから同じ入力特性を有する雑音スペクトルを減算することができる。これにより、撮影時に音声信号に含まれる機構音を雑音成分として適切に除去して、撮影画像と共に高品質に記録することができる。 As described above, according to the first embodiment of the present invention, after the second noise spectrum signal is selected by the motor sound input from the reference input unit 61 at the time of shooting, the first noise spectrum signal corresponding thereto is selected. Is used for noise removal, a noise spectrum having the same input characteristics can be subtracted from the spectrum of the speech signal. As a result, the mechanical sound included in the audio signal at the time of shooting can be appropriately removed as a noise component and recorded together with the shot image with high quality.

なお、入力モータ音スペクトル信号（Ｒｖ）と第２のスペクトル記憶部６４に記憶されている第２の雑音スペクトルの各パターンとの類似度を計算する場合に、すべてのパターン（ここでは６通りのパターン）の類似度を計算しなくとも、モータ２１の回転方向によって、例えば広角方向の回転であれば、広角側スペクトルであるＲＺＯ１〜ＲＺＯ３を対象として類似度計算を行うようにしても良い。このようにすることで、類似度計算の量は半分になる。 When calculating the similarity between the input motor sound spectrum signal (Rv) and each pattern of the second noise spectrum stored in the second spectrum storage unit 64, all patterns (here, six patterns) are calculated. Even if the degree of similarity of the pattern) is not calculated, the degree of similarity may be calculated for the wide-angle side spectrums RZO1 to RZO3 as long as the motor 21 rotates in the wide-angle direction, for example. By doing so, the amount of similarity calculation is halved.

また、前記第１の実施形態では、モータ２１の駆動開始点、駆動中間点、駆動停止点の３つの期間についてモータ音を事前に採取して、各期間に対応したスペクトル信号を雑音除去用に記憶しておくものとしたが、さらに細かくモータ期間を分類し、それぞれに対応したスペクトル信号を記憶しておき、これらを適宜選択的に使用して雑音除去を行うことでも良い。このようにすることで、モータ２１の各駆動期間毎に適切な雑音除去処理を行うことができる。 In the first embodiment, motor sounds are collected in advance for three periods of the motor 21 drive start point, drive intermediate point, and drive stop point, and a spectrum signal corresponding to each period is used for noise removal. However, it is also possible to classify the motor periods more finely, store the spectrum signals corresponding to the motor periods, and selectively use them appropriately to remove noise. In this way, appropriate noise removal processing can be performed for each drive period of the motor 21.

図７は第１の実施形態における音声記録処理をソフトウェア的に実現する場合のフローチャートである。なお、このフローチャートで示される処理は、コンピュータである制御部３２によって読取り可能なプログラムの形態でＲＯＭ等の記録媒体に予め記録されているものとする。 FIG. 7 is a flowchart when the audio recording process in the first embodiment is realized by software. Note that the processing shown in this flowchart is recorded in advance in a recording medium such as a ROM in the form of a program readable by the control unit 32 which is a computer.

音声付き動画撮影時において、制御部３２は、例えばシャッタキー９の操作により動画撮影の終了が明示的に指示されるまでの間、以下のような処理を繰り返し実行する（ステップＳ１１〜Ｓ２３）。 At the time of moving image recording with sound, the control unit 32 repeatedly executes the following processing until the end of moving image shooting is explicitly instructed by operating the shutter key 9, for example (steps S11 to S23).

すなわち、まず、制御部３２は、主マイクであるマイクロホン部７を通じて主信号として入力された音声信号からフレームデータを生成し、このフレームデータに窓関数をかける（ステップＳ１１）。続いて、このフレームデータをフーリエ変換することで、周波数毎のパワーを示した入力音声スペクトル信号（Ｉａ）を生成する（ステップＳ１２）。 That is, first, the control unit 32 generates frame data from an audio signal input as a main signal through the microphone unit 7 that is a main microphone, and applies a window function to the frame data (step S11). Subsequently, an input speech spectrum signal (Ia) indicating power for each frequency is generated by performing Fourier transform on the frame data (step S12).

ここで、制御部３２は、ズーム操作によりモータ２１が駆動されているか否かを判断する（ステップＳ１３）。モータ駆動中でなければ（ステップＳ１３のＮｏ）、制御部３２は、前記入力音声スペクトル信号（Ｉａ）を逆フリー変換処理して元の音声波形データに戻し（ステップＳ２０）、これを前の音声波形データと連続するようにフレーム単位で合成しながら（ステップＳ２１）、撮影画像（動画データ）と同期させてメモリ３８に記録していく（ステップＳ２２）。 Here, the control unit 32 determines whether or not the motor 21 is driven by a zoom operation (step S13). If the motor is not being driven (No in step S13), the control unit 32 performs inverse free conversion processing on the input speech spectrum signal (Ia) to return it to the original speech waveform data (step S20). While being synthesized in units of frames so as to be continuous with the waveform data (step S21), it is recorded in the memory 38 in synchronization with the photographed image (moving image data) (step S22).

一方、モータ駆動中であった場合、つまり、ズーム操作によりモータ２１が広角方向あるいは望遠方向に回転駆動されている場合には（ステップＳ１３のＹｅｓ）、制御部３２は、参照マイク７ａを通じて参照信号として入力されるモータ音のみの信号からフレームデータを生成し、このフレームデータに窓関数をかける（ステップＳ１４）。続いて、このフレームデータをフーリエ変換することで、周波数毎のパワーを示した入力モータ音スペクトル信号（Ｒｖ）を生成する（ステップＳ１５）
ここで、制御部３２は、入力モータ音スペクトル信号（Ｒｖ）と第２のスペクトル記憶部６４に記憶された第２の雑音スペクトル信号の各パターンとの類似度を算出する（ステップＳ１６，Ｓ１７）。そして、最も類似度の高い第２の雑音スペクトル信号を第２のスペクトル記憶部６４の中から選択すると共に、その雑音スペクトル信号に対応した第１の雑音スペクトル信号を第１のスペクトル記憶部５４の中から選択する（ステップＳ１８）。 On the other hand, when the motor is being driven, that is, when the motor 21 is rotationally driven in the wide-angle direction or the telephoto direction by the zoom operation (Yes in step S13), the control unit 32 transmits the reference signal through the reference microphone 7a. Frame data is generated from the signal of only the motor sound input as, and a window function is applied to the frame data (step S14). Subsequently, an input motor sound spectrum signal (Rv) indicating the power for each frequency is generated by Fourier transforming the frame data (step S15).
Here, the control unit 32 calculates the similarity between the input motor sound spectrum signal (Rv) and each pattern of the second noise spectrum signal stored in the second spectrum storage unit 64 (steps S16 and S17). . Then, the second noise spectrum signal having the highest similarity is selected from the second spectrum storage unit 64, and the first noise spectrum signal corresponding to the noise spectrum signal is selected from the first spectrum storage unit 54. A selection is made from among them (step S18).

制御部３２は、前記第１のスペクトル記憶部５４から選択した第１の雑音スペクトル信号を音声信号と同じ入力特性を有する最適な雑音スペクトル信号（Ｘｖ）として用い、入力音声スペクトル信号（Ｉａ）から雑音スペクトル信号（Ｘｖ）に所定のサブトラクト係数αを乗じた信号を減算することで雑音成分を除去する（ステップＳ１９）。 The control unit 32 uses the first noise spectrum signal selected from the first spectrum storage unit 54 as an optimum noise spectrum signal (Xv) having the same input characteristics as the voice signal, and uses the input noise spectrum signal (Ia). A noise component is removed by subtracting a signal obtained by multiplying the noise spectrum signal (Xv) by a predetermined subtract coefficient α (step S19).

そして、この雑音除去後の音声スペクトル信号（Ｉｂ）を逆フリー変換処理して元の音声波形データに戻し（ステップＳ２０）、これを前の音声波形データと連続するようにフレーム単位で合成しながら（ステップＳ２１）、撮影画像（動画データ）と同期させてメモリ３８に記録していく（ステップＳ２２）。 Then, the speech spectrum signal (Ib) after the noise removal is subjected to inverse free conversion processing to return to the original speech waveform data (step S20), and this is synthesized in units of frames so as to be continuous with the previous speech waveform data. (Step S21), and recorded in the memory 38 in synchronization with the photographed image (moving image data) (Step S22).

このように、本装置をソフトウェア的に実現した場合であっても前記図３に示した構成と同様の効果が得られる。 Thus, even when the present apparatus is realized by software, the same effect as that of the configuration shown in FIG. 3 can be obtained.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described.

図８は本発明の第２の実施形態に係るデジタルカメラ１に用いられる雑音除去機能を備えた音声記録装置の構成を示すブロック図である。なお、図３（第１の実施形態）の構成と同じ部分には同一符号を付して、その説明は省略するものとする。 FIG. 8 is a block diagram showing a configuration of an audio recording apparatus having a noise removal function used in the digital camera 1 according to the second embodiment of the present invention. In addition, the same code | symbol is attached | subjected to the part same as the structure of FIG. 3 (1st Embodiment), and the description shall be abbreviate | omitted.

図８において、図３の構成と異なる点は、短時間パワー算出部７１が設けられていることである。この短時間パワー算出部７１は、参照入力部６１（参照マイク）を通じて参照信号として入力されるモータ音信号のフレーム単位のパワー（短時間パワー）を算出する。この場合、フレーム内の信号をｒ（ｉ）として、パワー値をＰとすると、以下のような式（４）で表せる。 8 is different from the configuration of FIG. 3 in that a short-time power calculation unit 71 is provided. The short-time power calculation unit 71 calculates the power (short-time power) in units of frames of the motor sound signal input as a reference signal through the reference input unit 61 (reference microphone). In this case, if the signal in the frame is r (i) and the power value is P, it can be expressed by the following equation (4).

Ｐ＝Σ｛ｒ（ｉ）＊ｒ（ｉ）｝ …（４）
制御部３２では、この短時間パワー算出部７１によって算出されたモータ音信号のパワーに基づいて、モータ音の発生期間を判断してスペクトル切り替え部５６による切替えを制御する。 P = Σ {r (i) * r (i)} (4)
The control unit 32 determines the generation period of the motor sound based on the power of the motor sound signal calculated by the short-time power calculation unit 71 and controls switching by the spectrum switching unit 56.

以下に、第２の実施形態としての動作を説明する。 The operation as the second embodiment will be described below.

例えばユーザがキー入力部３６に含まれるズームキー２０ａ，２０ｂを操作したとする。デジタルカメラ全体の動作を制御する制御部３２は、キー入力部３６に含まれるズームキー２０ａ，２０ｂのズーム操作信号を入力すると、モータ駆動部２１ａに対して駆動開始信号を送る。モータ駆動部２１ａは、この駆動開始信号を受けてモータ２１を回転駆動する。このモータ２１の回転に伴い、図２のレンズ光学系２２に含まれる図示せぬズームレンズが光軸上に移動してズーム動作が開始され、撮影画像が拡大または縮小されてメモリ３８に記録される。 For example, assume that the user operates the zoom keys 20 a and 20 b included in the key input unit 36. When the control unit 32 that controls the operation of the entire digital camera inputs zoom operation signals of the zoom keys 20a and 20b included in the key input unit 36, it sends a drive start signal to the motor drive unit 21a. The motor drive unit 21a receives the drive start signal and rotationally drives the motor 21. Along with the rotation of the motor 21, a zoom lens (not shown) included in the lens optical system 22 in FIG. 2 moves on the optical axis to start a zoom operation, and a captured image is enlarged or reduced and recorded in the memory 38. The

また、ユーザがズーム操作を終了すると、制御部３２はモータ駆動部２１ａに対して駆動終了信号を送る。これにより、モータ２１の回転駆動が停止し、ズーム動作が終了する。 When the user finishes the zoom operation, the control unit 32 sends a drive end signal to the motor drive unit 21a. Thereby, the rotational drive of the motor 21 is stopped and the zoom operation is finished.

ここで、制御部３２がモータ２１に対して駆動開始信号を出力してから実際にモータ２１が回転し始めるまでには多少の遅れがある。したがって、駆動開始信号の出力と同時に雑音除去処理（サブトラクト処理）を開始してしまうと、まだモータ音（ここではズーム音）が発生していないにも関わらず、サブトラクト部５５にて入力音声スペクトル信号（Ｉａ）からモータ音分のスペクトルが減算されることになり、波形合成部５８から出力される音声信号に歪みが生じる可能性がある。 Here, there is a slight delay from when the control unit 32 outputs a drive start signal to the motor 21 until the motor 21 actually starts to rotate. Therefore, if noise removal processing (subtract processing) is started at the same time as the drive start signal is output, the input speech spectrum is input to the subtractor 55 even though the motor sound (here, zoom sound) has not yet been generated. The spectrum for the motor sound is subtracted from the signal (Ia), and the audio signal output from the waveform synthesis unit 58 may be distorted.

このような問題を防止するために、第２の実施形態では、参照入力部６１（参照マイク）からの入力信号は略モータ音のみの信号であることに着目して、短時間パワー算出部７１によって、参照入力部６１から得られるモータ音信号の短時間パワーを算出する。そして、この算出されたモータ音信号のパワーと予め設定された基準値とを比較し、モータ音信号のパワーが基準値よりも小さい場合には、フーリエ変換部５３から出力される入力音声スペクトル信号（Ｉａ）を選択するようにスペクトル切り替え部５６を制御する。 In order to prevent such a problem, in the second embodiment, paying attention to the fact that the input signal from the reference input unit 61 (reference microphone) is a signal of only substantially motor sound, the short-time power calculation unit 71. Thus, the short-time power of the motor sound signal obtained from the reference input unit 61 is calculated. Then, the calculated power of the motor sound signal is compared with a preset reference value, and when the power of the motor sound signal is smaller than the reference value, the input speech spectrum signal output from the Fourier transform unit 53 The spectrum switching unit 56 is controlled to select (Ia).

一方、モータ音信号のパワーが基準値以上であれば、実際にモータ音が発生しているものと判断し、サブトラクト部５５から出力される雑音除去後の音声スペクトル信号（Ｉｂ）を選択するようにスペクトル切り替え部５６を制御する。 On the other hand, if the power of the motor sound signal is equal to or higher than the reference value, it is determined that the motor sound is actually generated, and the speech spectrum signal (Ib) after noise removal output from the subtracting unit 55 is selected. The spectrum switching unit 56 is controlled.

また、モータ２１を停止するときも同様であり、短時間パワー算出部７１から得られるモータ音信号のパワーと基準値との比較により、モータ２１の回転が実際に停止したことを確認してから入力音声スペクトル信号（Ｉａ）を選択するようにスペクトル切り替え部５６を制御する。 The same applies when stopping the motor 21, and after confirming that the rotation of the motor 21 has actually stopped by comparing the power of the motor sound signal obtained from the short-time power calculation unit 71 with the reference value. The spectrum switching unit 56 is controlled to select the input speech spectrum signal (Ia).

これにより、モータ駆動動作と実際にモータ音が発生するタイミングとにずれがあっても、モータ音の発生タイミングに合わせてスペクトル減算つまり雑音除去処理を行うことができ、波形合成部５８から歪みのない音声信号を得ることができる。 As a result, even if there is a discrepancy between the motor drive operation and the timing at which the motor sound actually occurs, spectrum subtraction, that is, noise removal processing can be performed in accordance with the motor sound generation timing. No audio signal can be obtained.

なお、この図８の構成についても、前記同様に制御部３２がプログラムに従って一連の処理を行うことで、ソフトウェア的に実現することが可能である。 Note that the configuration of FIG. 8 can also be realized in software as the control unit 32 performs a series of processes according to a program in the same manner as described above.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。
第３の実施形態では、自動録音レベル制御システム（ＡＬＳ）を備えた音声記録装置を想定したものである。 (Third embodiment)
Next, a third embodiment of the present invention will be described.
In the third embodiment, an audio recording apparatus provided with an automatic recording level control system (ALS) is assumed.

図９は本発明の第３の実施形態に係るデジタルカメラ１に用いられる雑音除去機能を備えた音声記録装置の構成を示すブロック図である。なお、図３（第１の実施形態）の構成と同じ部分には同一符号を付して、その説明は省略するものとする。 FIG. 9 is a block diagram showing a configuration of an audio recording apparatus having a noise removal function used in the digital camera 1 according to the third embodiment of the present invention. In addition, the same code | symbol is attached | subjected to the part same as the structure of FIG. 3 (1st Embodiment), and the description shall be abbreviate | omitted.

図９において、図３の構成と異なる点は、録音レベル制御部７２、増幅器７３、乗算部７４を備えていることである。録音レベル制御部７２は、入力音声のレベルをほぼ一定に保つための制御を行う。すなわち、録音レベル制御部７２は、音声入力部５１（主マイク）から入力される音声信号のパワーを算出し、そのパワー値が所定値よりも小さい場合には増幅器７３に与える録音レベル（増幅率）を上げ、逆にパワー値が所定値よりも大きい場合には増幅器７３に与える録音レベル（増幅率）を下げるように制御する。 9 is different from the configuration of FIG. 3 in that a recording level control unit 72, an amplifier 73, and a multiplication unit 74 are provided. The recording level control unit 72 performs control for keeping the level of the input sound substantially constant. That is, the recording level control unit 72 calculates the power of the audio signal input from the audio input unit 51 (main microphone), and if the power value is smaller than a predetermined value, the recording level (amplification factor) given to the amplifier 73 If the power value is larger than the predetermined value, the recording level (amplification factor) given to the amplifier 73 is controlled to be lowered.

増幅器７３は、録音レベル制御部７２からの指示に従って入力音声を増幅調整（ゲイン調整）してフレーム分割部５２に与える。また、乗算部７４は、録音レベル制御部７２からの指示に従って、雑音スペクトル信号（Ｘｖ）に乗じるサブトラクト係数αの値を変更する。 The amplifier 73 performs amplification adjustment (gain adjustment) on the input sound in accordance with an instruction from the recording level control unit 72 and supplies the input sound to the frame division unit 52. Further, the multiplication unit 74 changes the value of the subtract coefficient α to be multiplied by the noise spectrum signal (Xv) in accordance with the instruction from the recording level control unit 72.

以下に、第３の実施形態としての動作を説明する。 The operation as the third embodiment will be described below.

前記第１の実施形態と同様に、予め第１のスペクトル記憶部５４と第２のスペクトル記憶部６４には、それぞれに音声入力部５１（主マイク）と参照入力部６１（参照マイク）を通じて無音状態で採取したモータ音（ここではズーム音）のスペクトル信号を雑音スペクトル信号として記憶しておく。この場合の録音レベルは、基準値の“１”に固定しておく。 Similarly to the first embodiment, the first spectrum storage unit 54 and the second spectrum storage unit 64 are silent in advance through the voice input unit 51 (main microphone) and the reference input unit 61 (reference microphone), respectively. A spectrum signal of motor sound (here, zoom sound) collected in the state is stored as a noise spectrum signal. The recording level in this case is fixed to the reference value “1”.

ここで、音声付きの動画撮影時において、音声入力部５１（主マイク）から主信号として入力される音声信号のパワー（音量）が録音レベル制御部７２によって算出され、そのパワー値に基づく録音レベルが決定される。 Here, at the time of video recording with sound, the power (volume) of the audio signal input as the main signal from the audio input unit 51 (main microphone) is calculated by the recording level control unit 72, and the recording level based on the power value is calculated. Is determined.

このときの録音レベルを“ｋ”とすると、音声入力部５１（主マイク）から主信号として入力された音声信号は増幅器７３にてｋ倍された後、フレーム分割部５２にて数１０ｍｓ程度のフレームに分割され、続いて、フーリエ変換部５３にてスペクトル信号（Ｉａ）に変換される。この入力音声スペクトル信号（Ｉａ）に雑音として含まれるモータ音のスペクトルは、録音レベルが“１”の時のｋ倍になっている。 Assuming that the recording level at this time is “k”, the audio signal input as the main signal from the audio input unit 51 (main microphone) is multiplied by k by the amplifier 73 and then is several tens of ms by the frame dividing unit 52. The signal is divided into frames, and then converted into a spectrum signal (Ia) by a Fourier transform unit 53. The spectrum of the motor sound included as noise in the input sound spectrum signal (Ia) is k times that when the recording level is “1”.

一方、第１のスペクトル記憶部５４に記憶された雑音スペクトル信号（Ｘｖ）は、通常の録音レベル（ｋ＝１）で採取したモータ音のスペクトルであることから、そのままサブトラクト部５５の雑音除去処理に適用すると、ｋ＝１より大きい場合には、スペクトルの引き残しが発生し、その結果、波形合成部５８から出力される音声信号にはズーム音が残ってしまう。また、ｋ＝１より小さい場合には、入力音声スペクトル信号（Ｉａ）から過大なスペクトルを減算してしまうことになる。このため、入力音声に含まれる雑音成分は除去されるが、スペクトルの引きすぎにより音声信号自体が歪んでしまう。 On the other hand, since the noise spectrum signal (Xv) stored in the first spectrum storage unit 54 is a spectrum of a motor sound collected at a normal recording level (k = 1), the noise removal processing of the subtractor unit 55 is performed as it is. When k is greater than 1, spectral leftover occurs, and as a result, the zoom sound remains in the audio signal output from the waveform synthesizer 58. On the other hand, when k is smaller than 1, an excessive spectrum is subtracted from the input voice spectrum signal (Ia). For this reason, the noise component contained in the input speech is removed, but the speech signal itself is distorted due to excessive spectrum drawing.

これを防止するために、第１のスペクトル記憶部５４の中から選択された雑音スペクトル信号（Ｘｖ）をそのままサブトラクト部５５に与えるのではなく、入力音声と同様にｋ倍して与えるようにする。具体的には、増幅器７３の増幅率に合わせて、乗算部７４にて雑音スペクトル信号（Ｘｖ）に乗じるサブトラクト係数αを変更する。これにより、入力音声信号と共にｋ倍された雑音スペクトル信号（Ｘｖ）がサブトラクト部５５に与えられることになり、波形合成部５８から歪みのない音声信号を得ることができる。 In order to prevent this, the noise spectrum signal (Xv) selected from the first spectrum storage unit 54 is not supplied to the subtracting unit 55 as it is, but is supplied after being multiplied by k like the input speech. . Specifically, the subtract coefficient α to be multiplied by the noise spectrum signal (Xv) is changed by the multiplier 74 in accordance with the amplification factor of the amplifier 73. Thereby, the noise spectrum signal (Xv) multiplied by k together with the input voice signal is given to the subtractor 55, and a voice signal without distortion can be obtained from the waveform synthesizer 58.

なお、この図６の構成についても、前記同様に制御部３２がプログラムに従って一連の処理を行うことで、ソフトウェア的に実現することが可能である。 Note that the configuration of FIG. 6 can also be realized in software as the control unit 32 performs a series of processing according to a program in the same manner as described above.

また、前記各実施形態では、ズーム音を雑音除去対象として説明したが、ズーム音に限らず、例えばフォーカス音、さらにはシャッター音などでも同様であり、要は撮影動作に伴って発生する機構音を入力音声から除去する場合に適用可能である。 In each of the embodiments described above, the zoom sound has been described as a noise removal target. However, the present invention is not limited to the zoom sound, and the same applies to, for example, a focus sound and further a shutter sound. Can be applied to the case of removing from the input voice.

また、前記各実施形態では、音声付き動画撮影可能なデジタルカメラを例にして説明したが、本発明はデジタルカメラに限らず、例えばカメラ付きの携帯電話など、音声信号と共に撮影画像を記録可能な機能を備えた電子機器であれば、そのすべてに適用可能である。 Further, in each of the above embodiments, a digital camera capable of shooting a moving image with sound has been described as an example. However, the present invention is not limited to a digital camera, and for example, a captured image can be recorded together with a sound signal such as a mobile phone with a camera. Any electronic device having a function can be applied to all of them.

要するに、本発明は前記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、前記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 In short, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the respective embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

また、上述した実施形態において記載した手法は、コンピュータに実行させることのできるプログラムとして、例えば磁気ディスク（フレキシブルディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ等）、半導体メモリなどの記録媒体に書き込んで各種装置に適用したり、そのプログラム自体をネットワーク等の伝送媒体により伝送して各種装置に適用することも可能である。本装置を実現するコンピュータは、記録媒体に記録されたプログラムあるいは伝送媒体を介して提供されたプログラムを読み込み、このプログラムによって動作が制御されることにより、上述した処理を実行する。 In addition, the method described in the above-described embodiment is a program that can be executed by a computer, such as a magnetic disk (flexible disk, hard disk, etc.), an optical disk (CD-ROM, DVD-ROM, etc.), a semiconductor memory, etc. The program can be written on a medium and applied to various apparatuses, or the program itself can be transmitted through a transmission medium such as a network and applied to various apparatuses. A computer that implements this apparatus reads a program recorded on a recording medium or a program provided via a transmission medium, and performs the above-described processing by controlling operations by this program.

図１は本発明の撮像装置としてデジタルカメラを例にした場合の外観構成を示す図であり、図１（ａ）は主に前面の構成、同図（ｂ）は主に背面の構成を示す斜視図である。1A and 1B are diagrams showing an external configuration when a digital camera is taken as an example of the imaging apparatus of the present invention. FIG. 1A mainly shows a front configuration, and FIG. 1B mainly shows a rear configuration. It is a perspective view. 図２はデジタルカメラの電子回路構成を示すブロック図である。FIG. 2 is a block diagram showing an electronic circuit configuration of the digital camera. 図３は本発明の第１の実施形態に係るデジタルカメラに用いられる雑音除去機能を備えた音声記録装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of an audio recording apparatus having a noise removal function used in the digital camera according to the first embodiment of the present invention. 図４は第１の実施形態における第１のスペクトル記憶部の構成を示す図である。FIG. 4 is a diagram showing the configuration of the first spectrum storage unit in the first embodiment. 図５は第１の実施形態における第２のスペクトル記憶部の構成を示す図である。FIG. 5 is a diagram showing a configuration of the second spectrum storage unit in the first embodiment. 図６はＳＳ法（スペクトルサブトラクション法）を用いた雑音除去処理を説明するための図である。FIG. 6 is a diagram for explaining noise removal processing using the SS method (spectral subtraction method). 図７は第１の実施形態における音声記録処理をソフトウェア的に実現する場合のフローチャートである。FIG. 7 is a flowchart when the audio recording process in the first embodiment is realized by software. 図８は本発明の第２の実施形態に係るデジタルカメラに用いられる雑音除去機能を備えた音声記録装置の構成を示すブロック図である。FIG. 8 is a block diagram showing the configuration of an audio recording apparatus having a noise removal function used in a digital camera according to the second embodiment of the present invention. 図９は本発明の第３の実施形態に係るデジタルカメラに用いられる雑音除去機能を備えた音声記録装置の構成を示すブロック図である。FIG. 9 is a block diagram showing the configuration of an audio recording apparatus having a noise removal function used in a digital camera according to the third embodiment of the present invention.

Explanation of symbols

１…デジタルカメラ、２…ボディ、３…撮影レンズ、７…マイクロホン部（主マイク）、７ａ…参照マイク、９…シャッタキー、２０ａ，２０ｂ…ズームキー、２１…モータ、２１ａ…モータ駆動部、３２…制御部、３６…キー入力部、５１…音声入力部、５２…フレーム分割部、５３…フーリエ変換部、５４…第１のスペクトル記憶部、５５…サブトラクト部、５６…スペクトル切り替え部、５７…逆フーリエ変換部、５８…波形合成部、６１…参照入力部、６２…フレーム分割部、６３…フーリエ変換部、６４…第２のスペクトル記憶部、６５…類似度計算部、６６…スペクトル選択部、７１…短時間パワー算出部、７２…録音レベル制御部、７３…増幅器、７４…乗算部、Ｉａ…入力音声スペクトル信号、Ｉｂ…雑音除去後の音声スペクトル信号、Ｘｖ…雑音スペクトル信号、Ｒｖ…入力モータ音スペクトル信号。 DESCRIPTION OF SYMBOLS 1 ... Digital camera, 2 ... Body, 3 ... Shooting lens, 7 ... Microphone part (main microphone), 7a ... Reference microphone, 9 ... Shutter key, 20a, 20b ... Zoom key, 21 ... Motor, 21a ... Motor drive part, 32 ... Control part 36 ... Key input part 51 ... Audio input part 52 ... Frame division part 53 ... Fourier transform part 54 ... First spectrum storage part 55 ... Subtract part 56 ... Spectrum switching part 57 ... Inverse Fourier transform unit, 58... Waveform synthesis unit, 61... Reference input unit, 62... Frame division unit, 63. 71 ... Short-time power calculation unit, 72 ... Recording level control unit, 73 ... Amplifier, 74 ... Multiplication unit, Ia ... Input speech spectrum signal, Ib ... Speech spectrum after noise removal Torque signal, Xv ... noise spectrum signal, Rv ... input motor noise spectrum signal.

Claims

An imaging device having an audio recording function for removing mechanical sound generated as a result of a shooting operation from an audio signal as noise when recording video with audio,
First input means for inputting an audio signal;
A first storage means for storing a plurality of patterns of the spectrum of the mechanical sound collected in advance through the first input means as a first noise spectrum;
Second input means provided in the vicinity of the mechanical sound source;
A second storage means for storing a plurality of patterns in association with each pattern of the first noise spectrum as a second noise spectrum, the spectrum of the mechanical sound collected in advance through the second input means;
Based on the spectrum of the mechanical sound obtained from the second input means at the time of shooting, a second noise spectrum corresponding to the characteristic of the mechanical sound is selected from the second storage means, and the first storage is performed. Selecting means for selecting a first noise spectrum corresponding to the second noise spectrum from the means;
Noise removal means for removing a noise component by subtracting a signal obtained by multiplying a first noise spectrum selected by the selection means by a predetermined coefficient from the spectrum of the audio signal obtained from the first input means;
Inverse conversion means for inversely converting the speech spectrum after noise removal obtained by the noise removal means into the original voice signal;
An imaging apparatus comprising: a recording unit that records an audio signal obtained by the inverse conversion unit together with a captured image.

A degree-of-similarity calculation means for calculating the degree of similarity between the spectrum of the mechanical sound obtained from the second input means at the time of shooting and each pattern of the noise spectrum stored in the second storage means;
The selection means selects the second noise spectrum having the highest similarity from the second storage means based on the calculation result of the similarity calculation means, and selects the second noise spectrum from the first storage means. The imaging device according to claim 1, wherein a first noise spectrum corresponding to the second noise spectrum is selected.

The mechanism sound includes a driving sound of a specific motor related to a photographing operation,
The first and second storage means store at least patterns of noise spectra corresponding to at least three periods of when the motor starts driving, during steady rotation, and when driving stops. Item 2. The imaging device according to Item 1.

The mechanism sound includes a driving sound of a specific motor related to a photographing operation,
The first and second storage means store noise spectrum patterns when the motor is driven in a predetermined direction and when the motor is driven in a direction opposite to the predetermined direction. The imaging apparatus according to claim 1.

The mechanism sound includes a driving sound of a specific motor related to a photographing operation,
In the first and second storage means, the motor is driven in a predetermined direction and when the motor is driven in a direction opposite to the predetermined direction, at least at the start of driving of the motor and during normal rotation The image pickup apparatus according to claim 1, wherein each pattern of the noise spectrum corresponding to three periods when the drive is stopped is stored.

Power calculating means for calculating the power of the mechanical sound obtained from the second input means;
6. The control unit according to claim 1, further comprising: a control unit that controls a noise removal operation by the noise removal unit based on the power of the mechanical sound calculated by the power calculation unit. Imaging device.

The control means prohibits a noise removing operation by the noise removing means when the power of the mechanical sound calculated by the power calculating means is smaller than a predetermined value, and the noise when the power of the mechanical sound is equal to or higher than a predetermined value. The imaging apparatus according to claim 6, wherein a noise removing operation by the removing unit is permitted.

Amplifying means for amplifying an audio signal obtained from the first input means;
An amplification factor adjusting unit that calculates the power of the audio signal and adjusts the amplification factor of the amplification unit based on the power value;
6. Coefficient variable means for changing a value of a coefficient to be multiplied by the first noise spectrum in accordance with the amplification factor of the amplification means adjusted by the amplification factor adjustment means. The imaging device according to any one of the above.

In addition to the first input unit for inputting an audio signal, the audio recording method is used for an imaging apparatus in which a second input unit is provided in the vicinity of the mechanical sound source,
Storing a plurality of patterns in the first storage unit with the spectrum of the mechanical sound collected in advance through the first input unit as a first noise spectrum;
Storing a plurality of patterns in a second storage unit in association with each pattern of the first noise spectrum, as a second noise spectrum, the spectrum of the mechanical sound collected in advance through the second input unit;
Based on the spectrum of the mechanical sound obtained from the second input unit at the time of shooting, a second noise spectrum corresponding to the characteristic of the mechanical sound is selected from the second storage unit, and the first storage is performed. Selecting a first noise spectrum corresponding to the second noise spectrum from the part;
Removing a noise component by subtracting a signal obtained by multiplying the selected first noise spectrum by a predetermined coefficient from the spectrum of an audio signal obtained from the first input unit;
Inversely converting the speech spectrum after the noise removal into the original speech signal;
A voice recording method comprising: recording a voice signal obtained by the inverse conversion unit in a predetermined memory together with a photographed image.

A program executed by a computer mounted on an imaging apparatus provided with a second input unit in the vicinity of a mechanism sound generation source separately from a first input unit that inputs an audio signal,
In the computer,
A function of storing a plurality of patterns of a mechanical sound collected in advance through the first input unit as a first noise spectrum in a first storage unit;
A function of storing a plurality of patterns in the second storage unit in association with each pattern of the first noise spectrum as a second noise spectrum, the spectrum of the mechanical sound collected in advance through the second input unit;
Based on the spectrum of the mechanical sound obtained from the second input unit at the time of shooting, a second noise spectrum corresponding to the characteristic of the mechanical sound is selected from the second storage unit, and the first storage is performed. A function of selecting a first noise spectrum corresponding to the second noise spectrum from the unit;
A function of removing a noise component by subtracting a signal obtained by multiplying the selected first noise spectrum by a predetermined coefficient from a spectrum of an audio signal obtained from the first input unit;
A function of inversely converting the speech spectrum after the noise removal into the original speech signal;
A program for realizing a function of recording an audio signal obtained by the inverse conversion means in a predetermined memory together with a photographed image.