JP2012168477A

JP2012168477A - Noise estimation device, signal processor, imaging apparatus, and program

Info

Publication number: JP2012168477A
Application number: JP2011031439A
Authority: JP
Inventors: Kosuke Okano; 康介岡野
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2011-02-16
Filing date: 2011-02-16
Publication date: 2012-09-06

Abstract

PROBLEM TO BE SOLVED: To appropriately reduce noise overlapping a sound signal.SOLUTION: A noise estimation device (150) includes: a calculation part (232) for calculating the degree of noise similarity indicating the degree of similarity between a sound signal and noise on the basis of the frequency spectrum of an input sound signal and the frequency spectrum of noise; and a noise estimation part (233) for estimating an estimated noise included in the sound signal on the basis of the degree of noise similarity calculated by the calculation part (232).

Description

本発明は、ノイズ推定装置、信号処理装置、撮像装置、及びプログラムに関する。 The present invention relates to a noise estimation device, a signal processing device, an imaging device, and a program.

音信号に重畳しているノイズを低減する技術が知られている（例えば、特許文献１を参照）。非特許文献１に記載されている技術では、音信号に重畳している定常ノイズを、予め定められている推定ノイズによって低減する。 A technique for reducing noise superimposed on a sound signal is known (see, for example, Patent Document 1). In the technique described in Non-Patent Document 1, stationary noise superimposed on a sound signal is reduced by a predetermined estimated noise.

BOLL, S. F. "Suppression of Acoustic Noise in Speech Using Spectral Subtraction." IEEE TRANSACTION ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-27, pp. 113-120, APRIL, 1979.BOLL, S. F. "Suppression of Acoustic Noise in Speech Using Spectral Subtraction." IEEE TRANSACTION ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, vol. ASSP-27, pp. 113-120, APRIL, 1979.

しかしながら、非特許文献１に記載されている技術では、例えば、大きさが非定常なノイズを低減するような場合、音信号に実際に混入しているノイズと推定ノイズとの間に差が生じ、ノイズの過大減算あるいは過小減算により、音の劣化もしくは雑音の残存が発生することがある。また、例えば、間欠的に発生するノイズを低減するような場合、ノイズが混入していない箇所では、ノイズを過大に減算してしまい、音の劣化が発生することがある。
つまり、非特許文献１に記載されている技術では、音信号に重畳しているノイズを適切に低減できないという問題がある。 However, in the technique described in Non-Patent Document 1, for example, when noise whose magnitude is unsteady is reduced, there is a difference between noise actually mixed in the sound signal and estimated noise. In some cases, excessive deterioration or excessive subtraction of noise may cause deterioration of sound or residual noise. Further, for example, when noise generated intermittently is reduced, noise may be excessively subtracted in a portion where noise is not mixed, resulting in sound deterioration.
That is, the technique described in Non-Patent Document 1 has a problem that noise superimposed on the sound signal cannot be reduced appropriately.

本発明は、上記問題を解決すべくなされたもので、その目的は、音信号に重畳しているノイズを適切に低減することができるノイズ推定装置、信号処理装置、撮像装置、及びプログラムを提供することにある。 The present invention has been made to solve the above problems, and an object thereof is to provide a noise estimation device, a signal processing device, an imaging device, and a program capable of appropriately reducing noise superimposed on a sound signal. There is to do.

上記問題を解決するために、本発明は、入力された音信号の周波数スペクトルと、ノイズの周波数スペクトルとに基づいて、前記音信号と前記ノイズとの類似の度合いを示すノイズ類似度を算出する算出部と、前記算出部により算出された前記ノイズ類似度に基づいて、前記音信号に含まれる推定ノイズを推定するノイズ推定部とを備えることを特徴とするノイズ推定装置である。 In order to solve the above problem, the present invention calculates a noise similarity indicating the degree of similarity between the sound signal and the noise based on the frequency spectrum of the input sound signal and the frequency spectrum of the noise. A noise estimation device comprising: a calculation unit; and a noise estimation unit that estimates an estimated noise included in the sound signal based on the noise similarity calculated by the calculation unit.

また、本発明は、上記のノイズ推定装置と、前記ノイズ推定装置によって推定された前記推定ノイズに基づいて、前記音信号に含まれるノイズを低減するノイズ低減処理部と、を備えることを特徴とする信号処理装置である。 In addition, the present invention includes the above noise estimation device, and a noise reduction processing unit that reduces noise included in the sound signal based on the estimated noise estimated by the noise estimation device. It is a signal processing device.

また、本発明は、上記の信号処理装置を備えることを特徴とする撮像装置である。 Moreover, this invention is an imaging device provided with said signal processing apparatus.

また、本発明は、ノイズ推定装置としてのコンピュータに、算出部が、入力された音信号の周波数スペクトルと、ノイズの周波数スペクトルとに基づいて、前記音信号と前記ノイズとの類似の度合いを示すノイズ類似度を算出する算出手順と、ノイズ推定部が、前記算出手順により算出された前記ノイズ類似度に基づいて、前記音信号に含まれる推定ノイズを推定するノイズ推定手順とを実行させるためのプログラムである。 Further, according to the present invention, in a computer as a noise estimation device, the calculation unit indicates the degree of similarity between the sound signal and the noise based on the frequency spectrum of the input sound signal and the frequency spectrum of the noise. A calculation procedure for calculating a noise similarity, and a noise estimation unit for executing a noise estimation procedure for estimating an estimated noise included in the sound signal based on the noise similarity calculated by the calculation procedure It is a program.

本発明によれば、音信号に重畳しているノイズを適切に低減することができる。 According to the present invention, it is possible to appropriately reduce noise superimposed on a sound signal.

本実施形態による撮像装置を示す概略ブロック図である。It is a schematic block diagram which shows the imaging device by this embodiment. 同実施形態における信号処理装置を示す概略ブロック図である。It is a schematic block diagram which shows the signal processing apparatus in the embodiment. 同実施形態における入力周波数特徴ベクトルとノイズ周波数特徴ベクトルの一例を示す概念図である。It is a conceptual diagram which shows an example of the input frequency feature vector and noise frequency feature vector in the embodiment. 同実施形態におけるノイズ低減処理の動作を示すフローチャートである。It is a flowchart which shows the operation | movement of the noise reduction process in the embodiment. 同実施形態におけるノイズが重畳した音信号の例を示す説明図である。It is explanatory drawing which shows the example of the sound signal with which the noise in the same embodiment was superimposed. 同実施形態におけるノイズの周波数スペクトルの生成方法の一例を説明する説明図である。It is explanatory drawing explaining an example of the generation method of the frequency spectrum of the noise in the embodiment.

以下、本発明の一実施形態による信号処理装置及び撮像装置について図面を参照して説明する。
図１は、本実施形態による撮像装置１を示す概略ブロック図である。
この図において、本実施形態による撮像装置１は、撮像部１０、バッファメモリ部３０、画像処理部４０、表示部５０、記憶部６０、通信部７０、操作部８０、ＣＰＵ（Central Processing Unit）９０、マイク２１、Ａ／Ｄ（Analog/Digital）変換部２２、音信号処理部２３、及びバス３００を備えている。この撮像装置１が備える構成のうち、例えば、音信号処理部２３と、記憶部６０の一部とが、信号処理装置１００に対応する。また、例えば、音信号処理部２３の一部と、記憶部６０の一部とが、ノイズ推定装置１５０に対応する。 Hereinafter, a signal processing apparatus and an imaging apparatus according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a schematic block diagram illustrating the imaging apparatus 1 according to the present embodiment.
In this figure, the imaging apparatus 1 according to the present embodiment includes an imaging unit 10, a buffer memory unit 30, an image processing unit 40, a display unit 50, a storage unit 60, a communication unit 70, an operation unit 80, and a CPU (Central Processing Unit) 90. , A microphone 21, an A / D (Analog / Digital) converter 22, a sound signal processor 23, and a bus 300. Among the configurations of the imaging apparatus 1, for example, the sound signal processing unit 23 and a part of the storage unit 60 correspond to the signal processing apparatus 100. For example, a part of the sound signal processing unit 23 and a part of the storage unit 60 correspond to the noise estimation device 150.

撮像部１０は、光学系１１と、撮像素子１９と、Ａ/Ｄ変換部２０とを含み、設定された撮像条件（例えば絞り値、露出値等）に従ってＣＰＵ９０により制御され、光学系１１による光学像を撮像素子１９に結像させて、Ａ/Ｄ変換部２０によってデジタル信号に変換された当該光学像に基づく画像データを生成する。 The imaging unit 10 includes an optical system 11, an imaging element 19, and an A / D conversion unit 20. The imaging unit 10 is controlled by the CPU 90 in accordance with the set imaging conditions (for example, an aperture value, an exposure value, etc.). An image is formed on the image sensor 19 and image data based on the optical image converted into a digital signal by the A / D converter 20 is generated.

光学系１１は、焦点調整レンズ（以下、ＡＦ（Auto Focus）レンズという）１２と、手振れ防止用レンズ（以下、ＶＲ（Vibration Reduction）レンズという）１３と、ズームレンズ１４と、ズームエンコーダ１５と、レンズ駆動部１６と、ＡＦエンコーダ１７と、手振れ防止部１８とを備えている。
この光学系１１は、ズームレンズ１４、ＶＲレンズ１３、及びＡＦレンズ１２を通過した光学像を撮像素子１９の受光面に導く。 The optical system 11 includes a focus adjustment lens (hereinafter referred to as an AF (Auto Focus) lens) 12, a camera shake prevention lens (hereinafter referred to as a VR (Vibration Reduction) lens) 13, a zoom lens 14, a zoom encoder 15, A lens driving unit 16, an AF encoder 17, and a camera shake preventing unit 18 are provided.
The optical system 11 guides the optical image that has passed through the zoom lens 14, the VR lens 13, and the AF lens 12 to the light receiving surface of the image sensor 19.

レンズ駆動部１６は、後述するＣＰＵ９０から入力される駆動制御信号に基づいて、ズームレンズ１４又はＡＦエンコーダ１７の位置を制御する。
手振れ防止部１８は、後述するＣＰＵ９０から入力される駆動制御信号に基づいて、ＶＲレンズ１３の位置を制御する。この手振れ防止部１８は、ＶＲレンズ１３の位置を検出していてもよい。 The lens driving unit 16 controls the position of the zoom lens 14 or the AF encoder 17 based on a drive control signal input from a CPU 90 described later.
The camera shake prevention unit 18 controls the position of the VR lens 13 based on a drive control signal input from a CPU 90 described later. The camera shake prevention unit 18 may detect the position of the VR lens 13.

ズームエンコーダ１５は、ズームレンズ１４の位置を表わすズームポジションを検出し、検出したズームポジションをＣＰＵ９０に出力する。
ＡＦエンコーダ１７は、ＡＦレンズ１２の位置を表わすフォーカスポジションを検出し、検出したズームポジション及びフォーカスポジションをＣＰＵ９０に出力する。 The zoom encoder 15 detects a zoom position representing the position of the zoom lens 14 and outputs the detected zoom position to the CPU 90.
The AF encoder 17 detects a focus position representing the position of the AF lens 12 and outputs the detected zoom position and focus position to the CPU 90.

なお、上述した光学系１１は、撮像装置１に取り付けられて一体とされていてもよいし、撮像装置１に着脱可能に取り付けられてもよい。 The optical system 11 described above may be attached to and integrated with the imaging apparatus 1 or may be attached to the imaging apparatus 1 so as to be detachable.

撮像素子１９は、例えば、受光面に結像した光学像を電気信号に変換して、Ａ/Ｄ変換部２０に出力する。
また、撮像素子１９は、操作部８０を介して撮影指示を受け付けた際に得られる画像データを、撮影された静止画の撮影画像データとして、Ａ/Ｄ変換部２０や画像処理部４０を介して、記憶媒体２００に記憶させる。 For example, the imaging element 19 converts an optical image formed on the light receiving surface into an electrical signal and outputs the electrical signal to the A / D conversion unit 20.
In addition, the image sensor 19 uses the image data obtained when a shooting instruction is received via the operation unit 80 as shot image data of a shot still image via the A / D conversion unit 20 and the image processing unit 40. And stored in the storage medium 200.

一方、撮像素子１９は、例えば、操作部８０を介して撮像指示を受け付けていない状態において、連続的に得られる画像データをスルー画データとして、Ａ/Ｄ変換部２０や画像処理部４０を介して、ＣＰＵ９０及び表示部５０に出力する。 On the other hand, for example, the imaging device 19 uses continuously obtained image data as through image data through the A / D conversion unit 20 and the image processing unit 40 in a state where an imaging instruction is not received via the operation unit 80. To the CPU 90 and the display unit 50.

Ａ/Ｄ変換部２０は、撮像素子１９によって変換された電子信号をアナログ／デジタル変換し、この変換したデジタル信号である画像データを出力する。 The A / D converter 20 performs analog / digital conversion on the electronic signal converted by the image sensor 19 and outputs image data that is the converted digital signal.

バッファメモリ部３０は、撮像部１０によって撮像された画像データや、音信号処理部２３により変換された音信号等を、一時的に記憶する。
画像処理部４０は、記憶部６０に記憶されている画像処理条件を参照して、バッファメモリ部３０、又は、記憶媒体２００に記録されている画像データに対して画像処理をする。 The buffer memory unit 30 temporarily stores the image data captured by the imaging unit 10, the sound signal converted by the sound signal processing unit 23, and the like.
The image processing unit 40 refers to the image processing conditions stored in the storage unit 60 and performs image processing on the image data recorded in the buffer memory unit 30 or the storage medium 200.

表示部５０は、例えば、液晶ディスプレイであって、撮像部１０によって得られた画像データや、操作画面等を表示する。 The display unit 50 is, for example, a liquid crystal display, and displays image data obtained by the imaging unit 10, an operation screen, and the like.

記憶部６０は、ＣＰＵ９０によってシーン判定の際に参照される判定条件や、撮像条件等を記憶する。また、記憶部６０は、後述する音信号処理部２３において音信号のノイズを低減するノイズ低減処理に使用する情報を記憶する。ここで、ノイズ低減処理に使用する情報とは、例えば、後述するノイズ周波数特徴ベクトル等である。
また、記憶部６０は、ノイズ周波数特徴ベクトル記憶部６１を備えている。 The storage unit 60 stores determination conditions referred to when scene determination is performed by the CPU 90, imaging conditions, and the like. The storage unit 60 also stores information used for noise reduction processing for reducing noise in the sound signal in the sound signal processing unit 23 described later. Here, the information used for the noise reduction processing is, for example, a noise frequency feature vector described later.
The storage unit 60 includes a noise frequency feature vector storage unit 61.

ノイズ周波数特徴ベクトル記憶部６１は、後述するノイズ周波数特徴ベクトルを記憶する。ノイズ周波数特徴ベクトルは、例えば、撮像装置１の製造又は出荷検査の際に、予め記憶されている。また、ノイズ周波数特徴ベクトル記憶部６１は、対応するノイズの種類に応じて、複数のノイズ周波数特徴ベクトルを記憶していてもよい。例えば、ズームレンズ１４、ＶＲレンズ１３、及びＡＦレンズ１２の機構音のうちのいずれか１つによるノイズ（雑音）に対応するノイズ周波数特徴ベクトルが、それぞれノイズ周波数特徴ベクトル記憶部６１に記憶されていてもよい。 The noise frequency feature vector storage unit 61 stores a noise frequency feature vector to be described later. The noise frequency feature vector is stored in advance, for example, when the imaging apparatus 1 is manufactured or shipped. Further, the noise frequency feature vector storage unit 61 may store a plurality of noise frequency feature vectors according to the type of the corresponding noise. For example, noise frequency feature vectors corresponding to noise caused by any one of the mechanical sounds of the zoom lens 14, the VR lens 13, and the AF lens 12 are stored in the noise frequency feature vector storage unit 61. May be.

マイク２１は、音を収音し、収音した音に応じた音信号を出力する。この音信号は、アナログ信号である。
Ａ／Ｄ変換部２２は、マイク２１から入力されたアナログ信号である音信号を、デジタル信号である音信号に、アナログデジタル変換する。 The microphone 21 collects sound and outputs a sound signal corresponding to the collected sound. This sound signal is an analog signal.
The A / D conversion unit 22 performs analog-to-digital conversion of a sound signal that is an analog signal input from the microphone 21 into a sound signal that is a digital signal.

音信号処理部２３は、Ａ／Ｄ変換部２２によりデジタル信号に変換された音信号に対して、例えば、ノイズを低減するなどの音信号処理を実行し、この音信号処理した音信号を記憶媒体２００に記憶させる。音信号処理部２３は、ノイズ低減処理部２４、周波数特徴ベクトル生成部２３１、内積算出部２３２、及びノイズ推定部２３３を備えている。この音信号処理部２３の詳細については、後述する。
なお、音信号処理部２３により音信号処理された音信号が記憶媒体２００に記憶される場合、撮像素子１９により撮像された画像データと、時間的に関係付けられて記憶されてもよいし、音信号を含む動画として記憶されてもよい。 The sound signal processing unit 23 performs sound signal processing such as noise reduction on the sound signal converted into a digital signal by the A / D conversion unit 22 and stores the sound signal that has been processed. Store in the medium 200. The sound signal processing unit 23 includes a noise reduction processing unit 24, a frequency feature vector generation unit 231, an inner product calculation unit 232, and a noise estimation unit 233. Details of the sound signal processing unit 23 will be described later.
When the sound signal processed by the sound signal processing unit 23 is stored in the storage medium 200, the sound signal may be stored in a temporal relationship with the image data captured by the image sensor 19. You may memorize | store as a moving image containing a sound signal.

通信部７０は、カードメモリ等の取り外しが可能な記憶媒体２００と接続され、この記憶媒体２００への情報の書込み、読み出し、あるいは消去を行う。
操作部８０は、例えば、電源スイッチやシャッターボタン、その他の操作キーを含み、ユーザによって操作されることでユーザの操作入力を受け付け、ＣＰＵ９０に出力する。 The communication unit 70 is connected to a removable storage medium 200 such as a card memory, and performs writing, reading, or erasing of information on the storage medium 200.
The operation unit 80 includes, for example, a power switch, a shutter button, and other operation keys. When the operation unit 80 is operated by the user, the operation unit 80 receives a user operation input and outputs the operation input to the CPU 90.

記憶媒体２００は、撮像装置１に対して着脱可能に接続される記憶部であって、例えば、撮像部１０によって生成された（撮影された）画像データや、音信号処理部２３により音信号処理された音信号を記憶する。 The storage medium 200 is a storage unit that is detachably connected to the imaging device 1. For example, image data generated (captured) by the imaging unit 10 or sound signal processing by the sound signal processing unit 23. The recorded sound signal is stored.

ＣＰＵ９０は、撮像装置１全体を制御するが、一例としては、ズームエンコーダ１５から入力されるズームポジション、及び、ＡＦエンコーダ１７から入力されるフォーカスポジションと、操作部８０から入力される操作入力とに基づいて、ズームエンコーダ１５及びＡＦエンコーダ１７の位置を制御する駆動制御信号を生成する。ＣＰＵ９０は、この駆動制御信号に基づいて、レンズ駆動部１６を介してズームエンコーダ１５及びＡＦエンコーダ１７の位置を制御する。 The CPU 90 controls the entire imaging apparatus 1. For example, the CPU 90 includes a zoom position input from the zoom encoder 15, a focus position input from the AF encoder 17, and an operation input input from the operation unit 80. Based on this, a drive control signal for controlling the positions of the zoom encoder 15 and the AF encoder 17 is generated. The CPU 90 controls the positions of the zoom encoder 15 and the AF encoder 17 via the lens driving unit 16 based on this drive control signal.

バス３００は、撮像部１０と、音信号処理部２３と、バッファメモリ部３０と、画像処理部４０と、表示部５０と、記憶部６０と、通信部７０と、操作部８０と、ＣＰＵ９０とに接続され、各部から出力されたデータ等を転送する。 The bus 300 includes an imaging unit 10, a sound signal processing unit 23, a buffer memory unit 30, an image processing unit 40, a display unit 50, a storage unit 60, a communication unit 70, an operation unit 80, and a CPU 90. The data output from each unit is transferred.

次に、図２を参照して、図１の構成のうち、信号処理装置１００及び音信号処理部２３の詳細な構成について説明する。
図２は、本実施形態における信号処理装置１００を示す概略ブロック図である。
この図において、信号処理装置１００は、音信号処理部２３とノイズ周波数特徴ベクトル記憶部６１とを備えている。なお、ノイズ周波数特徴ベクトル記憶部６１は、例えば、記憶部６０に備えられ、記憶部６０の一部である。また、信号処理装置１００は、ノイズ推定装置１５０を備えており、信号処理装置１００の構成のうち、例えば、周波数特徴ベクトル生成部２３１、内積算出部２３２、ノイズ推定部２３３、及びノイズ周波数特徴ベクトル記憶部６１が、このノイズ推定装置１５０に対応する。 Next, detailed configurations of the signal processing device 100 and the sound signal processing unit 23 in the configuration of FIG. 1 will be described with reference to FIG.
FIG. 2 is a schematic block diagram showing the signal processing apparatus 100 in the present embodiment.
In this figure, the signal processing apparatus 100 includes a sound signal processing unit 23 and a noise frequency feature vector storage unit 61. Note that the noise frequency feature vector storage unit 61 is provided in the storage unit 60 and is a part of the storage unit 60, for example. Further, the signal processing device 100 includes a noise estimation device 150. Among the configurations of the signal processing device 100, for example, a frequency feature vector generation unit 231, an inner product calculation unit 232, a noise estimation unit 233, and a noise frequency feature. The vector storage unit 61 corresponds to the noise estimation device 150.

音信号処理部２３は、Ａ／Ｄ変換部２２によりデジタル信号に変換された音信号に対して、例えば、ノイズを低減するなどの音信号処理を実行し、この音信号処理した音信号を記憶媒体２００に記憶させる。なお、本実施形態において、「ノイズ」とは、動作部による動作によって発生し、音信号に含まれる（すなわち、音信号に重畳している）ノイズ信号のことである。すなわち、「ノイズ」とは、非定常ノイズである。 The sound signal processing unit 23 performs sound signal processing such as noise reduction on the sound signal converted into a digital signal by the A / D conversion unit 22 and stores the sound signal that has been processed. Store in the medium 200. In the present embodiment, “noise” refers to a noise signal that is generated by the operation of the operation unit and is included in the sound signal (that is, superimposed on the sound signal). That is, “noise” is non-stationary noise.

ここでいう動作部とは、機構部とも呼び、一例としては、上述したズームレンズ１４、ＶＲレンズ１３、ＡＦレンズ１２、又は操作部８０のことである。この動作部とは、撮像装置１が備えている構成のうち、動作することにより、又は、動作されることにより、音が生じる（又は、音が生じる可能性がある）構成である。
また、この動作部とは、撮像装置１が備えている構成のうち、動作することにより生じた音、又は、動作されることにより生じた音が、マイク２１により収音される（又は、収音される可能性のある）構成である。 The operation unit referred to here is also called a mechanism unit, and as an example, is the above-described zoom lens 14, VR lens 13, AF lens 12, or operation unit 80. The operation unit is a configuration in which sound is generated (or that sound may be generated) when operated or operated among the configurations included in the imaging apparatus 1.
In addition, the operation unit refers to a sound that is generated by operation or a sound that is generated by operation in the configuration of the imaging apparatus 1 (or is collected). It may be sounded).

また、音信号処理部２３は、ＣＰＵ９０から供給される制御信号に基づいて、ノイズを低減するなどの音信号処理を実行する。
音信号処理部２３は、周波数特徴ベクトル生成部２３１、内積算出部２３２、ノイズ推定部２３３、及びノイズ低減処理部２４を備えている。 The sound signal processing unit 23 performs sound signal processing such as noise reduction based on the control signal supplied from the CPU 90.
The sound signal processing unit 23 includes a frequency feature vector generation unit 231, an inner product calculation unit 232, a noise estimation unit 233, and a noise reduction processing unit 24.

周波数特徴ベクトル生成部２３１は、Ａ／Ｄ変換部２２によりデジタル信号に変換された音信号を、フレーム単位でフーリエ変換（例えば、ＦＦＴ（Fast Fourier Transform）変換）して周波数スペクトルに変換する。なお、フレームとは、音信号を分割した区間のことである。ここでは、例えば、予め定められた期間を２分の１期間ずつずらした区間をフレームをとした場合について説明する。 The frequency feature vector generation unit 231 converts the sound signal converted into the digital signal by the A / D conversion unit 22 into a frequency spectrum by performing Fourier transform (for example, FFT (Fast Fourier Transform) transform) on a frame basis. A frame is a section obtained by dividing a sound signal. Here, for example, a case will be described in which a frame is a section in which a predetermined period is shifted by a half period.

そして、周波数特徴ベクトル生成部２３１は、変換した音信号の周波数スペクトルに基づいて入力周波数特徴ベクトルを生成する。つまり、周波数特徴ベクトル生成部２３１は、音信号を周波数スペクトルに変換し、変換した音信号の周波数スペクトルに基づいて、入力周波数特徴ベクトルを生成する。 Then, the frequency feature vector generation unit 231 generates an input frequency feature vector based on the converted frequency spectrum of the sound signal. That is, the frequency feature vector generation unit 231 converts the sound signal into a frequency spectrum, and generates an input frequency feature vector based on the converted frequency spectrum of the sound signal.

なお、入力周波数特徴ベクトルは、音信号の周波数スペクトルにおける各周波数成分（周波数ビン）に対応する強度（例えば、絶対値）を要素として生成されたベクトルである。この入力周波数特徴ベクトルについての詳細は、図３を参照して後述する。 Note that the input frequency feature vector is a vector generated with elements (for example, absolute values) corresponding to frequency components (frequency bins) in the frequency spectrum of the sound signal. Details of the input frequency feature vector will be described later with reference to FIG.

さらに、周波数特徴ベクトル生成部２３１は、生成した入力周波数特徴ベクトルを内積算出部２３２及びノイズ低減処理部２４に供給する。 Further, the frequency feature vector generation unit 231 supplies the generated input frequency feature vector to the inner product calculation unit 232 and the noise reduction processing unit 24.

内積算出部２３２（算出部）は、周波数特徴ベクトル生成部２３１から供給された入力周波数特徴ベクトルと、ノイズ周波数特徴ベクトル記憶部６１から読み出したノイズ周波数特徴ベクトルとの内積値に基づいて、ノイズ類似度を算出する。一例として、内積算出部２３２（算出部）は、この入力周波数特徴ベクトルと、ノイズ周波数特徴ベクトルとの内積値を、ノイズ類似度として算出する。このノイズ類似度とは、音信号とノイズとの類似の度合いを示す情報である。すなわち、ノイズ類似度とは、音信号にノイズがどの程度含まれて（混入されて）いるかを示す値である。内積算出部２３２は、算出したノイズ類似度をノイズ推定部２３３に供給する。 The inner product calculation unit 232 (calculation unit) calculates noise based on the inner product value of the input frequency feature vector supplied from the frequency feature vector generation unit 231 and the noise frequency feature vector read from the noise frequency feature vector storage unit 61. Calculate similarity. As an example, the inner product calculation unit 232 (calculation unit) calculates the inner product value of the input frequency feature vector and the noise frequency feature vector as the noise similarity. The noise similarity is information indicating the degree of similarity between a sound signal and noise. That is, the noise similarity is a value indicating how much noise is included (mixed) in the sound signal. The inner product calculation unit 232 supplies the calculated noise similarity to the noise estimation unit 233.

ここで、ノイズ周波数特徴ベクトルとは、ノイズの周波数スペクトルに基づいて生成されたノイズの特徴を示すベクトルであり、ノイズ周波数特徴ベクトル記憶部６１に予め記憶されている。また、このノイズ周波数特徴ベクトルは、ノイズの周波数スペクトルにおける各周波数成分（周波数ビン）に対応する強度（例えば、絶対値）を要素として生成される。 Here, the noise frequency feature vector is a vector indicating the feature of noise generated based on the frequency spectrum of noise, and is stored in advance in the noise frequency feature vector storage unit 61. Further, the noise frequency feature vector is generated with an intensity (for example, absolute value) corresponding to each frequency component (frequency bin) in the frequency spectrum of noise as an element.

つまり、ノイズ周波数特徴ベクトルは、ノイズの周波数スペクトルに基づいて生成され、入力周波数特徴ベクトルは、音信号の周波数スペクトルに基づいて生成される。そのため、言いかえると、内積算出部２３２は、周波数スペクトルとノイズの周波数スペクトルとに基づいて、上述のノイズ類似度を算出する。 That is, the noise frequency feature vector is generated based on the frequency spectrum of noise, and the input frequency feature vector is generated based on the frequency spectrum of the sound signal. Therefore, in other words, the inner product calculation unit 232 calculates the above-described noise similarity based on the frequency spectrum and the frequency spectrum of noise.

また、ノイズ周波数特徴ベクトルは、例えば、正規化した単位ベクトルでもよい。つまり、正規化前のノイズ周波数特徴ベクトルをｎ_０とすると、正規化されたノイズ周波数特徴ベクトルｎは、式（１）として示される。 The noise frequency feature vector may be a normalized unit vector, for example. That is, assuming that the noise frequency feature vector before normalization is n ₀ , the normalized noise frequency feature vector n is expressed as Expression (1).

ノイズ推定部２３３は、内積算出部２３２により算出されたノイズ類似度に基づいて、音信号に含まれる推定ノイズを推定する。つまり、本実施形態では、ノイズ推定部２３３は、入力周波数特徴ベクトルと、ノイズ周波数特徴ベクトルｎとの内積値に基づいて、推定ノイズを推定する。ここで推定ノイズ（ベクトル^〜ｎ_ｋ）は、式（２）として示される。なお、ここで、記号“^〜”は推定値を表し、本文中の“^〜”は直後の文字の真上に付けられた記号を表すこととする。 The noise estimation unit 233 estimates the estimated noise included in the sound signal based on the noise similarity calculated by the inner product calculation unit 232. That is, in this embodiment, the noise estimation unit 233 estimates the estimated noise based on the inner product value of the input frequency feature vector and the noise frequency feature vector n. Here estimated noise (vector ^~ n _k) is represented as formula (2). Here, the symbol “ ^˜ ” represents an estimated value, and “ ^˜ ” in the text represents a symbol attached immediately above the next character.

ここで、ｋは、フレーム番号を示し、ベクトルＸ_ｋは、ｋ番目のフレームにおける入力周波数ベクトルを示す。
ノイズ推定部２３３は、式（２）によって算出した推定ノイズ（ベクトル^〜ｎ_ｋ）をノイズ低減処理部２４に供給する。 Here, k indicates a frame number, and vector X _k indicates an input frequency vector in the k-th frame.
The noise estimation unit 233 supplies the formula (2) estimating the noise calculated by (vector ^~ n _k) to the noise reduction processing unit 24.

ノイズ低減処理部２４は、周波数特徴ベクトル生成部２３１によって生成された入力周波数特徴ベクトルと、ノイズ推定部２３３によって推定された推定ノイズ（ベクトル^〜ｎ_ｋ）とに基づいて、音信号に含まれるノイズを低減する処理を実行する。つまり、ノイズ低減処理部２４は、ノイズ推定装置１５０によって推定された推定ノイズに基づいて、音信号に含まれるノイズを低減する。そして、ノイズ低減処理部２４は、このノイズを低減する処理を実行した音信号を記憶媒体２００に記憶させる。
また、ノイズ低減処理部２４は、ノイズ減算部２３４及び逆変換部２３５を備えている。 Noise reduction processing unit 24, based on the input frequency feature vectors generated by the frequency feature vector generating unit 231, the estimated noise estimated by the noise estimating unit 233 (vector ^{~ n} _k) and the noise contained in the sound signal Execute processing to reduce. That is, the noise reduction processing unit 24 reduces noise included in the sound signal based on the estimated noise estimated by the noise estimation device 150. And the noise reduction process part 24 memorize | stores the sound signal which performed the process which reduces this noise in the storage medium 200. FIG.
In addition, the noise reduction processing unit 24 includes a noise subtraction unit 234 and an inverse conversion unit 235.

ノイズ減算部２３４は、周波数特徴ベクトル生成部２３１によって生成された入力周波数特徴ベクトルＸ_ｋと、ノイズ推定部２３３によって推定された推定ノイズ（ベクトル^〜ｎ_ｋ）とに基づいて、音信号に含まれるノイズを減算して、ノイズを減算した目的音の推定周波数特徴ベクトル^〜Ｓ_ｋを算出する。ノイズ減算部２３４は、式（３）に示される関係式によって、目的音の推定周波数特徴ベクトル^〜Ｓ_ｋを算出する。なお、ここで目的音とは、使用者が、撮像装置１のマイク２１によって収音しようとしている目的の音（録音対象の音）である。 Noise subtraction unit 234, based on the input frequency characteristic vector X _k generated by the frequency feature vector generating unit 231, the estimated noise estimated by the noise estimating unit 233 (vector ^{~ n} _k) and is included in the sound signal noise is subtracted to calculate the estimated frequency characteristic vector ^~ S _k of the target sound obtained by subtracting the noise. Noise subtraction unit 234, the relational expression shown in equation (3) to calculate the estimated frequency characteristic vector ^~ S _k of the target sound. Here, the target sound is a target sound (sound to be recorded) that the user intends to pick up with the microphone 21 of the imaging device 1.

ノイズ減算部２３４は、算出した目的音の推定周波数特徴ベクトル^〜Ｓ_ｋを、逆変換部２３５に供給する。 The noise subtracting unit 234 supplies the calculated estimated frequency feature vector ^~ S _k of the target sound to the inverse transform unit 235.

逆変換部２３５は、ノイズ減算部２３４によって算出された目的音の推定周波数特徴ベクトル^〜Ｓ_ｋを周波数スペクトルに戻す。そして、逆変換部２３５は、ノイズが混入した音信号の位相情報に基づいて時間波形に逆フーリエ変換（例えば、逆ＦＦＴ変換）し、ノイズを低減した目的音の音信号を合成する。さらに、逆変換部２３５は、通信部７０を介して、生成した音信号を記憶媒体２００に記憶させる。
なお、逆変換部２３５は、推定周波数特徴ベクトル^〜Ｓ_ｋを変換した周波数スペクトルに負の値が含まれる場合には、その負の値を“０”に置き換えて、逆フーリエ変換する。 Inverse transform unit 235 returns the estimated frequency characteristic vector ^~ S _k of target sound calculated by the noise subtraction unit 234 into a frequency spectrum. Then, the inverse transform unit 235 performs inverse Fourier transform (for example, inverse FFT transform) on the time waveform based on the phase information of the sound signal mixed with noise, and synthesizes the sound signal of the target sound with reduced noise. Further, the inverse conversion unit 235 stores the generated sound signal in the storage medium 200 via the communication unit 70.
In addition, when a negative value is included in the frequency spectrum _obtained by converting the estimated frequency feature vector ^to Sk, the inverse transform unit 235 performs inverse Fourier transform by replacing the negative value with “0”.

次に、図３を参照して、ノイズ推定装置１５０が音信号に含まれるノイズを推定する概念を詳細に説明する。
図３は、本実施形態における入力周波数特徴ベクトルとノイズ周波数特徴ベクトルの一例を示す概念図である。ここでは、本実施形態における概念を説明するために、周波数成分数（周波数ビン数）が２つの場合について説明する。 Next, with reference to FIG. 3, the concept that the noise estimation apparatus 150 estimates the noise included in the sound signal will be described in detail.
FIG. 3 is a conceptual diagram showing an example of an input frequency feature vector and a noise frequency feature vector in the present embodiment. Here, in order to explain the concept in the present embodiment, a case where the number of frequency components (number of frequency bins) is two will be described.

図３（ａ）は、ノイズの周波数スペクトルと音信号の周波数スペクトルの一例を示している。この図において、各グラフは、横軸が周波数成分（周波数ビンＢ１及びＢ２）を示し、縦軸が強度（例えば、ＰＳＤ（Power Spectrum Density）など）を示している。 FIG. 3A shows an example of a frequency spectrum of noise and a frequency spectrum of a sound signal. In this graph, in each graph, the horizontal axis represents frequency components (frequency bins B1 and B2), and the vertical axis represents intensity (for example, PSD (Power Spectrum Density)).

また、この図において、周波数スペクトルＷ１は、ノイズの周波数スペクトルを示している。また、周波数スペクトルＷ２は、入力音信号（音信号）ｘ１の周波数スペクトルを示し、周波数スペクトルＷ３は、入力音信号（音信号）ｘ２の周波数スペクトルを示している。 Moreover, in this figure, the frequency spectrum W1 has shown the frequency spectrum of noise. The frequency spectrum W2 indicates the frequency spectrum of the input sound signal (sound signal) x1, and the frequency spectrum W3 indicates the frequency spectrum of the input sound signal (sound signal) x2.

図３（ｂ）は、図３（ａ）に示したノイズの周波数スペクトルと音信号の周波数スペクトルとを周波数特徴ベクトルに変換したベクトル空間を示している。この図において、横軸は、周波数ビンＢ１の強度を示し、縦軸は、周波数ビンＢ２の強度を示している。 FIG. 3B shows a vector space obtained by converting the frequency spectrum of noise and the frequency spectrum of the sound signal shown in FIG. In this figure, the horizontal axis indicates the intensity of the frequency bin B1, and the vertical axis indicates the intensity of the frequency bin B2.

また、この図において、ベクトルｎは、ノイズの周波数スペクトルＷ１に対応するノイズ周波数特徴ベクトル（正規化したノイズ周波数特徴ベクトル）を示している。なお、この正規化したノイズ周波数特徴ベクトルｎは、上述した式（１）によって算出され、ノイズ周波数特徴ベクトル記憶部６１に予め記憶されている。 In this figure, a vector n represents a noise frequency feature vector (normalized noise frequency feature vector) corresponding to the noise frequency spectrum W1. The normalized noise frequency feature vector n is calculated by the above-described equation (1) and stored in the noise frequency feature vector storage unit 61 in advance.

また、この図において、ベクトルＸ１は、音信号ｘ１の周波数スペクトルＷ２に対応する入力周波数特徴ベクトルを示し、ベクトルＸ２は、音信号ｘ２の周波数スペクトルＷ３に対応する入力周波数特徴ベクトルを示している。
なお、音信号ｘ１の周波数スペクトルＷ２、音信号ｘ２の周波数スペクトルＷ３、入力周波数特徴ベクトルＸ１、及び入力周波数特徴ベクトルＸ２は、周波数特徴ベクトル生成部２３１によって生成される。 In this figure, a vector X1 indicates an input frequency feature vector corresponding to the frequency spectrum W2 of the sound signal x1, and a vector X2 indicates an input frequency feature vector corresponding to the frequency spectrum W3 of the sound signal x2.
The frequency spectrum W2 of the sound signal x1, the frequency spectrum W3 of the sound signal x2, the input frequency feature vector X1, and the input frequency feature vector X2 are generated by the frequency feature vector generation unit 231.

また、この図において、内積値Ｉ_１は、入力周波数特徴ベクトルＸ１とノイズ周波数特徴ベクトルｎとの内積値を示し、内積値Ｉ_２は、入力周波数特徴ベクトルＸ２とノイズ周波数特徴ベクトルｎとの内積値を示している。 In this figure, the inner product value I ₁ indicates the inner product value of the input frequency feature vector X1 and the noise frequency feature vector n, and the inner product value I ₂ indicates the inner product of the input frequency feature vector X2 and the noise frequency feature vector n. The value is shown.

本実施形態では、ノイズ推定部２３３が、内積算出部２３２により算出された内積値に基づいて、音信号に含まれるノイズ（推定ノイズ）を推定する。
ここで、音信号ｘ２は、内積値Ｉ_２が内積値Ｉ_１に比べて大きい。このことは、入力周波数特徴ベクトルＸ２の方向とノイズ周波数特徴ベクトルｎの方向が近いことを示している。そのため、音信号ｘ２の大部分がノイズであると推定される。
これに対して、音信号ｘ１は、内積値Ｉ_１が内積値Ｉ_２に比べて小さい。このことは、入力周波数特徴ベクトルＸ２の方向とノイズ周波数特徴ベクトルｎの方向が遠いことを示している。そのため、音信号ｘ１のノイズ量が、音信号ｘ２に比べて少ないと推定される。
つまり、ノイズ推定部２３３は、音信号に含まれる（重畳している）ノイズを適切に推定することができる。 In the present embodiment, the noise estimation unit 233 estimates noise (estimated noise) included in the sound signal based on the inner product value calculated by the inner product calculation unit 232.
Here, the sound signal x2 is the inner product value _{I 2} is larger than the inner product value _{I 1.} This indicates that the direction of the input frequency feature vector X2 and the direction of the noise frequency feature vector n are close. Therefore, it is estimated that most of the sound signal x2 is noise.
In contrast, the sound signal x1 is the inner product value _{I 1} is smaller than the inner product value _{I 2.} This indicates that the direction of the input frequency feature vector X2 is far from the direction of the noise frequency feature vector n. Therefore, it is estimated that the noise amount of the sound signal x1 is smaller than that of the sound signal x2.
That is, the noise estimation unit 233 can appropriately estimate the noise included (superimposed) in the sound signal.

本実施形態におけるノイズ推定装置１５０は、上述のノイズの推定方法に基づいて、ノイズ推定部２３３が推定ノイズを推定し、推定した推定ノイズを上述したノイズ低減処理部２４に供給する。なお、図３では、本実施形態における概念を説明するために、周波数ビン数が２つの場合を図示したが、内積算出部２３２は、実際にはより多数の要素を持ったベクトルの内積を算出することになる。例えば、１フレームのサンプル数が４０９６サンプルである場合、内積算出部２３２は、４０９６個の要素を持ったベクトルの内積を算出することになる。 In the noise estimation apparatus 150 according to the present embodiment, the noise estimation unit 233 estimates the estimated noise based on the above-described noise estimation method, and supplies the estimated noise to the noise reduction processing unit 24 described above. In FIG. 3, the case where the number of frequency bins is two is illustrated in order to explain the concept in the present embodiment. However, the inner product calculation unit 232 actually calculates the inner product of vectors having more elements. Will be calculated. For example, when the number of samples in one frame is 4096 samples, the inner product calculation unit 232 calculates the inner product of vectors having 4096 elements.

次に、本実施形態における撮像装置１及び信号処理装置１００の動作について説明する。 Next, operations of the imaging device 1 and the signal processing device 100 in the present embodiment will be described.

まず、撮像装置１の撮像動作について説明する。
撮像装置１において、ＣＰＵ９０は、例えば、操作部８０を介して撮像指示を受け付けた際に、撮像部１０を介して得られる画像データを、記憶媒体２００に記憶させる。この際に、ＣＰＵ９０は、レンズ駆動部１６及び手振れ防止部１８を制御して、ズームレンズ１４、ＶＲレンズ１３、又はＡＦレンズ１２を駆動させる。 First, the imaging operation of the imaging device 1 will be described.
In the imaging apparatus 1, for example, when receiving an imaging instruction via the operation unit 80, the CPU 90 stores image data obtained via the imaging unit 10 in the storage medium 200. At this time, the CPU 90 controls the lens driving unit 16 and the camera shake preventing unit 18 to drive the zoom lens 14, the VR lens 13, or the AF lens 12.

なお、使用者が動画を撮像する場合には、マイク２１により音信号が収音され、収音された音信号が、Ａ／Ｄ変換部２２及び音信号処理部２３を介して、音信号が記憶媒体２００に記憶される。この場合に、ズームレンズ１４、ＶＲレンズ１３、又はＡＦレンズ１２の駆動や操作部８０の操作により、マイク２１により収音された音信号にノイズが重畳されることがある。本実施形態における撮像装置１では、音信号処理部２３を含む信号処理装置１００を備えており、この信号処理装置１００によって、音信号に重畳されたノイズを低減するノイズ低減処理を実行する。 When a user captures a moving image, a sound signal is collected by the microphone 21, and the collected sound signal is converted into a sound signal via the A / D conversion unit 22 and the sound signal processing unit 23. Stored in the storage medium 200. In this case, noise may be superimposed on the sound signal collected by the microphone 21 by driving the zoom lens 14, the VR lens 13, or the AF lens 12 or by operating the operation unit 80. The imaging device 1 according to the present embodiment includes a signal processing device 100 including a sound signal processing unit 23, and the signal processing device 100 executes a noise reduction process for reducing noise superimposed on the sound signal.

次に、信号処理装置１００におけるノイズ低減処理に関する動作を説明する。
図４は、本実施形態におけるノイズ低減処理の動作を示すフローチャートである。 Next, operations related to noise reduction processing in the signal processing apparatus 100 will be described.
FIG. 4 is a flowchart showing the operation of noise reduction processing in the present embodiment.

この図において、まず、信号処理装置１００は、入力された音信号をフレーム単位でフーリエ変換する（ステップＳ１０１）。つまり、音信号処理部２３の周波数特徴ベクトル生成部２３１は、Ａ／Ｄ変換部２２によりデジタル信号に変換された音信号をフレーム単位で音信号の周波数スペクトルに変換する。なお、周波数特徴ベクトル生成部２３１は、フーリエ変換する際に、例えば、ハミング窓を窓関数として使用する。
次に、周波数特徴ベクトル生成部２３１は、フレーム単位に変換した周波数スペクトルに基づいて入力周波数特徴ベクトル（Ｘ_ｋ）を生成する(ステップＳ１０２)。 In this figure, first, the signal processing apparatus 100 performs Fourier transform on the input sound signal in units of frames (step S101). That is, the frequency feature vector generation unit 231 of the sound signal processing unit 23 converts the sound signal converted into the digital signal by the A / D conversion unit 22 into the frequency spectrum of the sound signal in units of frames. Note that the frequency feature vector generation unit 231 uses, for example, a Hamming window as a window function when performing Fourier transform.
Next, the frequency feature vector generation unit 231 generates an input frequency feature vector (X _k ) based on the frequency spectrum converted into frames (step S102).

次に、信号処理装置１００は、入力周波数特徴ベクトルとノイズ周波数特徴ベクトルとの内積値を算出する（ステップＳ１０３）。つまり、内積算出部２３２は、周波数特徴ベクトル生成部２３１から供給された入力周波数特徴ベクトル（Ｘ_ｋ）と、ノイズ周波数特徴ベクトル記憶部６１から読み出したノイズ周波数特徴ベクトル（ｎ）との内積値をノイズ類似度として算出する。 Next, the signal processing apparatus 100 calculates the inner product value of the input frequency feature vector and the noise frequency feature vector (step S103). That is, the inner product calculation unit 232 calculates the inner product value of the input frequency feature vector (X _k ) supplied from the frequency feature vector generation unit 231 and the noise frequency feature vector (n) read from the noise frequency feature vector storage unit 61. Is calculated as the noise similarity.

次に、信号処理装置１００は、推定ノイズを推定する（ステップＳ１０４）。つまり、ノイズ推定部２３３は、内積算出部２３２により算出されたノイズ類似度（内積値＜Ｘ_ｋ，ｎ＞）に基づいて、音信号に含まれる推定ノイズ（ベクトル^〜ｎ_ｋ）を推定する。ノイズ推定部２３３は、例えば、上述した式（２）によって、この推定ノイズ（ベクトル^〜ｎ_ｋ）を算出する。 Next, the signal processing apparatus 100 estimates estimated noise (step S104). That is, the noise estimation unit 233 estimates the estimated noise (vector ^to n _k ) included in the sound signal based on the noise similarity (inner product value <X _k , n>) calculated by the inner product calculation unit 232. . The noise estimation unit 233, for example, by equation (2) described above, it calculates the estimated noise (vector ^~ n _k).

次に、信号処理装置１００は、推定ノイズによってノイズを減算する（ステップＳ１０５）。つまり、ノイズ低減処理部２４のノイズ減算部２３４は、入力周波数特徴ベクトル（Ｘ_ｋ）と、推定ノイズ（ベクトル^〜ｎ_ｋ）とに基づいて、音信号に含まれるノイズを減算して、ノイズを減算した目的音の推定周波数特徴ベクトル（^〜Ｓ_ｋ）を算出する。ノイズ減算部２３４は、例えば、上述した式（３）によって、この目的音の推定周波数特徴ベクトル（^〜Ｓ_ｋ）を算出する。 Next, the signal processing apparatus 100 subtracts the noise by the estimated noise (step S105). That is, the noise subtraction unit 234 of the noise reduction processing unit 24 includes an input frequency feature vectors (X _k), based on the estimated noise (vector ^{~ n} _k), by subtracting the noise contained in the sound signal, the noise calculating estimated frequency feature vectors of the subtraction purpose sound ^{(~ S} _k). Noise subtraction unit 234, for example, by the above-described formula (3), to calculate an estimated frequency feature vector of the target sound ^(~ _{S k).}

次に、信号処理装置１００は、逆フーリエ変換して、音信号を合成する（ステップＳ１０６）。つまり、逆変換部２３５は、まず、ノイズ減算部２３４によって算出された目的音の推定周波数特徴ベクトル（^〜Ｓ_ｋ）を周波数スペクトルに戻す。そして、逆変換部２３５は、ノイズが混入した音信号の位相情報に基づいて時間波形に変換し、ノイズを低減した目的音の音信号を合成する。さらに、逆変換部２３５は、通信部７０を介して、生成した音信号を記憶媒体２００に記憶させて、１フレームにおけるノイズ低減処理を終了させる。 Next, the signal processing apparatus 100 synthesizes a sound signal by performing inverse Fourier transform (step S106). In other words, the inverse transform unit 235, first, back the estimated frequency characteristic vector of the target sound calculated by the noise subtraction unit 234 ^{(~ S} _k) in the frequency spectrum. Then, the inverse conversion unit 235 converts it into a time waveform based on the phase information of the sound signal mixed with noise, and synthesizes the sound signal of the target sound with reduced noise. Further, the inverse transform unit 235 stores the generated sound signal in the storage medium 200 via the communication unit 70 and ends the noise reduction process in one frame.

なお、ステップＳ１０１〜ステップＳ１０６の処理は、ＣＰＵ９０から制御信号によるノイズ低減処理の終了指示がされるまで繰り返し実行される。 Note that the processes in steps S101 to S106 are repeatedly executed until the CPU 90 gives an instruction to end the noise reduction process using a control signal.

以上のように、ノイズ推定装置１５０は、内積算出部２３２が、入力された音信号の周波数スペクトルと、ノイズの周波数スペクトルとに基づいて、音信号とノイズとの類似の度合いを示すノイズ類似度（内積値）を算出する。そして、ノイズ推定部２３３は、内積算出部２３２により算出されたノイズ類似度に基づいて、音信号に含まれる推定ノイズを推定する。 As described above, in the noise estimation device 150, the inner product calculation unit 232 has the noise similarity indicating the degree of similarity between the sound signal and the noise based on the frequency spectrum of the input sound signal and the frequency spectrum of the noise. The degree (inner product value) is calculated. Then, the noise estimation unit 233 estimates the estimated noise included in the sound signal based on the noise similarity calculated by the inner product calculation unit 232.

これにより、ノイズ推定装置１５０は、音信号に含まれる推定ノイズを適切に推定することができる。また、信号処理装置１００は、ノイズ低減処理部２４がノイズ推定装置１５０によって推定された推定ノイズに基づいて、音信号に含まれるノイズを低減する。そのため、ノイズ推定装置１５０及び信号処理装置１００は、例えば、大きさが非定常なノイズを低減するような場合であっても、ノイズの過大減算あるいは過小減算により、音の劣化もしくは雑音の残存が発生することを低減できる。 Thereby, the noise estimation apparatus 150 can estimate the estimated noise contained in a sound signal appropriately. In addition, the signal processing device 100 reduces noise included in the sound signal based on the estimated noise estimated by the noise estimation device 150 by the noise reduction processing unit 24. For this reason, the noise estimation device 150 and the signal processing device 100, for example, may reduce noise or remain due to excessive noise subtraction or excessive subtraction even when noise whose magnitude is unsteady is reduced. Occurrence can be reduced.

また、ノイズ推定装置１５０及び信号処理装置１００は、例えば、間欠的に発生するノイズを低減するような場合であっても、ノイズを過大に減算してしまうことを防止でき、音の劣化が発生することを低減できる。したがって、ノイズ推定装置１５０及び信号処理装置１００は、音信号に重畳しているノイズを適切に低減することができる。
また、ノイズ推定装置１５０及び信号処理装置１００は、非定常ノイズの推定ノイズを得るために、１チャンネルの音信号に対して、複数のマイクを必要としない。そのため、少数のマイクによって、非定常ノイズを適切に低減することができる。 In addition, the noise estimation device 150 and the signal processing device 100 can prevent excessive noise subtraction even when, for example, intermittent noise is reduced, and sound degradation occurs. Can be reduced. Therefore, the noise estimation device 150 and the signal processing device 100 can appropriately reduce noise superimposed on the sound signal.
In addition, the noise estimation device 150 and the signal processing device 100 do not need a plurality of microphones for one-channel sound signal in order to obtain estimated noise of non-stationary noise. Therefore, non-stationary noise can be appropriately reduced with a small number of microphones.

また、本実施形態において、内積算出部２３２は、音信号の周波数スペクトルに基づいて生成された入力周波数特徴ベクトルと、ノイズの周波数スペクトルに基づいて生成されたノイズの特徴を示すノイズ周波数特徴ベクトルとの内積値に基づいて、ノイズ類似度を算出する。ここで、ノイズ周波数特徴ベクトルは、ノイズの周波数スペクトルにおける各周波数成分（各周波数ビン）に対応する強度を要素として生成されたベクトルである。また、入力周波数特徴ベクトルは、音信号の周波数スペクトルにおける各周波数成分に対応する強度を要素として生成されたベクトルである。 Further, in the present embodiment, the inner product calculation unit 232 includes an input frequency feature vector generated based on the frequency spectrum of the sound signal and a noise frequency feature vector indicating noise characteristics generated based on the frequency spectrum of noise. The noise similarity is calculated based on the inner product value. Here, the noise frequency feature vector is a vector generated by using the intensity corresponding to each frequency component (each frequency bin) in the frequency spectrum of noise as an element. Further, the input frequency feature vector is a vector generated with the intensity corresponding to each frequency component in the frequency spectrum of the sound signal as an element.

入力周波数特徴ベクトルとノイズ周波数特徴ベクトルとの内積値は、図３に示されるように、入力周波数特徴ベクトルの方向とノイズ周波数特徴ベクトルの方向との近さの度合いを示している。そのため、ノイズ推定装置１５０は、入力周波数特徴ベクトルとノイズ周波数特徴ベクトルとの内積値に基づいて、適切に推定ノイズを推定することができる。よって、ノイズ推定装置１５０及び信号処理装置１００は、音信号に重畳しているノイズを適切に低減することができる。 The inner product value of the input frequency feature vector and the noise frequency feature vector indicates the degree of closeness between the direction of the input frequency feature vector and the direction of the noise frequency feature vector, as shown in FIG. Therefore, the noise estimation device 150 can appropriately estimate the estimated noise based on the inner product value of the input frequency feature vector and the noise frequency feature vector. Therefore, the noise estimation device 150 and the signal processing device 100 can appropriately reduce noise superimposed on the sound signal.

また、本実施形態において、ノイズ周波数特徴ベクトルは、単位ベクトルである。
これにより、内積算出部２３２によって算出される内積値が、極端に大きな値になることを防止し、この内積値を制限することができる。 In the present embodiment, the noise frequency feature vector is a unit vector.
As a result, the inner product value calculated by the inner product calculating unit 232 can be prevented from becoming an extremely large value, and the inner product value can be limited.

なお、本実施形態では、内積算出部２３２によって内積する際に、ノイズ周波数特徴ベクトルとして単位ベクトル（ｎ）を用いる形態を説明したが、正規化していないノイズ周波数特徴ベクトル（ｎ_０）を用いる形態でもよい。また、この場合、ノイズ推定部２３３は、式（４）に示すように、ノイズ類似度（例えば、内積値）に応じて定められる係数αを、ノイズ周波数特徴ベクトルに乗算して、推定ノイズを推定してもよい。ここで係数αは、例えば、ＡＦレンズ１２のノイズであれば、“０．５”、ズームレンズ１４のノイズであれば、“０．１”というように、ノイズを発生する動作部（機構部）に応じて変更してもよい。 In the present embodiment, the mode in which the unit vector (n) is used as the noise frequency feature vector when the inner product is calculated by the inner product calculation unit 232 has been described, but the noise frequency feature vector (n ₀ ) that is not normalized is used. Form may be sufficient. In this case, as shown in the equation (4), the noise estimation unit 233 multiplies the noise frequency feature vector by a coefficient α determined according to the noise similarity (for example, inner product value), and calculates the estimated noise. It may be estimated. Here, the coefficient α is, for example, “0.5” for the noise of the AF lens 12 and “0.1” for the noise of the zoom lens 14. ) May be changed according to

この場合、ノイズ減算部２３４は、式（５）に示される関係式によって、目的音の推定周波数特徴ベクトル^〜Ｓ_ｋを算出する。 In this case, the noise subtraction unit 234, the relational expression shown in equation (5), to calculate an estimated frequency feature vectors ^~ S _k of the target sound.

これにより、ノイズ周波数特徴ベクトルを正規化する必要がなく、ノイズ周波数特徴ベクトルの生成処理を簡略化することができる。また、必要に応じて、係数αによって、ノイズの削減量を補正できるため、ノイズ推定装置１５０及び信号処理装置１００は、音信号に重畳しているノイズを適切に低減することができる。 Thereby, it is not necessary to normalize the noise frequency feature vector, and the generation process of the noise frequency feature vector can be simplified. Further, since the noise reduction amount can be corrected by the coefficient α as necessary, the noise estimation device 150 and the signal processing device 100 can appropriately reduce noise superimposed on the sound signal.

次に、本実施形態におけるノイズ周波数特徴ベクトルの生成方法について説明する。
ノイズ周波数特徴ベクトルは、撮像装置１の製造又は出荷検査の際に、例えば、校正装置によって予め生成され、ノイズ周波数特徴ベクトル記憶部６１に記憶される。
校正装置は、静かな環境（フロアノイズの低い環境）において、ＡＦレンズ１２などの動作部（機構部）を動作させて、撮像装置１のマイク２１によって収音されたノイズの音信号を取得する。校正装置は、このノイズの音信号をフレーム単位でフーリエ変換して、ノイズの周波数スペクトルを生成する。 Next, a method for generating a noise frequency feature vector in the present embodiment will be described.
The noise frequency feature vector is generated in advance by a calibration device, for example, and stored in the noise frequency feature vector storage unit 61 at the time of manufacturing or shipping inspection of the imaging device 1.
The calibration device operates an operation unit (mechanism unit) such as the AF lens 12 in a quiet environment (an environment with low floor noise), and acquires a sound signal of noise collected by the microphone 21 of the imaging device 1. . The calibration device performs a Fourier transform on the noise signal in units of frames to generate a noise frequency spectrum.

そして、校正装置は、ノイズの周波数スペクトルにおける各周波数成分（各周波数ビン）に対応する強度を要素として、ノイズ周波数特徴ベクトルを生成する。なお、ノイズ周波数特徴ベクトルは、上述したように、正規化したベクトル（ｎ）でもよいし、正規化していないベクトル（ｎ_０）でもよい。校正装置は、生成したノイズ周波数特徴ベクトルをノイズ周波数特徴ベクトル記憶部６１に記憶させる。
すなわち、ノイズ周波数特徴ベクトルは、ノイズを発生する動作部（機構部）を動作させた際に得られる音信号に基づいて予め生成される。 Then, the calibration device generates a noise frequency feature vector with the intensity corresponding to each frequency component (each frequency bin) in the frequency spectrum of noise as an element. The noise frequency feature vector may be a normalized vector (n) or a non-normalized vector (n ₀ ) as described above. The calibration apparatus stores the generated noise frequency feature vector in the noise frequency feature vector storage unit 61.
That is, the noise frequency feature vector is generated in advance based on a sound signal obtained when an operation unit (mechanism unit) that generates noise is operated.

これにより、本実施形態におけるノイズ推定装置１５０は、適切なノイズ周波数特徴ベクトルを得ることができるため、音信号に含まれるノイズを適切に推定することができる。したがって、ノイズ推定装置１５０及び信号処理装置１００は、音信号に重畳しているノイズを適切に低減することができる。 Thereby, since the noise estimation apparatus 150 in this embodiment can obtain an appropriate noise frequency feature vector, it can estimate the noise contained in a sound signal appropriately. Therefore, the noise estimation device 150 and the signal processing device 100 can appropriately reduce noise superimposed on the sound signal.

なお、校正装置は、ノイズの音信号の所定の区間に対し、所定の時間で区切られた複数のフレームにおける周波数特徴ベクトルを算出し、ノイズ周波数特徴ベクトルの各周波数成分として、それらの最小値、平均値、中間値、最大値のいずれかを適用してもよい。
例えば、ノイズ低減を優先したい場合には、複数のフレームにおける周波数特徴ベクトルにおける最大値を適用する。また、例えば、目的音の劣化を抑えたい場合には、複数のフレームにおける周波数特徴ベクトルにおける最小値を適用する。また、例えば、ノイズ低減と目的音の劣化とをバランスよく、適切にノイズ低減したい場合には、複数のフレームにおける周波数特徴ベクトルにおける平均値又は中間値を適用する。 The calibration device calculates a frequency feature vector in a plurality of frames divided by a predetermined time for a predetermined section of the noise signal of noise, and each frequency component of the noise frequency feature vector has a minimum value thereof, Any one of an average value, an intermediate value, and a maximum value may be applied.
For example, when priority is given to noise reduction, the maximum value in the frequency feature vector in a plurality of frames is applied. For example, when it is desired to suppress the deterioration of the target sound, the minimum value in the frequency feature vector in a plurality of frames is applied. Further, for example, when it is desired to appropriately reduce noise with a good balance between noise reduction and target sound deterioration, an average value or an intermediate value in frequency feature vectors in a plurality of frames is applied.

次に、本実施形態におけるノイズ周波数特徴ベクトルの生成方法における別の一例について説明する。
図５は、本実施形態におけるノイズが重畳した音信号の例を示す説明図である。
この図において、横軸は、時間ｔを示す。図５（Ａ）は、動作部の動作状態を示すグラフであり、Ｈ（ハイ）状態である場合に、動作部が動作している状態を示し、Ｌ（ロウ）状態である場合に、動作部が停止している状態を示す。図５（Ａ）において、波形Ｗ４は、時刻Ｔ１において、動作部が動作を開始したことを示している。 Next, another example of the noise frequency feature vector generation method in this embodiment will be described.
FIG. 5 is an explanatory diagram illustrating an example of a sound signal on which noise is superimposed in the present embodiment.
In this figure, the horizontal axis indicates time t. FIG. 5A is a graph showing the operation state of the operation unit. When the operation unit is in the H (high) state, the operation unit is operating. When the operation unit is in the L (low) state, the operation is performed. The state where the part stops is shown. In FIG. 5A, a waveform W4 indicates that the operation unit has started operation at time T1.

次に、図５（Ｂ）は、音信号を分割するフレームの一例を示している。図５（Ｂ）において、予め定められた期間を２分の１期間ずつずらした区間を１つのフレームをとして、フレームＦ１からフレームＦ７に分割される例を示している。 Next, FIG. 5B shows an example of a frame into which the sound signal is divided. FIG. 5B shows an example in which a frame is divided from the frame F1 to the frame F7, with a section obtained by shifting a predetermined period by a half period as one frame.

図５（Ｃ）は、マイク２１により収音された音信号の波形を示すグラフである。図５（Ｃ）において、波形Ｗ５は、時刻Ｔ１までの期間（ノイズが発生する前の期間）の音信号を示し、波形Ｗ６は、時刻Ｔ１以降の期間（ノイズが発生した後の期間）の音信号を示している。波形Ｗ６に示すように、時刻Ｔ１以降の期間において、上述の動作部の動作（ズームレンズ１４、ＶＲレンズ１３、又はＡＦレンズ１２の駆動や操作部８０の操作）により、音信号にノイズが重畳される。 FIG. 5C is a graph showing the waveform of the sound signal collected by the microphone 21. In FIG. 5C, a waveform W5 indicates a sound signal in a period up to time T1 (period before noise is generated), and a waveform W6 is a period after time T1 (period after noise is generated). A sound signal is shown. As shown in the waveform W6, noise is superimposed on the sound signal during the period after time T1 due to the operation of the operation unit (the driving of the zoom lens 14, the VR lens 13, or the AF lens 12 and the operation of the operation unit 80). Is done.

本実施形態では、校正装置は、まず、図５に示されるような音信号（波形Ｗ５及び波形Ｗ６）を撮像装置１のマイク２１によって取得する。そして、校正装置は、このように取得した音信号に基づいて、図６に示すように、ノイズの周波数スペクトル（Ｗ９）を算出する。 In the present embodiment, the calibration device first acquires a sound signal (waveform W5 and waveform W6) as shown in FIG. Then, the calibration device calculates the frequency spectrum (W9) of noise based on the sound signal acquired in this way, as shown in FIG.

図６は、本実施形態におけるノイズの周波数スペクトルの生成方法の一例を説明する説明図である。この図において、周波数スペクトルＷ７は、ノイズ発生前（時刻Ｔ１以前、例えば、フレームＦ１）の周波数スペクトルを示し、周波数スペクトルＷ８は、ノイズ発生後（時刻Ｔ１以降、例えば、フレームＦ４）の周波数スペクトルを示している。なお、周波数スペクトルＷ７は、音信号に定常ノイズとして含まれる背景ノイズ（フロアノイズ）の成分を示している。また、周波数スペクトルＷ９は、周波数スペクトルＷ８から周波数スペクトルＷ７を減算した（フレームＦ４からフレームＦ１を減算した）、ノイズの周波数スペクトルを示している。 FIG. 6 is an explanatory diagram illustrating an example of a method for generating a noise frequency spectrum in the present embodiment. In this figure, a frequency spectrum W7 indicates a frequency spectrum before noise generation (before time T1, for example, frame F1), and a frequency spectrum W8 indicates a frequency spectrum after noise generation (after time T1, for example, frame F4). Show. The frequency spectrum W7 indicates a background noise (floor noise) component included as a stationary noise in the sound signal. The frequency spectrum W9 indicates the frequency spectrum of noise obtained by subtracting the frequency spectrum W7 from the frequency spectrum W8 (subtracting the frame F1 from the frame F4).

つまり、校正装置は、図６に示すようにノイズの周波数スペクトルを算出し、算出したノイズの周波数スペクトルに基づいて、ノイズ周波数特徴ベクトルを生成する。すなわち、ノイズ周波数特徴ベクトルは、機構部を動作させた際に、音信号に定常ノイズとして含まれる背景ノイズの成分を、音信号から減算して生成される。校正装置は、生成したノイズ周波数特徴ベクトルをノイズ周波数特徴ベクトル記憶部６１に記憶させる。 That is, the calibration apparatus calculates a noise frequency spectrum as shown in FIG. 6, and generates a noise frequency feature vector based on the calculated noise frequency spectrum. That is, the noise frequency feature vector is generated by subtracting the background noise component included as stationary noise in the sound signal from the sound signal when the mechanism unit is operated. The calibration apparatus stores the generated noise frequency feature vector in the noise frequency feature vector storage unit 61.

これにより、ノイズ周波数特徴ベクトルから音信号に定常ノイズとして含まれる背景ノイズ（フロアノイズ）の成分が低減されるので、背景ノイズ（フロアノイズ）によって、入力周波数特徴ベクトルとノイズ周波数特徴ベクトルとの内積値が大きな値になることを防止することができる。そのため、本実施形態におけるノイズ推定装置１５０は、音信号に含まれるノイズを適切に推定することができる。したがって、ノイズ推定装置１５０及び信号処理装置１００は、音信号に重畳しているノイズを適切に低減することができる。 As a result, the background noise (floor noise) component contained as stationary noise in the sound signal from the noise frequency feature vector is reduced, so that the inner product of the input frequency feature vector and the noise frequency feature vector by the background noise (floor noise). It is possible to prevent the value from becoming a large value. Therefore, the noise estimation apparatus 150 in the present embodiment can appropriately estimate noise included in the sound signal. Therefore, the noise estimation device 150 and the signal processing device 100 can appropriately reduce noise superimposed on the sound signal.

なお、上記の背景ノイズを減算してノイズ周波数特徴ベクトルする場合においても、上述した複数のフレームにおける周波数特徴ベクトルの最小値、平均値、中間値、最大値のいずれかを使用する手法を適用してもよい。 Even when the noise frequency feature vector is subtracted from the background noise described above, a method that uses any one of the minimum value, average value, intermediate value, and maximum value of the frequency feature vector in the plurality of frames described above is applied. May be.

なお、本発明の実施形態によれば、撮像装置１は、上述の信号処理装置１００を備える。
これにより、撮像装置１は、信号処理装置１００と同様の効果が期待でき、音信号に重畳しているノイズを適切に低減することができる。 According to the embodiment of the present invention, the imaging device 1 includes the signal processing device 100 described above.
Thereby, the imaging device 1 can expect the effect similar to the signal processing apparatus 100, and can reduce the noise superimposed on the sound signal appropriately.

また、本発明の実施形態によれば、ノイズ推定装置１５０としてのコンピュータに、内積算出部２３２が、入力された音信号の周波数スペクトルと、ノイズの周波数スペクトルとに基づいて、音信号とノイズとの類似の度合いを示すノイズ類似度を算出する算出手順（ステップＳ１０３）と、ノイズ推定部２３３が、算出手順により算出されたノイズ類似度に基づいて、音信号に含まれる推定ノイズを推定するノイズ推定手順（ステップＳ１０４）とを実行させるためのプログラムである。
これにより、プログラムは、ノイズ推定装置１５０と同様の効果が期待でき、音信号に重畳しているノイズを適切に低減することができる。 Further, according to the embodiment of the present invention, the inner product calculation unit 232 is added to the computer as the noise estimation device 150 based on the frequency spectrum of the input sound signal and the frequency spectrum of the noise. The noise calculation unit (step S103) for calculating the noise similarity indicating the degree of similarity with the noise estimation unit 233 estimates the estimated noise included in the sound signal based on the noise similarity calculated by the calculation procedure. This is a program for executing the noise estimation procedure (step S104).
Thereby, the program can expect the same effect as the noise estimation apparatus 150, and can appropriately reduce the noise superimposed on the sound signal.

なお、本発明は、上記の各実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で変更可能である。
上記の各実施形態において、ノイズ推定部２３３は、ノイズ類似度として算出された、入力周波数特徴ベクトルとノイズ周波数特徴ベクトルとの内積値に基づいて推定ノイズを推定する形態を説明したが、これに限定されるものではない。例えば、ノイズ類似度を算出する算出部が、音信号の周波数スペクトルと、ノイズの周波数スペクトルとの相関係数に基づいて、ノイズ類似度を算出し、ノイズ推定部２３３が、この相関係数に基づいて推定ノイズを推定する形態でもよい。この場合、ノイズ類似度を算出する算出部は、音信号の周波数スペクトルとノイズの周波数スペクトルとの各周波数成分（各周波数ビン）に対応する各組による相関係数を算出する。 The present invention is not limited to the above embodiments, and can be modified without departing from the spirit of the present invention.
In each of the embodiments described above, the noise estimation unit 233 has described a mode of estimating the estimated noise based on the inner product value of the input frequency feature vector and the noise frequency feature vector calculated as the noise similarity. It is not limited. For example, the calculation unit that calculates the noise similarity calculates the noise similarity based on the correlation coefficient between the frequency spectrum of the sound signal and the frequency spectrum of the noise, and the noise estimation unit 233 calculates the correlation coefficient. A form in which estimated noise is estimated based on this may be used. In this case, the calculation unit that calculates the noise similarity calculates a correlation coefficient by each set corresponding to each frequency component (each frequency bin) of the frequency spectrum of the sound signal and the frequency spectrum of the noise.

これにより、内積値に基づいて推定ノイズを推定する形態と同様に、ノイズ推定装置１５０及び信号処理装置１００は、音信号に重畳しているノイズを適切に低減することができる。
また、ノイズ類似度は、例えば、マハラノビス距離に基づいて算出される形態でもよいし、他の形態でもよい。 Thereby, the noise estimation apparatus 150 and the signal processing apparatus 100 can reduce appropriately the noise superimposed on the sound signal similarly to the form which estimates estimated noise based on the inner product value.
The noise similarity may be calculated based on, for example, the Mahalanobis distance, or may be in another form.

また、上記の各実施形態において、ノイズの周波数スペクトルの強度及び音信号の周波数スペクトルの強度として、絶対値を使用する形態を説明したが、虚数部も含めた複素数やパワーを使用する形態でもよい。 Further, in each of the embodiments described above, the form in which the absolute value is used as the intensity of the noise frequency spectrum and the intensity of the frequency spectrum of the sound signal has been described. However, a form using complex numbers and power including an imaginary part may be used. .

また、上記の各実施形態において、信号処理装置１００は、動作部（機構部）のノイズの発生タイミングを検出する検出部を備え、この検出部によって検出した発生タイミングに基づいて、ノイズを低減する処理を行うか否かを判定してもよい。つまり、信号処理装置１００は、検出部によって検出した発生タイミングに基づいて、ノイズが発生していると判定された場合に、ノイズを低減する処理を実行する形態でもよい。 In each of the above embodiments, the signal processing apparatus 100 includes a detection unit that detects the generation timing of noise in the operation unit (mechanism unit), and reduces noise based on the generation timing detected by the detection unit. It may be determined whether or not to perform processing. That is, the signal processing apparatus 100 may be configured to execute a process of reducing noise when it is determined that noise is generated based on the generation timing detected by the detection unit.

また、ノイズ推定装置１５０は、この検出部によって検出した発生タイミングに基づいて、ノイズ周波数特徴ベクトル記憶部６１に複数記憶されているノイズ周波数特徴ベクトルのうちの１つ又は複数を選択して使用する形態でもよい。例えば、ノイズ推定装置１５０は、この検出部によってズームレンズ１４、ＶＲレンズ１３、及びＡＦレンズ１２の機構音のうちのいずれか１つが動作していることを検出し、検出結果に応じて、ズームレンズ１４、ＶＲレンズ１３、及びＡＦレンズ１２のうちのいずれか１つに対応するノイズ周波数特徴ベクトルを選択してもよい。 Also, the noise estimation device 150 selects and uses one or more of the noise frequency feature vectors stored in the noise frequency feature vector storage unit 61 based on the generation timing detected by the detection unit. Form may be sufficient. For example, the noise estimation device 150 detects that any one of the mechanical sounds of the zoom lens 14, the VR lens 13, and the AF lens 12 is operating by the detection unit, and zooms in according to the detection result. A noise frequency feature vector corresponding to any one of the lens 14, the VR lens 13, and the AF lens 12 may be selected.

また、上記の実施形態において、ノイズ減算部２３４は、ノイズ推定部２３３によって推定された推定ノイズに基づいて、目的音の推定周波数特徴ベクトル^〜Ｓ_ｋを推定する形態を説明したが、さらに、重み付け係数を付加して算出する形態でもよい。例えば、推定ノイズに対するノイズ発生前の音信号の周波数スペクトルの比であるＳＮＲ（signal-noise ratio）やトーン度などを算出し、その算出値に基づいて重み付け係数を乗算して、推定ノイズ量を補正してもよい。 In the above embodiment, the noise subtraction unit 234, based on the estimated noise estimated by the noise estimation unit 233 has been described a mode of estimating the estimated frequency characteristic vector ^~ S _k of the target sound, furthermore, weighted The calculation may be performed by adding a coefficient. For example, the SNR (signal-noise ratio), which is the ratio of the frequency spectrum of the sound signal before noise generation to the estimated noise, the tone degree, and the like are calculated, and the estimated noise amount is calculated by multiplying the weighted coefficient based on the calculated value. It may be corrected.

また、上記の各実施形態において、ノイズ周波数特徴ベクトルを校正装置が生成して、ノイズ周波数特徴ベクトル記憶部６１に記憶させる形態を説明したが、撮像装置１、信号処理装置１００、又はノイズ推定装置１５０が、ノイズ周波数特徴ベクトルを生成する生成部を備える形態でもよい。また、校正装置が、既に予め生成されているノイズ周波数特徴ベクトルをノイズ周波数特徴ベクトル記憶部６１に記憶させる形態でもよい。
また、上記の各実施形態において、ノイズ周波数特徴ベクトル記憶部６１は、ノイズ周波数特徴ベクトルを記憶する形態を説明したが、ノイズ周波数特徴ベクトル記憶部６１は、ノイズの周波数スペクトルを記憶し、ノイズ推定装置１５０によって、ノイズ周波数特徴ベクトルが生成される形態でもよい。例えば、内積算出部２３２が、ノイズの周波数スペクトルに基づいて、ノイズ周波数特徴ベクトルを生成する形態でもよい。 In each of the above-described embodiments, the noise frequency feature vector is generated by the calibration device and stored in the noise frequency feature vector storage unit 61. However, the imaging device 1, the signal processing device 100, or the noise estimation device is described. 150 may include a generation unit that generates a noise frequency feature vector. Alternatively, the calibration apparatus may store the noise frequency feature vector that has already been generated in the noise frequency feature vector storage unit 61.
Further, in each of the embodiments described above, the noise frequency feature vector storage unit 61 has been described as storing noise frequency feature vectors. However, the noise frequency feature vector storage unit 61 stores a frequency spectrum of noise and performs noise estimation. The apparatus 150 may generate the noise frequency feature vector. For example, the inner product calculation unit 232 may generate a noise frequency feature vector based on the frequency spectrum of noise.

また、上記の各実施形態において、信号処理装置１００を撮像装置１に適用する形態を説明したが、これに限定されるものではない。例えば、レコーダなどの録音装置や電話機などのキー操作による音がノイズとして目的音である音信号に重畳されるような装置に適用してもよい。
また、上記の各実施形態において、フーリエ変換としてＦＦＴを用いる形態を説明したが、ＤＦＴ（Discrete Fourier Transform）を用いる形態でもよいし、他の方式を用いる形態でもよい。 Further, in each of the above embodiments, the mode in which the signal processing device 100 is applied to the imaging device 1 has been described. However, the present invention is not limited to this. For example, the present invention may be applied to a recording device such as a recorder or a device in which sound generated by key operations such as a telephone is superimposed as a noise signal as a target sound as noise.
Further, in each of the embodiments described above, the form using FFT as the Fourier transform has been described. However, a form using DFT (Discrete Fourier Transform) may be used, or a form using another method may be used.

上述のノイズ推定装置１５０及び信号処理装置１００は内部に、コンピュータシステムを有している。そして、上述したノイズ推定装置１５０及び信号処理装置１００の処理過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータが読み出して実行することによって、上記処理が行われる。ここで、コンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしてもよい。 The noise estimation device 150 and the signal processing device 100 described above have a computer system therein. The processing steps of the noise estimation device 150 and the signal processing device 100 described above are stored in a computer-readable recording medium in the form of a program, and the above processing is performed by the computer reading and executing the program. Is called. Here, the computer-readable recording medium refers to a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, and the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.

１…撮像装置、２４…ノイズ低減処理部、６１…ノイズ周波数特徴ベクトル記憶部、１００…信号処理装置、１５０…ノイズ推定装置、２３１…周波数特徴ベクトル生成部、２３２…内積算出部、２３３…ノイズ推定部 DESCRIPTION OF SYMBOLS 1 ... Imaging device, 24 ... Noise reduction process part, 61 ... Noise frequency feature vector memory | storage part, 100 ... Signal processing apparatus, 150 ... Noise estimation apparatus, 231 ... Frequency feature vector generation part, 232 ... Inner product calculation part, 233 ... Noise estimation unit

Claims

A calculation unit that calculates a noise similarity indicating a degree of similarity between the sound signal and the noise based on the frequency spectrum of the input sound signal and the frequency spectrum of the noise;
A noise estimation device comprising: a noise estimation unit that estimates estimated noise included in the sound signal based on the noise similarity calculated by the calculation unit.

The calculation unit includes:
Based on an inner product value of an input frequency feature vector generated based on the frequency spectrum of the sound signal and a noise frequency feature vector indicating the feature of the noise generated based on the frequency spectrum of the noise, the noise similarity Calculate the degree,
The noise frequency feature vector is a vector generated by using the intensity corresponding to each frequency component in the frequency spectrum of the noise as an element,
The noise estimation apparatus according to claim 1, wherein the input frequency feature vector is a vector generated using an intensity corresponding to each frequency component in a frequency spectrum of the sound signal as an element.

The noise estimation apparatus according to claim 2, wherein the noise frequency feature vector is a unit vector.

The noise estimation unit
The noise estimation according to any one of claims 2 to 3, wherein the estimated noise is estimated by multiplying the noise frequency feature vector by a coefficient determined in accordance with the noise similarity. apparatus.

A storage unit in which the noise frequency feature vector is stored in advance;
The frequency feature vector generation part which converts the sound signal into a frequency spectrum, and generates the input frequency feature vector based on the converted frequency spectrum of the sound signal. The noise estimation apparatus according to any one of the above.

6. The noise frequency feature vector is generated in advance based on a sound signal obtained when a mechanism unit that generates the noise is operated. The noise estimation apparatus described in 1.

The noise frequency feature vector is generated by subtracting, from the sound signal, a background noise component included as stationary noise in the sound signal when the mechanism unit is operated. The noise estimation apparatus described in 1.

The calculation unit includes:
The noise estimation device according to claim 1, wherein the noise similarity is calculated based on a correlation coefficient between a frequency spectrum of the sound signal and a frequency spectrum of the noise.

The noise estimation device according to any one of claims 1 to 8,
A noise reduction processing unit that reduces noise included in the sound signal based on the estimated noise estimated by the noise estimation device;
A signal processing apparatus comprising:

An image pickup apparatus comprising the signal processing apparatus according to claim 9.

In the computer as a noise estimation device,
A calculation procedure for calculating a noise similarity indicating a degree of similarity between the sound signal and the noise based on the frequency spectrum of the input sound signal and the frequency spectrum of the noise;
A noise estimation unit for executing a noise estimation procedure for estimating an estimated noise included in the sound signal based on the noise similarity calculated by the calculation procedure.