JP2012173371A

JP2012173371A - Imaging apparatus and noise reduction method for imaging apparatus

Info

Publication number: JP2012173371A
Application number: JP2011032786A
Authority: JP
Inventors: Mitsuhiro Okazaki; 光宏岡崎; Kosuke Okano; 康介岡野
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2011-02-18
Filing date: 2011-02-18
Publication date: 2012-09-10
Anticipated expiration: 2031-02-18
Also published as: JP5750932B2

Abstract

PROBLEM TO BE SOLVED: To provide an imaging apparatus and a noise reduction method for the imaging apparatus capable of reducing noise appropriately without degrading a target sound such as voice.SOLUTION: An imaging apparatus according to the present invention comprises: a sound collection device 131; a voice section detecting part 134 for detecting a voice section from sound information collected by the sound collection device 131; and a noise reduction processing part 133 for performing a different noise reduction process based on the detection result of the voice section detecting part 134.

Description

本発明は、撮影時に入力される音情報からノイズを低減処理する撮像装置及び撮像装置のノイズ低減方法に関するものである。 The present invention relates to an image pickup apparatus that performs noise reduction processing from sound information input during shooting, and a noise reduction method for the image pickup apparatus.

動画撮影が可能な撮像装置において動画撮影時には、オートフォーカスレンズの駆動部の動作に伴い発生する動作音（以下、ＡＦノイズという）等のノイズが、マイク等の集音装置により集音され、被写体の発する音声等の目的音に混入し、目的音の品質を損なうことがある。
このようなＡＦノイズを低減する方法として、ＡＦ駆動部の動作前に入力される音声信号のパワー値を取得し、この音声信号のパワー値に基づいてフロアリング係数を制御（変化）させることにより、ノイズを除去する方法が提案されている（例えば、特許文献１参照）。 At the time of moving image shooting in an image pickup apparatus capable of shooting a moving image, noise such as an operation sound (hereinafter referred to as AF noise) generated with the operation of the driving unit of the autofocus lens is collected by a sound collecting device such as a microphone, and the subject The sound may be mixed with the target sound such as voice and the quality of the target sound may be impaired.
As a method of reducing such AF noise, the power value of the audio signal input before the operation of the AF drive unit is acquired, and the flooring coefficient is controlled (changed) based on the power value of the audio signal. A method for removing noise has been proposed (see, for example, Patent Document 1).

特開２００８−２５２３８９号公報JP 2008-252389 A

しかし、特許文献１によるノイズ低減処理の場合は、ＡＦノイズを低減することができる反面、音声等の目的音を劣化する可能性が高いという問題があった。 However, in the case of the noise reduction processing according to Patent Document 1, although AF noise can be reduced, there is a problem that there is a high possibility that the target sound such as voice is deteriorated.

本発明の課題は、音声等の目的音の劣化を招くことなく、ノイズを適切に低減することができる撮像装置及び撮像装置のノイズ低減方法を提供することである。 An object of the present invention is to provide an image pickup apparatus and a noise reduction method for the image pickup apparatus that can appropriately reduce noise without causing deterioration of a target sound such as voice.

本発明は、以下のような解決手段により前記課題を解決する。なお、理解を容易にするために、本発明の実施形態に対応する符号を付して説明するが、これに限定されるものではない。 The present invention solves the above problems by the following means. In addition, in order to make an understanding easy, although the code | symbol corresponding to embodiment of this invention is attached | subjected and demonstrated, it is not limited to this.

請求項１に記載の発明は、集音装置（１３１）と、前記集音装置（１３１）により集音された音情報から音声区間を検出する音声区間検出部（１３４）と、前記音声区間検出部（１３４）の検出結果に基づいて、異なるノイズ低減処理を行うノイズ低減処理部（１３３）と、を備えること、を特徴とする撮像装置（１００）である。
請求項２に記載の発明は、請求項１に記載の撮像装置（１００）であって、該撮像装置（１００）内の駆動部の動作情報から動作ノイズの発生タイミングを検出するノイズタイミング検出部（１３５）を備え、前記ノイズ低減処理部（１３３）は、前記ノイズタイミング検出部（１３５）の検出結果に基づいて、異なるノイズ低減処理を行うこと、を特徴とする撮像装置（１００）である。
請求項３に記載の発明は、請求項１または２に記載の撮像装置（１００）であって、前記ノイズ低減処理部（１３３）は、前記音声区間検出部（１３４）により音声区間と検出された場合、前記音声区間検出部（１３４）において非音声区間であると検出された場合よりも弱い低い第１のノイズ低減処理を行うこと、を特徴とする撮像装置（１００）である。
請求項４に記載の発明は、請求項１から３のいずれか１項に記載の撮像装置（１００）であって、前記ノイズ低減処理部（１３３）は、前記音声区間検出部（１３４）において非音声区間であると判定された場合の音情報からノイズを推定し、該推定されたノイズを、推定ノイズ減算前音情報から減算する第２のノイズ低減処理を行うこと、を特徴とする撮像装置（１００）である。
請求項５に記載の発明は、請求項１から４のいずれか１項に記載の撮像装置（１００）であって、前記ノイズ低減処理部（１３３）は、前記音声区間検出部（１３４）において非音声区間であると判定された場合の音情報からフロアリングスペクトルを求め、該フロアリングスペクトルを用いてフロアリング処理前音情報に対してフロアリング処理すること、を特徴とする撮像装置（１００）である。
請求項６に記載の発明は、請求項１から５のいずれか１項に記載の撮像装置（１００）であって、前記音声区間検出部（１３４）による音声区間の検出は、音声波形の一部を切り出して自己相関関数を求め、その求めた自己相関関数のピーク値を用いて検出すること、を特徴とする撮像装置（１００）である。
請求項７に記載の発明は、集音された音情報から音声区間を検出し、音声区間の検出結果に基づいて、異なるノイズ低減処理を行うこと、を特徴とする撮像装置（１００）のノイズ低減方法である。
請求項８に記載の発明は、請求項７に記載のノイズ低減方法であって、前記撮像装置（１００）内の駆動部の動作情報から動作ノイズの発生タイミングを検出し、動作ノイズの発生タイミングの検出結果に基づいて、異なるノイズ低減処理を行うこと、を特徴とする撮像装置（１００）のノイズ低減方法である。
請求項９に記載の発明は、請求項７または８に記載のノイズ低減方法であって、音声区間と検出された場合、非音声区間であると検出された場合よりも弱い第１のノイズ低減処理を行うこと、を特徴とする撮像装置（１００）のノイズ低減方法である。
請求項１０に記載の発明は、請求項７から９のいずれか１項に記載のノイズ低減方法であって、非音声区間であると判定された場合の音情報からノイズを推定し、該推定されたノイズを、推定ノイズ減算前音情報から減算する第２のノイズ低減処理を行うこと、を特徴とする撮像装置（１００）のノイズ低減方法である。
請求項１１に記載の発明は、請求項７から１０のいずれか１項に記載のノイズ低減方法であって、非音声区間であると判定された場合の音情報からフロアリングスペクトルを求め、該フロアリングスペクトルを用いてフロアリング処理前音情報に対してフロアリング処理すること、を特徴とする撮像装置（１００）のノイズ低減方法である。
請求項１２に記載の発明は、請求項７から１１のいずれか１項に記載のノイズ低減方法であって、音声区間の検出は、音声波形の一部を切り出して自己相関関数を求め、その求めた自己相関関数のピーク値を用いて検出すること、を特徴とする撮像装置（１００）のノイズ低減方法である。
なお、符号を付して説明した構成は、適宜改良してもよく、また、少なくとも一部を他の構成物に代替してもよい。 The invention according to claim 1 is a sound collecting device (131), a sound section detecting unit (134) for detecting a sound section from sound information collected by the sound collecting device (131), and the sound section detection. An image pickup apparatus (100) comprising: a noise reduction processing unit (133) that performs different noise reduction processing based on a detection result of the unit (134).
A second aspect of the present invention is the imaging apparatus (100) according to the first aspect, wherein the noise timing detection unit detects the generation timing of the operational noise from the operational information of the drive unit in the imaging apparatus (100). (135), and the noise reduction processing unit (133) performs different noise reduction processing based on the detection result of the noise timing detection unit (135). .
The invention described in claim 3 is the imaging apparatus (100) according to claim 1 or 2, wherein the noise reduction processing unit (133) is detected as a voice section by the voice section detection unit (134). In this case, the imaging apparatus (100) is characterized in that it performs a lower first noise reduction process that is weaker than when it is detected as a non-speech section by the speech section detection unit (134).
Invention of Claim 4 is an imaging device (100) of any one of Claim 1 to 3, Comprising: The said noise reduction process part (133) is in the said audio | voice area detection part (134). Image pickup characterized in that noise is estimated from sound information when it is determined to be a non-speech interval, and a second noise reduction process is performed to subtract the estimated noise from the estimated noise subtraction sound information. Device (100).
Invention of Claim 5 is an imaging device (100) of any one of Claim 1 to 4, Comprising: The said noise reduction process part (133) is in the said audio | voice area detection part (134). An imaging apparatus (100) characterized in that a flooring spectrum is obtained from sound information when it is determined to be a non-speech section, and flooring processing is performed on the sound information before flooring processing using the flooring spectrum. ).
A sixth aspect of the present invention is the imaging apparatus (100) according to any one of the first to fifth aspects, wherein the detection of the voice section by the voice section detection unit (134) is performed using one of the voice waveforms. An imaging apparatus (100) is characterized in that an autocorrelation function is obtained by cutting out a part and detecting using a peak value of the obtained autocorrelation function.
According to a seventh aspect of the present invention, the noise of the image pickup apparatus (100) is characterized in that a voice section is detected from collected sound information and different noise reduction processing is performed based on the detection result of the voice section. This is a reduction method.
The invention according to claim 8 is the noise reduction method according to claim 7, wherein the generation timing of the operation noise is detected from the operation information of the drive unit in the imaging device (100), and the generation timing of the operation noise is detected. The noise reduction method of the imaging apparatus (100) is characterized in that different noise reduction processing is performed based on the detection result of.
The invention according to claim 9 is the noise reduction method according to claim 7 or 8, wherein the first noise reduction is weaker when detected as a non-speech interval when detected as a speech interval. It is the noise reduction method of an imaging device (100) characterized by performing a process.
A tenth aspect of the present invention is the noise reduction method according to any one of the seventh to ninth aspects, wherein noise is estimated from sound information when it is determined to be a non-voice section, and the estimation is performed. A noise reduction method for the imaging apparatus (100), characterized in that a second noise reduction process is performed to subtract the generated noise from the pre-estimation noise subtraction sound information.
The invention according to claim 11 is the noise reduction method according to any one of claims 7 to 10, wherein a flooring spectrum is obtained from sound information when it is determined to be a non-voice section, A noise reduction method for an imaging apparatus (100), wherein flooring processing is performed on sound information before flooring processing using a flooring spectrum.
The invention according to claim 12 is the noise reduction method according to any one of claims 7 to 11, wherein the speech section is detected by extracting a part of the speech waveform and obtaining an autocorrelation function. It is a noise reduction method of the imaging device (100) characterized by detecting using the peak value of the calculated autocorrelation function.
Note that the configuration described with reference numerals may be modified as appropriate, and at least a part of the configuration may be replaced with another component.

本発明によれば、音声等の目的音の劣化を招くことなく、ノイズを適切に低減することができる撮像装置及び撮像装置のノイズ低減方法を提供することができる。 According to the present invention, it is possible to provide an imaging apparatus and a noise reduction method for the imaging apparatus that can appropriately reduce noise without causing deterioration of a target sound such as voice.

本発明の実施形態の撮像装置の構成を示すブロック図である。It is a block diagram which shows the structure of the imaging device of embodiment of this invention. 音声波形図である。It is an audio | voice waveform diagram. 音声波形の自己相関関数を説明する図である。It is a figure explaining the autocorrelation function of an audio | voice waveform. 自己相関関数を利用して音声区間を検出する場合の一例であり、図４（Ａ）は、マイクからの出力波形、図４（Ｂ）は自己相関関数のピークに閾値を設定して、閾値以上の部分をＨｉｇｈとして示した波形である。FIG. 4A shows an example of detecting a speech section using an autocorrelation function. FIG. 4A shows an output waveform from a microphone, and FIG. 4B shows a threshold value at a peak of the autocorrelation function. It is the waveform which showed the above part as High. ノイズタイミング検出部による動作ノイズの発生タイミング検出の詳細を説明する図である。It is a figure explaining the detail of the generation timing detection of the operation noise by a noise timing detection part. ノイズ低減処理動作のフローを示すフローチャートである。It is a flowchart which shows the flow of a noise reduction process operation. ノイズ低減処理の対象となる第１の処理対象音の形態を説明する概略図である。It is the schematic explaining the form of the 1st process target sound used as the object of a noise reduction process. 区間Ａのスペクトルを示す図である。It is a figure which shows the spectrum of the area A. FIG. 区間Ｂのスペクトルを示す図である。It is a figure which shows the spectrum of the area B. FIG. 区間Ｃのスペクトルを示す図である。FIG. 6 is a diagram showing a spectrum of a section C. 推定ノイズスペクトルを示す図である。It is a figure which shows an estimated noise spectrum. 区間Ｃのスペクトルからノイズを減算したスペクトルを示す図である。It is a figure which shows the spectrum which subtracted the noise from the spectrum of the area C. FIG. フロアリングスペクトルＡを使用したフロアリング後のスペクトルを示す図である。It is a figure which shows the spectrum after the flooring which uses the flooring spectrum A. FIG. フロアリングスペクトルＡを示す図である。It is a figure which shows the flooring spectrum A. フロアリングスペクトルＢを示す図である。It is a figure which shows the flooring spectrum B. FIG. フロアリングスペクトルＢを使用したフロアリング後のスペクトルを示す図である。It is a figure which shows the spectrum after the flooring which uses the flooring spectrum B. FIG. ノイズ低減処理の対象となる第２の処理対象音の形態を説明する概略図である。It is the schematic explaining the form of the 2nd process target sound used as the object of a noise reduction process. 区間Ｅの背景音とノイズのスペクトルを示す図である。It is a figure which shows the spectrum of the background sound and noise of the area E. FIG. 区間Ｅのスペクトルを使用した推定ノイズを示す図である。It is a figure which shows the estimation noise which uses the spectrum of the area E. FIG. 区間Ｆのスペクトルを示す図である。It is a figure which shows the spectrum of the area F. FIG. 区間Ｅの推定ノイズを使用してフロアリング処理した後のスペクトルを示す図である。It is a figure which shows the spectrum after performing a flooring process using the estimated noise of the area E. FIG. 区間Ｆのスペクトルを使用した推定ノイズを示す図である。It is a figure which shows the estimation noise which uses the spectrum of the area F. FIG. 区間Ｆの推定ノイズを使用してフロアリング処理した後のスペクトルを示す図である。It is a figure which shows the spectrum after performing a flooring process using the estimated noise of the area F. FIG.

以下、図面等を参照して、本発明の実施形態について説明する。図１は、本発明の実施形態の撮像装置の構成を示すブロック図である。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration of an imaging apparatus according to an embodiment of the present invention.

図１に示すように、撮像装置１００は、レンズ鏡筒１１０と、レンズ鏡筒１１０を通過した被写体像を撮像してＡ／Ｄ変換すると共に、画像処理して画像データを生成する画像処理部１２０と、集音された音情報をＡ／Ｄ変換すると共に、ノイズ低減処理する音情報処理部１３０と、画像処理部１２０で得られた画像データ及び音情報処理部１３０で得られた音声信号を記録する記録部１４０と、ＣＰＵ１５０と、を備える。 As shown in FIG. 1, the imaging apparatus 100 includes a lens barrel 110 and an image processing unit that captures and subjects an object image that has passed through the lens barrel 110 to A / D conversion and generates image data by performing image processing. 120, the sound information processing unit 130 that performs A / D conversion on the collected sound information and performs noise reduction processing, the image data obtained by the image processing unit 120, and the audio signal obtained by the sound information processing unit 130 Recording section 140 and CPU 150 are recorded.

レンズ鏡筒１１０は、焦点調整レンズ（以下、ＡＦ（ＡｕｔｏＦｏｃｕｓ）レンズ、手振れ補正レンズ（以下、ＶＲ（ＶｉｂｒａｔｉｏｎＲｅｄｕｃｔｉｏｎ）レンズ、ズームレンズ、ズームレンズ駆動部、ズームエンコーダ、像ぶれ補正部等を備えるＶＲユニット１１１と、ＡＦエンコーダ１１２と、ＡＦ駆動用モータ１１３と、を備える。 The lens barrel 110 includes a focus adjustment lens (hereinafter referred to as AF (Auto Focus) lens, a camera shake correction lens (hereinafter referred to as VR (Vibration Reduction) lens), a zoom lens, a zoom lens driving unit, a zoom encoder, an image blur correction unit, and the like. A VR unit 111, an AF encoder 112, and an AF drive motor 113 are provided.

ＡＦエンコーダ１１２は、光学系のＡＦレンズの位置を検出してＣＰＵ１５０に出力する。ＡＦ駆動用モータ１１３には、ＡＦレンズの位置を制御するための駆動制御信号がＣＰＵ１５０から入力され、その駆動制御信号に応じて、ＡＦレンズの位置が制御される。 The AF encoder 112 detects the position of the AF lens of the optical system and outputs it to the CPU 150. A driving control signal for controlling the position of the AF lens is input from the CPU 150 to the AF driving motor 113, and the position of the AF lens is controlled in accordance with the driving control signal.

ＣＰＵ１５０は、設定された撮像条件（例えば、絞り値、露出値等）に応じてレンズ鏡筒１１０を制御する。ＣＰＵ１５０は、ズームレンズ駆動部及びＡＦ駆動用モータ１１３を駆動する駆動制御信号を生成し、ズームレンズ駆動部及びＡＦ駆動用モータ１１３に出力する。 The CPU 150 controls the lens barrel 110 according to the set imaging conditions (for example, aperture value, exposure value, etc.). The CPU 150 generates a drive control signal for driving the zoom lens driving unit and the AF driving motor 113 and outputs the drive control signal to the zoom lens driving unit and the AF driving motor 113.

音情報処理部１３０は、集音装置であるマイク１３１と、集音されＡ／Ｄ変換された音情報を処理する音信号処理部１３２と、ノイズ低減処理部１３３と、を備える。 The sound information processing unit 130 includes a microphone 131 that is a sound collection device, a sound signal processing unit 132 that processes sound information that has been collected and A / D converted, and a noise reduction processing unit 133.

音信号処理部１３２は、マイク１３１により集音した音情報から音声区間を検出する音声区間検出部１３４と、ＡＦ駆動用モータ１１３の動作情報から動作ノイズの発生するタイミングを検出するノイズタイミング検出部１３５と、を備える。 The sound signal processing unit 132 includes a voice section detection unit 134 that detects a voice section from sound information collected by the microphone 131, and a noise timing detection unit that detects timing at which operation noise occurs from the operation information of the AF driving motor 113. 135.

音声区間検出部１３４は、マイク１３１により集音された音情報から、音声信号の含まれる区間（音声区間）とそれ以外の区間（非音声区間）とを、自己相関関数のピーク値に基づいて判別する。音声区間検出部１３４による音声区間検出の概要を説明すると、次のとおりである。 Based on the peak value of the autocorrelation function, the speech section detection unit 134 determines a section (speech section) including a speech signal and other sections (non-speech section) from the sound information collected by the microphone 131. Determine. The outline of the voice zone detection by the voice zone detector 134 will be described as follows.

図２は、音声波形である。この音声波形の任意の一部を切り出して自己相関関数を求めると、図３に示す波形となる。この音声波形は、音声、即ち、声帯の振動数に対応した基本周波数及びそれの倍音に対応した周波数帯域にピークが集中する性質（調波性）を有しており、この調波性を利用して自己相関関数のピーク値の大きさによって、音声であるか非音声であるかを区別することが可能である。 FIG. 2 shows a speech waveform. When an arbitrary part of this speech waveform is cut out to obtain an autocorrelation function, the waveform shown in FIG. 3 is obtained. This voice waveform has the property (harmonic nature) that peaks concentrate on the fundamental frequency corresponding to the frequency of the voice, that is, the vocal cords, and the frequency band corresponding to its harmonics. Thus, it is possible to distinguish between speech and non-speech based on the magnitude of the peak value of the autocorrelation function.

図４は、自己相関関数を利用して音声区間を検出する場合の一例を示す。図４（Ａ）は、マイク１３１からの出力波形であり、その前半部にＡＦノイズが発生し、後半部に音声とＡＦノイズとが発生している。図４（Ａ）に示すような出力波形に対して自己相関関数を求め、この自己相関関数のピークに閾値を設定して、閾値以上の部分をＨｉｇｈとして示すと、図４（Ｂ）のような波形が得られる。これによって、出力波形の後半部に音声位置と一致した音声区間があることを検出できる。 FIG. 4 shows an example in which a speech section is detected using an autocorrelation function. FIG. 4A shows an output waveform from the microphone 131, in which AF noise is generated in the first half and voice and AF noise are generated in the second half. When an autocorrelation function is obtained for the output waveform as shown in FIG. 4A, a threshold is set at the peak of the autocorrelation function, and a portion equal to or higher than the threshold is shown as High, as shown in FIG. Waveform is obtained. As a result, it can be detected that there is a voice section that matches the voice position in the latter half of the output waveform.

ノイズタイミング検出部１３５は、ＡＦ駆動用モータ１１３の動作情報から動作ノイズの発生するタイミングを検出する。このノイズタイミング検出部１３５による動作ノイズの発生タイミングは、ＣＰＵ１５０にＡＦ駆動用モータ１１３に対する駆動制御信号を出力するように指示するＡＦ駆動コマンド及びＡＦエンコーダ１１２からの出力を用いて検出（推定）する。 The noise timing detection unit 135 detects the timing at which operation noise occurs from the operation information of the AF driving motor 113. The generation timing of the operation noise by the noise timing detector 135 is detected (estimated) using an AF drive command that instructs the CPU 150 to output a drive control signal for the AF drive motor 113 and an output from the AF encoder 112. .

ノイズタイミング検出部１３５による動作ノイズの発生タイミング検出の詳細を説明すると、次のとおりである。
図５に示すように、ＡＦ駆動コマンドの出力によりＡＦ駆動用モータ１１３が動作されると、ＡＦ駆動コマンドの出力時刻であるＡＦ駆動用モータ１１３の動作開始時刻ｔ１から動作終了時刻ｔ３まで連続して動作ノイズは発生される。マイク１３１には、被写体の音声等の記録目的音に動作ノイズが重畳された音情報が集音され、その集音された音情報がマイク１３１から出力される。 The details of operation noise generation timing detection by the noise timing detection unit 135 will be described as follows.
As shown in FIG. 5, when the AF drive motor 113 is operated by the output of the AF drive command, the operation continues from the operation start time t1 of the AF drive motor 113, which is the output time of the AF drive command, to the operation end time t3. Operation noise is generated. The microphone 131 collects sound information in which operation noise is superimposed on the recording target sound such as the sound of the subject, and the collected sound information is output from the microphone 131.

このとき、ＡＦエンコーダ１１２からは、ＡＦ駆動系のギア列で起きるバックラッシュ等の影響でＡＦ駆動用モータ１１３の動作開始時刻ｔ１よりも遅れた時刻ｔ２から出力されることがある。そこで、ノイズタイミング検出部１３５は、ＡＦ駆動コマンドの出力時刻ｔ１からＡＦエンコーダ１１２の出力停止ｔ３までを動作ノイズの発生タイミングとして検出し、それ以外を非ノイズタイミングとして検出する。
なお、ＡＦ動作時においてマイク１３１から実際に出力される信号は、目的音に動作ノイズが重畳した信号であるが、説明を簡略にするため、図５では、動作ノイズのみを示している。 At this time, the AF encoder 112 may output from a time t2 that is later than the operation start time t1 of the AF driving motor 113 due to the influence of backlash or the like that occurs in the gear train of the AF driving system. Therefore, the noise timing detection unit 135 detects from the output time t1 of the AF drive command to the output stop t3 of the AF encoder 112 as operation noise generation timing, and detects the other as non-noise timing.
Note that the signal actually output from the microphone 131 during the AF operation is a signal in which the operation noise is superimposed on the target sound, but only the operation noise is shown in FIG. 5 to simplify the description.

ノイズ低減処理部１３３は、図５に示す動作ノイズのうち、ＡＦ動作開始時及びＡＦ動作終了時に発生する衝撃音ノイズを低減処理する。ノイズ低減処理部１３３は、図５に示す動作ノイズ発生前の窓Ｘの第１周波数スペクトルと、動作ノイズ発生後の窓Ｙの第２周波数スペクトルと、を取得する。取得した第１周波数スペクトルと第２周波数スペクトルとを比較し、比較の結果、第２周波数スペクトルが第１周波数スペクトルより大きい場合、第２周波数スペクトルを第１周波数スペクトルに置き換えることにより、第１のノイズの低減処理を行なう。 The noise reduction processing unit 133 reduces the impact noise generated at the start of the AF operation and at the end of the AF operation among the operation noises shown in FIG. The noise reduction processing unit 133 obtains the first frequency spectrum of the window X before the generation of the operational noise and the second frequency spectrum of the window Y after the generation of the operational noise shown in FIG. The obtained first frequency spectrum is compared with the second frequency spectrum, and if the second frequency spectrum is larger than the first frequency spectrum as a result of the comparison, the first frequency spectrum is replaced with the first frequency spectrum, thereby replacing the first frequency spectrum with the first frequency spectrum. Perform noise reduction processing.

ここで、音声区間検出部１３４により音声区間であると検出された場合、所定の周波数（例えば、４０００Ｈｚ）までのスペクトルは置き換えずに保存し、また、非音声区間であると検出された場合、それよりも小さい所定の周波数（例えば、５００Ｈｚ）までのスペクトルは置き換えずに保存する。すなわち、音声区間と検出された場合の保存する周波数の上限を、例えば、４０００Ｈｚとし、非音声区間と検出された場合の保存する周波数の上限を、例えば、５００Ｈｚとすることにより、音声区間であると検出された場合、非音声区間であると検出された場合よりも弱い第１の衝撃音ノイズ低減処理を行う。 Here, when it is detected that the speech section is a speech section by the speech section detection unit 134, the spectrum up to a predetermined frequency (for example, 4000 Hz) is stored without replacement, and when it is detected that it is a non-speech section, The spectrum up to a predetermined frequency (for example, 500 Hz) smaller than that is stored without being replaced. That is, the upper limit of the frequency to be saved when it is detected as a voice section is, for example, 4000 Hz, and the upper limit of the frequency to be saved when it is detected as a non-voice section is, for example, 500 Hz. Is detected, a first impact noise reduction process that is weaker than that detected when the non-speech section is detected is performed.

また、ノイズ低減処理部１３３は、音声区間検出部１３４において非音声区間であると検出されて強い衝撃音ノイズ低減処理が行われた場合の周波数スペクトルからノイズを推定し、推定ノイズを更新すると共に、その推定ノイズを用いて、第１の衝撃音ノイズ低減処理が行われた周波数スペクトルから減算して周波数スペクトルを生成するスペクトル減算処理（第２のノイズ低減処理）を行なう。 In addition, the noise reduction processing unit 133 estimates noise from the frequency spectrum when the voice segment detection unit 134 detects that it is a non-speech segment and performs strong impact noise reduction processing, and updates the estimated noise. Then, using the estimated noise, a spectrum subtraction process (second noise reduction process) for generating a frequency spectrum by subtracting from the frequency spectrum on which the first impact sound noise reduction process has been performed is performed.

上記した構成の他に、音情報処理部１３０には、マイク１３１から出力される音情報を予め決められた区間毎に分割して窓関数で重み付けする共に、この区間毎の音データをフーリエ変換（ＦＦＴ：ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）して周波数領域に変換する処理部を有する。また、ＦＦＴ処理により周波数領域の振幅情報と位相情報とに分けられ、周波数領域の振幅情報を利用してノイズ低減処理（スペクトル減算処理）が行われたスペクトルに対して、逆フーリエ変換（ＩＦＦＴ：ＩｎｖｅｒｓｅＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を行なうことにより、ノイズ低減処理後のスペクトル（音情報）を時間領域に変換する処理部を有する。これら処理部については、図示を省略する。 In addition to the above-described configuration, the sound information processing unit 130 divides the sound information output from the microphone 131 into predetermined intervals and weights them with a window function, and Fourier-transforms the sound data for each interval. (FFT: Fast Fourier Transform) and a processing unit that converts the frequency domain. Further, an inverse Fourier transform (IFFT: IFFT) is performed on a spectrum that is divided into frequency domain amplitude information and phase information by FFT processing and subjected to noise reduction processing (spectral subtraction processing) using the frequency domain amplitude information. A processing unit that converts the spectrum (sound information) after the noise reduction processing into a time domain by performing an Inverse Fast Fourier Transform). The illustration of these processing units is omitted.

更に、ノイズ低減処理部１３３は、第２のノイズ低減処理（スペクトル減算処理）により、スペクトルが著しく減少した場合やスペクトルが消失した場合においてスペクトルを補正するフロアリング機能を有する。このフロアリングは、ノイズタイミング検出部１３５により非ノイズタイミングであると検出され、且つ、音声区間検出部１３４により非音声区間であると検出された場合の音情報を基に生成されたフロアリングスペクトルと第２のノイズ低減処理において減算された後のスペクトルとを比較し、減算後のスペクトルがフロアリングスペクトルを下回っている（スペクトル強度が小さい）場合、フロワリングスペクトルを採用した音情報（スペクトル）を生成し、これをＩＦＦＴ処理する。
ただし、減算後のスペクトルがフロアリングスペクトルを上回っている（スペクトル強度が大きい）場合は、フロアリング処理を行ってもよいし、また、行なわなくてもよい。 Further, the noise reduction processing unit 133 has a flooring function for correcting the spectrum when the spectrum is significantly reduced or the spectrum disappears by the second noise reduction processing (spectrum subtraction processing). This flooring is a flooring spectrum generated based on sound information when the noise timing detection unit 135 detects non-noise timing and the voice interval detection unit 134 detects non-sound timing. And the spectrum after subtraction in the second noise reduction processing, and if the spectrum after subtraction is lower than the flooring spectrum (spectrum intensity is small), the sound information (spectrum) adopting the flooring spectrum And IFFT process this.
However, if the spectrum after subtraction exceeds the flooring spectrum (the spectrum intensity is high), the flooring process may or may not be performed.

また、フロアリング機能に用いるフロアリングスペクトルは、ノイズタイミング検出部１３５により非ノイズタイミングであると検出され、且つ、音声区間検出部１３４により非音声区間であると検出された場合の音情報を用いて更新する。これにより、フロアリングスペクトルには、動作ノイズスペクトル、音声スペクトルのいずれも含まず、背景音のみが含まれており、フロアリング処理時に音声スペクトルが付加され、ノイズ低減処理後の音情報に本来無い音声を生じることがないようにしている。 Further, the flooring spectrum used for the flooring function uses sound information when the noise timing detection unit 135 detects non-noise timing and the voice segment detection unit 134 detects non-speech segment. Update. As a result, the flooring spectrum does not include either the operating noise spectrum or the voice spectrum, but only the background sound is included. The voice spectrum is added during the flooring process, and is not originally present in the sound information after the noise reduction process. The sound is not generated.

次に、本実施形態の撮像装置１００におけるノイズ低減処理動作（ノイズ低減方法）を、図面に基づいて説明する。図６は、ノイズ低減処理動作のフローを示すフローチャートである。図７は、ノイズ低減処理の対象となる第１の処理対象音の形態を説明する概略図である。 Next, the noise reduction processing operation (noise reduction method) in the imaging apparatus 100 of the present embodiment will be described based on the drawings. FIG. 6 is a flowchart showing the flow of the noise reduction processing operation. FIG. 7 is a schematic diagram illustrating the form of the first processing target sound that is the target of the noise reduction processing.

（第１の処理対象音）
図７に示すように、第１の処理対象音は、区間Ａが背景音のみ、区間Ｂが背景音と音声（目的音）、区間Ｃが背景音とＡＦノイズを発生している形態である。図７に示す第１の処理対象音の区間Ｃにおいてマイク１３１が集音し出力される音情報からＡＦノイズを低減処理する動作及びフロアリング更新について説明する。 (First processing target sound)
As shown in FIG. 7, the first processing target sound is a form in which section A generates only background sound, section B generates background sound and sound (target sound), and section C generates background sound and AF noise. . An operation for reducing AF noise from the sound information collected and output by the microphone 131 in the section C of the first processing target sound shown in FIG. 7 and flooring update will be described.

（ステップＳＴ１）
まず、ノイズタイミング検出部１３５により、マイク１３１から出力される音情報に基づいて、ノイズタイミングの検出が開始される。
このときのマイク３１によって集音された音情報（スペクトル）を、区間Ａについて図８、区間Ｂについて図９に示す。 (Step ST1)
First, the noise timing detection unit 135 starts detection of noise timing based on sound information output from the microphone 131.
The sound information (spectrum) collected by the microphone 31 at this time is shown in FIG.

（ステップＳＴ２）
続いて、音声区間検出部１３４により、マイク１３１から出力される音情報に基づいて、音声区間の検出が開始される。 (Step ST2)
Subsequently, the voice segment detection unit 134 starts detection of the voice segment based on the sound information output from the microphone 131.

（ステップＳＴ３）
マイク１３１から出力される音情報をＦＦＴ処理して、周波数領域の振幅情報と位相情報とに分けられる。 (Step ST3)
Sound information output from the microphone 131 is subjected to FFT processing, and is divided into frequency domain amplitude information and phase information.

（ステップＳＴ４）
次に、ノイズタイミング検出部１３５によって、動作ノイズの発生タイミングであるか非ノイズタイミングであるか（すなわちＡＦ区間か否か）、が検出（判定）される。 (Step ST4)
Next, the noise timing detection unit 135 detects (determines) whether it is the generation timing of the operation noise or the non-noise timing (that is, whether or not it is the AF section).

（ステップＳＴ４，ＹＥＳ）
ステップＳＴ４において、区間Ｃは動作ノイズの発生タイミングであると判定され（ＡＦ区間，ＹＥＳ）、ステップＳＴ５に進む。
（ステップＳＴ４，ＮＯ）
区間ＡおよびＢは非ノイズタイミングであると判定され、ステップＳＴ１１に進む。 (Step ST4, YES)
In step ST4, it is determined that section C is the timing of occurrence of operation noise (AF section, YES), and the process proceeds to step ST5.
(Step ST4, NO)
The sections A and B are determined to be non-noise timing, and the process proceeds to step ST11.

（ステップＳＴ５）
ステップＳＴ５においては、音声区間検出部１３４によって、音声区間であるか、非音声区間であるかが検出（判定）される。区間Ｃは、非音声区間であるので（ＮＯ）、ステップＳＴ７に進む。
（ステップＳＴ７）
ここでは、ＡＦ動作開始時及びＡＦ動作終了時が含まれる場合、上限が所定の周波数（例えば、５００Ｈｚ）までのスペクトルを置き換えずに保存するといった強い衝撃音ノイズ低減処理が行われ、区間Ｃについて図１０のスペクトルを得る。
ＡＦ動作開始時及びＡＦ動作終了時が含まれない場合は、衝撃音ノイズが含まれないと判断し、衝撃音ノイズ低減処理を行わない。 (Step ST5)
In step ST5, the speech segment detection unit 134 detects (determines) whether it is a speech segment or a non-speech segment. Since section C is a non-voice section (NO), the process proceeds to step ST7.
(Step ST7)
Here, when the AF operation start time and AF operation end time are included, a strong impact noise reduction process is performed in which the upper limit is stored without replacing the spectrum up to a predetermined frequency (for example, 500 Hz). The spectrum of FIG. 10 is obtained.
When the AF operation start time and AF operation end time are not included, it is determined that the impact noise is not included, and the impact noise reduction process is not performed.

（ステップＳＴ８）
次いで、ステップＳＴ７のノイズ低減処理により得られたスペクトル（図１０）におけるノイズを推定し、図１１に示すような推定ノイズスペクトルをステップＳＴ９に出力する。 (Step ST8)
Next, the noise in the spectrum (FIG. 10) obtained by the noise reduction process in step ST7 is estimated, and an estimated noise spectrum as shown in FIG. 11 is output to step ST9.

（ステップＳＴ９）
続いて、ステップＳＴ７の衝撃音ノイズ低減処理により得られたスペクトル（図１０）からステップＳＴ８の推定により得られた推定ノイズスペクトル（図１１）を減算するスペクトル減算処理（第２のノイズ低減処理）が行なわれ、図１２に示すようなスペクトルが得られる。 (Step ST9)
Subsequently, a spectrum subtraction process (second noise reduction process) for subtracting the estimated noise spectrum (FIG. 11) obtained by the estimation of step ST8 from the spectrum (FIG. 10) obtained by the shock noise reduction process of step ST7. And a spectrum as shown in FIG. 12 is obtained.

（ステップＳＴ１０）
第２のノイズ低減処理（スペクトル減算処理）により、図１２のスペクトルが著しく減少したり、消失したりする場合があるので、これに対応するため、図１２のスペクトルを補正するフロアリングが行なわれる。
このフロアリングは、図１２のスペクトルと基準となるフロアリングスペクトルと、の大きさを比較する。そして、比較の結果、強度の大きいスペクトルを採用して、図１３に示すスペクトルを生成する。ここで用いたフロアリングスペクトルは、後述するが、区間Ａから求めたフロアリングスペクトルである。 (Step ST10)
The second noise reduction process (spectrum subtraction process) may cause the spectrum of FIG. 12 to be significantly reduced or lost, and therefore, flooring for correcting the spectrum of FIG. 12 is performed to cope with this. .
In this flooring, the magnitudes of the spectrum of FIG. 12 and the reference flooring spectrum are compared. Then, as a result of comparison, a spectrum having a high intensity is adopted to generate a spectrum shown in FIG. Although the flooring spectrum used here is mentioned later, it is a flooring spectrum calculated | required from the area A. FIG.

（ステップＳＴ１１）
ステップＳＴ１１に戻り、ここでは、音声区間検出部１３４によって、音声区間であるか、非音声区間（背景音のみの区間）であるか、が検出（判定）される。その結果、区間Ｂは音声区間であると判定され（ＹＥＳ）、ノイズ低減処理、スペクトル減算、フロアリングを行わず、ステップＳＴ１３に進む。区間Ａは非音声区間であると判定され（ＮＯ）、ステップＳＴ１２に進む。 (Step ST11)
Returning to step ST11, here, the voice section detection unit 134 detects (determines) whether it is a voice section or a non-speech section (section of only background sound). As a result, it is determined that the section B is a voice section (YES), and noise reduction processing, spectrum subtraction, and flooring are not performed, and the process proceeds to step ST13. The section A is determined to be a non-voice section (NO), and the process proceeds to step ST12.

（ステップＳＴ１２）
ステップＳＴ１２においては、図８に示す背景音のみが発生している区間Ａのスペクトルの各周波数における振幅を半減して、図１４に示すようなフロアリングスペクトルを得る。このフロアリングスペクトル（図１４）を、前述したようにステップＳＴ１０のフロアリングに用いると共に、このフロアリングスペクトルに更新する。
仮に、区間Ｂの図９に示すスペクトルの各周波数における振幅を半減して求めた図１５のフロアリングスペクトルを用いてフロアリングした場合、図１６に示すようなスペクトルとなる。図１６のスペクトルを区間Ｃのスペクトルとすると、区間Ｂ（図９）に含まれる音声のスペクトルの成分（特にｆ２，ｆ４）も含まれることになり、正確な目的音を得ることができない。
しかし、本実施形態によると、フロアリングに用いるフロアリングスペクトル（図１４）には、音声及び動作ノイズのスペクトルが含まれていない。このため、ステップＳＴ１０のフロアリングにおいて、ＡＦノイズや音声のスペクトルが付加されてノイズ低減処理後の音情報に本来ない動作ノイズや音声が生じることを防ぐことができる。 (Step ST12)
In step ST12, the amplitude at each frequency of the spectrum of the section A in which only the background sound shown in FIG. 8 is generated is halved to obtain a flooring spectrum as shown in FIG. This flooring spectrum (FIG. 14) is used for flooring in step ST10 as described above, and is updated to this flooring spectrum.
If flooring is performed using the flooring spectrum of FIG. 15 obtained by halving the amplitude at each frequency of the spectrum shown in FIG. 9 in section B, the spectrum shown in FIG. 16 is obtained. If the spectrum of FIG. 16 is the spectrum of the section C, the components (particularly f2 and f4) of the speech spectrum included in the section B (FIG. 9) are also included, and an accurate target sound cannot be obtained.
However, according to this embodiment, the flooring spectrum (FIG. 14) used for flooring does not include the spectrum of voice and operation noise. For this reason, in flooring of step ST10, it is possible to prevent AF noise and voice spectrums from being added, and undesired operation noise and voice from being generated in the sound information after the noise reduction processing.

（ステップＳＴ１３）
最後のステップＳＴ１３において、ステップＳＴ３において分けられた位相を用いてＩＦＦＴ処理を行なうことにより、ノイズ低減処理後のスペクトルを時間領域に変換して記録部１４０に出力する。 (Step ST13)
In the final step ST13, IFFT processing is performed using the phase divided in step ST3, thereby converting the spectrum after the noise reduction processing into the time domain and outputting it to the recording unit 140.

（第２の処理対象音）
次に、上述した第１の処理対象音と異なる形態を有する第２の処理対象音を用いた場合のノイズ低減処理動作（ノイズ低減方法）について説明する。なお、ノイズ低減処理動作フローの各ステップは、第１の処理対象音の場合は略同様であるため、主として各ステップにおける処理内容の相違点を中心に説明する。 (Second processing target sound)
Next, a noise reduction processing operation (noise reduction method) when using a second processing target sound having a form different from the first processing target sound described above will be described. In addition, since each step of the noise reduction processing operation flow is substantially the same in the case of the first processing target sound, the description will mainly focus on differences in processing contents in each step.

図１７は、ノイズ低減処理の対象となる第２の処理対象音の形態を説明する概略図である。図１７に示すように、処理対象音は、区間Ｄが背景音のみ、区間Ｅが背景音とＡＦノイズ、区間Ｆが背景音と音声とＡＦノイズを発生している形態である。図１７に示す処理対象音の区間Ｅ及び区間Ｆにおいてマイク１３１が集音し出力される音情報からＡＦノイズを低減処理する動作及びフロアリング更新について説明する。 FIG. 17 is a schematic diagram illustrating the form of the second processing target sound that is the target of the noise reduction processing. As shown in FIG. 17, the processing target sound has a form in which section D generates only background sound, section E generates background sound and AF noise, and section F generates background sound, voice, and AF noise. The operation for reducing AF noise from the sound information collected and output by the microphone 131 in the section E and the section F of the processing target sound shown in FIG. 17 and the flooring update will be described.

ステップＳＴ１からＳＴ４までは上述の第１の処理対象音の区間Ｃと同様であるので省略する。
（ステップＳＴ５）
ステップＳＴ５において、区間Ｆは音声区間であると判定され（ＹＥＳ）、ステップＳＴ６に進む。
（ステップＳＴ６）
ステップＳＴ６においては、ＡＦ動作開始時及びＡＦ動作終了時が含まれる場合、上限が所定の周波数（例えば、４０００Ｈｚ）までのスペクトルを置き換えずに保存するといった弱い第１の衝撃音ノイズ低減処理が行なわれる。
ＡＦ動作開始時及びＡＦ動作終了時が含まれない場合は、衝撃音ノイズが含まれないと判断し、衝撃音ノイズ低減処理を行わない。 Steps ST1 to ST4 are the same as the above-described section C of the first processing target sound, and are therefore omitted.
(Step ST5)
In step ST5, it is determined that the section F is a voice section (YES), and the process proceeds to step ST6.
(Step ST6)
In step ST6, when the AF operation start time and AF operation end time are included, a weak first impact noise reduction process is performed in which the upper limit is stored without replacing the spectrum up to a predetermined frequency (for example, 4000 Hz). It is.
When the AF operation start time and AF operation end time are not included, it is determined that the impact noise is not included, and the impact noise reduction process is not performed.

この第１の衝撃音ノイズ低減処理が行われたスペクトルは、音声スベクトル成分ｆ２、ｆ４を含む。このスペクトルは、推定ノイズ更新に使用されず、第２のノイズ低減処理であるスペクトル減算処理を行なうためのステップＳＴ９に進む。
第２の処理対象音の場合、動作ノイズの発生タイミングで且つ非音声区間である区間Ｅにおいては、図１８に示すようなスペクトルが得られ、区間Ｆにおいては、図２０に示すようなスペクトルが得られる。 The spectrum subjected to the first impact sound noise reduction processing includes speech vector components f2 and f4. This spectrum is not used for updating the estimated noise, and the process proceeds to step ST9 for performing the spectrum subtraction process which is the second noise reduction process.
In the case of the second processing target sound, a spectrum as shown in FIG. 18 is obtained in the section E which is the generation timing of the operation noise and is a non-speech section, and in the section F, a spectrum as shown in FIG. can get.

そこで、ステップＳＴ８においては、区間Ｅで得られたスペクトルからノイズを推定し更新する。更新後の推定ノイズは、図１９に示すようなスペクトルとなる。 Therefore, in step ST8, noise is estimated from the spectrum obtained in section E and updated. The estimated noise after the update has a spectrum as shown in FIG.

そして、ステップＳＴ９において、区間Ｆにおけるスペクトル（図２０）から推定ノイズスペクル（図１９）を減算し、更に、ステップＳＴ１０においてフロアリングすることにより、図２１に示すようなスペクトルを生成する。
なお、第２の処理対象音の場合のフロアリングスペクトルは、背景音のみが発生している区間Ｄから得る。このフロアリングスペクトルは、第１の処理対象音の場合と同様図８を半減処理した図１４のスペクトルを用いる。 In step ST9, the estimated noise spectrum (FIG. 19) is subtracted from the spectrum in the section F (FIG. 20), and further, flooring is performed in step ST10, thereby generating a spectrum as shown in FIG.
The flooring spectrum in the case of the second processing target sound is obtained from the section D in which only the background sound is generated. As the flooring spectrum, the spectrum of FIG. 14 obtained by halving FIG. 8 as in the case of the first processing target sound is used.

ここで、仮に区間Ｆのスペクトル（図２０）に０．９を乗じた推定ノイズ図２２をもとにスペクトル減算を行った場合、図２３に示すスペクトルとなる。この場合、ｆ２、ｆ４で示される音声のスペクトルも減算され、正しい音情報を得ることができない。しかし本実施形態によると、図２１に示すように音声スペクトルを現存させることができる。 Here, if spectrum subtraction is performed based on the estimated noise diagram 22 obtained by multiplying the spectrum of the section F (FIG. 20) by 0.9, the spectrum shown in FIG. 23 is obtained. In this case, the sound spectrums indicated by f2 and f4 are also subtracted, and correct sound information cannot be obtained. However, according to the present embodiment, the voice spectrum can exist as shown in FIG.

以上、本実施形態によると、以下の効果を有する。
（１）マイク１３１により集音された音情報から音声区間を検出し、音声区間と検出された場合、非音声区間であると検出された場合よりも、弱い第１のノイズ低減処理を行なう。そのため、音声区間、非音声区間の区別なしに、強いノイズ低減処理を行なう場合に比べて、音声及び背景音からなる目的音の特に音声部分の劣化を招くことなく、ノイズを適切に低減することができる。
（２）第１のノイズ低減処理後に、非音声区間であると判定された場合の音情報からノイズを推定し、この推定されたノイズを減算する第２のノイズ低減処理（スペクトルの減算処理）を行なう。そのため、非音声区間の音情報からノイズを求めているので音声自体を消去することなく、目的音に非常に近い処理音を得ることができる。
（３）撮像装置１００内の駆動部の動作情報から動作ノイズの発生タイミングを検出し、ノイズ発生タイミングが検出された場合にノイズ低減処理に移行する。そのため、無駄なノイズ低減処理を行わず、必要な時のみ適切且つ合理的にノイズ低減処理を行なうことができる。
（４）第２のノイズ低減処理（スペクトル減算処理）後の音情報に対してフロアリングを行なうため、スペクトル減算により減少する、あるいは、消滅するおそれがあるスペクトルを補正することができる。これによって、ノイズを低減し過ぎてしまう事態を防ぎ、集音した音情報のうち、目的音に近い音を確保（記録）することができる。 As described above, this embodiment has the following effects.
(1) When a voice section is detected from the sound information collected by the microphone 131 and is detected as a voice section, the first noise reduction processing that is weaker than when it is detected as a non-voice section is performed. Therefore, noise is appropriately reduced without causing deterioration of the target sound, particularly the voice portion of the target sound consisting of the voice and the background sound, as compared with the case of performing strong noise reduction processing without distinguishing between the voice section and the non-voice section. Can do.
(2) Second noise reduction processing (spectrum subtraction processing) in which noise is estimated from sound information when it is determined to be a non-speech section after the first noise reduction processing, and the estimated noise is subtracted. To do. Therefore, since the noise is obtained from the sound information in the non-speech section, it is possible to obtain a processed sound very close to the target sound without deleting the sound itself.
(3) The operation noise generation timing is detected from the operation information of the drive unit in the imaging apparatus 100, and the process proceeds to the noise reduction process when the noise generation timing is detected. Therefore, it is possible to appropriately and rationally perform noise reduction processing only when necessary without performing useless noise reduction processing.
(4) Since flooring is performed on the sound information after the second noise reduction process (spectrum subtraction process), it is possible to correct a spectrum that may be reduced or disappear due to spectrum subtraction. As a result, it is possible to prevent the noise from being excessively reduced and to secure (record) a sound close to the target sound among the collected sound information.

以上、説明した実施形態に限定されることなく、以下に示すような種々の変形や変更が可能であり、それらも本発明の範囲内である。
例え、本実施形態では、マイク１３１により集音された音情報に対して、リアルタイムにノイズ低減処理する構成で説明した。しかし、これに限らず、マイク１３１により集音された音情報を、バッファメモリ等に一時的に記憶させておき、必要に応じてバッファメモリ等から音情報を読み出してノイズ低減処理する構成であってもよい。この場合は、リアルタイムに処理する際に装置にかかる負荷を軽減することができる。
なお、実施形態及び変形形態は、適宜組み合わせて用いることもできるが、詳細な説明は省略する。また、本発明は以上説明した実施形態によって限定されることはない。 The present invention is not limited to the above-described embodiment, and various modifications and changes as described below are possible, and these are also within the scope of the present invention.
For example, in the present embodiment, the configuration has been described in which noise information collected by the microphone 131 is subjected to noise reduction processing in real time. However, the present invention is not limited to this, and the sound information collected by the microphone 131 is temporarily stored in a buffer memory or the like, and the sound information is read from the buffer memory or the like as necessary to perform noise reduction processing. May be. In this case, it is possible to reduce the load on the apparatus when processing in real time.
In addition, although embodiment and a deformation | transformation form can also be used in combination as appropriate, detailed description is abbreviate | omitted. Further, the present invention is not limited to the embodiment described above.

１００：撮像装置、１３１：マイク（集音装置）、１３３：ノイズ低減処理部、１３４：音声区間検出部、１３５：ノイズタイミング検出部、１３６：第１のノイズ低減処理部、１３７：第２のノイズ低減処理部 100: imaging device, 131: microphone (sound collecting device), 133: noise reduction processing unit, 134: voice section detection unit, 135: noise timing detection unit, 136: first noise reduction processing unit, 137: second Noise reduction processing section

Claims

A sound collector;
A voice section detector for detecting a voice section from sound information collected by the sound collecting device;
A noise reduction processing unit that performs different noise reduction processing based on the detection result of the voice section detection unit,
An imaging apparatus characterized by the above.

The imaging apparatus according to claim 1,
A noise timing detection unit for detecting operation noise generation timing from the operation information of the drive unit in the imaging device;
The noise reduction processing unit performs different noise reduction processing based on a detection result of the noise timing detection unit;
An imaging apparatus characterized by the above.

The imaging apparatus according to claim 1, wherein:
The noise reduction processing unit
Performing a first noise reduction process that is weaker than when detected as a non-speech segment in the speech segment detection unit when the speech segment detection unit detects a speech segment;
An imaging apparatus characterized by the above.

The imaging apparatus according to any one of claims 1 to 3,
The noise reduction processing unit
Performing a second noise reduction process for estimating noise from sound information when it is determined as a non-speech section in the speech section detection unit, and subtracting the estimated noise from estimated sound subtraction sound information ,
An imaging apparatus characterized by the above.

The imaging apparatus according to any one of claims 1 to 4, wherein:
The noise reduction processing unit
Obtaining a flooring spectrum from sound information when it is determined as a non-voice section in the voice section detection unit, and flooring processing the sound information before flooring processing using the flooring spectrum;
An imaging apparatus characterized by the above.

The imaging apparatus according to any one of claims 1 to 5,
The detection of the voice section by the voice section detection unit is to cut out a part of the voice waveform to obtain an autocorrelation function, and to detect using the peak value of the obtained autocorrelation function,
An imaging apparatus characterized by the above.

Detects the voice section from the collected sound information,
Performing different noise reduction processing based on the detection result of the voice section,
A noise reduction method for an image pickup apparatus.

The noise reduction method according to claim 7,
Detecting the generation timing of the operation noise from the operation information of the drive unit in the imaging device;
Perform different noise reduction processing based on the detection result of the operation noise occurrence timing,
A noise reduction method for an image pickup apparatus.

The noise reduction method according to claim 7 or 8,
Performing a first noise reduction process that is weaker than when it is detected as a non-speech section when it is detected as a speech section;
A noise reduction method for an image pickup apparatus.

It is the noise reduction method of the imaging device according to any one of claims 7 to 9,
Performing a second noise reduction process for estimating noise from sound information when it is determined to be a non-speech section and subtracting the estimated noise from the pre-estimated noise subtraction sound information;
A noise reduction method for an image pickup apparatus.

The noise reduction method according to any one of claims 7 to 10,
Obtaining a flooring spectrum from sound information when it is determined to be a non-speech section, and flooring the sound information before flooring using the flooring spectrum;
A noise reduction method for an image pickup apparatus.

The noise reduction method according to any one of claims 7 to 11,
The detection of the speech section is to cut out a part of the speech waveform to obtain an autocorrelation function, and to detect using the peak value of the obtained autocorrelation function,
A noise reduction method for an image pickup apparatus.