JP2013161041A

JP2013161041A - Signal processor, camera and signal processing program

Info

Publication number: JP2013161041A
Application number: JP2012025095A
Authority: JP
Inventors: Kosuke Okano; 康介岡野
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2012-02-08
Filing date: 2012-02-08
Publication date: 2013-08-19

Abstract

PROBLEM TO BE SOLVED: To provide a signal processor that properly performs noise reduction.SOLUTION: A signal processor 1 includes transformation means 16 that transforms an acoustic signal to a spectrum signal for each prescribed frame, spectrum suppression means 16 that makes substantially zero a signal of frequency component to be a noise reduction object from the spectrum signal transformed by the transformation means 16, and inverse transformation means 16 that inversely transforms the spectrum signal after spectrum suppression to an acoustic signal.

Description

本発明は、信号処理装置、カメラおよび信号処理プログラムに関する。 The present invention relates to a signal processing device, a camera, and a signal processing program.

撮影中に取得した音響信号を撮影画像とともに記録する場合に、雑音となる撮影動作時の機構音をあらかじめスペクトル化した信号として記憶しておき、上記機構音が発生するタイミングに基づいて、上記スペクトル化信号に所定係数を乗じた信号を取得した音響信号から減算することにより、音響信号に混入する機構音を抑制する技術が知られている（特許文献１参照）。 When recording an acoustic signal acquired during photographing together with a photographed image, the mechanical sound at the time of the photographing operation that becomes noise is stored in advance as a spectrum signal, and the spectrum is determined based on the timing at which the mechanical sound is generated. There is known a technique for suppressing a mechanical sound mixed in an acoustic signal by subtracting a signal obtained by multiplying a signal by a predetermined coefficient from the acquired acoustic signal (see Patent Document 1).

特開２００６−２７９１８５号公報JP 2006-279185 A

一般に、ＡＦ（オートフォーカス）作動時の機構音は時間とともに変動する非定常な音である。このため、大きな機構音が生じた場合には、過小減算になって雑音が残ってしまうおそれがあった。一方、小さな機構音が生じた場合には、過大減算となって音質劣化につながってしまう。 In general, the mechanism sound at the time of AF (autofocus) operation is an unsteady sound that varies with time. For this reason, when a large mechanical sound is generated, there is a possibility that noise may remain due to undersubtraction. On the other hand, when a small mechanical sound is generated, excessive subtraction results in deterioration of sound quality.

本発明による信号処理装置は、音響信号を所定フレームごとにスペクトル信号に変換する変換手段と、変換手段によって変換されたスペクトル信号から、雑音低減対象とする周波数成分の信号を略０にするスペクトル抑圧手段と、スペクトル抑圧後のスペクトル信号を音響信号に逆変換する逆変換手段と、を備えることを特徴とする。 The signal processing apparatus according to the present invention includes a conversion unit that converts an acoustic signal into a spectrum signal for each predetermined frame, and a spectrum suppression that makes the frequency component signal to be reduced noise substantially zero from the spectrum signal converted by the conversion unit. Means, and inverse conversion means for inversely converting the spectrum signal after spectrum suppression into an acoustic signal.

本発明によれば、適切に雑音低減を行うことができる。 According to the present invention, it is possible to appropriately reduce noise.

本発明の一実施の形態による雑音低減用の信号処理装置を搭載する電子カメラの構成を例示するブロック図である。It is a block diagram which illustrates the composition of the electronic camera carrying the signal processing device for noise reduction by one embodiment of the present invention. ＣＰＵが実行する雑音低減処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the noise reduction process which CPU performs. コンピュータ装置を例示する図である。It is a figure which illustrates a computer apparatus.

以下、図面を参照して本発明を実施するための形態について説明する。図１は、本発明の一実施の形態による雑音低減用の信号処理装置を搭載する電子カメラ１の構成を例示するブロック図である。図１において、電子カメラ１は、撮影光学系１１と、撮像素子１２と、画像処理部１３と、ＲＡＭ１４と、ＬＣＤモニタ１５と、ＣＰＵ１６と、不揮発性メモリ１７と、カードインターフェース(I/F)１８と、通信インターフェース(I/F)１９と、操作部材２０と、マイク２１と、音響処理回路２２とを備える。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating the configuration of an electronic camera 1 equipped with a signal processing device for noise reduction according to an embodiment of the present invention. In FIG. 1, an electronic camera 1 includes a photographing optical system 11, an image sensor 12, an image processing unit 13, a RAM 14, an LCD monitor 15, a CPU 16, a nonvolatile memory 17, and a card interface (I / F). 18, a communication interface (I / F) 19, an operation member 20, a microphone 21, and an acoustic processing circuit 22.

ＣＰＵ１６、不揮発性メモリ１７、カードインターフェース１８、通信インターフェース１９、音響処理回路２２、画像処理部１３、ＲＡＭ１４およびＬＣＤモニタ１５は、それぞれがバス２５を介して接続されている。 The CPU 16, the nonvolatile memory 17, the card interface 18, the communication interface 19, the sound processing circuit 22, the image processing unit 13, the RAM 14, and the LCD monitor 15 are connected via a bus 25.

撮影光学系１１は、ズームレンズやフォーカシングレンズを含む複数のレンズ群で構成され、被写体像を撮像素子１２の受光面に結像させる。なお、図１を簡単にするため、撮影光学系１１を単レンズとして図示している。 The photographing optical system 11 includes a plurality of lens groups including a zoom lens and a focusing lens, and forms a subject image on the light receiving surface of the image sensor 12. In order to simplify FIG. 1, the photographing optical system 11 is shown as a single lens.

撮像素子１２は、受光素子が受光面に二次元配列されたＣＭＯＳイメージセンサなどによって構成される。撮像素子１２は、撮影光学系１１を通過した光束による像を光電変換し、デジタル画像信号を生成する。デジタル画像信号は、画像処理部１３に入力される。画像処理部１３は、デジタル画像データに対して各種の画像処理（色補間処理、階調変換処理、輪郭強調処理、ホワイトバランス調整処理など）を施す。 The imaging element 12 is configured by a CMOS image sensor or the like in which light receiving elements are two-dimensionally arranged on the light receiving surface. The image sensor 12 photoelectrically converts an image of the light beam that has passed through the photographing optical system 11 to generate a digital image signal. The digital image signal is input to the image processing unit 13. The image processing unit 13 performs various types of image processing (color interpolation processing, gradation conversion processing, contour enhancement processing, white balance adjustment processing, etc.) on the digital image data.

ＬＣＤモニタ１５は液晶パネルなどによって構成される。ＬＣＤモニタ１５は、ＣＰＵ１６からの指示に応じて画像や操作アイコン、メニュー画面などを表示する。ＲＡＭ１４は、画像処理部１３による画像処理の前工程や後工程でのデジタル画像データを一時的に記憶する他、後述する雑音低減処理の前工程や後工程でのデジタル音響データを一時的に記憶したり、ＣＰＵ１６によるプログラム実行時の作業用メモリとして用いられる。不揮発性メモリ１７は、フラッシュメモリなどによって構成される。不揮発性メモリ１７は電源オフ時にも記憶内容を保持するので、ＣＰＵ１６が実行するプログラムなどを記憶させる。 The LCD monitor 15 is composed of a liquid crystal panel or the like. The LCD monitor 15 displays an image, an operation icon, a menu screen, and the like according to an instruction from the CPU 16. The RAM 14 temporarily stores digital image data in a pre-process and post-process of image processing by the image processing unit 13 and temporarily stores digital acoustic data in a pre-process and post-process of noise reduction processing described later. Or used as a working memory when the CPU 16 executes the program. The nonvolatile memory 17 is configured by a flash memory or the like. Since the nonvolatile memory 17 retains the stored contents even when the power is turned off, the nonvolatile memory 17 stores a program executed by the CPU 16 and the like.

ＣＰＵ１６は、不揮発性メモリ１７が記憶するプログラムを実行することにより、電子カメラ１が行う動作を制御する。ＣＰＵ１６は、ＡＦ（オートフォーカス）動作制御や、自動露出（ＡＥ）演算も行う。ＡＦ動作は、たとえば、ライブビュー画像のコントラスト情報に基づいてフォーカシングレンズ（不図示）の合焦位置を求める。ライブビュー画像は、レリーズ操作前に撮像素子１２によって所定の時間間隔（たとえば３０コマ／毎秒）で繰り返し取得されるモニタ用画像のことをいう。 The CPU 16 controls the operation performed by the electronic camera 1 by executing a program stored in the nonvolatile memory 17. The CPU 16 also performs AF (autofocus) operation control and automatic exposure (AE) calculation. In the AF operation, for example, an in-focus position of a focusing lens (not shown) is obtained based on the contrast information of the live view image. The live view image refers to a monitor image that is repeatedly acquired by the image sensor 12 at a predetermined time interval (for example, 30 frames / second) before the release operation.

カードインターフェース１８はコネクタ（不図示）を有し、該コネクタにメモリカードなどの記憶媒体３０が接続される。カードインターフェース１８は、接続された記憶媒体３０に対するデータの書き込みや、記憶媒体３０からのデータの読み込みを行う。記憶媒体３０は、半導体メモリを内蔵したメモリカード、またはハードディスクドライブなどで構成される。 The card interface 18 has a connector (not shown), and a storage medium 30 such as a memory card is connected to the connector. The card interface 18 writes data to the connected storage medium 30 and reads data from the storage medium 30. The storage medium 30 is configured by a memory card incorporating a semiconductor memory, a hard disk drive, or the like.

通信インターフェース１９は、たとえば、不図示のコネクタに接続された外部機器との間でTCP/IPプロトコルを用いた通信を行う。この通信により、外部機器からのコマンドやデータを受信したり、記憶媒体３０が記憶している画像データや音響データなどを外部機器へ送信したりする。操作部材２０は、電源スイッチをはじめ、レリーズボタンや録画ボタン、十字キースイッチなどの各操作部材を含む。操作部材２０は、静止画撮影指示、録画（および録音）指示、モード切替え指示や各種選択指示など、各指示に応じた操作信号をＣＰＵ１６へ送出する。 For example, the communication interface 19 performs communication using an TCP / IP protocol with an external device connected to a connector (not shown). Through this communication, commands and data from an external device are received, and image data, acoustic data, and the like stored in the storage medium 30 are transmitted to the external device. The operation member 20 includes a power switch, and other operation members such as a release button, a recording button, and a cross key switch. The operation member 20 sends an operation signal corresponding to each instruction such as a still image shooting instruction, a recording (and recording) instruction, a mode switching instruction, and various selection instructions to the CPU 16.

音響処理回路２２は、マイク２２で集音された音響信号を増幅し、増幅後の信号をＡ／Ｄ変換回路（不図示）によってデジタル音響データに変換する。レンズ駆動機構２３は、ＣＰＵ１６からの指示に応じてフォーカシングレンズ（不図示）を光軸方向に進退移動させる。 The acoustic processing circuit 22 amplifies the acoustic signal collected by the microphone 22 and converts the amplified signal into digital acoustic data by an A / D conversion circuit (not shown). The lens driving mechanism 23 moves a focusing lens (not shown) forward and backward in the optical axis direction in response to an instruction from the CPU 16.

電子カメラ１は、レリーズボタンの押下操作に応じて静止画像を撮影する機能と、録画ボタンの押下操作に応じて動画像の録画および録音を行う機能とを有する。録画時のＣＰＵ１６は、録画ボタンが押下されると動画像の取得および録音を開始する。たとえばライブビュー画像と同じ３０コマ／毎秒のフレームレートで撮像を開始させるとともに、音響データの取得を開始させる。録画ボタンが再度押下されると画像取得および音響取得を終了し、取得したフレーム画像群によって構成される動画像を格納するファイル、および取得した音響データを格納するファイルを生成する。 The electronic camera 1 has a function of capturing a still image in response to a release button pressing operation and a function of recording and recording a moving image in response to a recording button pressing operation. When recording, the CPU 16 starts recording and recording a moving image when the recording button is pressed. For example, imaging is started at the same frame rate of 30 frames / second as that of the live view image, and acquisition of acoustic data is started. When the recording button is pressed again, image acquisition and sound acquisition are terminated, and a file for storing a moving image constituted by the acquired frame image group and a file for storing the acquired sound data are generated.

なお、本実施形態のＣＰＵ１６が生成した音響データファイルには、音響データと同じ時間軸で表されたレンズ駆動機構２３の駆動情報が含まれる。駆動情報は、録音時にレンズ駆動機構２３がフォーカシングレンズを駆動中であったか否かを示すデータである。たとえば、駆動開始時刻と駆動終了時刻とを示すタイムデータでもよいし、駆動中を「１」で表し、非駆動中を「０」で表す２値データでもよい。 Note that the acoustic data file generated by the CPU 16 of the present embodiment includes driving information of the lens driving mechanism 23 represented on the same time axis as the acoustic data. The drive information is data indicating whether or not the lens driving mechanism 23 is driving the focusing lens during recording. For example, time data indicating a driving start time and a driving end time may be used, or binary data indicating “1” during driving and “0” during non-driving may be used.

本実施形態は、上述した録画時に取得した音響データを対象にする雑音低減処理に特徴を有するので、以降の説明は雑音低減のためにＣＰＵ１６が実行する処理を中心に行う。 Since the present embodiment has a feature in the noise reduction processing for the acoustic data acquired at the time of recording described above, the following description will be focused on the processing executed by the CPU 16 for noise reduction.

図２は、ＣＰＵ１６が実行する雑音低減処理の流れを説明するフローチャートである。ＣＰＵ１６は、たとえば、音響データを一時的にＲＡＭ１４へ記憶しておき、雑音低減処理プログラムを実行後の音響データを記憶媒体３０に記録する。ＣＰＵ１６は、音響データを構成する１フレームごとに雑音低減を行う処理を繰り返す。 FIG. 2 is a flowchart for explaining the flow of noise reduction processing executed by the CPU 16. For example, the CPU 16 temporarily stores the acoustic data in the RAM 14 and records the acoustic data after executing the noise reduction processing program in the storage medium 30. CPU16 repeats the process which performs noise reduction for every frame which comprises acoustic data.

図２のステップＳ１０において、ＣＰＵ１６は、ＲＡＭ１４から１フレーム分の音響データおよびレンズ駆動機構２３の駆動情報を読み出してステップＳ２０へ進む。ステップＳ２０において、ＣＰＵ１６は、対象フレームの音響データに対して短時間フーリエ変換を施してステップＳ３０へ進む。これにより、音響信号の周波数成分ごとの振幅（周波数特徴スペクトルと呼ぶ）が得られる。たとえば、サンプリング周波数が４４．１kHzで約２０００の周波数成分に分割する場合には、１周波数成分当たりのスペクトル幅は約２０Hzである。 In step S10 of FIG. 2, the CPU 16 reads out acoustic data for one frame and driving information of the lens driving mechanism 23 from the RAM 14, and proceeds to step S20. In step S20, the CPU 16 performs a short-time Fourier transform on the sound data of the target frame and proceeds to step S30. Thereby, the amplitude (referred to as a frequency feature spectrum) for each frequency component of the acoustic signal is obtained. For example, when the sampling frequency is 44.1 kHz and divided into about 2000 frequency components, the spectral width per frequency component is about 20 Hz.

ステップＳ３０において、ＣＰＵ１６は、音響データに音声が含まれているか否かを判定する。一般に、人の声は周波数２kHz以下において所定の周波数間隔で複数の「山」波形を有する。ＣＰＵ１６は、周波数特徴スペクトルにおいて２kHz以下の周波数帯に上記周波数間隔で複数の「山」波形が並び、かつ「山」波形のピーク値が第１所定値以上である場合には、ステップＳ３０を肯定判定してステップＳ４０へ進む。ステップＳ３０を肯定判定する場合は、対象フレームの音響データに会話やナレーションなどが含まれる可能性が高いので、当該フレームの音響データを雑音低減対象とせず、当該フレームの音響データを環境音閾値の決定にも用いない。環境音閾値は、後述する雑音低減を行うか否かの判定に用いる判定閾値である。一方、ＣＰＵ１６は、２kHz以下の周波数帯において上記ピーク値が第１所定値以上となる「山」波形が上記周波数間隔で並んでいない場合には、ステップＳ３０を否定判定してステップＳ６０へ進む。 In step S30, the CPU 16 determines whether sound is included in the acoustic data. Generally, a human voice has a plurality of “mountain” waveforms at a predetermined frequency interval at a frequency of 2 kHz or less. When the plurality of “mountain” waveforms are arranged at the frequency interval in the frequency band of 2 kHz or less in the frequency characteristic spectrum, and the peak value of the “mountain” waveform is equal to or greater than the first predetermined value, the CPU 16 affirms step S30. Determine and proceed to step S40. If the determination in step S30 is affirmative, there is a high possibility that the sound data of the target frame includes conversation, narration, and the like. Therefore, the sound data of the frame is not targeted for noise reduction, and the sound data of the frame is set to the environmental sound threshold value. Not used for decision. The environmental sound threshold is a determination threshold used for determining whether or not to perform noise reduction described later. On the other hand, if the “mountain” waveform in which the peak value is not less than the first predetermined value in the frequency band of 2 kHz or less is not arranged at the frequency interval, the CPU 16 makes a negative determination in step S30 and proceeds to step S60.

ステップＳ４０において、ＣＰＵ１６は、対象フレームの周波数特徴スペクトルに対して短時間逆フーリエ変換を施してステップＳ５０へ進む。ステップＳ５０において、ＣＰＵ１６は、上記処理後の音響データをＲＡＭ１４へ記録して図２による処理を終了する。なお、本実施形態では、後述するステップＳ１１０へ進む場合を除き、対象フレームの音響データを雑音低減の対象にしない。つまり、ステップＳ１１０へ進まない場合は対象フレームの音響データをそのまま保存対象とし、ステップＳ１１０へ進む場合には、該Ｓ１１０による処理後の音響データを保存対象とする。 In step S40, the CPU 16 performs short-time inverse Fourier transform on the frequency feature spectrum of the target frame, and proceeds to step S50. In step S50, the CPU 16 records the acoustic data after the above processing in the RAM 14, and ends the processing shown in FIG. In the present embodiment, the acoustic data of the target frame is not targeted for noise reduction unless the process proceeds to step S110 described later. That is, if the process does not proceed to step S110, the sound data of the target frame is directly stored, and if the process proceeds to step S110, the sound data processed by S110 is stored.

上述したステップＳ３０を否定判定して進むステップＳ６０において、ＣＰＵ１６は、対象フレームにおける周波数特徴スペクトルの総和（すなわち、上記約２０００の周波数成分ごとの振幅値の総和）が第２所定値以上か否かを判定する。ＣＰＵ１６は、上記総和が第２所定値以上の場合にステップＳ６０を肯定判定してステップＳ４０へ進む。ステップＳ６０を肯定判定する場合は、対象フレームの音響データに突発的な音が含まれる可能性が高いので、当該フレームの音響データを雑音低減対象とせず、当該フレームの音響データを環境音閾値の決定にも用いない。一方、ＣＰＵ１６は、上記総和が第２所定値未満の場合にはステップＳ６０を否定判定し、ステップＳ７０へ進む。 In step S60, which proceeds after making a negative determination in step S30 described above, the CPU 16 determines whether or not the sum of the frequency feature spectra in the target frame (that is, the sum of the amplitude values for each of the approximately 2000 frequency components) is greater than or equal to the second predetermined value. Determine. If the sum is equal to or greater than the second predetermined value, the CPU 16 makes a positive determination in step S60 and proceeds to step S40. If the determination in step S60 is affirmative, there is a high possibility that sudden sound is included in the sound data of the target frame. Therefore, the sound data of the frame is not targeted for noise reduction, and the sound data of the frame is set to the environmental sound threshold value. Not used for decision. On the other hand, if the sum is less than the second predetermined value, the CPU 16 makes a negative determination in step S60 and proceeds to step S70.

ステップＳ７０において、ＣＰＵ１６は、対象フレームにおける音響データが可動部駆動中に取得されたデータか否かを判定する。ＣＰＵ１６は、たとえば、当該フレームがフォーカシングレンズの駆動中に該当することが上述したレンズ駆動機構２３の駆動情報によって示される場合は、ステップＳ７０を肯定判定してステップＳ９０へ進む。ステップＳ７０を肯定判定する場合は、対象フレームの音響データに可動部駆動音が含まれている可能性が高いので、当該フレームの音響データを環境音閾値の決定に用いない。ＣＰＵ１６は、当該フレームがフォーカシングレンズの駆動中に該当しないことが上述したレンズ駆動機構２３の駆動情報によって示される場合は、ステップＳ７０を否定判定してステップＳ８０へ進む。 In step S <b> 70, the CPU 16 determines whether the acoustic data in the target frame is data acquired while the movable unit is being driven. For example, when the driving information of the lens driving mechanism 23 described above indicates that the frame corresponds to the driving of the focusing lens, the CPU 16 makes a positive determination in step S70 and proceeds to step S90. When the determination in step S70 is affirmative, there is a high possibility that the moving part drive sound is included in the sound data of the target frame, and therefore the sound data of the frame is not used for determining the environmental sound threshold. If the above-described driving information of the lens driving mechanism 23 indicates that the frame does not correspond to driving of the focusing lens, the CPU 16 makes a negative determination in step S70 and proceeds to step S80.

ステップＳ８０において、ＣＰＵ１６は、対象フレームの周波数特徴スペクトルを用いて環境音閾値を決定、更新してステップＳ４０へ進む。ＣＰＵ１６は、たとえば、音響データを構成する第１フレームから対象フレームまでの各フレームの周波数特徴スペクトルから、上記約２０００の周波数成分ごとに最大値を抽出し、抽出した約２０００の周波数成分からなる周波数特徴スペクトルを環境音閾値スペクトルとする。ＣＰＵ１６は、当該フレームの周波数特徴スペクトルのいずれかの周波数成分の値が最大値に該当する場合は、環境音閾値スペクトルの対応する周波数成分の値を上記最大値と置換することによって環境音閾値スペクトルを更新する。これにより、可動部駆動音が含まれていない直近のフレームの音響データを環境音閾値スペクトルに反映させることができる。 In step S80, the CPU 16 determines and updates the environmental sound threshold using the frequency feature spectrum of the target frame, and proceeds to step S40. For example, the CPU 16 extracts a maximum value for each of the above approximately 2000 frequency components from the frequency feature spectrum of each frame from the first frame to the target frame constituting the acoustic data, and a frequency composed of the extracted approximately 2000 frequency components. The feature spectrum is an environmental sound threshold spectrum. When the value of any frequency component of the frequency characteristic spectrum of the frame corresponds to the maximum value, the CPU 16 replaces the value of the corresponding frequency component of the environmental sound threshold spectrum with the maximum value to replace the environmental sound threshold spectrum. Update. Thereby, the sound data of the most recent frame that does not include the movable part drive sound can be reflected in the environmental sound threshold spectrum.

ステップＳ９０において、ＣＰＵ１６は、対象フレームにおける周波数特徴スペクトルが第３所定値以上変動したか否かを判定する。ＣＰＵ１６は、当該フレームにおける周波数特徴スペクトル（すなわち、上記約２０００の周波数成分）のうち、少なくとも１つの周波数成分で前フレームに比べて第３所定値より大きく増減した場合にステップＳ９０を肯定判定してステップＳ１００へ進む。ＣＰＵ１６は、当該フレームにおける周波数特徴スペクトルの中で前フレームに比べて第３所定値より大きく増減した周波数成分が存在しない場合には、ステップＳ９０を否定判定してステップＳ４０へ進む。ステップＳ９０を肯定判定する場合は、対象フレームの音響データが突発的に変化した可能性が高いので、当該フレームの音響データを雑音低減対象とせず、当該フレームの音響データを環境音閾値の決定にも用いない。 In step S90, the CPU 16 determines whether or not the frequency feature spectrum in the target frame has fluctuated by a third predetermined value or more. The CPU 16 makes an affirmative determination in step S90 when at least one frequency component of the frequency characteristic spectrum in the frame (that is, the frequency component of about 2000) increases or decreases more than a third predetermined value compared to the previous frame. Proceed to step S100. If the frequency feature spectrum in the frame does not include a frequency component that is larger or smaller than the third predetermined value compared to the previous frame, the CPU 16 makes a negative determination in step S90 and proceeds to step S40. If the determination in step S90 is affirmative, there is a high possibility that the acoustic data of the target frame has suddenly changed. Therefore, the acoustic data of the frame is not targeted for noise reduction, and the acoustic data of the frame is used to determine the environmental sound threshold. Also do not use.

ステップＳ１００において、ＣＰＵ１６は、対象フレームにおける周波数特徴スペクトル（すなわち、上記約２０００の周波数成分）のうち、環境音閾値スペクトルの対応する周波数成分の値より大きい値を有する周波数成分があるか否かを判定する。ＣＰＵ１６は、当該フレームにおける周波数特徴スペクトルのうち、少なくとも１つの周波数成分で環境音閾値スペクトルの対応する周波数成分より大きな値を有する場合に、ステップＳ１００を肯定判定してステップＳ１１０へ進む。ＣＰＵ１６は、当該フレームにおける周波数特徴スペクトルが環境音閾値スペクトルの対応する周波数成分の値より大きい値を有していない場合には、ステップＳ１００を否定判定してステップＳ４０へ進む。ステップＳ１００を否定判定する場合は、当該フレームの音響データに含まれる可動部駆動音が許容範囲内と判断し、当該フレームの音響データを雑音低減対象としない。しかしながら、当該フレームの音響データは可動部駆動音を含むので、当該フレームの音響データを環境音閾値の決定には用いない。 In step S100, the CPU 16 determines whether or not there is a frequency component having a value larger than the value of the corresponding frequency component of the environmental sound threshold spectrum among the frequency feature spectrum in the target frame (that is, the frequency component of about 2000). judge. The CPU 16 makes a positive determination in step S100 and proceeds to step S110 when at least one frequency component of the frequency characteristic spectrum in the frame has a value larger than the corresponding frequency component of the environmental sound threshold spectrum. If the frequency feature spectrum in the frame does not have a value greater than the value of the corresponding frequency component of the environmental sound threshold spectrum, the CPU 16 makes a negative determination in step S100 and proceeds to step S40. When a negative determination is made in step S100, it is determined that the moving part drive sound included in the sound data of the frame is within the allowable range, and the sound data of the frame is not targeted for noise reduction. However, since the sound data of the frame includes the moving part driving sound, the sound data of the frame is not used for determining the environmental sound threshold.

ステップＳ１１０において、ＣＰＵ１６は、対象フレームにおける周波数特徴スペクトル（すなわち、上記約２０００の周波数成分）のうち、環境音閾値スペクトルの対応する周波数成分の値より大きい値を有する周波数成分の値を０に置換してステップＳ４０へ進む。０に抑圧することで、０に抑圧する前の値が非定常な音であっても過小減算や過大減算をすることがないので、雑音が残ったり（過小減算の場合）、音質劣化になったり（過大減算の場合）するおそれがない。 In step S110, the CPU 16 replaces the value of the frequency component having a value larger than the value of the corresponding frequency component of the environmental sound threshold spectrum with 0 in the frequency feature spectrum (that is, the frequency component of about 2000) in the target frame. Then, the process proceeds to step S40. By suppressing to 0, even if the value before suppressing to 0 is an unsteady sound, there is no under-subtraction or over-subtraction, so noise remains (in the case of under-subtraction) and sound quality deteriorates. (In the case of excessive subtraction).

以上説明した実施形態によれば、以下の作用効果が得られる。
（１）雑音低減を行う信号処理装置を搭載する電子カメラ１は、音響信号を所定フレームごとに周波数特徴スペクトルにフーリエ変換するＣＰＵ１６と、ＣＰＵ１６によって変換された周波数特徴スペクトルから、雑音低減対象とする周波数成分の信号を略０にするＣＰＵ１６と、０に抑圧後の周波数特徴スペクトルを音響信号に逆フーリエ変換するＣＰＵ１６とを備えるようにしたので、適切に雑音低減を行うことができる。すなわち、抑圧すべき雑音が非定常な音であっても対象とする周波数成分を確実に略０に抑えるため、スペクトル減算手法と異なり、雑音が残ったり（過小減算の場合）、音質劣化になったり（過大減算の場合）するおそれを排除できる。また、スペクトル減算をしないので、減算すべき雑音スペクトルの推定も不要である。 According to the embodiment described above, the following effects can be obtained.
(1) An electronic camera 1 equipped with a signal processing device that performs noise reduction is subject to noise reduction from a CPU 16 that Fourier transforms an acoustic signal into a frequency feature spectrum for each predetermined frame, and the frequency feature spectrum converted by the CPU 16. Since the CPU 16 that sets the frequency component signal to approximately 0 and the CPU 16 that performs inverse Fourier transform of the frequency feature spectrum after suppression to 0 to an acoustic signal are provided, noise reduction can be performed appropriately. That is, even if the noise to be suppressed is an unsteady sound, the frequency component of interest is surely suppressed to substantially zero, so that noise remains (in the case of under-subtraction) and the sound quality deteriorates unlike the spectral subtraction method. (In the case of excessive subtraction). Further, since no spectral subtraction is performed, it is not necessary to estimate a noise spectrum to be subtracted.

（２）上記（１）の電子カメラ１において、音響信号がフォーカシングレンズの駆動中に取得されたか否かを判定するＣＰＵ１６をさらに備え、ＣＰＵ１６は、上記肯定判定された音響信号に対応する周波数特徴スペクトルを対象に抑圧処理を行うようにしたので、可動部駆動に起因する音響信号の雑音を適切に低減することができる。 (2) The electronic camera 1 of (1) further includes a CPU 16 that determines whether or not an acoustic signal has been acquired during driving of the focusing lens, and the CPU 16 has a frequency characteristic corresponding to the affirmatively determined acoustic signal. Since the suppression process is performed on the spectrum, it is possible to appropriately reduce the noise of the acoustic signal due to the driving of the movable part.

（３）上記（２）の電子カメラ１において、ＣＰＵ１６がフーリエ変換した周波数特徴スペクトルに基づいて、環境音に応じた環境音閾値スペクトルを生成するＣＰＵ１６をさらに備える。この場合のＣＰＵ１６は、フーリエ変換した周波数特徴スペクトルのうち、環境音に応じた環境音閾値スペクトルより大となる周波数成分の信号を雑音低減対象とするので、雑音が許容範囲の場合には抑圧処理を制限できる。このため、雑音低減対象とする周波数成分の信号を常に抑圧する場合に比べて音質劣化を抑えることができる。 (3) The electronic camera 1 of the above (2) further includes a CPU 16 that generates an environmental sound threshold spectrum corresponding to the environmental sound based on the frequency characteristic spectrum Fourier-transformed by the CPU 16. In this case, the CPU 16 sets a signal having a frequency component that is larger than the environmental sound threshold spectrum corresponding to the environmental sound, among the frequency characteristic spectrum obtained by Fourier transform, as a noise reduction target. Can be limited. For this reason, it is possible to suppress deterioration in sound quality as compared with the case where the signal of the frequency component targeted for noise reduction is always suppressed.

（４）上記（３）の電子カメラ１において、ＣＰＵ１６は、異なるフレームでそれぞれフーリエ変換された複数フレームの周波数特徴スペクトルの各周波数成分ごとに最大値を抽出し、抽出した各周波数成分の信号で構成される周波数特徴スペクトルを環境音に応じた環境音閾値スペクトルとしたので、雑音が許容範囲か否かの判断基準を適切に求めることができる。 (4) In the electronic camera 1 of the above (3), the CPU 16 extracts the maximum value for each frequency component of the frequency feature spectrum of a plurality of frames each Fourier-transformed in different frames, and uses the extracted signal of each frequency component. Since the configured frequency feature spectrum is the environmental sound threshold spectrum corresponding to the environmental sound, it is possible to appropriately determine a criterion for determining whether or not the noise is within the allowable range.

（５）上記電子カメラ１において、ＣＰＵ１６は、音響信号がフォーカシングレンズの駆動中に取得されたか否かの判定で否定判定された音響信号に対応する周波数特徴スペクトルに基づいて環境音に応じた環境音閾値スペクトルを生成したので、フォーカシングレンズを非駆動中の環境音に応じた環境音閾値スペクトルを生成する。このため、雑音が許容範囲か否かの判断基準を適切に求めることができる。 (5) In the electronic camera 1, the CPU 16 determines the environment according to the environmental sound based on the frequency characteristic spectrum corresponding to the acoustic signal that is negatively determined in the determination as to whether or not the acoustic signal has been acquired during driving of the focusing lens. Since the sound threshold spectrum is generated, an environmental sound threshold spectrum corresponding to the environmental sound when the focusing lens is not driven is generated. For this reason, it is possible to appropriately determine a criterion for determining whether or not the noise is within an allowable range.

（６）上記電子カメラ１において、ＣＰＵ１６がフーリエ変換した周波数特徴スペクトルのフレーム内の周波数成分の信号値の総和が第２所定値より大か否かを判定するＣＰＵ１６をさらに備える。ＣＰＵ１６は、上記信号値の総和が第２所定値より小であることが判定された周波数特徴スペクトルに基づいて環境音に応じた環境音閾値スペクトルを生成する。このため、雑音が許容範囲か否かの判断基準を適切に求めることができる。 (6) The electronic camera 1 further includes a CPU 16 that determines whether the sum of the signal values of the frequency components in the frame of the frequency feature spectrum Fourier-transformed by the CPU 16 is greater than a second predetermined value. The CPU 16 generates an environmental sound threshold spectrum corresponding to the environmental sound based on the frequency feature spectrum determined that the sum of the signal values is smaller than the second predetermined value. For this reason, it is possible to appropriately determine a criterion for determining whether or not the noise is within an allowable range.

（７）上記電子カメラ１において、音響信号に音声が含まれるか否かを判定するＣＰＵ１６をさらに備える。ＣＰＵ１６は、音声を含まないことが判定された音響信号に対応する周波数特徴スペクトルに基づいて環境音に応じた環境音閾値スペクトルを生成する。このため、雑音が許容範囲か否かの判断基準を適切に求めることができる。 (7) The electronic camera 1 further includes a CPU 16 that determines whether sound is included in the acoustic signal. CPU16 produces | generates the environmental sound threshold spectrum according to environmental sound based on the frequency feature spectrum corresponding to the acoustic signal determined not to contain an audio | voice. For this reason, it is possible to appropriately determine a criterion for determining whether or not the noise is within an allowable range.

（８）上記電子カメラ１において、ＣＰＵ１６は、音声を含まないことが判定された音響信号に対応する周波数特徴スペクトルを対象に抑圧処理を行うようにしたので、音声情報を抑圧しないように制御することができる。 (8) In the electronic camera 1, since the CPU 16 performs the suppression process on the frequency feature spectrum corresponding to the acoustic signal determined not to include sound, the CPU 16 performs control so as not to suppress the sound information. be able to.

（９）上記電子カメラ１において、ＣＰＵ１６がフーリエ変換した連続するフレーム間の周波数特徴スペクトルの変化が第３所定値より大か否かを判定するＣＰＵ１６をさらに備える。ＣＰＵ１６は、連続するフレーム間の周波数特徴スペクトルの変化が第３所定値より小であることが判定された周波数特徴スペクトルを対象に抑圧処理を行うようにしたので、適切に雑音低減処理をすることができる。 (9) The electronic camera 1 further includes a CPU 16 that determines whether or not a change in frequency feature spectrum between successive frames subjected to Fourier transform by the CPU 16 is greater than a third predetermined value. Since the CPU 16 performs the suppression process on the frequency feature spectrum for which it is determined that the change in the frequency feature spectrum between successive frames is smaller than the third predetermined value, the noise reduction process is appropriately performed. Can do.

（変形例１）
音響データファイルを記憶媒体３０に記録する直前に図２に例示する雑音低減処理プログラムを起動する例を説明したが、音響データの取得と雑音低減処理とをリアルタイムに行うように構成してもよい。この場合のＣＰＵ１６は、１フレームごとに雑音低減処理を行い、該雑音低減処理後の音響データを逐次記録媒体３０に記録させることができる。 (Modification 1)
Although the example in which the noise reduction processing program illustrated in FIG. 2 is started immediately before recording the acoustic data file in the storage medium 30 has been described, the acquisition of the acoustic data and the noise reduction processing may be performed in real time. . In this case, the CPU 16 can perform noise reduction processing for each frame and sequentially record the acoustic data after the noise reduction processing on the recording medium 30.

（変形例２）
対象フレームの周波数特徴スペクトルを用いて環境音閾値を決定、更新する（ステップＳ８０）場合に、以下のように決定、更新してもよい。変形例２のＣＰＵ１６は、たとえば、音響データを構成する第１フレームから対象フレームまでの各フレームの周波数特徴スペクトルの総和（すなわち、上記約２０００の周波数成分ごとの振幅値の総和）が最小となるフレームを選ぶ。そして、当該フレームについての上記約２０００の周波数成分からなる周波数特徴スペクトルに第４所定値を乗じた周波数特徴スペクトルを従前の環境音閾値スペクトルと置換することによって環境音閾値スペクトルを更新する。 (Modification 2)
When the environmental sound threshold is determined and updated using the frequency feature spectrum of the target frame (step S80), it may be determined and updated as follows. For example, the CPU 16 of Modification 2 minimizes the sum of the frequency feature spectra of each frame from the first frame constituting the acoustic data to the target frame (that is, the sum of the amplitude values for each of the approximately 2000 frequency components). Select a frame. Then, the environmental sound threshold spectrum is updated by substituting the frequency characteristic spectrum obtained by multiplying the frequency characteristic spectrum composed of about 2000 frequency components for the frame by the fourth predetermined value with the previous environmental sound threshold spectrum.

変形例２によれば、突発的な音を含まない可能性が高いフレームの音響データに基づいて環境音閾値スペクトルの決定、更新をするので、環境音閾値スペクトルが特定の周波数成分で大きな値を有することを避けられる。このため、特定の周波数成分において雑音低減が不十分となってしまうおそれを低減できる。 According to the modified example 2, since the environmental sound threshold spectrum is determined and updated based on the acoustic data of the frame that is highly likely not to include sudden sound, the environmental sound threshold spectrum has a large value at a specific frequency component. You can avoid having. For this reason, it is possible to reduce the possibility of noise reduction being insufficient for a specific frequency component.

（変形例３）
対象フレームの周波数特徴スペクトルを用いて環境音閾値を決定、更新する（ステップＳ８０）場合に、以下のように決定、更新してもよい。変形例３のＣＰＵ１６は、たとえば、音響データを構成する第１フレームから対象フレームまでの各フレームの周波数特徴スペクトルを用いて、上記約２０００の周波数成分ごとに振幅値の平均値を算出する。そして、各周波数成分の平均値からなる周波数特徴スペクトルに第５所定値を乗じた周波数特徴スペクトルを従前の環境音閾値スペクトルと置換することによって環境音閾値スペクトルを更新する。 (Modification 3)
When the environmental sound threshold is determined and updated using the frequency feature spectrum of the target frame (step S80), it may be determined and updated as follows. CPU16 of the modification 3 calculates the average value of an amplitude value for every said about 2000 frequency component, for example using the frequency characteristic spectrum of each flame | frame from the 1st frame which comprises acoustic data to an object frame. Then, the environmental sound threshold spectrum is updated by replacing the frequency characteristic spectrum obtained by multiplying the frequency characteristic spectrum formed by the average value of each frequency component by the fifth predetermined value with the previous environmental sound threshold spectrum.

変形例３によれば、あるフレームに突発的に含まれる音の成分を平準化するように環境音閾値スペクトルの決定、更新をするので、環境音閾値スペクトルが特定の周波数成分で大きな値を有することを避けられる。このため、特定の周波数成分において雑音低減が不十分となってしまうおそれを低減できる。 According to the third modification, the environmental sound threshold spectrum is determined and updated so that the sound component suddenly included in a certain frame is leveled. Therefore, the environmental sound threshold spectrum has a large value at a specific frequency component. You can avoid that. For this reason, it is possible to reduce the possibility of noise reduction being insufficient for a specific frequency component.

（変形例４）
上述した音響データに音声が含まれているか否かの判定（ステップＳ３０）において、音声信号の自己相関関数のピーク値を用いて検出してもよい。変形例４のＣＰＵ１６は、音声信号の基本周波数（たとえば５０Hz〜４００Hz）の１周期に相当する時間遅れ領域（サンプリング周波数が４４．１kHzの場合、約８８０Point〜約１１０Point）における自己相関関数のピーク値を求める。そして、ＣＰＵ１６は、ピーク値が所定の判定閾値より大きい場合に音声が含まれていると判定する。ピーク値が所定の判定閾値より小さい場合には、音声が含まれていないと判定する。 (Modification 4)
In the above-described determination of whether or not sound is included in the acoustic data (step S30), detection may be performed using the peak value of the autocorrelation function of the sound signal. The CPU 16 of the modification 4 uses the peak value of the autocorrelation function in a time delay region (about 880 points to about 110 points when the sampling frequency is 44.1 kHz) corresponding to one period of the fundamental frequency (for example, 50 Hz to 400 Hz) of the audio signal. Ask for. And CPU16 determines with the audio | voice being included when a peak value is larger than a predetermined determination threshold value. When the peak value is smaller than a predetermined determination threshold, it is determined that no sound is included.

（変形例５）
あるいは、音声スペクトル波形（横軸が信号周波数、縦軸が信号の振幅）の包絡線を用いて検出してもよい。変形例５のＣＰＵ１６は、線形予測分析等によって音声スペクトル波形の包絡線を求める。そして、ＣＰＵ１６は、包絡線の時間変化量が所定の判定閾値より大きい場合に音声が含まれていると判定する。包絡線の時間変化量が所定の判定閾値より小さい場合には、音声が含まれていないと判定する。 (Modification 5)
Or you may detect using the envelope of a speech spectrum waveform (a horizontal axis is a signal frequency and a vertical axis | shaft is an amplitude of a signal). The CPU 16 of the modified example 5 obtains the envelope of the speech spectrum waveform by linear prediction analysis or the like. And CPU16 determines with the audio | voice being included, when the amount of time change of an envelope is larger than a predetermined determination threshold value. When the amount of time variation of the envelope is smaller than a predetermined determination threshold, it is determined that no sound is included.

（変形例６）
上述した実施形態では、電子カメラ１に雑音低減用の信号処理装置を搭載する例を説明したが、信号処理装置をパーソナルコンピュータによって構成するようにしてもよい。図２に例示したフローチャートに基づく処理を行うプログラムを図３に示すコンピュータ装置１００に実行させることにより、信号処理装置を構成する。プログラムをパーソナルコンピュータ１００に取込んで使用する場合には、パーソナルコンピュータ１００のデータストレージ装置にプログラムをローディングした上で、当該プログラムを実行させることによって信号処理装置として使用する。 (Modification 6)
In the embodiment described above, an example in which a signal processing device for noise reduction is mounted on the electronic camera 1 has been described. However, the signal processing device may be configured by a personal computer. A signal processing apparatus is configured by causing the computer apparatus 100 illustrated in FIG. 3 to execute a program that performs processing based on the flowchart illustrated in FIG. When the program is used by being loaded into the personal computer 100, the program is loaded into the data storage device of the personal computer 100 and then used as a signal processing device by executing the program.

パーソナルコンピュータ１００に対するプログラムのローディングは、プログラムを格納したＣＤ−ＲＯＭなどの記憶媒体１０４をパーソナルコンピュータ１００にセットして行ってもよいし、ネットワークなどの通信回線１０１を経由する方法でパーソナルコンピュータ１００へローディングしてもよい。通信回線１０１を経由する場合は、通信回線１０１に接続されたサーバー（コンピュータ）１０２のハードディスク装置１０３などにプログラムを格納しておく。プログラムは、記憶媒体１０４や通信回線１０１を介する提供など、種々の形態のコンピュータプログラム製品として供給することができる。 The loading of the program to the personal computer 100 may be performed by setting a storage medium 104 such as a CD-ROM storing the program in the personal computer 100, or to the personal computer 100 by a method via the communication line 101 such as a network. You may load. When passing through the communication line 101, the program is stored in the hard disk device 103 of the server (computer) 102 connected to the communication line 101. The program can be supplied as various forms of computer program products such as provision via the storage medium 104 or the communication line 101.

以上説明した変形例６によれば、上記実施形態と同様の作用効果が得られる。なお、パーソナルコンピュータによって構成する他にも、再生機能を備えるデジタルフォトフレームやプロジェクタ等にも上述した雑音低減処理をさせるように構成して構わない。 According to the modified example 6 described above, the same function and effect as the above embodiment can be obtained. In addition to the configuration using a personal computer, a digital photo frame or a projector having a playback function may be configured to perform the above-described noise reduction processing.

以上の説明はあくまで一例であり、上記の実施形態の構成に何ら限定されるものではない。 The above description is merely an example, and is not limited to the configuration of the above embodiment.

１…電子カメラ
１４…ＲＡＭ
１５…ＬＣＤモニタ
１６…ＣＰＵ
１８…カードインターフェース
２０…操作部材
２１…マイク
２２…音響処理回路
２３…レンズ駆動機構
３０…記憶媒体
１００…パーソナルコンピュータ 1 ... Electronic camera 14 ... RAM
15 ... LCD monitor 16 ... CPU
18 ... Card interface 20 ... Operation member 21 ... Microphone 22 ... Acoustic processing circuit 23 ... Lens drive mechanism 30 ... Storage medium 100 ... Personal computer

Claims

Conversion means for converting an acoustic signal into a spectrum signal for each predetermined frame;
Spectrum suppression means for reducing the signal of the frequency component to be noise-reduced from the spectrum signal converted by the conversion means to substantially zero;
Inverse conversion means for inversely converting the spectrum signal after spectrum suppression into an acoustic signal;
A signal processing apparatus comprising:

The signal processing device according to claim 1,
It further comprises an acquisition timing determination means for determining whether or not the acoustic signal is acquired at a noise generation timing,
The signal processing apparatus according to claim 1, wherein the spectrum suppression unit performs the spectrum suppression on a spectrum signal corresponding to the acoustic signal that has been affirmed by the acquisition timing determination unit.

The signal processing device according to claim 2,
Based on the spectrum signal converted by the converting means, further comprising an environmental sound threshold value generating means for generating a spectral signal corresponding to the environmental sound,
The signal processing apparatus according to claim 1, wherein the spectrum suppression unit sets a signal having a frequency component that is larger than a spectrum signal corresponding to the environmental sound, among the spectrum signals converted by the conversion unit, as the noise reduction target.

The signal processing device according to claim 3.
The environmental sound threshold generation unit extracts a maximum value for each frequency component of a plurality of frames of spectrum signals converted in different frames by the conversion unit, and extracts a spectrum signal composed of the extracted signals of each frequency component. A signal processing apparatus characterized in that it is a spectrum signal corresponding to the environmental sound.

The signal processing device according to claim 3.
The environmental sound threshold generation unit multiplies a predetermined value by a spectral signal of a frame in which the sum of signal values of frequency components in the frame is minimum among the spectral signals of a plurality of frames converted in different frames by the conversion unit. A signal processing apparatus characterized in that the spectrum signal corresponds to the environmental sound.

The signal processing device according to claim 3.
The environmental sound threshold generation unit is configured by a signal obtained by calculating an average value for each frequency component of a plurality of frames of spectrum signals converted in different frames by the conversion unit, and multiplying each average value by a predetermined value. A signal processing apparatus characterized in that a spectrum signal is a spectrum signal corresponding to the environmental sound.

In the signal processing device according to any one of claims 3 to 6,
The environmental sound threshold generation unit generates a spectral signal corresponding to the environmental sound based on a spectral signal corresponding to the acoustic signal that is negatively determined by the acquisition timing determination unit.

In the signal processing device according to any one of claims 3 to 7,
A sum total determining means for determining whether or not a sum of signal values of frequency components in a frame of the spectrum signal converted by the converting means is larger than a first determination threshold;
The signal processing apparatus according to claim 1, wherein the environmental sound threshold generation unit generates a spectral signal corresponding to the environmental sound based on the spectral signal that is negatively determined by the sum total determination unit.

In the signal processing device according to any one of claims 3 to 8,
Voice determination means for determining whether or not sound is included in the acoustic signal;
The environmental sound threshold generation unit generates a spectral signal corresponding to the environmental sound based on a spectral signal corresponding to the acoustic signal that is negatively determined by the voice determination unit.

The signal processing device according to claim 9,
The signal processing apparatus, wherein the spectrum suppression unit performs the spectrum suppression on a spectrum signal corresponding to an acoustic signal that is negatively determined by the voice determination unit.

The signal processing device according to claim 10,
A change determination means for determining whether or not a change in the spectrum signal between successive frames converted by the conversion means is greater than a second determination threshold;
The signal processing apparatus, wherein the spectrum suppression unit performs the spectrum suppression on a spectrum signal that is negatively determined by the change determination unit.

A camera comprising the signal processing device according to claim 1.

A first process for reading an acoustic signal;
A second process for converting the acoustic signal into a spectrum signal for each predetermined frame;
A third process for setting the frequency component signal to be reduced in noise from the converted spectrum signal to substantially zero;
A fourth process for inversely converting the spectral signal after the third process into an acoustic signal;
A signal processing program for causing a computer to execute.