JP5034605B2

JP5034605B2 - Imaging apparatus, noise removal method, and program

Info

Publication number: JP5034605B2
Application number: JP2007089663A
Authority: JP
Inventors: 孝夫菅家
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2007-03-29
Filing date: 2007-03-29
Publication date: 2012-09-26
Anticipated expiration: 2027-03-29
Also published as: JP2008252389A

Description

本発明は、デジタルカメラ等の撮像装置に係り、特に撮影中に入力された音声信号を撮影画像と共に記録可能な機能を備えた撮像装置と、この撮像装置に用いられる雑音除去方法及びプログラムに関する。 The present invention relates to an image pickup apparatus such as a digital camera, and more particularly to an image pickup apparatus having a function capable of recording an audio signal input during shooting together with a shot image, and a noise removal method and program used for the image pickup apparatus.

例えば、音声記録機能を備えた撮像装置では、録画中にズーム操作を行うと、ズーム音（ズームモータの駆動音）が音声信号に雑音として入り込む。ここで、音声信号に重畳した雑音を除去するための手法として、スペクトルサブトラクション（ｓｐｅｃｔｒａｌｓｕｂｔｒａｃｔｉｏｎ）法が知られている。 For example, in an imaging apparatus having an audio recording function, when a zoom operation is performed during recording, a zoom sound (a driving sound of a zoom motor) enters the audio signal as noise. Here, a spectral subtraction method is known as a method for removing noise superimposed on an audio signal.

スペクトルサブトラクション法（以下、ＳＳ法と呼ぶ）とは、無音区間におけるスペクトルを雑音スペクトルと推定し、その雑音スペクトルに所定の係数（サブトラクト係数α）を乗じた信号を入力音声スペクトルから差し引くことで雑音成分を除去する方法である。 The spectral subtraction method (hereinafter referred to as the SS method) is a method of estimating noise in a silent section as a noise spectrum, and subtracting a signal obtained by multiplying the noise spectrum by a predetermined coefficient (subtract coefficient α) from the input speech spectrum. This is a method for removing components.

ここで、サブトラクト係数αは、雑音を抑制するために大きな値を取る。このため、音声と雑音とのレベル差によっては、スペクトル減算によって出力がゼロ以下になることもある。これを防ぐため、通常、フロアリング係数βと呼ばれる下限値設定用の係数が用いられる（例えば、特許文献１参照）。 Here, the subtract coefficient α takes a large value in order to suppress noise. For this reason, depending on the level difference between voice and noise, the output may become zero or less due to spectral subtraction. In order to prevent this, a coefficient for setting a lower limit value called a flooring coefficient β is usually used (see, for example, Patent Document 1).

今、雑音のスペクトルをＮ(ω)、雑音が混じった入力音声のスペクトルをＹ(ω)、雑音除去後の音声スペクトルをＳ(ω)とすると、次式で表される。 If the noise spectrum is N (ω), the spectrum of the input speech mixed with noise is Y (ω), and the speech spectrum after noise removal is S (ω),

Ｓ(ω)＝max(Ｙ(ω)−α・Ｎ(ω)，β・Ｙ(ω)) …（１）
前記（１）式において、max(Ｙ(ω)−α・Ｎ(ω)，β・Ｙ(ω))は、「Ｙ(ω)−α・Ｎ(ω)」と「β・Ｙ(ω)」のうち、値の大きい方を出力することを表す。αはサブトラクト係数（過推定係数とも呼ぶ）であり、通常、“１”よりも大きい固定値が用いられる。βはフロアリング係数であり、例えば０．０１〜０．１程度の固定値が用いられる。 S (ω) = max (Y (ω) −α · N (ω), β · Y (ω)) (1)
In the equation (1), max (Y (ω) −α · N (ω), β · Y (ω)) is “Y (ω) −α · N (ω)” and “β · Y (ω ) "Indicates that the larger value is output. α is a subtract coefficient (also referred to as an overestimation coefficient), and a fixed value larger than “1” is usually used. β is a flooring coefficient, for example, a fixed value of about 0.01 to 0.1 is used.

このように、入力音声をβ倍した値が下限値として設定され、スペクトル減算を行ったときに、その減算結果が下限値を下回らないように出力制御がなされる。
特開２００１−２２８８９２号公報 In this way, a value obtained by multiplying the input speech by β is set as the lower limit value, and when spectral subtraction is performed, output control is performed so that the subtraction result does not fall below the lower limit value.
JP 2001-228892 A

上述したスペクトル減算は、モータの駆動タイミングでのみ実行され、また、サブトラクト係数αは１より大きな値であるため、モータの駆動前後に原音声がまたがっていると、雑音としての駆動音だけでなく、原音声のスペクトルも含めて過剰に減算してしまうことがある。 The spectral subtraction described above is executed only at the drive timing of the motor, and since the subtract coefficient α is a value larger than 1, if the original voice is straddling before and after the motor is driven, not only the drive sound as noise In some cases, excessive subtraction is performed, including the spectrum of the original voice.

ここで、原音声の音量が小さければ、モータ駆動期間中に入力音声から雑音成分を減算したとしても（Ｙ(ω)−α・Ｎ(ω)）、元々の音量が小さいため、大きな音量変化は生じないが、原音声の音量が大きいと、減算後の原音声の音量が極端に低くなるため、違和感を生じるといった問題がある。この場合、フロアリング係数βによって下限値が補償されているが、原音声の音量に関係なく一定の値に設定されているため、上述したような原音声の極端な音量変化を抑えることはできない。 Here, if the volume of the original voice is small, even if the noise component is subtracted from the input voice during the motor driving period (Y (ω) −α · N (ω)), the original volume is small, so that the volume changes greatly. However, if the volume of the original sound is large, the volume of the original sound after subtraction becomes extremely low, and there is a problem that a sense of incongruity occurs. In this case, the lower limit value is compensated by the flooring coefficient β, but since it is set to a constant value regardless of the volume of the original sound, it is not possible to suppress the extreme volume change of the original sound as described above. .

本発明は前記のような点に鑑みなされたもので、雑音源となる機構部の駆動前後における原音声の音量変化を抑えて、入力音声から雑音を適切に除去することのできる撮像装置、雑音除去方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and an imaging apparatus capable of appropriately removing noise from input speech by suppressing a change in volume of the original speech before and after driving of a mechanism unit serving as a noise source. It is an object to provide a removal method and a program.

本発明の撮像装置は、撮影時に音声信号を記録可能な撮像装置において、撮影操作に伴って駆動される機構部と、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段と、音声信号の録音レベルを制御する録音レベル制御手段とを備え、前記フロアリング係数制御手段は、前記録音レベル制御手段によって設定された録音レベルが大きいほど、前記フロアリング係数を小さくし、前記録音レベルが小さいほど、前記フロアリング係数を大きく制御することを特徴とする。 An imaging apparatus according to the present invention is an imaging apparatus capable of recording an audio signal at the time of shooting, a mechanism unit that is driven in accordance with a shooting operation, a conversion unit that converts an input audio signal into a spectrum signal, Storage means for storing a noise spectrum signal obtained by spectralizing the sound generated during the driving of the mechanism, and a predetermined noise spectrum signal stored in the storage means from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven. And subtracting the signal multiplied by the coefficient, and setting the value obtained by multiplying the spectrum signal of the input speech by the flooring coefficient as the lower limit value, and adding the value obtained by the subtraction process and the larger of the lower limit values as noise Spectral subtracting means for outputting as a speech spectrum signal after removal, and acquiring the power value of the speech signal input before driving the mechanism unit Comprising a power acquisition unit, a flooring coefficient control means for controlling the flooring coefficient based on the power value of the acquired voice signal by the power acquiring unit, and a recording level control means for controlling the recording level of the audio signal The flooring coefficient control means reduces the flooring coefficient as the recording level set by the recording level control means increases, and controls the flooring coefficient as the recording level decreases. And

また、本発明の他の態様による撮像装置は、撮影時に音声信号を記録可能な撮像装置において、撮影操作に伴って駆動される機構部と、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段と、前記機構部の駆動前後に所定の遷移期間を設定する遷移期間設定手段とを備え、前記フロアリング係数制御手段は、前記遷移期間設定手段によって設定された前記機構部の駆動前の遷移期間では前記フロアリング係数を所定の値から前記音声信号の平均パワー値に応じた最適値に緩やかに変化させ、前記機構部の駆動後の遷移期間では前記フロアリング係数を前記最適値から前記所定の値に緩やかに変化させることを特徴とする。Further, an imaging apparatus according to another aspect of the present invention is an imaging apparatus capable of recording an audio signal at the time of shooting, and a mechanism unit that is driven in accordance with the shooting operation, and a conversion unit that converts the input voice signal into a spectrum signal. Storage means storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion unit when the mechanism unit is driven into the storage unit. And subtracting a signal obtained by multiplying the noise spectrum signal by a predetermined coefficient, and setting a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient as a lower limit value, and the value obtained by the subtraction process and the lower limit Spectral subtracting means for outputting the larger value as a speech spectrum signal after noise removal, and a speech signal input before driving the mechanism section. Power acquisition means for acquiring the power value of the sound signal, flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means, and a predetermined value before and after driving the mechanism unit. A transition period setting means for setting a transition period, wherein the flooring coefficient control means sets the flooring coefficient from a predetermined value in the transition period before driving of the mechanism unit set by the transition period setting means. The flooring coefficient is gradually changed from the optimum value to the predetermined value during a transition period after the mechanism unit is driven, and is gradually changed to an optimum value corresponding to the average power value of the audio signal. .

また、本発明の他の態様による撮像装置は、撮影時に音声信号を記録可能な撮像装置において、撮影操作に伴って駆動される機構部と、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段とを備え、前記記憶手段に記憶された雑音スペクトル信号の特定の周波数を所定の値に置き換えておくことにより、前記スペクトル減算手段は、前記雑音スペクトル信号から前記所定の値を検出した場合に出力をゼロにすることを特徴とする。Further, an imaging apparatus according to another aspect of the present invention is an imaging apparatus capable of recording an audio signal at the time of shooting, and a mechanism unit that is driven in accordance with the shooting operation, and a conversion unit that converts the input voice signal into a spectrum signal. Storage means storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion unit when the mechanism unit is driven into the storage unit. And subtracting a signal obtained by multiplying the noise spectrum signal by a predetermined coefficient, and setting a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient as a lower limit value, and the value obtained by the subtraction process and the lower limit Spectral subtracting means for outputting the larger value as a speech spectrum signal after noise removal, and a speech signal input before driving the mechanism section. And a flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means, and stored in the storage means. By replacing a specific frequency of the noise spectrum signal with a predetermined value, the spectrum subtracting means sets the output to zero when the predetermined value is detected from the noise spectrum signal.

また、本発明の他の態様による撮像装置は、撮影時に音声信号を記録可能な撮像装置において、撮影操作に伴って駆動される機構部と、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段と、前記記憶手段に記憶された雑音スペクトル信号の特定の周波数を指定するための制御データを記憶した制御データ記憶手段とを備え、前記スペクトル減算手段は、前記音声信号のパワーが所定の値よりも小さい場合に、前記制御データ記憶手段に記憶された制御データに基づいて、前記特定の周波数の出力をゼロにすることを特徴とする。Further, an imaging apparatus according to another aspect of the present invention is an imaging apparatus capable of recording an audio signal at the time of shooting, and a mechanism unit that is driven in accordance with the shooting operation, and a conversion unit that converts the input voice signal into a spectrum signal. Storage means storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion unit when the mechanism unit is driven into the storage unit. And subtracting a signal obtained by multiplying the noise spectrum signal by a predetermined coefficient, and setting a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient as a lower limit value, and the value obtained by the subtraction process and the lower limit Spectral subtracting means for outputting the larger value as a speech spectrum signal after noise removal, and a speech signal input before driving the mechanism section. Power acquisition means for acquiring the power value of the sound signal, flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means, and the noise spectrum stored in the storage means Control data storage means storing control data for designating a specific frequency of the signal, and the spectrum subtraction means stores the control data storage means when the power of the audio signal is smaller than a predetermined value. The output of the specific frequency is made zero based on the stored control data.

また、本発明の雑音除去方法は、音声付き動画撮影を行う場合に、入力音声から撮影操作に伴って駆動部から発生する音を雑音として除去する雑音除去方法であって、前記入力音声をスペクトル信号に変換するステップと、前記スペクトル変換によって得られた入力音声スペクトル信号から、予め前記機構音をスペクトル化して得られた機構音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理の結果と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するステップと、前記機構部の駆動前に入力された音声信号のパワー値を取得するステップと、音声信号の録音レベルを制御するステップと、前記音声信号のパワー値に基づいて前記フロアリング係数を制御し、設定された録音レベルが大きいほど、前記フロアリング係数を小さくし、設定された録音レベルが小さいほど、前記フロアリング係数を大きく制御するステップとを備えたことを特徴とすることを特徴とする。The noise removal method of the present invention is a noise removal method for removing, as noise, sound generated from a drive unit in accordance with a shooting operation from input sound when moving image recording with sound is performed. Converting the signal into a signal, and subtracting a signal obtained by multiplying a mechanism sound spectrum signal obtained by previously spectralizing the mechanism sound from the input sound spectrum signal obtained by the spectrum conversion by a predetermined coefficient, and A value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient is set as a lower limit value, and the step of outputting the result of the subtraction process and the larger of the lower limit value as a speech spectrum signal after noise removal, Obtaining a power value of an audio signal input before driving; controlling a recording level of the audio signal; and The flooring coefficient is controlled based on the power value of the control unit, the flooring coefficient is decreased as the set recording level is larger, and the flooring coefficient is controlled as the set recording level is smaller. It is characterized by having provided.

また、本発明のプログラムは、撮影時に音声信号を記録可能であり、撮影操作に伴って駆動される機構部を備えた撮像装置のコンピュータを、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段と、音声信号の録音レベルを制御する録音レベル制御手段として機能させ、前記フロアリング係数制御手段は、前記録音レベル制御手段によって設定された録音レベルが大きいほど、前記フロアリング係数を小さくし、前記録音レベルが小さいほど、前記フロアリング係数を大きく制御することを特徴とする。Further, the program of the present invention is capable of recording an audio signal at the time of shooting, and converts the input audio signal into a spectrum signal from a computer of an image pickup apparatus provided with a mechanism unit driven in accordance with a shooting operation. Storage means storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion unit when the mechanism unit is driven into the storage unit. And subtracting a signal obtained by multiplying the noise spectrum signal by a predetermined coefficient, and setting a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient as a lower limit value, and the value obtained by the subtraction process and the lower limit Spectral subtracting means for outputting the larger value as a speech spectral signal after noise removal, and input before driving the mechanism section Power acquisition means for acquiring the power value of the voice signal, flooring coefficient control means for controlling the flooring coefficient based on the power value of the voice signal acquired by the power acquisition means, and control of the recording level of the voice signal The flooring coefficient control means reduces the flooring coefficient as the recording level set by the recording level control means increases, and decreases the flooring coefficient as the recording level decreases. The coefficient is controlled largely.

また、本発明の他の態様によるプログラムは、撮影時に音声信号を記録可能であり、撮影操作に伴って駆動される機構部を備えた撮像装置のコンピュータを、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段と、前記機構部の駆動前後に所定の遷移期間を設定する遷移期間設定手段として機能させ、前記フロアリング係数制御手段は、前記遷移期間設定手段によって設定された前記機構部の駆動前の遷移期間では前記フロアリング係数を所定の値から前記音声信号の平均パワー値に応じた最適値に緩やかに変化させ、前記機構部の駆動後の遷移期間では前記フロアリング係数を前記最適値から前記所定の値に緩やかに変化させることを特徴とする。In addition, the program according to another aspect of the present invention can record an audio signal at the time of shooting, and can convert a computer of an imaging apparatus having a mechanism unit driven by a shooting operation into an input audio signal as a spectrum signal. Conversion means for converting, storage means for storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven A signal obtained by multiplying the noise spectrum signal stored in the means by a predetermined coefficient is subtracted, and a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient is set as a lower limit value, and obtained by the subtraction process. Subtracting means for outputting the greater of the value and the lower limit value as a speech spectrum signal after noise removal, and driving of the mechanism unit Power acquisition means for acquiring the power value of the audio signal input to the power, flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means, and the mechanism section Functioning as a transition period setting means for setting a predetermined transition period before and after driving, and the flooring coefficient control means sets the flooring coefficient in the transition period before driving of the mechanism unit set by the transition period setting means. Is gradually changed from a predetermined value to an optimum value corresponding to the average power value of the audio signal, and the flooring coefficient is gradually changed from the optimum value to the predetermined value in the transition period after the mechanism unit is driven. It is characterized by making it.

また、本発明の他の態様によるプログラムは、撮影時に音声信号を記録可能であり、撮影操作に伴って駆動される機構部を備えた撮像装置のコンピュータを、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段として機能させ、前記記憶手段に記憶された雑音スペクトル信号の特定の周波数を所定の値に置き換えておくことにより、前記スペクトル減算手段は、前記雑音スペクトル信号から前記所定の値を検出した場合に出力をゼロにすることを特徴とする。In addition, the program according to another aspect of the present invention can record an audio signal at the time of shooting, and can convert a computer of an imaging apparatus having a mechanism unit driven by a shooting operation into an input audio signal as a spectrum signal. Conversion means for converting, storage means for storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven A signal obtained by multiplying the noise spectrum signal stored in the means by a predetermined coefficient is subtracted, and a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient is set as a lower limit value, and obtained by the subtraction process. Subtracting means for outputting the greater of the value and the lower limit value as a speech spectrum signal after noise removal, and driving of the mechanism unit Power acquisition means for acquiring the power value of the audio signal input to the power signal, and function as flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means, By replacing a specific frequency of the noise spectrum signal stored in the storage means with a predetermined value, the spectrum subtraction means sets the output to zero when the predetermined value is detected from the noise spectrum signal. It is characterized by.

また、本発明の他の態様によるプログラムは、撮影時に音声信号を記録可能であり、撮影操作に伴って駆動される機構部を備えた撮像装置のコンピュータを、入力された音声信号をスペクトル信号に変換する変換手段と、予め前記機構部の駆動時に発生する音をスペクトル化した雑音スペクトル信号を記憶した記憶手段と、前記機構部の駆動時に前記変換手段によって得られる入力音声のスペクトル信号から前記記憶手段に記憶された雑音スペクトル信号に所定の係数を乗じた信号を減算処理すると共に、前記入力音声のスペクトル信号にフロアリング係数を乗じた値を下限値として設定し、前記減算処理により得られた値と前記下限値の大きい方を雑音除去後の音声スペクトル信号として出力するスペクトル減算手段と、前記機構部の駆動前に入力された音声信号のパワー値を取得するパワー取得手段と、このパワー取得手段によって取得された音声信号のパワー値に基づいて前記フロアリング係数を制御するフロアリング係数制御手段と、前記記憶手段に記憶された雑音スペクトル信号の特定の周波数を指定するための制御データを記憶した制御データ記憶手段として機能させ、前記スペクトル減算手段は、前記音声信号のパワーが所定の値よりも小さい場合に、前記制御データ記憶手段に記憶された制御データに基づいて、前記特定の周波数の出力をゼロにすることを特徴とする。In addition, the program according to another aspect of the present invention can record an audio signal at the time of shooting, and can convert a computer of an imaging apparatus having a mechanism unit driven by a shooting operation into an input audio signal as a spectrum signal. Conversion means for converting, storage means for storing in advance a noise spectrum signal obtained by spectralizing the sound generated when the mechanism unit is driven, and storing the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven A signal obtained by multiplying the noise spectrum signal stored in the means by a predetermined coefficient is subtracted, and a value obtained by multiplying the spectrum signal of the input speech by a flooring coefficient is set as a lower limit value, and obtained by the subtraction process. Subtracting means for outputting the greater of the value and the lower limit value as a speech spectrum signal after noise removal, and driving of the mechanism unit Power acquisition means for acquiring the power value of the audio signal input to the power, flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means, and the storage means Functioning as control data storage means storing control data for designating a specific frequency of the noise spectrum signal stored in the above, the spectrum subtraction means, when the power of the audio signal is smaller than a predetermined value, The output of the specific frequency is made zero based on the control data stored in the control data storage means.

本発明によれば、雑音源となる機構部の駆動前後における原音声の音量変化を抑えて、原音声に含まれる雑音を適切に除去することができる。 ADVANTAGE OF THE INVENTION According to this invention, the noise contained in an original audio | voice can be removed appropriately, suppressing the volume change of the original audio | voice before and after the drive of the mechanism part used as a noise source.

以下、図面を参照して本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は本発明の撮像装置としてデジタルカメラを例にした場合の外観構成を示す図であり、図１（ａ）は主に前面の構成、同図（ｂ）は主に背面の構成を示す斜視図である。 (First embodiment)
1A and 1B are diagrams showing an external configuration when a digital camera is taken as an example of the imaging apparatus of the present invention. FIG. 1A mainly shows a front configuration, and FIG. 1B mainly shows a rear configuration. It is a perspective view.

このデジタルカメラ１は、略矩形の薄板状ボディ２の前面に、撮影レンズ３、セルフタイマランプ４、光学ファインダ窓５、ストロボ発光部６、マイクロホン部７などを有し、上面の（ユーザにとって）右端側には電源キー８及びシャッタキー９などが設けられている。 The digital camera 1 has a photographing lens 3, a self-timer lamp 4, an optical finder window 5, a strobe light emitting unit 6, a microphone unit 7 and the like on the front surface of a substantially rectangular thin plate-like body 2 on the upper surface (for the user). On the right end side, a power key 8 and a shutter key 9 are provided.

電源キー８は、電源のオン／オフ毎に操作するキーであり、シャッタキー９は、撮影時に撮影タイミングを指示するキーである。 The power key 8 is a key operated every time the power is turned on / off, and the shutter key 9 is a key for instructing a photographing timing at the time of photographing.

また、デジタルカメラ１の背面には、撮影モード（Ｒ）キー１０、再生モード（Ｐ）キー１１、光学ファインダ１２、スピーカ部１３、マクロキー１４、ストロボキー１５、メニュー（ＭＥＮＵ）キー１６、リングキー１７、セット（ＳＥＴ）キー１８、表示部１９などが設けられている。 Also, on the back of the digital camera 1, a shooting mode (R) key 10, a playback mode (P) key 11, an optical viewfinder 12, a speaker unit 13, a macro key 14, a strobe key 15, a menu (MENU) key 16, a ring A key 17, a set (SET) key 18, a display unit 19, and the like are provided.

撮影モードキー１０は、電源オフの状態から操作することで自動的に電源オンとして静止画の撮影モードに移行する一方で、電源オンの状態から繰返し操作することで、静止画モード、動画モードを循環的に設定する。静止画モードは、静止画を撮影するためのモードである。また、動画モードは、動画を撮影するためのモードであり、特に本実施形態では音声付き動画撮影が可能であるとする。 The shooting mode key 10 is operated automatically from the power-off state to automatically turn on the power and shift to the still image shooting mode. On the other hand, by repeatedly operating from the power-on state, the still image mode and the moving image mode are switched. Set cyclically. The still image mode is a mode for photographing a still image. The moving image mode is a mode for shooting a moving image. In particular, in this embodiment, it is assumed that moving image shooting with sound is possible.

前記シャッタキー９は、これらの撮影モードに共通に使用される。すなわち、静止画モードでは、シャッタキー９が押下されたときのタイミングで静止画の撮影が行われる。動画モードでは、シャッタキー９が押下されたときのタイミングで動画の撮影が開始され、シャッタキー９が再度押下されたときにその動画の撮影が終了する。 The shutter key 9 is commonly used for these photographing modes. That is, in the still image mode, a still image is taken at the timing when the shutter key 9 is pressed. In the moving image mode, shooting of a moving image is started at a timing when the shutter key 9 is pressed, and shooting of the moving image is ended when the shutter key 9 is pressed again.

再生モードキー１１は、電源オフの状態から操作することで自動的に電源オンとして再生モードに移行する。 When the playback mode key 11 is operated from the power-off state, the playback mode key 11 is automatically turned on to enter the playback mode.

マクロキー１４は、静止画の撮影モードで通常撮影とマクロ撮影とを切換える際に操作する。ストロボキー１５は、ストロボ発光部６の発光モードを切換える際に操作する。メニューキー１６は、各種メニュー項目等を選択する際に操作する。リングキー１７は、上下左右各方向への項目選択用のキーが一体に形成されたものであり、このリングキー１７の中央に位置するセットキー１８は、その時点で選択されている項目を設定する際に操作する。 The macro key 14 is operated when switching between normal shooting and macro shooting in the still image shooting mode. The strobe key 15 is operated when switching the light emission mode of the strobe light emitting unit 6. The menu key 16 is operated when selecting various menu items. The ring key 17 is integrally formed with item selection keys in the up, down, left, and right directions, and the set key 18 located in the center of the ring key 17 sets the item selected at that time. To operate.

表示部１９は、バックライト付きのカラー液晶パネルで構成されるもので、撮影モード時には電子ファインダとしてスルー画像のモニタ表示を行う一方で、再生モード時には選択した画像等を再生表示する。 The display unit 19 is composed of a color liquid crystal panel with a backlight, and displays a through image on the monitor as an electronic viewfinder in the photographing mode, and reproduces and displays the selected image and the like in the reproduction mode.

また、このデジタルカメラ１には、光学ズーム機能が備えられており、ズームキー２０ａ，２０ｂの操作により焦点距離を物理的に変化させて画像の拡大率を変更することができる。ズームキー２０ａ，２０ｂのうち、一方のズームキー２０ａはテレ端用であり、望遠側へズーム倍率を変更する場合に用いられる。他方のズームキー２０ｂはワイド端用であり、広角側へズーム倍率を変更する場合に用いられる。 Further, the digital camera 1 is provided with an optical zoom function, and the enlargement ratio of the image can be changed by physically changing the focal length by operating the zoom keys 20a and 20b. Of the zoom keys 20a and 20b, one zoom key 20a is for the telephoto end and is used when the zoom magnification is changed to the telephoto side. The other zoom key 20b is for the wide end and is used when the zoom magnification is changed to the wide angle side.

なお、図示はしないがデジタルカメラ１の底面には、記録媒体として用いられるメモリカードを着脱するためのメモリカードスロットや、外部のパーソナルコンピュータ等と接続するためのシリアルインタフェースコネクタとして、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）コネクタ等が設けられている。 Although not shown, the digital camera 1 has a memory card slot for attaching / detaching a memory card used as a recording medium, a serial interface connector for connecting to an external personal computer, etc., for example, USB (Universal). Serial Bus) connector and the like are provided.

図２はデジタルカメラ１の電子回路構成を示すブロック図である。 FIG. 2 is a block diagram showing an electronic circuit configuration of the digital camera 1.

このデジタルカメラ１には、前記撮影レンズ３を構成する図示せぬフォーカスレンズおよびズームレンズなどを含むレンズ光学系２２が光軸方向に所定の範囲内で移動可能に設けられている。このレンズ光学系２２は、モータ駆動部２１ａによって回転駆動されるモータ２１により移動する。 The digital camera 1 is provided with a lens optical system 22 including a focus lens and a zoom lens (not shown) constituting the photographing lens 3 so as to be movable within a predetermined range in the optical axis direction. The lens optical system 22 is moved by a motor 21 that is rotationally driven by a motor drive unit 21a.

なお、前記モータ２１として、ズーム倍率調整用のモータ（ズームモータ）、フォーカス調整用のモータ（フォーカスモータ）などの複数の異なるモータを含み、それぞれに対応したモータ駆動部２１ａが設けられているものとする。 The motor 21 includes a plurality of different motors such as a zoom magnification adjustment motor (zoom motor) and a focus adjustment motor (focus motor), and a motor driving unit 21a corresponding to each of them is provided. And

このモータ２１の光軸後方に撮像素子であるＣＣＤ（ｃｈａｒｇｅｃｏｕｐｌｅｄｄｅｖｉｃｅ）２３が配設されている。このＣＣＤ２３は、撮影レンズ３を通して入力される被写体の各部位からの光を受光し、その光の強度に応じた電気信号を出力する。 A CCD (charge coupled device) 23 that is an image pickup device is disposed behind the optical axis of the motor 21. The CCD 23 receives light from each part of the subject input through the photographing lens 3 and outputs an electrical signal corresponding to the intensity of the light.

基本モードである記録モード時において、ＣＣＤ２３がタイミング発生器（ＴＧ）２４、ドライバ２５によって走査駆動され、一定周期毎に結像した光像に対応する光電変換出力を１画面分出力する。このＣＣＤ２３の光電変換出力は、アナログ値の信号の状態でＲＧＢの各原色成分毎に適宜ゲイン調整された後に、サンプルホールド回路２６でサンプルホールドされ、Ａ／Ｄ変換器２７でデジタルデータに変換される。 In the recording mode, which is the basic mode, the CCD 23 is scanned and driven by a timing generator (TG) 24 and a driver 25, and outputs a photoelectric conversion output corresponding to a light image formed at regular intervals for one screen. The photoelectric conversion output of the CCD 23 is appropriately gain-adjusted for each primary color component of RGB in the state of an analog value signal, sampled and held by the sample hold circuit 26, and converted into digital data by the A / D converter 27. The

そして、画像処理回路２８において、画素補間処理及びγ補正処理を含む画像処理が行われて、デジタル値の輝度信号Ｙ及び色差信号Ｕ，Ｖ（Ｃｂ，Ｃｒ）が生成され、ＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）コントローラ２９に出力される。 Then, the image processing circuit 28 performs image processing including pixel interpolation processing and γ correction processing to generate a digital luminance signal Y and color difference signals U and V (Cb, Cr), and DMA (Direct Memory Access). ) Output to the controller 29.

ＤＭＡコントローラ２９は、画像処理回路２８の出力する輝度信号Ｙ及び色差信号Ｕ，Ｖを、同じく画像処理回路２８からの複合同期信号、メモリ書込みイネーブル信号、及びクロック信号を用いて一度ＤＭＡコントローラ２９内部のバッファに書き込み、ＤＲＡＭインタフェース（Ｉ／Ｆ）３０を介してバッファメモリとして使用されるＤＲＡＭ３１にＤＭＡ転送を行う。 The DMA controller 29 once uses the luminance signal Y and the color difference signals U and V output from the image processing circuit 28 by using the composite synchronization signal, the memory write enable signal, and the clock signal from the image processing circuit 28 once. And the DMA transfer to the DRAM 31 used as the buffer memory via the DRAM interface (I / F) 30.

制御部３２は、デジタルカメラ１全体の制御を行うものであり、ＣＰＵと、このＣＰＵで実行される動作プログラムを記憶したＲＯＭ、及びワークメモリとして使用されるＲＡＭなどを含むマイクロコンピュータにより構成される。この制御部３２は、前記輝度及び色差信号のＤＲＡＭ３１へのＤＭＡ転送終了後に、この輝度及び色差信号をＤＲＡＭインタフェース３０を介してＤＲＡＭ３１より読み出し、ＶＲＡＭコントローラ３３を介してＶＲＡＭ３４に書き込む。 The control unit 32 controls the entire digital camera 1 and is constituted by a microcomputer including a CPU, a ROM storing an operation program executed by the CPU, a RAM used as a work memory, and the like. . After the DMA transfer of the luminance and color difference signals to the DRAM 31, the control unit 32 reads the luminance and color difference signals from the DRAM 31 via the DRAM interface 30 and writes them to the VRAM 34 via the VRAM controller 33.

デジタルビデオエンコーダ３５は、前記輝度及び色差信号をＶＲＡＭコントローラ３３を介してＶＲＡＭ３４より定期的に読み出し、これらのデータを元にビデオ信号を発生して表示部１９に出力する。 The digital video encoder 35 periodically reads the luminance and color difference signals from the VRAM 34 via the VRAM controller 33, generates a video signal based on these data, and outputs the video signal to the display unit 19.

この表示部１９は、上述した如く撮影時にはモニタ表示部（電子ファインダ）として機能するもので、デジタルビデオエンコーダ３５からのビデオ信号に基づいた表示を行うことで、その時点でＶＲＡＭコントローラ３３から取込んでいる画像情報に基づく画像をリアルタイムに表示することとなる。 As described above, the display unit 19 functions as a monitor display unit (electronic finder) at the time of shooting. By performing display based on the video signal from the digital video encoder 35, the display unit 19 captures from the VRAM controller 33 at that time. An image based on the image information is displayed in real time.

このように、表示部１９にその時点での画像がモニタ画像としてリアルタイムに表示されている状態で、例えば静止画撮影を行いたいタイミングでシャッタキー９を押下操作すると、トリガ信号が発生する。 As described above, when the image at that time is displayed in real time as the monitor image on the display unit 19, for example, when the shutter key 9 is pressed at a timing at which still image shooting is desired, a trigger signal is generated.

制御部３２は、このトリガ信号に応じて、その時点でＣＣＤ２３から取込んでいる１画面分の輝度及び色差信号のＤＲＡＭ３１へのＤＭＡ転送の終了後、直ちにＣＣＤ２３からのＤＲＡＭ３１への経路を停止し、記録保存の状態に遷移する。 In response to the trigger signal, the control unit 32 immediately stops the path from the CCD 23 to the DRAM 31 immediately after the DMA transfer of the luminance and color difference signals for one screen captured from the CCD 23 to the DRAM 31 is completed. , Transition to the record storage state.

この記録保存の状態では、制御部３２がＤＲＡＭ３１に書き込まれている１フレーム分の輝度及び色差信号をＤＲＡＭインタフェース３０を介してＹ，Ｃｂ，Ｃｒの各コンポーネント毎に縦８画素×横８画素の基本ブロックと呼称される単位で読み出して、ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｃｏｄｉｎｇＥｘｐｅｒｔｓＧｒｏｕｐ）回路３７に書き込み、このＪＰＥＧ回路３７でＡＤＣＴ（ＡｄａｐｔｉｖｅＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ：適応離散コサイン変換）、エントロピ符号化方式であるハフマン符号化等の処理によりデータ圧縮する。 In this recording and storage state, the control unit 32 outputs the luminance and color difference signals for one frame written in the DRAM 31 to 8 pixels × 8 pixels for each of Y, Cb, and Cr components via the DRAM interface 30. The data is read out in units called basic blocks and written in a JPEG (Joint Photographic Coding Experts Group) circuit 37. The JPEG circuit 37 uses an ADCT (Adaptive Discrete Cosine Transform) and an entropy coding system. Data compression is performed by processing such as conversion.

そして得た符号データを１画像のデータファイルとして該ＪＰＥＧ回路３７から読み出して記録用のメモリ３８に書き込む。このメモリ３８としては、予め本体に内蔵されたフラッシュメモリ等の内部メモリの他に、記録媒体として着脱自在に装着されるメモリカードなどを含む。１フレーム分の輝度及び色差信号の圧縮処理及びメモリ３８への全圧縮データの書込み終了に伴って、制御部３２はＣＣＤ２３からＤＲＡＭ３１への経路を再び起動する。 The obtained code data is read out from the JPEG circuit 37 as a data file of one image and written in the recording memory 38. The memory 38 includes a memory card that is detachably mounted as a recording medium in addition to an internal memory such as a flash memory built in the main body in advance. With the compression processing of the luminance and color difference signals for one frame and the completion of writing all the compressed data to the memory 38, the control unit 32 activates the path from the CCD 23 to the DRAM 31 again.

制御部３２には、さらに音声処理部３９、ＵＳＢインタフェース（Ｉ／Ｆ）４０、ストロボ駆動部４１が接続される。 The control unit 32 is further connected with an audio processing unit 39, a USB interface (I / F) 40, and a strobe driving unit 41.

音声処理部３９は、ＰＣＭ音源等の音源回路を備え、音声の録音時には前記マイクロホン部（ＭＩＣ）７より入力された音声信号をデジタル化し、所定のデータファイル形式、例えばＭＰ３（ＭＰＥＧ−１ａｕｄｉｏｌａｙｅｒ３）規格に従ってデータ圧縮して音声データファイルを作成してメモリ３８へ送出する一方、音声の再生時にはメモリ３８から読み出された音声データファイルの圧縮を解いてアナログ化し、上述したデジタルカメラ１の背面側に設けられるスピーカ部（ＳＰ）１３を通じて出力する。 The sound processing unit 39 includes a sound source circuit such as a PCM sound source, digitizes a sound signal input from the microphone unit (MIC) 7 during sound recording, and performs a predetermined data file format such as MP3 (MPEG-1 audio layer). 3) Data compression is performed in accordance with the standard to create an audio data file and send it to the memory 38. On the other hand, when reproducing the audio, the audio data file read from the memory 38 is uncompressed and converted into an analog signal. The sound is output through a speaker unit (SP) 13 provided on the back side.

ＵＳＢインタフェース４０は、ＵＳＢコネクタを介して有線接続されるパーソナルコンピュータ等の他の情報端末装置との間で画像データ、その他の送受を行う場合の通信制御を行う。ストロボ駆動部４１は、撮影時に図示せぬストロボ用の大容量コンデンサを充電した上で、制御部３２からの制御に基づいてストロボ発光部６を閃光駆動する。 The USB interface 40 performs communication control when image data and other information are transmitted / received to / from another information terminal device such as a personal computer connected by wire via a USB connector. The strobe drive unit 41 charges a strobe capacitor (not shown) at the time of shooting, and then drives the strobe light emitting unit 6 to flash based on control from the control unit 32.

なお、前記キー入力部３６は、上述したシャッタキー９の他に、電源キー８、撮影モードキー１０、再生モードキー１１、マクロキー１４、ストロボキー１５、メニューキー１６、リングキー１７、セットキー１８、ズームキー２０ａ，２０ｂなどから構成され、それらのキー操作に伴う信号は直接制御部３２へ送出される。 In addition to the shutter key 9 described above, the key input unit 36 includes a power key 8, a shooting mode key 10, a playback mode key 11, a macro key 14, a strobe key 15, a menu key 16, a ring key 17, and a set key. 18, zoom keys 20 a and 20 b and the like, and signals accompanying these key operations are sent directly to the control unit 32.

また、静止画像ではなく動画像の撮影時においては、シャッタキー９が押下操作されたときに、上述したＪＰＥＧ回路３７によりｍｏｔｉｏｎ−ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）などの手法により撮影動画をデータ圧縮してメモリ３８へ記録する。この場合、音声付き動画撮影であれば、その撮影中にマイクロホン部（ＭＩＣ）７より入力された音声信号が動画データと共に前記メモリ３８に記録されることになる。再度シャッタキー９が操作されると、動画データの記録を終了する。 Further, when shooting a moving image instead of a still image, when the shutter key 9 is pressed, the above-described JPEG circuit 37 compresses the captured moving image using a technique such as motion-JPEG (Joint Photographic Experts Group). To the memory 38. In this case, in the case of moving image shooting with audio, the audio signal input from the microphone unit (MIC) 7 during the shooting is recorded in the memory 38 together with the moving image data. When the shutter key 9 is operated again, the recording of the moving image data is finished.

一方、基本モードである再生モード時には、制御部３２がメモリ３８に記録されている画像データを選択的に読み出し、ＪＰＥＧ回路３７で記録モード時にデータ圧縮した手順と全く逆の手順で、圧縮されている画像データを伸長する。そして、この伸長した画像データをＤＲＡＭインタフェース３０を介してＤＲＡＭ３１に保持させた上で、このＤＲＡＭ３１の保持内容をＶＲＡＭコントローラ３３を介してＶＲＡＭ３４に記憶させ、このＶＲＡＭ３４より定期的に画像データを読み出してビデオ信号を発生し、表示部１９で再生出力させる。 On the other hand, in the playback mode which is the basic mode, the control unit 32 selectively reads out the image data recorded in the memory 38 and is compressed by a procedure completely opposite to the procedure of data compression in the recording mode by the JPEG circuit 37. Decompress image data. The decompressed image data is held in the DRAM 31 via the DRAM interface 30, and then the content held in the DRAM 31 is stored in the VRAM 34 via the VRAM controller 33. The image data is periodically read out from the VRAM 34. A video signal is generated and reproduced and output by the display unit 19.

選択した画像データが静止画像ではなく動画像であった場合には、その動画データを構成する複数フレームの静止画データを時系列の順で順次再生して表示し、すべての静止画データの再生を終了した時点で、例えば、次に再生の指示がなされるまで先頭に位置する静止画データを表示するなどを行う。その際、当該動画データに音声データが含まれていれば、その音声データがスピーカ部（ＳＰ）１３を通じて出力されることになる。 If the selected image data is not a still image but a moving image, the multiple frames of still image data that make up the moving image data are played back and displayed sequentially in chronological order, and all the still image data is played back. For example, the top still image data is displayed until the next playback instruction is given. At this time, if the moving image data includes audio data, the audio data is output through the speaker unit (SP) 13.

次に、このデジタルカメラ１に用いられる雑音除去機能を備えた音声記録装置について説明する。 Next, an audio recording apparatus having a noise removal function used in the digital camera 1 will be described.

図３は本発明の第１の実施形態におけるデジタルカメラ１に用いられる音声記録装置の構成を示すブロック図である。 FIG. 3 is a block diagram showing the configuration of the audio recording apparatus used in the digital camera 1 according to the first embodiment of the present invention.

この音声記録装置は、主としてデジタルカメラ１の音声付き動画撮影に用いられるものであり、その撮影中に音声信号に混入するズーム機構の駆動音などを雑音として除去する機能を備えている。 This audio recording apparatus is mainly used for moving image recording with sound of the digital camera 1 and has a function of removing, as noise, a driving sound of a zoom mechanism mixed in an audio signal during the shooting.

第１の実施形態において、この音声記録装置は、モータ２１、モータ駆動部２１ａ、制御部３２、キー入力部３６、マイクロホン部７、Ａ／Ｄ変換部５１、フレーム分割部５２、フーリエ変換部５３、雑音スペクトル記憶部５４、スペクトル減算部５５、逆フーリエ変換部５６、波形合成部５７、入力パワー演算部５８、平滑化処理部５９、フロアリング係数制御部６０を備える。なお、前記各構成部のうち、５１〜５７の部分は図２に示したデジタルカメラ１の音声処理部３９に含まれる。 In the first embodiment, the audio recording apparatus includes a motor 21, a motor drive unit 21a, a control unit 32, a key input unit 36, a microphone unit 7, an A / D conversion unit 51, a frame division unit 52, and a Fourier transform unit 53. , A noise spectrum storage unit 54, a spectrum subtraction unit 55, an inverse Fourier transform unit 56, a waveform synthesis unit 57, an input power calculation unit 58, a smoothing processing unit 59, and a flooring coefficient control unit 60. Of the components, 51 to 57 are included in the audio processor 39 of the digital camera 1 shown in FIG.

モータ２１はズームレンズなどのレンズ光学系２２を光軸方向に移動させるためのモータであり、モータ駆動部２１ａはそのモータ２１を回転駆動させるための駆動機構である。 The motor 21 is a motor for moving the lens optical system 22 such as a zoom lens in the optical axis direction, and the motor drive unit 21a is a drive mechanism for driving the motor 21 to rotate.

制御部３２は、キー入力部３６に含まれるズームキー２０ａ，２０ｂなどの操作信号を受けてモータ駆動制御信号をモータ駆動部２１ａに出力すると共に、ここでは、音声付き動画撮影中に入力パワー演算部５８およびフロアリング係数制御部５６を制御する機能を備える。 The control unit 32 receives an operation signal from the zoom keys 20a and 20b included in the key input unit 36 and outputs a motor drive control signal to the motor drive unit 21a. Here, the input power calculation unit is used during video recording with sound. 58 and a flooring coefficient control unit 56 are provided.

一方、マイクロホン部７を通じて入力された音声信号は、Ａ／Ｄ変換部５１を介してデジタル信号に変換された後、フレーム分割部５２に与えられる。この場合、音声付き動画撮影中に例えばズーム操作が行われると、そのズーム操作に伴って発生するモータ音（ズーム音）がマイクロホン部７を通じて音声信号と共に入り込むことになる。 On the other hand, the audio signal input through the microphone unit 7 is converted into a digital signal through the A / D conversion unit 51 and then given to the frame division unit 52. In this case, for example, when a zoom operation is performed during moving image recording with sound, a motor sound (zoom sound) generated along with the zoom operation enters along with the sound signal through the microphone unit 7.

フレーム分割部５２は、入力された音声信号を所定時間分のフレーム単位で分割する。フーリエ変換部５３は、このフレーム分割部５２によってフレーム単位で分割された音声信号をフーリエ変換し、周波数毎のパワーを示したスペクトル信号Ｙ（ω）に変換する。 The frame dividing unit 52 divides the input audio signal in units of frames for a predetermined time. The Fourier transform unit 53 Fourier transforms the audio signal divided by the frame unit by the frame division unit 52 and converts it into a spectrum signal Y (ω) indicating the power for each frequency.

雑音スペクトル記憶部５４には、予め雑音除去対象となる機構部の駆動音（ズーム音）をスペクトル化した雑音スペクトル信号Ｎ（ω）が記憶されている。スペクトル減算部５５は、フーリエ変換部５３によって得られた入力音声のスペクトル信号Ｙ（ω）と雑音スペクトル記憶部５４に記憶された雑音スペクトル信号Ｎ（ω）に基づいて、ＳＳ（ｓｐｅｃｔｒａｌｓｕｂｔｒａｃｔｉｏｎ）法による雑音除去処理を行う。 The noise spectrum storage unit 54 stores in advance a noise spectrum signal N (ω) obtained by spectrumizing the drive sound (zoom sound) of the mechanism unit to be subjected to noise removal. The spectrum subtracting unit 55 is based on the spectral signal Y (ω) of the input speech obtained by the Fourier transform unit 53 and the noise spectrum signal N (ω) stored in the noise spectrum storage unit 54, and the SS (spectral subtraction) method. The noise removal process is performed.

詳しくは、入力音声のスペクトル信号Ｙ（ω）から雑音スペクトル信号Ｎ（ω）に所定のサブトラクト係数αを乗じた信号を減算することで、入力音声に含まれる雑音成分を除去する処理を行う。減算後の音声スペクトル信号をＳ（ω）とする。 Specifically, the noise component included in the input speech is removed by subtracting a signal obtained by multiplying the noise spectrum signal N (ω) by a predetermined subtract coefficient α from the spectrum signal Y (ω) of the input speech. Let S (ω) be the audio spectrum signal after subtraction.

ここで、スペクトル減算部５５による過度の雑音除去を防ぐために、下限値設定部５５ａが設けられている。この下限値設定部５５ａは、入力音声のスペクトル信号Ｙ（ω）にフロアリング係数βを乗じた信号を下限値として設定し、スペクトル減算部５５の出力信号Ｓ（ω）がこの下限値を下回らないように制御する。なお、後述するように、フロアリング係数βの値は、フロアリング係数制御部６０によって適宜最適な値に調整される。 Here, in order to prevent excessive noise removal by the spectrum subtracting unit 55, a lower limit setting unit 55a is provided. The lower limit setting unit 55a sets a signal obtained by multiplying the spectrum signal Y (ω) of the input voice by the flooring coefficient β as a lower limit, and the output signal S (ω) of the spectrum subtracting unit 55 falls below this lower limit. Control to not. As will be described later, the value of the flooring coefficient β is appropriately adjusted to an optimal value by the flooring coefficient control unit 60.

逆フーリエ変換部５６は、スペクトル減算部５５から出力された音声スペクトル信号Ｓ（ω）を逆フーリエ変換して元のフレーム単位毎の音声信号に戻す。 The inverse Fourier transform unit 56 performs inverse Fourier transform on the speech spectrum signal S (ω) output from the spectrum subtracting unit 55 to return the speech signal to the original frame unit.

波形合成部５７は、この逆フーリエ変換部５６によって得られるフレーム単位毎の音声信号を合成することで時系的に連続した音声信号に復元する。この波形合成部５７から出力された音声信号は、最終的な記録用の音声信号として用いられ、デジタルカメラ１の撮像系から得られる動画データと共に図２に示したメモリ３８に記録される。 The waveform synthesizing unit 57 synthesizes the audio signal for each frame obtained by the inverse Fourier transform unit 56 to restore the audio signal continuous in time. The audio signal output from the waveform synthesizer 57 is used as the final recording audio signal, and is recorded in the memory 38 shown in FIG. 2 together with the moving image data obtained from the imaging system of the digital camera 1.

また、入力パワー演算部５８は、モータ２１の駆動期間前に入力された音声信号を算出する。平滑化処理部５９は、入力パワー演算部５８によって算出された音声信号のパワーを時間軸方向に平滑化する。フロアリング係数制御部６０は、下限値設定部５５ａに用いられるフロアリング係数βをモータ２１の駆動期間前に入力された音声信号のパワーに応じて変化させる。詳しくは、平滑化処理部５９によって得られた音声信号のパワーの平均値が大きいほど、フロアリング係数βを大きくし、音声信号のパワーの平均値が小さいほど、フロアリング係数βを小さく制御する。 The input power calculation unit 58 calculates an audio signal input before the driving period of the motor 21. The smoothing processing unit 59 smoothes the power of the audio signal calculated by the input power calculation unit 58 in the time axis direction. The flooring coefficient control unit 60 changes the flooring coefficient β used in the lower limit value setting unit 55a according to the power of the audio signal input before the driving period of the motor 21. Specifically, the flooring coefficient β is increased as the average value of the power of the audio signal obtained by the smoothing processing unit 59 is increased, and the flooring coefficient β is controlled to be decreased as the average value of the power of the audio signal is decreased. .

ここで、理解を容易にするため、ＳＳ法（スペクトルサブトラクション法）を用いた雑音除去処理の基本的な動作について説明しておく。 Here, in order to facilitate understanding, a basic operation of noise removal processing using the SS method (spectral subtraction method) will be described.

今、音声付き動画撮影を行っている最中に、例えばユーザがキー入力部３６に含まれるズームキー２０ａ，２０ｂを操作したとする。 Now, assume that, for example, the user operates the zoom keys 20a and 20b included in the key input unit 36 while shooting a moving image with sound.

デジタルカメラ全体の動作を制御する制御部３２は、キー入力部３６に含まれるズームキー２０ａ，２０ｂのズーム操作信号を入力すると、モータ駆動部２１ａに対して駆動開始信号を送る。モータ駆動部２１ａは、この駆動開始信号を受けてモータ２１を回転駆動する。このモータ２１の回転に伴い、図２のレンズ光学系２２に含まれる図示せぬズームレンズが光軸上に移動してズーム倍率が変化する。 When the control unit 32 that controls the operation of the entire digital camera inputs zoom operation signals of the zoom keys 20a and 20b included in the key input unit 36, it sends a drive start signal to the motor drive unit 21a. The motor drive unit 21a receives the drive start signal and rotationally drives the motor 21. As the motor 21 rotates, a zoom lens (not shown) included in the lens optical system 22 shown in FIG. 2 moves on the optical axis and the zoom magnification changes.

また、ユーザがズーム操作を終了すると、制御部３２はモータ駆動部２１ａに対して駆動停止信号を送る。これにより、モータ２１の回転駆動が停止し、ズーム動作が終了する。 When the user finishes the zoom operation, the control unit 32 sends a drive stop signal to the motor drive unit 21a. Thereby, the rotational drive of the motor 21 is stopped and the zoom operation is finished.

ここで、音声付き動画の撮影中は常にマイクロホン部７による音声入力機能がＯＮ状態にある。このため、前記ズーム操作に伴って発生するモータ音が入力音声の中に雑音として混入する問題がある。このようなモータ音を音声信号から除去して記録するべく、以下のような処理が行われる。 Here, the sound input function by the microphone unit 7 is always in an ON state during shooting of a moving image with sound. For this reason, there is a problem that the motor sound generated by the zoom operation is mixed as noise in the input voice. In order to remove such motor noise from the audio signal and record it, the following processing is performed.

すなわち、まず、雑音除去対象となるモータ音（機構音）のスペクトル信号を事前に採取しておき、これを雑音スペクトル信号Ｎ（ω）として雑音スペクトル記憶部５４に記憶しておく。以下では、ズーム操作時に発生するモータ音つまりズーム音を雑音除去対象として説明する。 That is, first, a spectrum signal of a motor sound (mechanism sound) to be noise-removed is collected in advance and stored in the noise spectrum storage unit 54 as a noise spectrum signal N (ω). In the following, a motor sound generated during zoom operation, that is, a zoom sound will be described as a noise removal target.

ズーム音の採取方法は、無音状態でズーム操作を行い、そのときに発生するズーム音のみをマイクロホン部７から入力することで行う。この入力したズーム音をＡ／Ｄ変換部５１にてデジタル信号に変換した後、フレーム分割部５２により数１０ｍｓ程度のフレーム区間に切り出し、フーリエ変換部５３によりスペクトル信号に変換する。これをモータ駆動期間（ズームモータの駆動開始から駆動停止までの期間）について行い、その間にフレーム単位で順次得られるスペクトル信号の平均値を雑音スペクトル信号Ｎ（ω）として雑音スペクトル記憶部５４に記憶しておく。 The zoom sound is collected by performing a zoom operation in a silent state and inputting only the zoom sound generated at that time from the microphone unit 7. The input zoom sound is converted into a digital signal by the A / D conversion unit 51, and then cut out into frame sections of about several tens of ms by the frame division unit 52, and converted into a spectrum signal by the Fourier conversion unit 53. This is performed for the motor driving period (period from the start of driving of the zoom motor to the stop of driving), and the average value of the spectrum signals sequentially obtained in units of frames during that period is stored in the noise spectrum storage unit 54 as the noise spectrum signal N (ω). Keep it.

上述したように、ズーム操作を行っているとき、マイクロホン部７には原音声に加えて、そのときに発生するモータ音が雑音として入力されている。このため、フーリエ変換部５３からは原音声とモータ音が混合したスペクトル信号Ｙ（ω）が出力される。 As described above, when the zoom operation is performed, in addition to the original sound, the motor sound generated at that time is input to the microphone unit 7 as noise. Therefore, the Fourier transform unit 53 outputs a spectrum signal Y (ω) in which the original sound and the motor sound are mixed.

スペクトル減算部５５では、このモータ音を含んだ入力音声のスペクトル信号Ｙ（ω）と、雑音スペクトル記憶部５４に予め記憶された雑音スペクトル信号Ｎ（ω）とに基づいてＳＳ法による雑音除去処理を行う。 The spectrum subtracting unit 55 performs noise removal processing by the SS method based on the spectrum signal Y (ω) of the input sound including the motor sound and the noise spectrum signal N (ω) stored in advance in the noise spectrum storage unit 54. I do.

この雑音除去処理について、図４を参照して詳しく説明する。 This noise removal processing will be described in detail with reference to FIG.

図４はＳＳ法（スペクトルサブトラクション法）を用いた雑音除去処理を説明するための図である。図４（ａ）は入力音声の波形データ、同図（ｂ）はこの入力音声をフレーム単位でフーリエ変換して得られた入力音声のスペクトル信号である。また、同図（ｃ）は雑音除去対象となるモータ音のスペクトル信号（雑音スペクトル）、同図（ｄ）はその雑音スペクトル信号に所定のサブトラクト係数αを乗じた信号である。同図（ｅ）は入力音声のスペクトル信号から係数乗算後の雑音スペクトル信号を減算して得られるスペクトル信号つまり雑音除去後の音声スペクトル信号である。同図（ｆ）はその雑音除去後の音声スペクトル信号を逆フーリエ変換して得られた音声信号、同図（ｇ）はフレーム単位で分割された音声信号を時系列に合成して元の音声波形に戻した状態を示している。 FIG. 4 is a diagram for explaining a noise removal process using the SS method (spectral subtraction method). 4A shows the waveform data of the input voice, and FIG. 4B shows the spectrum signal of the input voice obtained by Fourier transforming this input voice in units of frames. FIG. 6C shows a spectrum signal (noise spectrum) of a motor sound to be subjected to noise removal, and FIG. 9D shows a signal obtained by multiplying the noise spectrum signal by a predetermined subtract coefficient α. FIG. 4E shows a spectrum signal obtained by subtracting the noise spectrum signal after coefficient multiplication from the spectrum signal of the input voice, that is, the voice spectrum signal after noise removal. FIG. 5 (f) shows an audio signal obtained by performing inverse Fourier transform on the audio spectrum signal after the noise removal, and FIG. 5 (g) shows an original audio by synthesizing an audio signal divided in frame units into a time series. The state returned to the waveform is shown.

今、図４（ａ）に示すような波形を有する音声信号が音声入力部５１に入力されたとする。この音声信号には、例えばズーム操作に伴って発生するモータ音つまりズーム音が雑音として混入されている。 Assume that an audio signal having a waveform as shown in FIG. 4A is input to the audio input unit 51. In this audio signal, for example, a motor sound generated by a zoom operation, that is, a zoom sound is mixed as noise.

まず、フレーム分割部５２において、例えば１０ｍｓ程度のフレーム区間で入力信号を切り出し、同図（ｂ）に示すように、フーリエ変換部５３にて周波数毎のパワーを表したスペクトル信号Ｙ（ω）を生成する。 First, in the frame dividing unit 52, for example, an input signal is cut out in a frame section of about 10 ms, and a spectrum signal Y (ω) representing the power for each frequency is obtained in the Fourier transform unit 53 as shown in FIG. Generate.

ここで、同図（ｃ）に示すように、雑音スペクトル記憶部５４には予めモータ音をスペクトル化した雑音スペクトル信号Ｎ（ω）が記憶されている。そこで、同図（ｃ）〜（ｅ）に示すように、スペクトル減算部５５において、入力音声のスペクトル信号Ｙ（ω）から雑音スペクトル信号Ｎ（ω）に所定のサブトラクト係数αを乗じた信号を減算することにより、雑音除去後の音声スペクトル信号Ｓ（ω）が得られる。 Here, as shown in FIG. 7C, the noise spectrum storage unit 54 stores a noise spectrum signal N (ω) obtained by previously converting the motor sound into a spectrum. Therefore, as shown in (c) to (e) in the figure, the spectrum subtracting unit 55 generates a signal obtained by multiplying the noise spectrum signal N (ω) by the predetermined subtract coefficient α from the spectrum signal Y (ω) of the input speech. By subtracting, the speech spectrum signal S (ω) after noise removal is obtained.

なお、前記サブトラクト係数αは、雑音スペクトル記憶部５４に記憶された雑音スペクトル信号Ｎ（ω）のレベルに応じて予め決められており、通常、“１”以上の値である。 The subtract coefficient α is determined in advance according to the level of the noise spectrum signal N (ω) stored in the noise spectrum storage unit 54, and is usually a value of “1” or more.

図４（ｆ）に示すように、この雑音除去後の音声スペクトル信号Ｓ（ω）は逆フーリエ変換部５６にて逆フーリエ変換される。そして、同図（ｇ）に示すように、波形合成部５７にて各フレーム毎の音声信号が時系列に合成処理されて、元のアナログ波形信号である音声信号に復元される。この音声信号は、雑音除去後の音声信号として動画撮影中に画像データと共にメモリ３８に記録される。 As shown in FIG. 4 (f), the speech spectrum signal S (ω) after the noise removal is subjected to inverse Fourier transform by the inverse Fourier transform unit 56. Then, as shown in FIG. 5G, the waveform synthesis unit 57 synthesizes the audio signal for each frame in time series and restores the original analog waveform signal to the audio signal. This sound signal is recorded in the memory 38 together with image data during moving image shooting as a sound signal after noise removal.

なお、前記のような雑音除去処理において、実際にはフレーム分割部５２にて音声信号をフレーム分割してフーリエ変換する前に、音声信号に「ハニング窓」等の窓関数をかけておく。また、後段の波形合成部５７で逆フーリエ変換後の音声信号をフレーム毎に合成処理する際にフレーム境界で不連続な波形になるのを防止するために、フレーム毎の音声信号を多少オーバーラップして合成していく。 In the noise removal processing as described above, the audio signal is actually subjected to a window function such as a “Hanning window” before the audio signal is divided into frames by the frame dividing unit 52 and subjected to Fourier transform. In addition, when the audio signal after inverse Fourier transform is synthesized for each frame by the waveform synthesis unit 57 in the subsequent stage, the audio signals for each frame are somewhat overlapped to prevent discontinuous waveforms at the frame boundaries. And then synthesize.

例えば、フレーム長が２５６サンプルとして分析ポイントを１２８サンプルずつシフトしていく。この場合のハニング窓は（２）式のように表せる。 For example, the analysis point is shifted by 128 samples with a frame length of 256 samples. The Hanning window in this case can be expressed as in equation (2).

ｗ（ｎ）＝０．５−ｃｏｓ｛２＊ＰＩ＊ｎ／（Ｌ−１）｝ …（２）
Ｌ：１フレームのサンプル数
ｎ＝０，１，…，Ｌ−１
このように、各信号を１／２フレームずらして重ね合わせると、振幅が一定で不連続点のない音声波形を得ることができる。 w (n) = 0.5−cos {2 * PI * n / (L−1)} (2)
L: number of samples in one frame n = 0, 1,..., L−1
In this way, when the signals are overlapped with a shift of ½ frame, a speech waveform having a constant amplitude and no discontinuity can be obtained.

次に、第１の実施形態の動作を説明する。
第１の実施形態では、モータ駆動前後における原音声の急激な変化を防止するべく、駆動期間前に入力された音声信号（つまり、雑音であるズーム音を含まない原音声の信号）のパワーに基づいて、下限値設定用に用いられるフロアリング係数βを変化させるものである。 Next, the operation of the first embodiment will be described.
In the first embodiment, in order to prevent a sudden change in the original sound before and after motor driving, the power of the sound signal input before the driving period (that is, the signal of the original sound not including the zoom sound as noise) is used. Based on this, the flooring coefficient β used for setting the lower limit value is changed.

図５は第１の実施形態におけるデジタルカメラ１の動画記録時の音声処理の動作を示すフローチャートである。なお、このフローチャートで示される処理は、コンピュータである制御部３２によって読み取り可能なプログラムの形態でＲＯＭ等の記録媒体に予め記録されているものとする。 FIG. 5 is a flowchart showing an operation of audio processing during moving image recording of the digital camera 1 in the first embodiment. Note that the processing shown in this flowchart is recorded in advance in a recording medium such as a ROM in the form of a program readable by the control unit 32 which is a computer.

（ａ）ズーム動作がない場合
まず、ズーム動作がない場合の処理について説明する。 (A) When there is no zoom operation
First, processing when there is no zoom operation will be described.

図３に示したように、マイクロホン部７より入力された音声信号は、Ａ／Ｄ変換部５１によりデジタル信号に変換された後、フレーム分割部５２により、例えば２５６サンプル毎のフレーム単位に分割される（ステップＡ１１）。このフレーム単位の音声データにハニング窓等の窓掛け処理を行った後、フーリエ変換部５３により周波数領域信号に変換され、入力音声のスペクトル信号Ｙ（ω）が生成される（ステップＡ１２）。 As shown in FIG. 3, the audio signal input from the microphone unit 7 is converted into a digital signal by the A / D conversion unit 51, and then divided by the frame division unit 52 into frames, for example, every 256 samples. (Step A11). The audio data in units of frames is subjected to a windowing process such as a Hanning window, and then converted into a frequency domain signal by the Fourier transform unit 53 to generate a spectrum signal Y (ω) of the input audio (step A12).

ここで、ズーム動作がない場合には（ステップＡ１３のＮｏ）、入力パワー演算部１０では、フーリエ変換部５３から得られる入力音声のスペクトル信号Ｙ（ω）を全周波数にわたって総和をとることにより、入力音声のパワー値を算出する（ステップＡ１４）。この入力音声のパワー値をPowerとすると、次のような式で表される。 Here, when there is no zoom operation (No in Step A13), the input power calculation unit 10 sums the spectrum signal Y (ω) of the input sound obtained from the Fourier transform unit 53 over all frequencies, The power value of the input voice is calculated (step A14). When the power value of this input voice is Power, it is expressed by the following formula.

Power＝Σ（｜Ｙ(ω)｜＊｜Ｙ(ω)｜） …（３）
なお、入力音声のスペクトル信号Ｙ（ω）は常時計算しているため、前記（３）式のために必要な処理は加算処理のみである。 Power = Σ (| Y (ω) | * | Y (ω) |) (3)
Since the spectrum signal Y (ω) of the input voice is constantly calculated, only the addition process is necessary for the expression (3).

平滑化処理部１１では、入力音声のパワー値の小さい変動を抑えるために、ズーム動作期間以前の入力音声の平均パワー値を算出する（ステップＡ１５）。このときの平均パワー値をPower_Aveとすると、次のような式で表される。 The smoothing processing unit 11 calculates the average power value of the input sound before the zoom operation period in order to suppress small fluctuations in the power value of the input sound (step A15). If the average power value at this time is Power_Ave, it is expressed by the following formula.

Power_Ave＝Ｋ＊Power_Ave(ｎ−１)＋（１−Ｋ）＊Power …（４）
Power_Ave(ｎ−１)は、前回の平滑化処理で得られた入力音声の平均パワー値であり、Ｋは平滑化の度合いを決定する平滑化係数であり、０から１までの間の値である。なお、平滑化処理の別の方法としては、過去の数フレーム分の入力パワー値の平均をとるようにしても良い。 Power_Ave = K * Power_Ave (n-1) + (1-K) * Power (4)
Power_Ave (n−1) is the average power value of the input speech obtained in the previous smoothing process, K is a smoothing coefficient that determines the degree of smoothing, and is a value between 0 and 1. is there. As another method of the smoothing process, an average of input power values for several past frames may be taken.

次に、フロアリング係数制御部１２では、フロアリング係数β＝１とする（ステップＡ１６）。 Next, the flooring coefficient control unit 12 sets the flooring coefficient β = 1 (step A16).

スペクトル減算部５５は、入力音声のスペクトル信号Ｙ(ω)と、雑音スペクトル記憶部５４に記憶された雑音スペクトル信号Ｎ(ω)とにより前記（１）式に従ってスペクトル減算処理を行う（ステップＡ１８）。 The spectrum subtraction unit 55 performs the spectrum subtraction process according to the equation (1) using the spectrum signal Y (ω) of the input speech and the noise spectrum signal N (ω) stored in the noise spectrum storage unit 54 (step A18). .

ここで、スペクトル減算後のスペクトル信号Ｓ(ω)は、フロアリング係数β＝１であることから、Ｓ(ω)＝Ｙ(ω)となる。逆フーリエ変換部５６にてスペクトル信号Ｓ(ω)を逆フーリエ変換して時間領域信号に変換すると（ステップＡ１９）、フーリエ変換前の音声信号になる。波形合成部５７で前後のフレームデータから連続的な音声信号を生成すると（ステップＡ２０）、元の入力信号に戻り、動画データとタイミングを合わせて記録メディアであるメモリ３８に記録される（ステップＡ２１）。 Here, the spectrum signal S (ω) after the spectrum subtraction is S (ω) = Y (ω) because the flooring coefficient β = 1. When the inverse Fourier transform unit 56 performs inverse Fourier transform on the spectrum signal S (ω) to convert it into a time domain signal (step A19), an audio signal before Fourier transform is obtained. When the waveform synthesizer 57 generates a continuous audio signal from the preceding and following frame data (step A20), the original input signal is restored, and the moving image data and the timing are recorded in the memory 38 as a recording medium (step A21). ).

このように、ズーム動作がない場合には、マイクロホン部７を通じて入力された音声信号がそのまま動画データと共にメモリ３８に記録されることになる。 Thus, when there is no zoom operation, the audio signal input through the microphone unit 7 is recorded in the memory 38 together with the moving image data.

（ｂ）ズーム動作があった場合の処理
次に、ズーム動作があった場合について説明する。 (B) Processing when there is a zoom operation
Next, a case where there is a zoom operation will be described.

フーリエ変換部５３により入力音声のスペクトル信号Ｙ(ω)を生成するまでは、ズーム動作がない場合と同様である（ステップＡ１１〜Ａ１２）。 Until the spectral signal Y (ω) of the input sound is generated by the Fourier transform unit 53, it is the same as when there is no zoom operation (steps A11 to A12).

ここで、ズーム動作があると（ステップＡ１３のＮｏ）、フロアリング係数制御部１２では、ズーム期間以前に得られた入力音声の平均パワー値Power_Aveに基づいてフロアリング係数βを最適な値に制御する（ステップＡ１７）。 Here, when there is a zoom operation (No in Step A13), the flooring coefficient control unit 12 controls the flooring coefficient β to an optimum value based on the average power value Power_Ave of the input sound obtained before the zoom period. (Step A17).

詳しくは、入力音声の平均パワー値Power_Aveがズーム音に比べて大きく、ズーム音が目立たない時にはフロアリング係数βを１．０に近い値に設定しておき、Power_Aveが小さくなるに従ってフロアリング係数βを小さくしていき、Power_Aveが０の場合は例えばフロアリング係数βを０とするように制御する。 Specifically, when the average power value Power_Ave of the input sound is larger than the zoom sound and the zoom sound is not noticeable, the flooring coefficient β is set to a value close to 1.0, and the flooring coefficient β becomes smaller as the Power_Ave becomes smaller. When Power_Ave is 0, for example, the flooring coefficient β is controlled to be 0.

スペクトル減算部５５では、前記（１）式に従ってスペクトル減算処理を行う（ステップＡ１８）。その際、入力音声の平均パワー値Power_Aveがズーム音に比べて大きく、ズーム音が目立たない場合であれば、フロアリング係数βによって決定される下限値が上がるので、入力音声を過剰に減算することなく、ほぼ原形のままで出力することができる。また、Power_Aveがズーム音に比べて小さく、ズーム音が目立つ場合であれば、フロアリング係数βによって決定される下限値が下がることになり、入力音声から雑音であるズーム音を適切に除去することができる。 The spectrum subtraction unit 55 performs the spectrum subtraction process according to the equation (1) (step A18). At that time, if the average power value Power_Ave of the input sound is large compared to the zoom sound and the zoom sound is inconspicuous, the lower limit value determined by the flooring coefficient β will increase, so excessively subtract the input sound. And can be output almost in its original form. Also, if Power_Ave is small compared to the zoom sound and the zoom sound is conspicuous, the lower limit value determined by the flooring coefficient β will be lowered, and the zoom sound that is noise will be appropriately removed from the input sound. Can do.

逆フーリエ変換部５６では、スペクトル減算部５５から出力される雑音除去後の音声スペクトル信号Ｓ(ω)を逆フーリエ変換して時間領域信号に変換する（ステップＡ１９）。波形合成部５７では、前後のフレームデータから連続的な音声信号を生成し（ステップＡ２０）、動画データとタイミングをあわせて記録メディアであるメモリ３８に記録する(ステップＡ２１)。 In the inverse Fourier transform unit 56, the noise-removed speech spectrum signal S (ω) output from the spectrum subtraction unit 55 is subjected to inverse Fourier transform to convert it into a time domain signal (step A19). The waveform synthesizer 57 generates a continuous audio signal from the preceding and following frame data (step A20), and records it in the memory 38, which is a recording medium, together with the moving image data (step A21).

このように、ズーム動作があった場合には、音声信号の平均パワー値Power_Aveに応じてフロアリング係数βが最適な値に設定され、そのフロアリング係数βを用いてスペクトル減算の下限値が調整される。これにより、過剰な減算による急激な音量変化を抑え、臨場感を損なうことなく、原音声の音量をできるだけ保持した状態で撮影画像と共に記録することが可能となる。 Thus, when there is a zoom operation, the flooring coefficient β is set to an optimum value according to the average power value Power_Ave of the audio signal, and the lower limit value of the spectral subtraction is adjusted using the flooring coefficient β. Is done. As a result, it is possible to suppress a sudden change in volume due to excessive subtraction, and to record with the captured image while maintaining the volume of the original voice as much as possible without impairing the sense of reality.

以上のように、第１の実施形態によれば、ズーム駆動前に入力された音声信号の平均パワー値Power_Aveに応じてフロアリング係数βを変化させたことにより、モータ駆動前後における原音声の音量変化を抑えて、入力音声からズーム音を雑音として適切に除去することができる。 As described above, according to the first embodiment, by changing the flooring coefficient β according to the average power value Power_Ave of the audio signal input before zoom driving, the volume of the original audio before and after motor driving is changed. The zoom sound can be appropriately removed as noise from the input sound while suppressing the change.

また、ズーム期間以前の平均パワー値Power_Aveの算出は、入力音声のスペクトル信号Ｙ（ω）の加算処理と、簡単な平滑化処理で得られることから、ソフト処理においては処理時間の節約、ハード処理においてはハード規模の節約を図ることができる。 In addition, the calculation of the average power value Power_Ave before the zoom period can be obtained by the addition process of the spectrum signal Y (ω) of the input sound and the simple smoothing process. Can save hardware.

（第１の実施形態の変形例１）
図６は第１の実施形態の変形例１における音声記録装置の構成を示すブロック図である。なお、図３と同一部分には同一符号を付して、その説明は省略するものとする。 (Modification 1 of the first embodiment)
FIG. 6 is a block diagram showing a configuration of an audio recording apparatus in Modification 1 of the first embodiment. The same parts as those in FIG. 3 are denoted by the same reference numerals, and description thereof will be omitted.

図３と異なる点は、録音レベル制御部６１および増幅器６２が追加されている点である。録音レベル制御部６１は、マイクロホン部７から入力される音声信号が小さいときには録音レベルを上げ、大きいときには録音レベルを下げるように増幅器６２を制御するものである。この録音レベルの調整は、例えば、マイクロホン部７から入力される音声信号を積分することにより求めた音声信号のパワーの変化に応じて行われる。 The difference from FIG. 3 is that a recording level control unit 61 and an amplifier 62 are added. The recording level control unit 61 controls the amplifier 62 to increase the recording level when the audio signal input from the microphone unit 7 is small and to decrease the recording level when the audio signal is large. This adjustment of the recording level is performed according to a change in the power of the audio signal obtained by integrating the audio signal input from the microphone unit 7, for example.

デジタルカメラ１に録音レベルの制御機能が備えられている場合、録音レベルに応じて入力音声に含まれるズーム音の割合が変化する。すなわち、録音レベルが大きいときには、入力音声に含まれるズーム音の割合は大きくなり、録音レベルが小さいときには入力音声に含まれるズーム音の割合は小さくなる。 When the digital camera 1 has a recording level control function, the ratio of the zoom sound included in the input sound changes according to the recording level. That is, when the recording level is high, the ratio of the zoom sound included in the input sound is large, and when the recording level is low, the ratio of the zoom sound included in the input sound is small.

そこで、フロアリング係数制御部６０では、録音レベル制御部１２によって設定される録音レベルが大きいときにはフロアリング係数βを小さく、録音レベルが小さいときにはフロアリング係数βを大きく制御する。具体的には、入力パワー演算部５８および平滑化処理部５９によって得られる入力音声の平均パワー値Power_Aveに対応したフロアリング係数βの値に録音レベルの逆数を乗じることで、最終的なフロアリング係数βをスペクトル減算部５５に与える。 Therefore, the flooring coefficient control unit 60 controls the flooring coefficient β to be small when the recording level set by the recording level control unit 12 is large, and increases the flooring coefficient β to be large when the recording level is small. Specifically, the final flooring is obtained by multiplying the flooring coefficient β corresponding to the average power value Power_Ave of the input sound obtained by the input power calculation unit 58 and the smoothing processing unit 59 by the reciprocal of the recording level. The coefficient β is given to the spectrum subtraction unit 55.

これにより、録音レベルが大きく、入力音声に対してズーム音が目立つときには、フロアリング係数βを小さくして、入力音声からズーム音を雑音として適切に除去することができる。また、録音レベルが小さく、ズーム音が目立たないときには、フロアリング係数βを大きくすることで、過剰な減算を抑えて入力音声をほぼ原形のままで出力することができる。 Thereby, when the recording level is high and the zoom sound is conspicuous with respect to the input sound, the flooring coefficient β can be reduced and the zoom sound can be appropriately removed from the input sound as noise. Also, when the recording level is low and the zoom sound is not noticeable, the input sound can be output almost in its original form while suppressing excessive subtraction by increasing the flooring coefficient β.

（第１の実施形態の変形例２）
図７に示すように、ズーム期間の前後に遷移期間Ａと遷移期間Ｂを設定する。遷移期間Ａは、例えばズームがＯＮ操作されたから実際にズーム動作が開始されるまでの間に設定される。遷移期間Ｂは、例えばズームがＯＦＦ操作されたから実際にズーム動作が終了するまでの間に設定される。遷移期間Ａではフロアリング係数βを１．０から前記第１の実施形態で設定された最適値まで滑らかに変化させ、遷移期間Ｂではフロアリング係数βをその最適値から１．０に滑らかに変化させる。 (Modification 2 of the first embodiment)
As shown in FIG. 7, a transition period A and a transition period B are set before and after the zoom period. The transition period A is set, for example, from when the zoom is turned on until when the zoom operation is actually started. The transition period B is set, for example, between when the zoom is turned OFF and when the zoom operation is actually finished. In the transition period A, the flooring coefficient β is smoothly changed from 1.0 to the optimum value set in the first embodiment, and in the transition period B, the flooring coefficient β is smoothly changed from the optimum value to 1.0. Change.

具体的に説明すると、図８に示すように、制御部３２に遷移期間Ａと遷移期間Ｂを設定するための遷移期間設定部３２ａが設けられる。遷移期間Ａにおいて、ズーム動作を開始する一定期間前に制御部３２からズーム予告信号をフロアリング係数制御部６０に出力する。フロアリング係数制御部６０は、その時点でズーム期間以前に得られた入力音声の平均パワー値Power_Aveに基づいてフロアリング係数βの最適値を算出し、フロアリング係数βを１．０から前記最適値になるように制御する。なお、実際には、例えば５０ｍｓｅｃ毎にフロアリング係数βの値が段階的に変化するので、これを直線補間により滑らかに変化させる。 Specifically, as shown in FIG. 8, the control unit 32 is provided with a transition period setting unit 32 a for setting the transition period A and the transition period B. In the transition period A, a zoom notice signal is output from the control unit 32 to the flooring coefficient control unit 60 before a certain period of time when the zoom operation is started. The flooring coefficient control unit 60 calculates an optimum value of the flooring coefficient β based on the average power value Power_Ave of the input sound obtained before the zoom period at that time, and the flooring coefficient β is changed from 1.0 to the optimal value. Control to be a value. Actually, since the value of the flooring coefficient β changes stepwise, for example, every 50 msec, it is smoothly changed by linear interpolation.

一方、遷移期間Ｂでは、ズーム動作が終了した時点で制御部３２からズーム終了信号をフロアリング係数制御部６０に出力する。フロアリング係数制御部６０は、このズーム終了信号を時湯新することにより、現在のフロアリング係数βを前記最適値から１．０になるように制御する。この場合も直線補間により滑らかに変化させる。 On the other hand, in the transition period B, a zoom end signal is output from the control unit 32 to the flooring coefficient control unit 60 when the zoom operation is completed. The flooring coefficient control unit 60 controls the current flooring coefficient β to be 1.0 from the optimum value by updating the zoom end signal. Also in this case, it is changed smoothly by linear interpolation.

このように、ズーム期間の前後の遷移期間Ａと遷移期間Ｂでフロアリング係数βを滑らかに変化させることで、入力音声に対するスペクトル減算処理が緩やかなものになるため、ズーム期間前後での急激な音量変化をさらに抑えることができる。 Thus, by smoothly changing the flooring coefficient β in the transition period A and the transition period B before and after the zoom period, the spectrum subtraction process for the input sound becomes gentle. Volume change can be further suppressed.

なお、前記変形例１のように録音レベルに基づいてフロアリング係数βの最適値を算出した場合も同様であり、その最適値に合わせてフロアリング係数βを緩やかに変化させることで、ズーム期間前後での急激な音量変化をさらに抑えることができる。 The same applies to the case where the optimum value of the flooring coefficient β is calculated based on the recording level as in the first modification, and the zoom period is changed by gradually changing the flooring coefficient β according to the optimum value. Sudden volume changes before and after can be further suppressed.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。
第２の実施形態では、スペクトル減算を利用して、ズーム音だけでなく、デジタルカメラ１の電源がＯＮしいるときに回路基板上の電子回路から定常的に発生している雑音を除去する構成としている。 (Second Embodiment)
Next, a second embodiment of the present invention will be described.
In the second embodiment, not only the zoom sound but also noise that is constantly generated from the electronic circuit on the circuit board when the power of the digital camera 1 is ON is used by using spectral subtraction. It is said.

音声記録装置として基本的な構成は図３と同様である。雑音スペクトル記憶部５４には、ズーム期間中に発生する雑音つまりズーム音を周波数単位でスペクトル化した信号が雑音スペクトル信号Ｎ（ω）として記憶されている。この雑音スペクトル信号Ｎ（ω）の特定の周波数の値を所定の値に置き換えておく。「特定の周波数」とは、回路基板上の電子回路から定常的に発生している雑音に対応した周波数であり、ズーム音の周波数に比べて高い周波数である。また、「所定の値」とは、通常の雑音スペクトル信号Ｎ（ω）からは得られることのない大きな値とする。 The basic configuration of the audio recording apparatus is the same as that shown in FIG. The noise spectrum storage unit 54 stores noise generated during the zoom period, that is, a signal obtained by spectralizing the zoom sound in frequency units as a noise spectrum signal N (ω). The value of the specific frequency of the noise spectrum signal N (ω) is replaced with a predetermined value. The “specific frequency” is a frequency corresponding to noise constantly generated from the electronic circuit on the circuit board, and is higher than the frequency of the zoom sound. The “predetermined value” is a large value that cannot be obtained from the normal noise spectrum signal N (ω).

なお、多数の電子回路から発生される雑音を除去する場合には、雑音スペクトル信号Ｎ（ω）の中の各電子回路の雑音周波数に対応した部分を所定の値にそれぞれ置き換えておくものとする。 When removing noise generated from a large number of electronic circuits, the portion corresponding to the noise frequency of each electronic circuit in the noise spectrum signal N (ω) is replaced with a predetermined value. .

以下に、このような構成の音声処理について説明する。 Hereinafter, the voice processing having such a configuration will be described.

図９は第２の実施形態におけるデジタルカメラ１の動画記録時の音声処理の動作を示すフローチャートである。なお、このフローチャートで示される処理は、コンピュータである制御部３２によって読み取り可能なプログラムの形態でＲＯＭ等の記録媒体に予め記録されているものとする。 FIG. 9 is a flowchart showing an operation of audio processing during moving image recording of the digital camera 1 according to the second embodiment. Note that the processing shown in this flowchart is recorded in advance in a recording medium such as a ROM in the form of a program readable by the control unit 32 which is a computer.

図９において、ステップＢ１１〜Ｂ１７までの処理は、図５のステップＡ１１〜Ａ１７までの処理と同様である。すなわち、マイクロホン部７より入力された音声信号のスペクトル信号Ｙ（ω）が生成された後（ステップＢ１１，Ｂ１２）、ズーム動作の有無に応じてフロアリング係数βの値が設定される（ステップＢ１３〜Ｂ１７）。上述したように、ズーム動作がない場合には、フロアリング係数β＝１に設定されるので、Ｓ(ω)＝Ｙ(ω)となる。また、ズーム動作がある場合には、入力音声の平均パワー値Power_Aveに基づいてフロアリング係数βが最適値に設定され、そのときのフロアリング係数βの値によってＳ(ω)が制御されることになる。 In FIG. 9, the process from step B11 to B17 is the same as the process from step A11 to A17 in FIG. That is, after the spectrum signal Y (ω) of the audio signal input from the microphone unit 7 is generated (steps B11 and B12), the value of the flooring coefficient β is set according to the presence or absence of the zoom operation (step B13). ~ B17). As described above, when there is no zoom operation, the flooring coefficient β = 1 is set, so S (ω) = Y (ω). When there is a zoom operation, the flooring coefficient β is set to an optimum value based on the average power value Power_Ave of the input sound, and S (ω) is controlled by the value of the flooring coefficient β at that time. become.

ここで、スペクトル減算部５５では、雑音スペクトル記憶部５４に記憶された雑音スペクトル信号Ｎ（ω）と入力音声のスペクトル信号Ｙ（ω）を照らし合わせながら、入力音声に含まれる雑音つまりズーム音を除去する処理（スペクトル減算処理）を行っている。その際、雑音スペクトル記憶部５４に記憶された雑音スペクトル信号Ｎ（ω）から前記所定の値を検出した場合（ステップＢ１８のＹｅｓ）、スペクトル減算部５５は、前記特定の周波数における出力Ｓ(ω)をゼロとして出力する（ステップＢ１９）。一方、前記所定の値を検出しなかった場合には（ステップＢ１８のＮｏ）、前記（１）式に従ってＳ(ω)の値を求めて出力する（ステップＢ２０）。 Here, the spectrum subtracting section 55 compares the noise spectrum signal N (ω) stored in the noise spectrum storage section 54 with the spectrum signal Y (ω) of the input sound, and the noise included in the input sound, that is, the zoom sound. The removal process (spectral subtraction process) is performed. At this time, when the predetermined value is detected from the noise spectrum signal N (ω) stored in the noise spectrum storage unit 54 (Yes in Step B18), the spectrum subtraction unit 55 outputs the output S (ω at the specific frequency. ) Is output as zero (step B19). On the other hand, when the predetermined value is not detected (No in step B18), the value of S (ω) is obtained and output according to the equation (1) (step B20).

以後の処理は前記第１の実施形態と同様であり、逆フーリエ変換部５６では、スペクトル減算部５５から出力される雑音除去後の音声スペクトル信号Ｓ(ω)を逆フーリエ変換して時間領域信号に変換する（ステップＢ２１）。波形合成部５７では、前後のフレームデータから連続的な音声信号を生成し（ステップＢ２２）、動画データとタイミングをあわせて記録メディアであるメモリ３８に記録する(ステップＢ２３)。 Subsequent processing is the same as in the first embodiment, and the inverse Fourier transform unit 56 performs inverse Fourier transform on the noise-removed speech spectrum signal S (ω) output from the spectrum subtraction unit 55 to perform a time domain signal. (Step B21). The waveform synthesizer 57 generates a continuous audio signal from the preceding and following frame data (step B22), and records it in the memory 38, which is a recording medium, together with the moving image data (step B23).

このようにして、Ｎ（ω）＝所定の値（通常よりも大きな値）である場合には、フロアリング係数βの値に関係なく、スペクトル減算部５５の出力Ｓ（ω）がゼロになるので、メモリ３８に記録される音声信号は常に特定周波数の成分が欠落した信号となる。 In this way, when N (ω) = predetermined value (a value larger than normal), the output S (ω) of the spectrum subtracting unit 55 becomes zero regardless of the value of the flooring coefficient β. Therefore, the audio signal recorded in the memory 38 is always a signal lacking a specific frequency component.

以上のように、第２の実施形態によれば、スペクトル減算に用いられる雑音スペクトル信号Ｎ（ω）の特定の周波数を所定の値に変更しておくだけで、ノッチフィルタと同じ効果が得られる。したがって、専用のソフト的なノッチ処理やノッチフィルタ回路を使用することなく、ズーム音だけでなく、電子回路から混入してくる特定周波数帯の雑音を入力音声から適切に除去して記録することができる。 As described above, according to the second embodiment, the same effect as the notch filter can be obtained only by changing a specific frequency of the noise spectrum signal N (ω) used for spectrum subtraction to a predetermined value. . Therefore, without using a dedicated soft notch processing or notch filter circuit, it is possible to properly remove not only the zoom sound but also noise in a specific frequency band mixed from the electronic circuit from the input sound and record it. it can.

（第２の実施形態の変形例１）
図１０は第２の実施形態の変形例１における音声記録装置の構成を示すブロック図である。なお、図３と同一部分には同一符号を付して、その説明は省略するものとする。 (Modification 1 of 2nd Embodiment)
FIG. 10 is a block diagram showing a configuration of an audio recording apparatus in Modification 1 of the second embodiment. The same parts as those in FIG. 3 are denoted by the same reference numerals, and description thereof will be omitted.

雑音スペクトル記憶部５４に記憶された雑音スペクトル信号Ｎ（ω）はそのままとして、新たに制御データ記憶部６３を設ける。この制御データ記憶部６３には、上述した回路雑音である特定の周波数を指定するための制御データが記憶されている。具体的には、前記特定の周波数では“１”、その他の周波数では“０”の値を有するノッチ用のスペクトル信号が制御データとして制御データ記憶部６３にプリセットされている。 A control data storage unit 63 is newly provided with the noise spectrum signal N (ω) stored in the noise spectrum storage unit 54 as it is. The control data storage unit 63 stores control data for designating a specific frequency that is the circuit noise described above. Specifically, a notch spectrum signal having a value of “1” at the specific frequency and “0” at the other frequencies is preset in the control data storage unit 63 as control data.

ここで、ズーム期間以前の入力音声の平均パワー値Power_Aveが所定の値より小さい場合に、スペクトル減算部５５は制御データ記憶部６３を参照する。そして、制御データ記憶部６３にプリセットされた制御データが“１”のときに、前記フロアリング係数βに関わらず、Ｓ(ω)＝０とする。一方、入力音声の平均パワー値Power_Aveが所定の値より大きい場合には、前記（１）式に従ってスペクトル減算処理を行う。その他の処理については、前記第２の実施形態と同様である。 Here, when the average power value Power_Ave of the input sound before the zoom period is smaller than a predetermined value, the spectrum subtraction unit 55 refers to the control data storage unit 63. When the control data preset in the control data storage unit 63 is “1”, S (ω) = 0 is set regardless of the flooring coefficient β. On the other hand, when the average power value Power_Ave of the input voice is larger than a predetermined value, the spectrum subtraction process is performed according to the above equation (1). Other processes are the same as those in the second embodiment.

これにより、入力音声の信号レベルが大きく、電子回路から混入する特定周波数の雑音（回路雑音）が目立たない場合にはノッチフィルタ効果をなくし、信号レベルが小さく、（回路雑音）が目立つ場合のみノッチフィルタ効果が得られるようになり、結果的に原音声により近い音声信号を記録することができる。 This eliminates the notch filter effect when the signal level of the input voice is large and noise (circuit noise) of a specific frequency mixed from the electronic circuit is not noticeable, and is notch only when the signal level is small and (circuit noise) is noticeable. A filter effect can be obtained, and as a result, an audio signal closer to the original audio can be recorded.

なお、前記各実施形態では、音声付き動画撮影可能なデジタルカメラを例にして説明したが、本発明はデジタルカメラに限らず、例えばカメラ付きの携帯電話など、音声信号と共に撮影画像を記録可能な機能を備えた電子機器であれば、そのすべてに適用可能である。 In each of the above-described embodiments, a digital camera capable of shooting a moving image with sound has been described as an example. Any electronic device having a function can be applied to all of them.

また、ズーム音に限らず、例えばフォーカス音など、撮影操作に伴って発生する駆動音であれば、前記同様の手法にて除去することができる。 Further, not only the zoom sound but also a driving sound generated with a photographing operation such as a focus sound can be removed by the same method as described above.

要するに、本発明は前記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、前記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 In short, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the respective embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

また、上述した実施形態において記載した手法は、コンピュータに実行させることのできるプログラムとして、例えば磁気ディスク（フレキシブルディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ等）、半導体メモリなどの記録媒体に書き込んで各種装置に適用したり、そのプログラム自体をネットワーク等の伝送媒体により伝送して各種装置に適用することも可能である。本装置を実現するコンピュータは、記録媒体に記録されたプログラムあるいは伝送媒体を介して提供されたプログラムを読み込み、このプログラムによって動作が制御されることにより、上述した処理を実行する。 In addition, the method described in the above-described embodiment is a program that can be executed by a computer, such as a magnetic disk (flexible disk, hard disk, etc.), an optical disk (CD-ROM, DVD-ROM, etc.), a semiconductor memory, etc. The program can be written on a medium and applied to various apparatuses, or the program itself can be transmitted through a transmission medium such as a network and applied to various apparatuses. A computer that implements this apparatus reads a program recorded on a recording medium or a program provided via a transmission medium, and performs the above-described processing by controlling operations by this program.

図１は本発明の撮像装置としてデジタルカメラを例にした場合の外観構成を示す図であり、図１（ａ）は主に前面の構成、同図（ｂ）は主に背面の構成を示す斜視図である。1A and 1B are diagrams showing an external configuration when a digital camera is taken as an example of the imaging apparatus of the present invention. FIG. 1A mainly shows a front configuration, and FIG. 1B mainly shows a rear configuration. It is a perspective view. 図２はデジタルカメラの電子回路構成を示すブロック図である。FIG. 2 is a block diagram showing an electronic circuit configuration of the digital camera. 図３は本発明の第１の実施形態に係るデジタルカメラに用いられる音声記録装置の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of the audio recording apparatus used in the digital camera according to the first embodiment of the present invention. 図４はＳＳ法（スペクトルサブトラクション法）を用いた雑音除去処理を説明するための図である。FIG. 4 is a diagram for explaining a noise removal process using the SS method (spectral subtraction method). 図５は第１の実施形態におけるデジタルカメラ１の動画記録時の音声処理の動作を示すフローチャートである。FIG. 5 is a flowchart showing an operation of audio processing during moving image recording of the digital camera 1 in the first embodiment. 図６は第１の実施形態の変形例１における音声記録装置の構成を示すブロック図である。FIG. 6 is a block diagram showing a configuration of an audio recording apparatus in Modification 1 of the first embodiment. 図７は第１の実施形態の変形例２におけるフロアリング係数の変更方法を説明するための図である。FIG. 7 is a diagram for explaining a method of changing a flooring coefficient in the second modification of the first embodiment. 図８は第１の実施形態の変形例２における音声記録装置の構成を部分的に示すブロック図である。FIG. 8 is a block diagram partially showing the configuration of the audio recording apparatus according to the second modification of the first embodiment. 図９は本発明の第２の実施形態におけるデジタルカメラの動画記録時の音声処理の動作を示すフローチャートである。FIG. 9 is a flowchart showing the sound processing operation when recording a moving image in the digital camera according to the second embodiment of the present invention. 図１０は第２の実施形態の変形例１における音声記録装置の構成を示すブロック図である。FIG. 10 is a block diagram showing a configuration of an audio recording apparatus in Modification 1 of the second embodiment.

Explanation of symbols

１…デジタルカメラ、２…ボディ、３…撮影レンズ、７…マイクロホン部、９…シャッタキー、２０ａ，２０ｂ…ズームキー、２１…モータ、２１ａ…モータ駆動部、３２…制御部、３２ａ…遷移期間設定部、３６…キー入力部、５１…Ａ／Ｄ変換部、５２…フレーム分割部、５３…フーリエ変換部、５４…雑音スペクトル記憶部、５５…スペクトル減算部、５５ａ…下限値設定部、５６…逆フーリエ変換部、５７…波形合成部、５８…入力パワー演算部、５９…平滑化処理部、６０…フロアリング係数制御部、６１…録音レベル制御部、６２…増幅器、６３…制御データ記憶部、Ｙ（ω）…入力音声のスペクトル信号、Ｎ（ω）…雑音スペクトル信号、Ｓ（ω）…雑音除去後の音声スペクトル信号。 DESCRIPTION OF SYMBOLS 1 ... Digital camera, 2 ... Body, 3 ... Shooting lens, 7 ... Microphone part, 9 ... Shutter key, 20a, 20b ... Zoom key, 21 ... Motor, 21a ... Motor drive part, 32 ... Control part, 32a ... Transition period setting , 36 ... key input part, 51 ... A / D conversion part, 52 ... frame division part, 53 ... Fourier transform part, 54 ... noise spectrum storage part, 55 ... spectrum subtraction part, 55a ... lower limit value setting part, 56 ... Inverse Fourier transform unit, 57 ... waveform synthesis unit, 58 ... input power calculation unit, 59 ... smoothing processing unit, 60 ... flooring coefficient control unit, 61 ... recording level control unit, 62 ... amplifier, 63 ... control data storage unit , Y (ω): input speech spectrum signal, N (ω): noise spectrum signal, S (ω): noise spectrum signal after noise removal.

Claims

In an imaging device capable of recording audio signals during shooting,
A mechanism that is driven in accordance with the shooting operation;
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting the value obtained by the subtraction process and the larger of the lower limit value as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means ;
Recording level control means for controlling the recording level of the audio signal;
With
The flooring coefficient control means reduces the flooring coefficient as the recording level set by the recording level control means is larger, and controls the flooring coefficient to be larger as the recording level is smaller. An imaging device.

2. The imaging apparatus according to claim 1, wherein the recording level control means controls the recording level of the audio signal in accordance with a change in the power value of the input audio signal.

In an imaging device capable of recording audio signals during shooting,
A mechanism that is driven in accordance with the shooting operation;
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting a value obtained by the subtraction process and a larger one of the lower limit values as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
And a transition period setting means for setting a predetermined transition period before and after the driving of the mechanism,
The flooring coefficient control means loosens the flooring coefficient from a predetermined value to an optimum value according to the average power value of the audio signal in the transition period before the driving of the mechanism unit set by the transition period setting means. the changing, the mechanical portion of the to that imaging device characterized by causing gradual change to the predetermined value from the optimum value the flooring factor in the transition period after the driving.

In an imaging device capable of recording audio signals during shooting,
A mechanism that is driven in accordance with the shooting operation;
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting the value obtained by the subtraction process and the larger of the lower limit value as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
With
By replacing a specific frequency of the noise spectrum signal stored in the storage means with a predetermined value, the spectrum subtracting means sets the output to zero when the predetermined value is detected from the noise spectrum signal. it shall be the said imaging device.

In an imaging device capable of recording audio signals during shooting,
A mechanism that is driven in accordance with the shooting operation;
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting the value obtained by the subtraction process and the larger of the lower limit value as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
Control data storage means storing control data for designating a specific frequency of the noise spectrum signal stored in the storage means, and
The spectrum subtracting means sets the output of the specific frequency to zero based on control data stored in the control data storage means when the power of the audio signal is smaller than a predetermined value. It is that imaging device.

The imaging according to any one of claims 1 to 5, wherein the power acquisition unit acquires the power value of the audio signal by summing the spectrum signal converted by the conversion unit over all frequencies. apparatus.

The power acquisition unit, to claim 1, characterized in that to obtain the power value of the audio signal by averaging the power values of the input speech signal during a predetermined period before the drive of the mechanism portion The imaging device according to any one of 5 .

Inverse transform means for inversely transforming the speech spectrum signal after noise removal obtained by the spectrum subtracting means into the original speech signal;
6. An imaging apparatus according to claim 1 , further comprising: a recording unit that records an audio signal obtained by the inverse conversion unit together with a captured image.

The flooring coefficient control means, as the power value of the speech signal is greater, claims to increase the flooring factor, as the power value of the audio signal is small, and controls reduce the flooring factor Item 6. The imaging device according to any one of Items 1 to 5 .

A noise removal method for removing, as noise, sound generated from a drive unit in accordance with a shooting operation from input sound when performing video recording with sound,
Converting the input speech into a spectral signal;
A signal obtained by multiplying the mechanical sound spectrum signal obtained by previously spectralizing the mechanical sound from the input speech spectral signal obtained by the spectrum conversion by a predetermined coefficient is subtracted, and the spectrum signal of the input speech is floored A value obtained by multiplying a ring coefficient is set as a lower limit value, and the step of outputting the result of the subtraction process and the larger of the lower limit value as a speech spectrum signal after noise removal;
Obtaining a power value of an audio signal input before driving the mechanism unit;
Controlling the recording level of the audio signal;
The flooring coefficient is controlled based on the power value of the audio signal, and the flooring coefficient is decreased as the set recording level is increased, and the flooring coefficient is controlled as the set recording level is decreased. And a noise removing method comprising the steps of:

A computer of an imaging device that can record an audio signal at the time of shooting and includes a mechanism unit that is driven in accordance with a shooting operation.
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting a value obtained by the subtraction process and a larger one of the lower limit values as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
Recording level control means for controlling the recording level of the audio signal;
To function,
The flooring coefficient control means reduces the flooring coefficient as the recording level set by the recording level control means is larger, and controls the flooring coefficient to be larger as the recording level is smaller. Program to do.

A computer of an imaging device that can record an audio signal at the time of shooting and includes a mechanism unit that is driven in accordance with a shooting operation.
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting a value obtained by the subtraction process and a larger one of the lower limit values as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
Transition period setting means for setting a predetermined transition period before and after driving of the mechanism unit;
To function,
The flooring coefficient control means loosens the flooring coefficient from a predetermined value to an optimum value according to the average power value of the audio signal in the transition period before the driving of the mechanism unit set by the transition period setting means. The flooring coefficient is gradually changed from the optimum value to the predetermined value in a transition period after the mechanism unit is driven.

A computer of an imaging device that can record an audio signal at the time of shooting and includes a mechanism unit that is driven in accordance with a shooting operation.
  Conversion means for converting the input audio signal into a spectrum signal;
  Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
  A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting a value obtained by the subtraction process and a larger one of the lower limit values as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
To function,
By replacing a specific frequency of the noise spectrum signal stored in the storage means with a predetermined value, the spectrum subtracting means sets the output to zero when the predetermined value is detected from the noise spectrum signal. A program characterized by that.

A computer of an imaging device that can record an audio signal at the time of shooting and includes a mechanism unit that is driven in accordance with a shooting operation.
Conversion means for converting the input audio signal into a spectrum signal;
Storage means for storing in advance a noise spectrum signal obtained by spectralizing sound generated when the mechanism unit is driven;
A signal obtained by multiplying a noise spectrum signal stored in the storage means by a predetermined coefficient is subtracted from the spectrum signal of the input voice obtained by the conversion means when the mechanism unit is driven, and a floor signal is added to the spectrum signal of the input voice. A value obtained by multiplying a ring coefficient is set as a lower limit value, and a spectral subtraction means for outputting a value obtained by the subtraction process and a larger one of the lower limit values as a speech spectrum signal after noise removal,
Power acquisition means for acquiring a power value of an audio signal input before driving the mechanism unit;
Flooring coefficient control means for controlling the flooring coefficient based on the power value of the audio signal acquired by the power acquisition means;
Control data storage means for storing control data for designating a specific frequency of the noise spectrum signal stored in the storage means;
To function,
The spectrum subtracting means sets the output of the specific frequency to zero based on control data stored in the control data storage means when the power of the audio signal is smaller than a predetermined value. Program to do.