JP5018860B2

JP5018860B2 - Signal processing apparatus and imaging apparatus

Info

Publication number: JP5018860B2
Application number: JP2009248953A
Authority: JP
Inventors: 豪松本; 光宏岡崎
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2009-10-29
Filing date: 2009-10-29
Publication date: 2012-09-05
Anticipated expiration: 2029-10-29
Also published as: JP2011097335A

Description

本発明は、音声信号に含まれるノイズ信号を低減する信号処理装置及び撮像装置に関する。 The present invention relates to a signal processing device and an imaging device that reduce a noise signal included in an audio signal.

目的の音とノイズ音とが混合された音声信号からノイズ音に基づく成分（ノイズ成分）を低減する方法として、取得される音声信号からノイズ音を推定し、推定されたノイズ音の信号（以下、ノイズ信号）を音声信号から減算することが一般的である（例えば特許文献１参照）。 As a method of reducing the component (noise component) based on the noise sound from the sound signal in which the target sound and the noise sound are mixed, the noise sound is estimated from the acquired sound signal, and the estimated noise sound signal (hereinafter referred to as the noise signal) In general, the noise signal is subtracted from the audio signal (see, for example, Patent Document 1).

特開２００５−１９５９５５号公報JP 2005-195955 A

このような方法では、予めノイズ音を取得している場合やノイズ音が周期的な音からなる場合には、ノイズ音の大きさやノイズ音が含まれるタイミングは容易に推定できるので、音声信号からノイズ信号を適切に低減することができる。しかしながら、装置内部の各種機構などが駆動したときの音（以下、動作音）をノイズ音とした場合、上述した動作音は発生するタイミングが不定期であることから音声信号に含まれるノイズ音を推定することは難しい。このため、ノイズ信号を減算した後の音声信号には、ミュージカルノイズと呼ばれるノイズ成分が含まれてしまう恐れがある。 In such a method, when the noise sound is acquired in advance or when the noise sound is a periodic sound, the size of the noise sound and the timing including the noise sound can be easily estimated. The noise signal can be appropriately reduced. However, when the sound generated when various mechanisms in the device are driven (hereinafter referred to as operation sound) is a noise sound, the above-mentioned operation sound is generated at irregular timings, so that the noise sound included in the audio signal is not generated. It is difficult to estimate. For this reason, the audio signal after subtracting the noise signal may contain a noise component called musical noise.

本発明は、音声信号に含まれる動作音を適切に低減することができるようにした信号処理装置及び撮像装置を提供することを目的とする。 It is an object of the present invention to provide a signal processing apparatus and an imaging apparatus that can appropriately reduce operation sound included in an audio signal.

上述した課題を解決するために、本発明の信号処理装置は、目的音と動作音とが混合された時間関数で示される音声信号を所定時間幅で分割して得られる所定時間毎の複数の第１の音声信号を、周波数関数で示される第２の音声信号に変換する信号変換手段と、前記第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める算出手段と、複数の前記第２の音声信号の前記目的音を示す音声信号を基準信号として、前記第３の音声信号の各周波数帯域における周波数スペクトルの値と、係数が乗算された前記基準信号の対応する周波数帯域における周波数スペクトルの値とを比較し、前記第３の音声信号における周波数スペクトルの値が、前記係数が乗算された前記基準信号における対応する周波数スペクトルの値と異なる場合に、前記第３の音声信号における周波数スペクトルの値を、前記対応する周波数スペクトルの値に置換する補正手段と、前記補正手段により補正処理が施された音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする。 In order to solve the above-described problem, the signal processing apparatus according to the present invention provides a plurality of predetermined time intervals obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width. The influence of the operation sound is reduced from signal conversion means for converting the first sound signal into a second sound signal represented by a frequency function, the second sound signal, and the sound signal indicating the operation sound. Calculating means for obtaining the third audio signal, and using as a reference signal an audio signal indicating the target sound of the plurality of second audio signals, a value of a frequency spectrum in each frequency band of the third audio signal; The frequency spectrum value in the corresponding frequency band of the reference signal multiplied by the coefficient is compared, and the value of the frequency spectrum in the third audio signal corresponds to the reference signal multiplied by the coefficient. Is different from the value of the wavenumber spectrum, the value of the third frequency spectrum in the speech signal, and correcting means for replacing said on the value of the corresponding frequency spectrum, the audio signal correction processing has been performed by the correction means, And a signal reverse conversion means for performing reverse conversion from the sound signal indicated by the frequency function to the sound signal indicated by the time function.

また、本発明の信号処理装置は、目的音と動作音とが混合された時間関数で示される音声信号を所定時間幅で分割して得られる所定時間毎の複数の第１の音声信号を、周波数関数で示される第２の音声信号に変換する信号変換手段と、前記第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める算出手段と、複数の前記第２の音声信号の前記目的音を示す音声信号を基準信号として、前記第３の音声信号の各周波数帯域における周波数スペクトルの値と、係数が乗算された前記基準信号の対応する周波数帯域における周波数スペクトルの値とを比較し、前記第３の音声信号における周波数スペクトルの値が、前記係数が乗算された前記基準信号における対応する周波数スペクトルの値未満となる場合に、前記第３の音声信号における周波数スペクトルの値を、前記対応する周波数スペクトルの値を下回らないように補正する補正手段と、前記補正手段により補正処理が施された音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする Further, the signal processing apparatus of the present invention is configured to obtain a plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width. A third audio signal in which the influence of the operation sound is reduced from signal conversion means for converting to a second audio signal indicated by a frequency function, the second audio signal, and the audio signal indicating the operation sound. The calculation means to be obtained, and the reference obtained by multiplying the value of the frequency spectrum in each frequency band of the third audio signal by a coefficient, using the audio signal indicating the target sound of the plurality of second audio signals as a reference signal The frequency spectrum value in the corresponding frequency band of the signal is compared, and the frequency spectrum value in the third audio signal is less than the corresponding frequency spectrum value in the reference signal multiplied by the coefficient The correction means for correcting the frequency spectrum value in the third audio signal so as not to fall below the value of the corresponding frequency spectrum, and the audio signal subjected to the correction processing by the correction means, A signal reverse conversion means for performing reverse conversion from an audio signal represented by a frequency function to an audio signal represented by the time function.

また、本発明の信号処理装置は、目的音と動作音とが混合された時間関数で示される音声信号を所定時間幅で分割して得られる所定時間毎の複数の第１の音声信号を、周波数関数で示される第２の音声信号に変換する信号変換手段と、前記第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める算出手段と、複数の前記第２の音声信号の前記目的音を示す音声信号を基準信号として、前記第３の音声信号の各周波数帯域における周波数スペクトルの値と、係数が乗算された前記基準信号の対応する周波数帯域における周波数スペクトルの値とを比較し、前記第３の音声信号における周波数スペクトルの値が、前記係数が乗算された前記基準信号における対応する周波数スペクトルを超過する場合に、前記第３の音声信号における周波数スペクトルの値を、前記対応する周波数スペクトルの値を上回らないように補正する補正手段と、前記補正手段により補正処理が施された音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする。 Further, the signal processing apparatus of the present invention is configured to obtain a plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width. A third audio signal in which the influence of the operation sound is reduced from signal conversion means for converting to a second audio signal indicated by a frequency function, the second audio signal, and the audio signal indicating the operation sound. The calculation means to be obtained, and the reference obtained by multiplying the value of the frequency spectrum in each frequency band of the third audio signal by a coefficient, using the audio signal indicating the target sound of the plurality of second audio signals as a reference signal A frequency spectrum value in the corresponding frequency band of the signal is compared, and the frequency spectrum value in the third audio signal exceeds the corresponding frequency spectrum in the reference signal multiplied by the coefficient In this case, correction means for correcting the frequency spectrum value in the third audio signal so as not to exceed the value of the corresponding frequency spectrum, and the audio signal subjected to correction processing by the correction means, And a signal reverse conversion means for performing reverse conversion from the audio signal indicated by the function to the audio signal indicated by the time function.

また、本発明の信号処理装置は、目的音と動作音とが混合された時間関数で示される音声信号を所定時間幅で分割して得られる所定時間毎の複数の第１の音声信号を、周波数関数で示される第２の音声信号に変換する信号変換手段と、前記第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める算出手段と、複数の前記第２の音声信号の前記目的音を示す音声信号を基準信号として、前記第３の音声信号の各周波数帯域における周波数スペクトルの値と、係数が乗算された前記基準信号の対応する周波数帯域における周波数スペクトルの値とを比較し、前記第３の音声信号における周波数スペクトルの値が、前記係数が乗算された前記基準信号における対応する周波数スペクトルの値と異なる場合に、前記第３の音声信号における周波数スペクトルの値を、前記係数が乗算された前記基準信号の周波数スペクトルのうち、前記対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値を用いて算出される補正値に置換する補正手段と、前記補正手段により補正処理が施された音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする。 Further, the signal processing apparatus of the present invention is configured to obtain a plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width. A third audio signal in which the influence of the operation sound is reduced from signal conversion means for converting to a second audio signal indicated by a frequency function, the second audio signal, and the audio signal indicating the operation sound. The calculation means to be obtained, and the reference obtained by multiplying the value of the frequency spectrum in each frequency band of the third audio signal by a coefficient, using the audio signal indicating the target sound of the plurality of second audio signals as a reference signal The frequency spectrum value in the corresponding frequency band of the signal is compared, and the frequency spectrum value in the third audio signal is different from the corresponding frequency spectrum value in the reference signal multiplied by the coefficient. The frequency spectrum value in the third audio signal is the value of the corresponding frequency spectrum and the frequency spectrum value in the vicinity of the frequency spectrum of the reference signal multiplied by the coefficient. A correction unit that replaces the correction value calculated by using the correction unit, and the audio signal that has been corrected by the correction unit is inversely converted from the audio signal represented by the frequency function to the audio signal represented by the time function. And a signal reverse conversion means .

また、本発明の信号処理装置は、目的音と動作音とが混合された時間関数で示される音声信号を所定時間幅で分割して得られる所定時間毎の複数の第１の音声信号を、周波数関数で示される第２の音声信号に変換する信号変換手段と、前記第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める算出手段と、複数の前記第２の音声信号の前記目的音を示す音声信号を基準信号として、前記第３の音声信号の各周波数帯域における周波数スペクトルの値と、係数が乗算された前記基準信号の対応する周波数帯域における周波数スペクトルの値とを比較し、前記第３の音声信号における周波数スペクトルの値が、前記係数が乗算された前記基準信号における対応する周波数スペクトルの値を超過する場合に、前記第３の音声信号における周波数スペクトルの値を、前記係数が乗算された前記基準信号の周波数スペクトルのうち、前記対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値を用いて算出される補正値を上回らないように補正する補正手段と、前記補正手段により補正処理が施された音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする。Further, the signal processing apparatus of the present invention is configured to obtain a plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width. A third audio signal in which the influence of the operation sound is reduced from signal conversion means for converting to a second audio signal indicated by a frequency function, the second audio signal, and the audio signal indicating the operation sound. The calculation means to be obtained, and the reference obtained by multiplying the value of the frequency spectrum in each frequency band of the third audio signal by a coefficient, using the audio signal indicating the target sound of the plurality of second audio signals as a reference signal The frequency spectrum value in the corresponding frequency band of the signal is compared, and the frequency spectrum value in the third audio signal exceeds the corresponding frequency spectrum value in the reference signal multiplied by the coefficient. When the frequency spectrum value of the third audio signal is determined, the frequency spectrum value of the reference signal multiplied by the coefficient and the value of the corresponding frequency spectrum and the frequency spectrum value of the frequency band in the vicinity thereof are used. A correction means for correcting so as not to exceed a correction value calculated by using the sound signal, and a voice signal subjected to correction processing by the correction means from a voice signal indicated by the frequency function and a voice indicated by the time function And a signal reverse conversion means for performing reverse conversion to a signal.

また、本発明の信号処理装置は、目的音と動作音とが混合された時間関数で示される音声信号を所定時間幅で分割して得られる所定時間毎の複数の第１の音声信号を、周波数関数で示される第２の音声信号に変換する信号変換手段と、前記第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める算出手段と、複数の前記第２の音声信号の前記目的音を示す音声信号を基準信号として、前記第３の音声信号の各周波数帯域における周波数スペクトルの値と、係数が乗算された前記基準信号の対応する周波数帯域における周波数スペクトルの値とを比較し、前記第３の音声信号における周波数スペクトルの値が、前記係数が乗算された前記基準信号における対応する周波数スペクトルの値未満となる場合に、前記第３の音声信号における周波数スペクトルの値を、前記係数が乗算された前記基準信号の周波数スペクトルのうち、前記対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値を用いて算出される補正値を下回らないように補正する補正手段と、前記補正手段により補正処理が施された音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする。Further, the signal processing apparatus of the present invention is configured to obtain a plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width. A third audio signal in which the influence of the operation sound is reduced from signal conversion means for converting to a second audio signal indicated by a frequency function, the second audio signal, and the audio signal indicating the operation sound. The calculation means to be obtained, and the reference obtained by multiplying the value of the frequency spectrum in each frequency band of the third audio signal by a coefficient, using the audio signal indicating the target sound of the plurality of second audio signals as a reference signal The frequency spectrum value in the corresponding frequency band of the signal is compared, and the frequency spectrum value in the third audio signal is less than the corresponding frequency spectrum value in the reference signal multiplied by the coefficient The frequency spectrum value in the third audio signal is the value of the corresponding frequency spectrum and the frequency spectrum value in the vicinity of the frequency spectrum of the reference signal multiplied by the coefficient. A correction unit that corrects the correction value so as not to fall below a correction value calculated by using the voice signal indicated by the time function from the voice signal indicated by the frequency function. And a signal reverse conversion means for performing reverse conversion to a signal.

なお、係数が乗算された基準信号の周波数スペクトルのうち、対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値を用いて補正値を算出する場合、前記補正値は、前記対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値の平均、前記対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値のうちの最大値、或いは前記対応する周波数スペクトルの値及びその近傍の周波数帯域の周波数スペクトルの値のうちの最小値のいずれかからなることが好ましい。In the case where the correction value is calculated using the corresponding frequency spectrum value and the frequency spectrum value in the frequency band in the vicinity of the frequency spectrum of the reference signal multiplied by the coefficient, the correction value corresponds to the corresponding frequency spectrum. The average of the frequency spectrum value and the frequency spectrum value of the nearby frequency band, the maximum value of the corresponding frequency spectrum value and the frequency spectrum value of the nearby frequency band, or the corresponding frequency spectrum value And the minimum value of the frequency spectrum values in the frequency band in the vicinity thereof.

また、本発明の撮像装置は、画像信号を取得する撮像手段と、前記撮像手段による前記画像信号の取得に同期して、音声信号を取得する収音手段と、上記に記載のいずれかの信号処理装置と、前記撮像手段により取得された画像信号と、前記信号処理装置により信号処理が施された音声信号とを記憶する記憶手段と、を備えたことを特徴とする。 The imaging device of the present invention includes an imaging unit that acquires an image signal, a sound collection unit that acquires an audio signal in synchronization with the acquisition of the image signal by the imaging unit, and any one of the signals described above The image processing apparatus includes a processing device, and a storage unit that stores an image signal acquired by the imaging unit and an audio signal subjected to signal processing by the signal processing device.

本発明によれば、音声信号に含まれる動作音を有効に低減することができる。 According to the present invention, it is possible to effectively reduce the operation sound included in the audio signal.

デジタルカメラの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a digital camera. 信号処理装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a signal processing apparatus. 取得される音声信号、窓関数、フレーム分割及びＡＦ動作信号の関係を示す図である。It is a figure which shows the relationship between the audio | voice signal acquired, a window function, a frame division | segmentation, and AF operation signal. 信号処理装置における信号処理の流れを示す図である。It is a figure which shows the flow of the signal processing in a signal processing apparatus. 他の実施形態における信号処理の流れを示す図である。It is a figure which shows the flow of the signal processing in other embodiment.

図１は、デジタルカメラ１０の構成を示す機能ブロック図である。周知のように、デジタルカメラ１０は、撮像光学系１５によって取り込まれた被写体光を撮像素子１６によって光電変換し、光電変換後の電気信号（画像信号）から画像データを取得する。 FIG. 1 is a functional block diagram showing the configuration of the digital camera 10. As is well known, the digital camera 10 photoelectrically converts the subject light captured by the imaging optical system 15 by the imaging device 16 and acquires image data from the electrical signal (image signal) after the photoelectric conversion.

撮像光学系１５は、複数のレンズから構成される。この撮像光学系１５を構成する各レンズは、ズーム倍率の変更時やフォーカス調整時に、レンズ駆動部１８の駆動により光軸Ｌに沿って移動する。この撮像光学系１５には、絞り１９が設けられる。この絞り１９は絞り開口の大きさを変化させることで、撮像素子１６に向けて入射する被写体光の光量を変化させる。この絞り１９は、予め設定された絞り値となるように、その絞り開口の大きさが絞り駆動部２０により変更される。 The imaging optical system 15 includes a plurality of lenses. The lenses constituting the imaging optical system 15 move along the optical axis L by driving the lens driving unit 18 when changing the zoom magnification or adjusting the focus. The imaging optical system 15 is provided with a diaphragm 19. This diaphragm 19 changes the amount of subject light incident on the image sensor 16 by changing the size of the aperture. The size of the aperture of the aperture 19 is changed by the aperture driver 20 so that the aperture value is set in advance.

シャッタ２１は、撮像光学系１５と撮像素子１６との間に配置される。このシャッタ２１は、撮像光学系１５を介して取り込まれる被写体光を撮像素子１６に照射させる開放状態と、該被写体光を遮光する遮光状態との間で切り替えられる。なお、撮影時にはシャッタ２１は、一旦遮光状態に保持された後、開放状態に切り替えられる。そして、シャッタ２１が開放状態に切り替えられてから予め設定された時間経過すると、再度遮光状態に切り替えられる。なお、このシャッタ２１における遮光状態と開放状態との間の切り替えは、シャッタ駆動部２２により実行される。 The shutter 21 is disposed between the imaging optical system 15 and the imaging element 16. The shutter 21 can be switched between an open state in which the imaging element 16 is irradiated with subject light captured via the imaging optical system 15 and a light shielding state in which the subject light is shielded. At the time of photographing, the shutter 21 is once held in a light shielding state and then switched to an open state. Then, when a preset time elapses after the shutter 21 is switched to the open state, the shutter 21 is switched to the light shielding state again. Note that switching between the light shielding state and the open state of the shutter 21 is performed by the shutter driving unit 22.

撮像素子１６は、例えばＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）やＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ−ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）などから構成される。撮像素子１６は、撮像光学系１５によって取り込まれる被写体光を受光し、受光した光量を信号電荷に変換（光電変換）して、変換した信号電荷を蓄積する。その後、撮像素子１６にて蓄積された信号電荷は、ＡＦＥ（ＡｎａｌｏｇＦｒｏｎｔＥｎｄ）回路２５に出力される。 The image pickup device 16 is configured by, for example, a charge coupled device (CCD), a complementary metal-oxide semiconductor (CMOS), or the like. The image sensor 16 receives subject light captured by the imaging optical system 15, converts the received light amount into signal charges (photoelectric conversion), and accumulates the converted signal charges. Thereafter, the signal charge accumulated in the image sensor 16 is output to an AFE (Analog Front End) circuit 25.

ＡＦＥ回路２５は、図示しないＡＧＣ回路、ＣＤＳ回路及びＡ／Ｄ変換回路を含んで構成される。ＡＦＥ回路２５は、入力された画像信号に対してゲインコントロール、雑音除去など処理を施す。これら処理の後、ＡＦＥ回路２５は、アナログの画像信号をデジタルの画像信号に変換する。このデジタルの画像信号は１コマ毎にまとめられて画像メモリ３０に記録される。なお、撮像素子１６やＡＦＥ回路２５は、図示を省略したタイミングジェネレータによって、その駆動タイミングが制御される。 The AFE circuit 25 includes an AGC circuit, a CDS circuit, and an A / D conversion circuit (not shown). The AFE circuit 25 performs processing such as gain control and noise removal on the input image signal. After these processes, the AFE circuit 25 converts the analog image signal into a digital image signal. The digital image signals are collected for each frame and recorded in the image memory 30. The drive timing of the image sensor 16 and the AFE circuit 25 is controlled by a timing generator (not shown).

画像処理回路３５は、画像メモリ３０に記憶された画像信号に対して、ホワイトバランス処理、色補間処理、輪郭補償処理、ガンマ処理などの画像処理を施す。これら画像処理の後、画像処理回路３５は、例えばＪＰＥＧ方式などの記憶方式で圧縮するためのフォーマット処理を施す。また、画像処理回路３５は、画像データに対して符号化処理や復号化処理を行う。なお、符号３７は記録用Ｉ／Ｆである。 The image processing circuit 35 performs image processing such as white balance processing, color interpolation processing, contour compensation processing, and gamma processing on the image signal stored in the image memory 30. After these image processes, the image processing circuit 35 performs a format process for compression using a storage system such as the JPEG system. The image processing circuit 35 performs encoding processing and decoding processing on the image data. Reference numeral 37 denotes a recording I / F.

ＬＣＤ３８は、デジタルカメラ１０にて取得された画像、記憶媒体３６に記憶された画像、撮影待機状態時に取り込まれるスルー画像を表示する他に、撮影条件等の設定や設定変更を行う際の設定画像などを表示する。デジタルカメラ１０にて取得された画像や記憶媒体３６に記憶された画像としては、静止画像の他に動画像が挙げられる。このＬＣＤ３８は、表示制御回路３９により制御される。スピーカ４０は、動画像をＬＣＤ３８に表示する際に動画像に対応付けられた音声などを出力する。このスピーカ４０における音声の出力制御は、音声制御回路４１により実行される。 The LCD 38 displays an image acquired by the digital camera 10, an image stored in the storage medium 36, a through image captured in a shooting standby state, and a setting image for setting or changing a shooting condition. Etc. are displayed. Examples of the image acquired by the digital camera 10 and the image stored in the storage medium 36 include moving images in addition to still images. The LCD 38 is controlled by a display control circuit 39. The speaker 40 outputs sound or the like associated with the moving image when the moving image is displayed on the LCD 38. The sound output control in the speaker 40 is executed by the sound control circuit 41.

収音部４３は、例えばマイクから構成され、例えば動画撮影や録音時の音声を取得する。この収音部４３により取得される音声信号は、音声メモリ４４に記録される。 The sound collection unit 43 is constituted by, for example, a microphone, and acquires, for example, sound during moving image shooting or recording. The audio signal acquired by the sound collection unit 43 is recorded in the audio memory 44.

信号処理装置４５は、収音部４３にて取得される音声信号に対して、該音声信号に含まれるノイズ音を低減する処理を施す。なお、ノイズ音としては、動画撮影や録音時に、デジタルカメラ１０の内部に設けられた各機構が駆動されるときに発生する動作音や、各種機構を駆動させる際に操作される操作音などが挙げられる。また、信号処理装置４５は、取得された音声信号（上述したノイズ音を低減する処理が施された音声信号を含む）に対する圧縮符号化する処理や、圧縮符号化する処理が施された音声信号を復号化する処理を行う。 The signal processing device 45 performs processing for reducing the noise sound included in the audio signal with respect to the audio signal acquired by the sound collection unit 43. Note that the noise sound includes an operation sound that is generated when each mechanism provided in the digital camera 10 is driven at the time of moving image recording or recording, and an operation sound that is operated when various mechanisms are driven. Can be mentioned. In addition, the signal processing device 45 compresses and encodes the acquired audio signal (including the audio signal subjected to the noise noise reduction process described above) and the audio signal subjected to the compression encoding process. The process which decodes is performed.

ＣＰＵ５０は、内蔵メモリ５１に記憶された制御プログラム（図示省略）を実行することで、デジタルカメラ１０の各部を統括的に制御する。このＣＰＵ５０における制御としては、レリーズボタン５２の操作に基づいた制御や、設定操作部５３の操作に基づいた制御が挙げられる。レリーズボタン５２の操作に基づいた制御としては、周知のＡＥ処理やＡＦ処理、撮像処理などが挙げられる。また、設定操作部５３の操作に基づいた制御としては、初期設定や撮影条件の設定等の処理が挙げられる。 The CPU 50 comprehensively controls each unit of the digital camera 10 by executing a control program (not shown) stored in the built-in memory 51. Examples of the control in the CPU 50 include control based on the operation of the release button 52 and control based on the operation of the setting operation unit 53. Examples of the control based on the operation of the release button 52 include known AE processing, AF processing, imaging processing, and the like. Control based on the operation of the setting operation unit 53 includes processing such as initial setting and setting of shooting conditions.

また、ＣＰＵ５０は、撮影時に取得される画像データを記憶媒体３６に書き込む処理を行う。例えば、静止画撮影を行った場合には、ＣＰＵ５０は、画像処理回路３５にて符号化処理が施された画像データを、デジタルカメラ１０の機種情報や撮影時の撮影情報などと１つのファイル（静止画像ファイル）としてまとめて記憶媒体３６に書き込む。同様にして、動画撮影を行った場合には動画撮影にて取得された各フレーム画像データが画像処理回路３５にて符号化処理が施されるので、これら符号化処理が施された各フレーム画像データと、信号処理回路４５にて信号処理が施された音声データとを、デジタルカメラ１０の機種情報や撮影時の撮影情報などと１つのファイル（動画像ファイル）としてまとめて記憶媒体３６に書き込む。 In addition, the CPU 50 performs a process of writing image data acquired at the time of shooting to the storage medium 36. For example, when still image shooting is performed, the CPU 50 converts the image data that has been encoded by the image processing circuit 35 into one file (model information of the digital camera 10, shooting information at the time of shooting, and the like). (Still image files) are collectively written in the storage medium 36. Similarly, when moving image shooting is performed, each frame image data obtained by moving image shooting is subjected to encoding processing by the image processing circuit 35. Therefore, each frame image subjected to these encoding processing is processed. The data and the audio data subjected to the signal processing by the signal processing circuit 45 are collectively written in the storage medium 36 as one file (moving image file) together with the model information of the digital camera 10 and shooting information at the time of shooting. .

次に、上述した信号処理装置４５の構成について、図２の機能ブロック図を用いて説明する。図２に示すように、信号処理装置４５は、周波数変換部６１、信号算出部６２、記憶部６３、信号補正部６４及び周波数逆変換部６５を備えている。 Next, the configuration of the signal processing device 45 described above will be described with reference to the functional block diagram of FIG. As shown in FIG. 2, the signal processing device 45 includes a frequency conversion unit 61, a signal calculation unit 62, a storage unit 63, a signal correction unit 64, and a frequency inverse conversion unit 65.

周波数変換部６１は、収音部４３により取得された音声信号を、時間関数で示される信号（時間領域信号）から周波数関数で示される信号（周波数領域信号）に変換する。まず、周波数変換部６１は、後述する窓関数における窓幅を決定する。この窓関数における窓幅を決定した後、周波数変換部６１は、決定された窓幅を１フレームとしたときに、１フレーム当たりのサンプル数が例えば１０２４となるように、入力される音声信号を分割する。 The frequency conversion unit 61 converts the audio signal acquired by the sound collection unit 43 from a signal (time domain signal) indicated by a time function to a signal (frequency domain signal) indicated by a frequency function. First, the frequency converter 61 determines a window width in a window function described later. After determining the window width in this window function, the frequency converting unit 61 converts the input audio signal so that the number of samples per frame is, for example, 1024 when the determined window width is one frame. To divide.

次に、周波数変換部６１は、０．５フレーム分ずらしながらハニング窓などの窓関数を掛けた後、窓関数が適用された音声信号にフーリエ変換処理を実行する。周知のように、ハニング窓と呼ばれる窓関数は、その両端値が０、中央値が１となる関数であることから、窓関数を掛け合わせた信号は、中心部分が強調された信号となる。このため、時間とともに変化する振動などの信号を１フレーム毎ずらして解析した場合には、特徴的な箇所を捕らえることが難しい。このため、０．５フレーム分ずらすことでオーバーラップさせた解析を行うことで、信号に特徴的な箇所を検出する。これら処理を行うことで、取得された音声信号が、０．５フレーム分ずらしながら、１フレーム毎に時間領域信号から周波数領域信号に変換される。これら処理が施された音声信号は、信号算出部６２及び信号補正部６４に出力される。 Next, the frequency conversion unit 61 multiplies a window function such as a Hanning window while shifting by 0.5 frames, and then performs a Fourier transform process on the audio signal to which the window function is applied. As is well known, since a window function called a Hanning window is a function having both end values of 0 and a median value of 1, a signal obtained by multiplying the window functions is a signal in which the central portion is emphasized. For this reason, when a signal such as vibration that changes with time is shifted and analyzed for each frame, it is difficult to capture a characteristic part. For this reason, a characteristic portion of the signal is detected by performing an analysis that is overlapped by shifting by 0.5 frame. By performing these processes, the acquired audio signal is converted from a time domain signal to a frequency domain signal for each frame while shifting by 0.5 frame. The audio signal subjected to these processes is output to the signal calculation unit 62 and the signal correction unit 64.

信号算出部６２は、取得された音声信号に含まれるノイズ音を低減する処理を行う。上述したように、収音部４３により取得された音声信号は、目的の音（目的音）と、ノイズ音（動作音）とが混合された信号からなる。信号算出部６２は、周波数変換部６１から出力される音声信号から、ノイズ音に基づく信号（ノイズ信号）を減算することで、音声信号に含まれるノイズ音を低減する。このノイズ信号は、周波数変換部６１から出力される音声信号と同一のフレーム幅に対応する周波数領域信号からなる。また、このノイズ信号は記憶部６３に予め記憶されている。 The signal calculation unit 62 performs a process of reducing noise noise included in the acquired audio signal. As described above, the audio signal acquired by the sound collection unit 43 includes a signal in which a target sound (target sound) and a noise sound (operation sound) are mixed. The signal calculation unit 62 subtracts a signal (noise signal) based on the noise sound from the sound signal output from the frequency conversion unit 61 to reduce the noise sound included in the sound signal. This noise signal consists of a frequency domain signal corresponding to the same frame width as the audio signal output from the frequency converter 61. The noise signal is stored in the storage unit 63 in advance.

なお、本実施形態では、ノイズ信号を記憶部６３に予め記憶させておき、周波数変換部６１から出力された１フレーム毎の音声信号から、記憶部６３に記憶されたノイズ信号を減算するが、これに限定される必要はなく、係数を乗算したノイズ信号を、周波数変換部６１から出力される各フレームの音声信号から減算することも可能である。 In the present embodiment, the noise signal is stored in the storage unit 63 in advance, and the noise signal stored in the storage unit 63 is subtracted from the audio signal for each frame output from the frequency conversion unit 61. The present invention is not limited to this, and the noise signal multiplied by the coefficient can be subtracted from the audio signal of each frame output from the frequency converter 61.

また、ノイズ信号を記憶部６３に予め記憶させておくのではなく、従来のノイズ推定の手法を用いてノイズ信号を取得し、取得したノイズ信号、又は該ノイズ信号に係数を乗算した後の信号を、周波数変換部６１から出力された１フレーム毎の音声信号から減算してもよい。 In addition, the noise signal is not stored in the storage unit 63 in advance, but the noise signal is acquired using a conventional noise estimation method, or the acquired noise signal or a signal obtained by multiplying the noise signal by a coefficient is acquired. May be subtracted from the audio signal for each frame output from the frequency converter 61.

信号補正部６４は、信号算出部６２によりノイズ成分が低減された音声信号（以下、低減処理済みの音声信号）に対する補正を実行する。この信号補正部６４には、各フレームの周波数領域信号のうち、ノイズ音が混合されていない、つまり目的音のみからなる音声信号が周波数変換部６１から入力される。信号補正部６４は、目的音のみの音声信号を基準信号とした上で、この基準信号に対して係数を乗算する。基準信号に対して係数を乗算した後、信号補正部６４は、係数を乗算した基準信号に基づいて、低減処理済みの音声信号を補正する。この補正が行われた補正済み音声信号は、周波数逆変換部６５に出力される。周波数逆変換部６５は、補正済み音声信号を、周波数関数で示される信号から時間関数で示される信号に逆変換する。時間関数で示される信号に逆変換された音声信号は、音声メモリ４４に書き込まれる。 The signal correction unit 64 performs correction on the audio signal whose noise component has been reduced by the signal calculation unit 62 (hereinafter, the audio signal that has been subjected to reduction processing). The signal correcting unit 64 receives an audio signal from the frequency domain signal of each frame, in which noise sound is not mixed, that is, only the target sound is input from the frequency converting unit 61. The signal correction unit 64 uses an audio signal of only the target sound as a reference signal, and multiplies the reference signal by a coefficient. After multiplying the reference signal by a coefficient, the signal correction unit 64 corrects the audio signal that has been subjected to reduction processing based on the reference signal multiplied by the coefficient. The corrected audio signal subjected to this correction is output to the frequency inverse transform unit 65. The frequency inverse transform unit 65 inversely transforms the corrected audio signal from a signal represented by a frequency function to a signal represented by a time function. The audio signal inversely converted into a signal represented by a time function is written in the audio memory 44.

次に、図３及び図４を用いて、信号処理装置４５における信号処理の流れについて説明する。図３に示すように、収音部４３により取得される音声は、数十ｍｓ程度の短時間においては、周期的な信号となる。上述したように、周波数変換部６１は、音声信号が入力されると、窓関数における窓幅を設定した後、フレーム分割を行う。 Next, the flow of signal processing in the signal processing device 45 will be described with reference to FIGS. 3 and 4. As shown in FIG. 3, the sound acquired by the sound collection unit 43 becomes a periodic signal in a short time of about several tens of ms. As described above, when an audio signal is input, the frequency converting unit 61 sets a window width in the window function and then performs frame division.

上述したように、周波数変換部６１は、０．５フレーム分ずらしながらハニング窓などの窓関数を掛けた後、窓関数が適用された音声信号にフーリエ変換処理を実行する。このため、周波数変換部６１に入力される音声信号に対して上述した処理が施されると、符号７１で示す領域及び符号７２で示す領域を１フレームとする音声信号、符号７２で示す領域及び符号７３で示す領域を１フレームとする音声信号、・・・・の順で１フレーム毎の音声信号が生成され、信号算出部６２に出力される。信号算出部６２は、１フレーム毎の音声信号が入力されると、記録部６３に記録された動作音からなるノイズ信号を読み出し、１フレーム毎の音声信号からノイズ信号を減算する。 As described above, the frequency conversion unit 61 multiplies a window function such as a Hanning window while shifting by 0.5 frames, and then performs a Fourier transform process on the audio signal to which the window function is applied. For this reason, when the above-described processing is performed on the audio signal input to the frequency converting unit 61, the audio signal having the area indicated by reference numeral 71 and the area indicated by reference numeral 72 as one frame, the area indicated by reference numeral 72, and An audio signal for each frame is generated in the order of an audio signal having the region indicated by reference numeral 73 as one frame, and so on, and is output to the signal calculation unit 62. When an audio signal for each frame is input, the signal calculation unit 62 reads a noise signal composed of an operation sound recorded in the recording unit 63 and subtracts the noise signal from the audio signal for each frame.

例えば、収音部４３により音声を取得している際にＡＦ（オートフォーカス）処理が実行されるときには、ＡＦ動作信号が出力される。このＡＦ動作信号が出力されることを受けて、レンズ駆動部１８が駆動し、撮像光学系１５を構成するレンズが光軸Ｌ方向に移動する。このレンズ駆動部１８の駆動及び撮像光学系１５を構成するレンズの移動時には、その動作音が生じる。このため、収音部４３により取得される音声信号は、目的音とノイズ音とが混合された音声信号となる。例えばＡＦ駆動信号が出力されたタイミングが符号７６で示す領域内の場合には、それ以降の領域（符号７７，７８で示す領域）では、音声信号にノイズ成分が重畳されていると推定することができる。 For example, when AF (autofocus) processing is executed while sound is acquired by the sound collection unit 43, an AF operation signal is output. In response to the output of the AF operation signal, the lens driving unit 18 is driven, and the lens constituting the imaging optical system 15 moves in the optical axis L direction. When the lens driving unit 18 is driven and the lens constituting the imaging optical system 15 is moved, the operation sound is generated. For this reason, the sound signal acquired by the sound collection unit 43 is a sound signal in which the target sound and the noise sound are mixed. For example, when the timing at which the AF drive signal is output is within the area indicated by reference numeral 76, it is estimated that the noise component is superimposed on the audio signal in the subsequent areas (areas indicated by reference numerals 77 and 78). Can do.

例えばノイズ成分が重畳された音声信号は、以下の（１）式で表すことができる。 For example, an audio signal on which a noise component is superimposed can be expressed by the following equation (1).

ｘ（ｔ）＝ｓ（ｔ）＋ｎ（ｔ）・・・（１）
ここで、ｘ（ｔ）は収音部４３により取得される音声信号、ｓ（ｔ）は目的音の音声信号、ｎ（ｔ）は動作音などのノイズ信号である。なお、これら信号は時間関数で示される。 x (t) = s (t) + n (t) (1)
Here, x (t) is an audio signal acquired by the sound collection unit 43, s (t) is an audio signal of a target sound, and n (t) is a noise signal such as an operation sound. These signals are shown as a time function.

上述したフーリエ変換を行うことで、目的音とノイズ音とが混合された音声信号は、時間関数で示される信号ｘ（ｔ）から、周波数関数で示される信号Ｘ（ｆ）に変換される。なお、ｆは周波数を示す。 By performing the Fourier transform described above, the audio signal in which the target sound and the noise sound are mixed is converted from the signal x (t) indicated by the time function into the signal X (f) indicated by the frequency function. Note that f indicates a frequency.

ここで、目的音の音声信号をＳｅ（ｆ）とすると、目的音の音声信号をＳｅ（ｆ）は以下に示す（２）式で表される。 Here, if the sound signal of the target sound is Se (f), Se (f) of the sound signal of the target sound is expressed by the following equation (2).

｜Ｓｅ（ｆ）｜＝｜Ｘ（ｆ）｜−α｜Ｎｅ（ｆ）｜・・・（２）
なお、Ｎｅ（ｆ）はノイズ信号、αは減算係数である。このαの値は、上述した（２）式を用いて目的音のみの音声信号を求める際に、減算するノイズ信号の大きさによっては、算出される目的成分の信号の周波数特性が変化してしまう場合や、減算後の音声信号にミュージカルノイズなどを人工的に重畳させてしまうこともある。このため、αの値としては、０．５〜４の値が用いられることが望ましい。 | Se (f) | = | X (f) | -α | Ne (f) | (2)
Note that Ne (f) is a noise signal, and α is a subtraction coefficient. The value of α depends on the magnitude of the noise signal to be subtracted when obtaining the audio signal of only the target sound using the above-described equation (2). In some cases, musical noise or the like may be artificially superimposed on the audio signal after subtraction. For this reason, as a value of (alpha), it is desirable to use the value of 0.5-4.

図４は、各信号（８１、８２、８３、８３’、８４、８４’）の周波数スペクトルを示しており、各周波数スペクトルのグラフは、横軸が周波数帯域、縦軸が音の強度（以下、「周波数スペクトルの値」ともいう）を示している。 FIG. 4 shows the frequency spectrum of each signal (81, 82, 83, 83 ′, 84, 84 ′). In the graph of each frequency spectrum, the horizontal axis is the frequency band, and the vertical axis is the sound intensity (hereinafter referred to as the sound intensity). , Also referred to as “frequency spectrum value”).

以下、図４に示されるように、取得された音声信号に対して符号８１を、ノイズ信号に対して符号８２を付して説明する。信号算出部６２は、記憶部６３に記憶されたノイズ信号８２を読み出した後、フーリエ変換された音声信号８１における周波数スペクトル８１ａ〜８１ｈから、ノイズ信号８２における周波数スペクトル８２ａ〜８２ｈをそれぞれ周波数帯域毎に減算する。この減算処理により、ノイズ成分が低減された低減処理済みの音声信号８３が生成される。 Hereinafter, as illustrated in FIG. 4, the acquired audio signal is denoted by reference numeral 81 and the noise signal is denoted by reference numeral 82. The signal calculation unit 62 reads out the noise signal 82 stored in the storage unit 63 and then converts the frequency spectrums 82a to 82h in the noise signal 82 from the frequency spectrum 81a to 81h in the Fourier-transformed audio signal 81 for each frequency band. Subtract to By this subtraction process, a reduced audio signal 83 with a reduced noise component is generated.

次に、信号補正部６４は、基準信号８４に対して係数βを乗算した信号（符号８４’）を生成する。この信号８４’を生成した後、信号補正部６４は、係数βが乗算された基準信号８４’の周波数スペクトルと低減処理済みの音声信号８３の周波数スペクトルとを周波数領域毎に比較する。すなわち、低減処理済みの音声信号８３の各周波数帯域における音の強度と、係数βが乗算された基準信号８４’の対応する周波数帯域における音の強度とを比較する。 Next, the signal correction unit 64 generates a signal (reference numeral 84 ′) obtained by multiplying the reference signal 84 by the coefficient β. After generating the signal 84 ′, the signal correction unit 64 compares the frequency spectrum of the reference signal 84 ′ multiplied by the coefficient β with the frequency spectrum of the audio signal 83 that has been subjected to the reduction process for each frequency region. That is, the sound intensity in each frequency band of the reduced audio signal 83 is compared with the sound intensity in the corresponding frequency band of the reference signal 84 ′ multiplied by the coefficient β.

例えば低減処理済みの音声信号８３におけるある周波数帯域の音の強度が、係数βが乗算された基準信号８４’における対応する周波数帯域の音の強度の値未満となる場合には、低減処理済みの音声信号８３におけるその周波数帯域の音の強度の値を、係数βが乗算された基準信号８４’における対応する周波数帯域の音の強度の値に置換する。 For example, when the sound intensity of a certain frequency band in the audio signal 83 that has been subjected to reduction processing is less than the value of the sound intensity of the corresponding frequency band in the reference signal 84 ′ multiplied by the coefficient β, The sound intensity value in the frequency band in the audio signal 83 is replaced with the sound intensity value in the corresponding frequency band in the reference signal 84 ′ multiplied by the coefficient β.

同様にして、低減処理済みの音声信号８３におけるある周波数帯域の音の強度が、係数βが乗算された基準信号８４’における対応する周波数帯域の音の強度の値を超過する場合には、低減処理済みの音声信号８３におけるその周波数帯域の音の強度の値を、係数βが乗算された基準信号８４’における対応する周波数帯域の音の強度の値に置換する。 Similarly, if the sound intensity in a certain frequency band in the audio signal 83 subjected to the reduction process exceeds the value of the sound intensity in the corresponding frequency band in the reference signal 84 ′ multiplied by the coefficient β, the sound signal 83 is reduced. The sound intensity value in the frequency band in the processed audio signal 83 is replaced with the sound intensity value in the corresponding frequency band in the reference signal 84 ′ multiplied by the coefficient β.

なお、図４は、低減処理済みの音声信号８３の周波数スペクトル８３ｅの値（音の強度）が、係数βが乗算された基準信号８４’の周波数スペクトル８４’ｅの値未満となる場合を示している。この場合、信号補正部６４は、低減処理済みの音声信号８３の周波数スペクトル８３ｅの値を、係数βが乗算された基準信号８４’における各周波数帯域の周波数スペクトル８４’ｅの値に置換する。 FIG. 4 shows a case where the value (sound intensity) of the frequency spectrum 83e of the audio signal 83 subjected to the reduction process is less than the value of the frequency spectrum 84′e of the reference signal 84 ′ multiplied by the coefficient β. ing. In this case, the signal correction unit 64 replaces the value of the frequency spectrum 83e of the reduced audio signal 83 with the value of the frequency spectrum 84'e of each frequency band in the reference signal 84 'multiplied by the coefficient β.

これにより、補正処理が施された音声信号（補正済みの音声信号）８３’が生成される。この補正済みの音声信号８３’は、周波数逆変換部６５による逆フーリエ変換等により、周波数関数で示される信号から時間関数で示される信号に変換される。なお、各フレームの周波数領域信号は、０．５フレーム分ずらしながらのフーリエ変換処理により生成されることから、周波数逆変換部６５により逆フーリエ変換処理が施された時間関数で示される音声信号は、０．５フレームずらしながらつなぎ合わされる。 As a result, an audio signal (corrected audio signal) 83 ′ subjected to the correction process is generated. The corrected audio signal 83 ′ is converted from a signal represented by a frequency function into a signal represented by a time function by inverse Fourier transform or the like by the frequency inverse transform unit 65. Since the frequency domain signal of each frame is generated by the Fourier transform process while shifting by 0.5 frames, the audio signal indicated by the time function subjected to the inverse Fourier transform process by the frequency inverse transform unit 65 is , The images are joined together while shifting by 0.5 frame.

このように、音声信号からカメラ内部の機構が駆動したときに発生する動作音の成分を減算することで、動作音の影響を低減した音声信号を生成した後、この動作音の影響を低減した音声信号を目的音のみの音声信号に基づいて補正している。このため、本来の動作音とは異なる特性の音からなる動作音をノイズ音として推定された場合に生じるミュージカルノイズの発生を防止することができる。このように本実施形態においては、取得された音声信号に含まれるノイズ音を適切に低減することができる。 In this way, by subtracting the component of the operation sound that occurs when the internal mechanism of the camera is driven from the sound signal, after generating the sound signal that reduces the effect of the operation sound, the effect of this operation sound is reduced. The sound signal is corrected based on the sound signal of only the target sound. For this reason, it is possible to prevent the occurrence of musical noise that occurs when an operation sound having a characteristic different from that of the original operation sound is estimated as a noise sound. Thus, in this embodiment, the noise sound contained in the acquired audio | voice signal can be reduced appropriately.

本実施形態では、ノイズ音としての動作音の発生の要件として、ＡＦ駆動信号が出力されるタイミングを挙げているが、この他に、ズームボタンなどの操作部が操作されたときに出力される操作信号など、デジタルカメラに設けられた操作部の操作信号が出力されるタイミングや、絞り値を変更する際に駆動する絞り駆動部の駆動信号や、手ブレ補正機能を備えたデジタルカメラの場合には手ブレ補正処理の開始信号が出力されるタイミングなどが挙げられる。 In the present embodiment, the timing for outputting the AF drive signal is given as a requirement for the generation of the operation sound as the noise sound, but in addition to this, it is output when an operation unit such as a zoom button is operated. In the case of a digital camera equipped with a camera shake correction function, the timing at which the operation signal of the operation unit provided in the digital camera is output, the drive signal of the aperture drive unit that is driven when changing the aperture value, etc. Includes a timing at which a start signal of camera shake correction processing is output.

本実施形態では、周波数帯域毎の周波数スペクトルの比較において、低減処理済みの音声信号８３におけるある周波数帯域における周波数スペクトルの値（音の強度）が、係数βが乗算された基準信号８４’の対応する周波数帯域における周波数スペクトルの値未満となる場合に、低減処理済みの音声信号におけるその周波数帯域の周波数スペクトルの値を、係数βが乗算された基準信号の対応する周波数帯域の周波数スペクトルの値に置換しているが、これに限定される必要はない。 In the present embodiment, in the comparison of the frequency spectrum for each frequency band, the value of the frequency spectrum (sound intensity) in a certain frequency band in the reduced audio signal 83 corresponds to the reference signal 84 ′ multiplied by the coefficient β. The frequency spectrum value of that frequency band in the reduced audio signal is changed to the frequency spectrum value of the corresponding frequency band of the reference signal multiplied by the coefficient β. Although it is substituted, it need not be limited to this.

例えば、低減処理済みの音声信号におけるある周波数帯域における周波数スペクトルの値が、係数βが乗算された基準信号の対応する周波数帯域の周波数スペクトルの値未満となる場合には、低減処理済みの音声信号におけるその周波数帯域における周波数スペクトルの値を、係数βが乗算された基準信号の対応する周波数帯域の周波数スペクトルの値を上回る値に補正してもよい。この場合、予め隣り合う周波数帯域の周波数スペクトルの値の比を、低減処理済みの音声信号８３と係数βが乗算された基準信号８４’とのそれぞれで算出し、隣り合う周波数帯域の周波数スペクトルの値の比が一致するように、低減処理済みの音声信号８３における周波数スペクトルの値を補正すればよい。 For example, when the value of the frequency spectrum in a certain frequency band in the reduced audio signal is less than the value of the frequency spectrum in the corresponding frequency band of the reference signal multiplied by the coefficient β, the reduced audio signal The frequency spectrum value in that frequency band may be corrected to a value that exceeds the value of the frequency spectrum in the corresponding frequency band of the reference signal multiplied by the coefficient β. In this case, the ratio of the frequency spectrum values of the adjacent frequency bands is calculated in advance for each of the reduced audio signal 83 and the reference signal 84 ′ multiplied by the coefficient β. What is necessary is just to correct | amend the value of the frequency spectrum in the audio | voice signal 83 after the reduction process so that ratio of values may correspond.

また、低減処理済みの音声信号におけるある周波数帯域の周波数スペクトルの値が、係数βが乗算された基準信号の対応する周波数帯域の周波数スペクトルの値を超過している場合も同様であり、低減処理済みの音声信号におけるその周波数帯域の周波数スペクトルの値を、係数βが乗算された基準信号の対応する周波数帯域の周波数スペクトルの値に置換せずに、低減処理済みの音声信号におけるその周波数帯域の周波数スペクトルの値を、係数βが乗算された基準信号の対応する周波数帯域の周波数スペクトルの値を下回る値に補正してもよい。隣合う周波数帯域の周波数スペクトルの値の比が一致するように補正するのも同様に行えばよい。 The same applies to the case where the value of the frequency spectrum of a certain frequency band in the audio signal that has been subjected to reduction processing exceeds the value of the frequency spectrum of the corresponding frequency band of the reference signal multiplied by the coefficient β. Without replacing the value of the frequency spectrum of that frequency band in the already-processed audio signal with the value of the frequency spectrum of the corresponding frequency band of the reference signal multiplied by the coefficient β. The value of the frequency spectrum may be corrected to a value lower than the value of the frequency spectrum of the corresponding frequency band of the reference signal multiplied by the coefficient β. The correction may be performed in the same manner so that the ratios of the frequency spectrum values of the adjacent frequency bands match.

本実施形態では、基準信号の各周波数帯域の周波数スペクトルに対して係数βを乗算し、係数βが乗算された基準信号における周波数帯域毎の周波数スペクトルと、低減処理済みの音声信号における周波数帯域毎の周波数スペクトルを、周波数帯域毎に比較し、比較の結果、周波数スペクトルの値が異なる周波数帯域を補正対象の周波数帯域とし、低減処理済みの音声信号におけるその補正対象の周波数帯域の周波数スペクトルを補正する。そして、低減処理済みの音声信号における補正対象の周波数帯域の周波数スペクトルを補正する際に、係数βが乗算された基準信号における対応する周波数帯域（補正対象の周波数帯域）の周波数スペクトルに基づいて、補正を行っている。しかしながら、周波数スペクトルの補正のしかたは、これに限定されるものではなく、基準信号における周波数帯域のうち、補正対象の周波数帯域を含む複数の周波数帯域（例えば、補正対象の周波数帯域と隣り合う周波数帯域）の周波数スペクトルの値の単純平均値、加重平均値、最大値或いは最小値のいずれかを求め、この求めた値を制限値として、この制限値に基づいて低減処理済みの音声信号における周波数スペクトルのうち、補正対象の周波数帯域の周波数スペクトルの値の補正を行うことも可能である。以下、単純平均値を用いる場合について説明する。なお、単純平均値を求める際に用いる周波数スペクトルの数は適宜設定してよいものとする。 In the present embodiment, the frequency spectrum of each frequency band of the reference signal is multiplied by the coefficient β, the frequency spectrum for each frequency band in the reference signal multiplied by the coefficient β, and the frequency band in the reduced audio signal Are compared for each frequency band, and as a result of comparison, frequency bands with different frequency spectrum values are set as the frequency band to be corrected, and the frequency spectrum of the frequency band to be corrected in the reduced audio signal is corrected. To do. Then, when correcting the frequency spectrum of the frequency band to be corrected in the reduced audio signal, based on the frequency spectrum of the corresponding frequency band (frequency band to be corrected) in the reference signal multiplied by the coefficient β, Correction is being performed. However, the method of correcting the frequency spectrum is not limited to this, and among the frequency bands in the reference signal, a plurality of frequency bands including the frequency band to be corrected (for example, frequencies adjacent to the frequency band to be corrected). Frequency) of the frequency spectrum of the band) is calculated as a simple average value, a weighted average value, a maximum value or a minimum value, and the obtained value is used as a limit value, and the frequency in the audio signal subjected to reduction processing based on the limit value Of the spectrum, it is also possible to correct the value of the frequency spectrum of the frequency band to be corrected. Hereinafter, a case where a simple average value is used will be described. It should be noted that the number of frequency spectra used when obtaining the simple average value may be set as appropriate.

図５に示すように、例えば低減処理済みの音声信号８３における各周波数帯域の周波数スペクトルのうち、周波数スペクトル８３ｅにおける周波数帯域が補正対象の周波数帯域である場合について説明する。信号補正部６４は、基準信号における各周波数帯域の周波数スペクトルのうち、周波数スペクトル８３ｅにおける周波数帯域を含む複数の周波数帯域の周波数スペクトル（例えば周波数スペクトル８６ｄ、８６ｅ、８６ｆ）の値を読み出し、これら周波数スペクトルの単純平均値を算出する。そして、信号補正部６４は算出した単純平均値に、係数σを乗算することで制限値を算出する。次に、信号補正部６４は、算出した制限値と、対象の周波数帯域の周波数スペクトルの値とを比較する。 As illustrated in FIG. 5, for example, a case will be described in which the frequency band in the frequency spectrum 83e is the frequency band to be corrected among the frequency spectrums in the respective frequency bands in the audio signal 83 that has been subjected to reduction processing. The signal correction unit 64 reads out values of a plurality of frequency bands (for example, frequency spectra 86d, 86e, 86f) including a frequency band in the frequency spectrum 83e among the frequency spectrums of the respective frequency bands in the reference signal, and these frequencies. Calculate the simple average of the spectrum. Then, the signal correction unit 64 calculates the limit value by multiplying the calculated simple average value by the coefficient σ. Next, the signal correction unit 64 compares the calculated limit value with the value of the frequency spectrum of the target frequency band.

上述した比較において、例えば周波数スペクトル８３ｅの値が、算出した制限値未満となる場合や、算出した制限値を超過する場合には、信号補正部６４は、周波数スペクトル８３ｅの値を制限値に置換する。 In the comparison described above, for example, when the value of the frequency spectrum 83e is less than the calculated limit value or exceeds the calculated limit value, the signal correction unit 64 replaces the value of the frequency spectrum 83e with the limit value. To do.

本実施形態では、基準信号の各周波数帯域の周波数スペクトルに対して係数βを乗算した値を各周波数帯域の制限値とし、この制限値を上限にした補正や、この制限値を下限にした補正行っている。しかしながら、基準信号の各周波数帯域の周波数スペクトルに対して係数βを乗算した値を制限値ではなく目標値として設定した上で、低減処理済みの音声信号の周波数スペクトルのうち、補正対象とする周波数スペクトルの値を目標値に近づけるように補正することも可能である。 In the present embodiment, a value obtained by multiplying the frequency spectrum of each frequency band of the reference signal by a coefficient β is a limit value of each frequency band, and a correction with this limit value as the upper limit, or a correction with this limit value as the lower limit. Is going. However, after setting the value obtained by multiplying the frequency spectrum of each frequency band of the reference signal by the coefficient β as the target value instead of the limit value, the frequency to be corrected out of the frequency spectrum of the reduced audio signal It is also possible to correct the spectrum value so as to approach the target value.

本実施形態では、動画像撮影の際に取得される音声信号を例に挙げて説明しているが、これに限定される必要はなく、例えば音声信号のみを取得する場合にも適応できる。つまり、録音機能を有する電子機器であれば、本発明を適用することが可能である。また、動画撮影を行う装置としてデジタルカメラを例に挙げて説明しているが、この他に、携帯電話機や、ＰＤＡなどの携帯型端末機であってもよい。さらに、図２で示す信号処理装置の各機能をコンピュータにて実行させることが可能なプログラムであってもよい。この場合、該プログラムは、メモリカード、光学ディスク、磁気ディスクなどのコンピュータで読み取り可能な記憶媒体に記憶されていることが好ましい。 In the present embodiment, the audio signal acquired at the time of moving image shooting is described as an example. However, the present invention is not limited to this, and can be applied to the case of acquiring only an audio signal, for example. That is, the present invention can be applied to any electronic device having a recording function. In addition, although a digital camera has been described as an example of a device that performs moving image shooting, a mobile terminal such as a mobile phone or a PDA may be used. Furthermore, the program may be a program that allows a computer to execute the functions of the signal processing device shown in FIG. In this case, the program is preferably stored in a computer-readable storage medium such as a memory card, an optical disk, or a magnetic disk.

１０…デジタルカメラ、１５…撮像光学系、１６…撮像素子、１８…レンズ駆動部、１９…絞り、２０…絞り駆動部、２１…シャッタ、２２…シャッタ駆動部、３６…記憶媒体、４３…収音部、４４…音声メモリ、４５…信号処理装置、６１…周波数変換部、６２…信号算出部、６３…記録部、６４…信号補正部、６５…信号逆変換部 DESCRIPTION OF SYMBOLS 10 ... Digital camera, 15 ... Imaging optical system, 16 ... Imaging device, 18 ... Lens drive part, 19 ... Aperture, 20 ... Aperture drive part, 21 ... Shutter, 22 ... Shutter drive part, 36 ... Storage medium, 43 ... Collection Sound part 44 ... Audio memory 45 ... Signal processing device 61 ... Frequency conversion part 62 ... Signal calculation part 63 ... Recording part 64 ... Signal correction part 65 ... Signal inverse conversion part

Claims

A plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width are second audio signals represented by a frequency function. Signal converting means for converting to
Calculating means for obtaining a third sound signal in which the influence of the operation sound is reduced from the second sound signal and the sound signal indicating the operation sound;
The frequency corresponding to the reference signal multiplied by the value of the frequency spectrum in each frequency band of the third audio signal and the coefficient , with the audio signal indicating the target sound of the plurality of second audio signals as a reference signal Comparing the value of the frequency spectrum in the band with the value of the frequency spectrum in the third audio signal different from the value of the corresponding frequency spectrum in the reference signal multiplied by the coefficient. Correction means for replacing the value of the frequency spectrum in the signal with the value of the corresponding frequency spectrum ;
Signal inverse conversion means for inversely converting the sound signal subjected to the correction processing by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
A signal processing apparatus comprising:

A plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width are second audio signals represented by a frequency function. Signal converting means for converting to
Calculating means for obtaining a third sound signal in which the influence of the operation sound is reduced from the second sound signal and the sound signal indicating the operation sound;
The frequency corresponding to the reference signal multiplied by the value of the frequency spectrum in each frequency band of the third audio signal and the coefficient, with the audio signal indicating the target sound of the plurality of second audio signals as a reference signal A frequency spectrum value in a band is compared, and if the frequency spectrum value in the third audio signal is less than the corresponding frequency spectrum value in the reference signal multiplied by the coefficient, Correction means for correcting the value of the frequency spectrum in the audio signal so as not to fall below the value of the corresponding frequency spectrum;
Signal inverse conversion means for inversely converting the sound signal subjected to the correction processing by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
Signal processing apparatus characterized by comprising a.

A plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width are second audio signals represented by a frequency function. Signal converting means for converting to
Calculating means for obtaining a third sound signal in which the influence of the operation sound is reduced from the second sound signal and the sound signal indicating the operation sound;
The frequency corresponding to the reference signal multiplied by the value of the frequency spectrum in each frequency band of the third audio signal and the coefficient, with the audio signal indicating the target sound of the plurality of second audio signals as a reference signal Comparing the value of the frequency spectrum in the band and if the value of the frequency spectrum in the third audio signal exceeds the corresponding frequency spectrum in the reference signal multiplied by the coefficient, the third audio signal Correction means for correcting the value of the frequency spectrum in so as not to exceed the value of the corresponding frequency spectrum;
Signal inverse conversion means for inversely converting the sound signal subjected to the correction processing by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
Signal processing apparatus characterized by comprising a.

A plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width are second audio signals represented by a frequency function. Signal converting means for converting to
Calculating means for obtaining a third sound signal in which the influence of the operation sound is reduced from the second sound signal and the sound signal indicating the operation sound;
The frequency corresponding to the reference signal multiplied by the value of the frequency spectrum in each frequency band of the third audio signal and the coefficient, with the audio signal indicating the target sound of the plurality of second audio signals as a reference signal Comparing the value of the frequency spectrum in the band with the value of the frequency spectrum in the third audio signal different from the value of the corresponding frequency spectrum in the reference signal multiplied by the coefficient. The value of the frequency spectrum in the signal is changed to a correction value calculated using the value of the corresponding frequency spectrum and the frequency spectrum of the frequency band in the vicinity of the frequency spectrum of the reference signal multiplied by the coefficient. Correction means to replace;
Signal inverse conversion means for inversely converting the sound signal subjected to the correction processing by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
Signal processing apparatus characterized by comprising a.

A plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width are second audio signals represented by a frequency function. Signal converting means for converting to
Calculating means for obtaining a third sound signal in which the influence of the operation sound is reduced from the second sound signal and the sound signal indicating the operation sound;
The frequency corresponding to the reference signal multiplied by the value of the frequency spectrum in each frequency band of the third audio signal and the coefficient, with the audio signal indicating the target sound of the plurality of second audio signals as a reference signal A frequency spectrum value in the band is compared, and if the frequency spectrum value in the third audio signal exceeds the corresponding frequency spectrum value in the reference signal multiplied by the coefficient, A correction value calculated from the frequency spectrum value of the audio signal using the corresponding frequency spectrum value and the frequency spectrum value of the frequency band in the vicinity of the frequency spectrum of the reference signal multiplied by the coefficient. Correction means for correcting so as not to exceed
Signal inverse conversion means for inversely converting the sound signal subjected to the correction processing by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
Signal processing apparatus characterized by comprising a.

A plurality of first audio signals for each predetermined time obtained by dividing an audio signal represented by a time function in which a target sound and an operation sound are mixed by a predetermined time width are second audio signals represented by a frequency function. Signal converting means for converting to
Calculating means for obtaining a third sound signal in which the influence of the operation sound is reduced from the second sound signal and the sound signal indicating the operation sound;
The frequency corresponding to the reference signal multiplied by the value of the frequency spectrum in each frequency band of the third audio signal and the coefficient, with the audio signal indicating the target sound of the plurality of second audio signals as a reference signal A frequency spectrum value in a band is compared, and if the frequency spectrum value in the third audio signal is less than the corresponding frequency spectrum value in the reference signal multiplied by the coefficient, A correction value calculated from the frequency spectrum value of the audio signal using the corresponding frequency spectrum value and the frequency spectrum value of the frequency band in the vicinity of the frequency spectrum of the reference signal multiplied by the coefficient. Correction means for correcting so as not to fall below
Signal inverse conversion means for inversely converting the sound signal subjected to the correction processing by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
Signal processing apparatus characterized by comprising a.

The signal processing device according to any one of claims 4 to 6 ,
The correction value is an average of the corresponding frequency spectrum value and the frequency spectrum value of the neighboring frequency band, the maximum value of the corresponding frequency spectrum value and the frequency spectrum value of the neighboring frequency band, Alternatively , the signal processing circuit comprises any one of the minimum value of the corresponding frequency spectrum value and the frequency spectrum value of the frequency band in the vicinity thereof .

Imaging means for obtaining an image signal;
  Sound collection means for acquiring an audio signal in synchronization with acquisition of the image signal by the imaging means;
  The signal processing device according to any one of claims 1 to 7,
  Storage means for storing the image signal acquired by the imaging means and the audio signal subjected to signal processing by the signal processing device;
  An imaging apparatus comprising: