JP5246134B2

JP5246134B2 - Signal processing apparatus and imaging apparatus

Info

Publication number: JP5246134B2
Application number: JP2009248954A
Authority: JP
Inventors: 豪松本; 光宏岡崎
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2009-10-29
Filing date: 2009-10-29
Publication date: 2013-07-24
Anticipated expiration: 2029-10-29
Also published as: JP2011095478A

Description

本発明は、音声信号に含まれるノイズ信号を低減する信号処理装置及び撮像装置に関する。 The present invention relates to a signal processing device and an imaging device that reduce a noise signal included in an audio signal.

目的の音とノイズ音とが混合された音声信号からノイズ音に基づく成分（ノイズ成分）を低減する方法として、取得される音声信号からノイズ音を推定し、推定されたノイズ音の信号（以下、ノイズ信号）を音声信号から減算することが一般的である（例えば特許文献１参照）。 As a method of reducing the component (noise component) based on the noise sound from the sound signal in which the target sound and the noise sound are mixed, the noise sound is estimated from the acquired sound signal, and the estimated noise sound signal (hereinafter referred to as the noise signal) In general, the noise signal is subtracted from the audio signal (see, for example, Patent Document 1).

特開２００５−１９５９５５号公報JP 2005-195955 A

このような方法では、予めノイズ音を取得している場合やノイズ音が周期的な音からなる場合には、ノイズ音の大きさやノイズ音が含まれるタイミングは容易に推定できるので、音声信号からノイズ信号を適切に低減することができる。しかしながら、装置内部の各種機構などが駆動したときの音（以下、動作音）をノイズ音とした場合、上述した動作音は発生するタイミングが不定期であることから音声信号に含まれるノイズ音を推定することは難しい。さらに、目的の音の大きさが変化する音声信号の場合には、このノイズ信号の推定は更に困難となることから、ノイズ信号を減算した後の音声信号にミュージカルノイズと呼ばれるノイズ成分が含まれてしまう。 In such a method, when the noise sound is acquired in advance or when the noise sound is a periodic sound, the size of the noise sound and the timing including the noise sound can be easily estimated. The noise signal can be appropriately reduced. However, when the sound generated when various mechanisms in the device are driven (hereinafter referred to as operation sound) is a noise sound, the above-mentioned operation sound is generated at irregular timings, so that the noise sound included in the audio signal is not generated. It is difficult to estimate. Furthermore, in the case of an audio signal whose target sound volume changes, it is more difficult to estimate this noise signal. Therefore, the audio signal after subtracting the noise signal contains a noise component called musical noise. End up.

本発明は、音声信号自体の大きさが変化する場合であっても、音声信号に含まれる動作音の影響を有効に低減することができるようにした信号処理装置及び撮像装置を提供することを目的とする。 The present invention provides a signal processing device and an imaging device that can effectively reduce the influence of an operation sound included in an audio signal even when the size of the audio signal itself changes. Objective.

上述した課題を解決するために、本発明の信号処理装置は、時間関数で示される音声信号を所定時間間隔で分割した複数の第１の音声信号を、周波数関数で示される第２の音声信号にそれぞれ変換する信号変換手段と、複数の前記第２の音声信号のうち、動作音が混合された第２の音声信号と、前記動作音を示す音声信号とから、前記動作音の影響を低減した第３の音声信号を求める信号算出手段と、複数の前記第２の音声信号のうち、前記動作音を発する動作を実行するための動作信号が未出力のときに得られた、前記動作音が混合されていない第２の音声信号を基準信号として、前記基準信号の大きさと前記動作音が混合された第２の音声信号の大きさとの比率を求める比率算出手段と、前記第３の音声信号と前記比率が乗算された前記基準信号との周波数特性を比較し、前記比較した結果が予め定められた範囲でない場合に、前記比率が乗算された前記基準信号の周波数特性に近づけるように前記第３の音声信号の周波数特性を補正する補正手段と、前記補正手段により補正された前記第３の音声信号を、前記周波数関数で示される音声信号から、前記時間関数で示される音声信号に逆変換する信号逆変換手段と、を備えたことを特徴とする。 In order to solve the above-described problem, the signal processing apparatus of the present invention provides a plurality of first audio signals obtained by dividing an audio signal indicated by a time function at predetermined time intervals, and second audio signals indicated by a frequency function. a signal conversion means for converting each of the plurality of said second audio signal, and a second audio signal dynamic Sakuon are mixed, and a voice signal indicative of the operation sound, the effect of the operation sound a third signal calculating means for obtaining an audio signal with reduced out of the plurality of said second audio signal, operation signal for executing the operation to emit pre Symbol operation sound obtained when the non-output, the Ratio calculating means for obtaining a ratio between the magnitude of the reference signal and the magnitude of the second audio signal mixed with the operation sound, using the second audio signal not mixed with the operation sound as a reference signal; The audio signal multiplied by the ratio The frequency characteristic of the third audio signal is compared with the frequency characteristic of the reference signal multiplied by the ratio when the comparison result is not within a predetermined range. Correction means for correcting; and signal inverse conversion means for inversely converting the third audio signal corrected by the correction means from the audio signal indicated by the frequency function to the audio signal indicated by the time function. It is characterized by having.

また、前記補正手段は、前記第３の音声信号と前記比率が乗算された前記基準信号とで少なくとも１つの周波数帯域の周波数スペクトルの値が予め定められた範囲でない場合、前記少なくとも１つの周波数帯域の周波数スペクトルの値を前記比率が乗算された前記基準信号に近づけるように前記第３の音声信号を補正するものである。
また、前記補正手段は、特定の周波数帯域の周波数スペクトルの値と全周波数帯域のうち少なくとも１つの周波数帯域の周波数スペクトルから得られる値との比を、前記第３の音声信号及び前記比率が乗算された前記基準信号からそれぞれ求め、前記第３の音声信号から得られる比が、前記比率が乗算された前記基準信号から得られる比に約等しくなるように、前記特定の周波数帯域の周波数スペクトルの値を補正するものである。 In addition, the correction means may include the at least one frequency band when a value of a frequency spectrum of at least one frequency band is not in a predetermined range between the third audio signal and the reference signal multiplied by the ratio. The third audio signal is corrected so that the value of the frequency spectrum becomes closer to the reference signal multiplied by the ratio.
Further, the correcting means multiplies the third audio signal and the ratio by a ratio between a frequency spectrum value of a specific frequency band and a value obtained from the frequency spectrum of at least one frequency band among all frequency bands. Obtained from the reference signal, and the ratio obtained from the third audio signal is approximately equal to the ratio obtained from the reference signal multiplied by the ratio. The value is corrected.

また、前記補正手段は、前記特定の周波数帯域の周波数スペクトルの値と、前記全周波数帯域の周波数スペクトルの値の総和又は前記全周波数帯域の周波数スペクトルの平均値のいずれか一方の値との比を、前記第３の音声信号及び前記比率が乗算された前記基準信号からそれぞれ求めるものである。Further, the correction means is a ratio between the value of the frequency spectrum of the specific frequency band and either the sum of the values of the frequency spectrum of the entire frequency band or the average value of the frequency spectrum of the entire frequency band. Are obtained from the third audio signal and the reference signal multiplied by the ratio, respectively.
また、前記補正手段は、前記特定の周波数帯域の周波数スペクトルの値と、前記特定の周波数帯域及び前記特定の周波数帯域の近傍の周波数帯域の周波数スペクトルの値の総和、又は前記特定の周波数帯域及び前記特定の周波数帯域の近傍の周波数帯域の周波数スペクトルの平均値のいずれか一方の値との比を、前記第３の音声信号及び前記比率が乗算された前記基準信号から求めるものである。 In addition, the correction means includes a sum of a frequency spectrum value of the specific frequency band and a frequency spectrum value of the specific frequency band and a frequency band in the vicinity of the specific frequency band, or the specific frequency band and A ratio with any one of average values of frequency spectra in a frequency band in the vicinity of the specific frequency band is obtained from the third audio signal and the reference signal multiplied by the ratio.

また、本発明の撮像装置は、画像信号を取得する撮像手段と、前記撮像手段による前記画像信号の取得に同期して、音声信号を取得する収音手段と、上述した信号処理装置のいずれかと、前記撮像手段により取得された画像信号と、前記信号処理装置により信号処理が施された音声信号とを記憶する記憶手段と、を備えたことを特徴とする。 In addition, an imaging apparatus of the present invention includes an imaging unit that acquires an image signal, a sound collection unit that acquires an audio signal in synchronization with the acquisition of the image signal by the imaging unit, and any one of the signal processing devices described above. And a storage means for storing the image signal acquired by the imaging means and the audio signal subjected to the signal processing by the signal processing device.

本発明によれば、音声信号自体の大きさが変化する場合であっても音声信号に重畳される動作音の影響を有効に低減することができる。 According to the present invention, it is possible to effectively reduce the influence of the operation sound superimposed on the audio signal even when the magnitude of the audio signal itself changes.

デジタルカメラの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a digital camera. 信号処理装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of a signal processing apparatus. 取得される音声信号、窓関数、フレーム分割及びＡＦ動作信号の関係を示す図である。It is a figure which shows the relationship between the audio | voice signal acquired, a window function, a frame division | segmentation, and AF operation signal. 信号処理装置における信号処理の流れを示す図である。It is a figure which shows the flow of the signal processing in a signal processing apparatus.

図１は、デジタルカメラ１０の構成を示す機能ブロック図である。周知のように、デジタルカメラ１０は、撮像光学系１５によって取り込まれた被写体光を撮像素子１６によって光電変換し、光電変換後の電気信号（画像信号）から画像データを取得する。 FIG. 1 is a functional block diagram showing the configuration of the digital camera 10. As is well known, the digital camera 10 photoelectrically converts the subject light captured by the imaging optical system 15 by the imaging device 16 and acquires image data from the electrical signal (image signal) after the photoelectric conversion.

撮像光学系１５は、複数のレンズから構成される。この撮像光学系１５を構成する各レンズは、ズーム倍率の変更時やフォーカス調整時に、レンズ駆動部１８の駆動により光軸Ｌに沿って移動する。この撮像光学系１５には、絞り１９が設けられる。この絞り１９は絞り開口の大きさを変化させることで、撮像素子１６に向けて入射する被写体光の光量を変化させる。この絞り１９は、予め設定された絞り値となるように、その絞り開口の大きさが絞り駆動部２０により変更される。 The imaging optical system 15 includes a plurality of lenses. The lenses constituting the imaging optical system 15 move along the optical axis L by driving the lens driving unit 18 when changing the zoom magnification or adjusting the focus. The imaging optical system 15 is provided with a diaphragm 19. This diaphragm 19 changes the amount of subject light incident on the image sensor 16 by changing the size of the aperture. The aperture of the aperture 19 is changed by the aperture driver 20 so that the aperture value is set in advance.

シャッタ２１は、撮像光学系１５と撮像素子１６との間に配置される。このシャッタ２１は、撮像光学系１５を介して取り込まれる被写体光を撮像素子１６に照射させる開放状態と、該被写体光を遮光する遮光状態との間で切り替えられる。なお、撮影時にはシャッタ２１は、一旦遮光状態に保持された後、開放状態に切り替えられる。そして、シャッタ２１が開放状態に切り替えられてから予め設定された時間経過すると、再度遮光状態に切り替えられる。なお、このシャッタ２１における遮光状態と開放状態との間の切り替えは、シャッタ駆動部２２により実行される。 The shutter 21 is disposed between the imaging optical system 15 and the imaging element 16. The shutter 21 can be switched between an open state in which the imaging element 16 is irradiated with subject light captured via the imaging optical system 15 and a light shielding state in which the subject light is shielded. At the time of photographing, the shutter 21 is once held in a light shielding state and then switched to an open state. Then, when a preset time elapses after the shutter 21 is switched to the open state, the shutter 21 is switched to the light shielding state again. Note that switching between the light shielding state and the open state of the shutter 21 is performed by the shutter driving unit 22.

撮像素子１６は、例えばＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）やＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ−ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）などから構成される。撮像素子１６は、撮像光学系１５によって取り込まれる被写体光を受光し、受光した光量を信号電荷に変換（光電変換）して、変換した信号電荷を蓄積する。その後、撮像素子１６にて蓄積された信号電荷は、ＡＦＥ（ＡｎａｌｏｇＦｒｏｎｔＥｎｄ）回路２５に出力される。 The image pickup device 16 is configured by, for example, a charge coupled device (CCD), a complementary metal-oxide semiconductor (CMOS), or the like. The image sensor 16 receives subject light captured by the imaging optical system 15, converts the received light amount into signal charges (photoelectric conversion), and accumulates the converted signal charges. Thereafter, the signal charge accumulated in the image sensor 16 is output to an AFE (Analog Front End) circuit 25.

ＡＦＥ回路２５は、図示しないＡＧＣ回路、ＣＤＳ回路及びＡ／Ｄ変換回路を含んで構成される。ＡＦＥ回路２５は、入力された画像信号に対してゲインコントロール、雑音除去など処理を施す。これら処理の後、ＡＦＥ回路２５は、アナログの画像信号をデジタルの画像信号に変換する。このデジタルの画像信号は１コマ毎にまとめられて画像メモリ３０に記録される。なお、撮像素子１６やＡＦＥ回路２５は、図示を省略したタイミングジェネレータによって、その駆動タイミングが制御される。 The AFE circuit 25 includes an AGC circuit, a CDS circuit, and an A / D conversion circuit (not shown). The AFE circuit 25 performs processing such as gain control and noise removal on the input image signal. After these processes, the AFE circuit 25 converts the analog image signal into a digital image signal. The digital image signals are collected for each frame and recorded in the image memory 30. The drive timing of the image sensor 16 and the AFE circuit 25 is controlled by a timing generator (not shown).

画像処理回路３５は、画像メモリ３０に記憶された画像信号に対して、ホワイトバランス処理、色補間処理、輪郭補償処理、ガンマ処理などの画像処理を施す。これら画像処理の後、画像処理回路３５は、例えばＪＰＥＧ方式などの記憶方式で圧縮するためのフォーマット処理を施す。また、画像処理回路３５は、画像データに対して符号化処理や復号化処理を行う。なお、符号３７は記録用Ｉ／Ｆである。 The image processing circuit 35 performs image processing such as white balance processing, color interpolation processing, contour compensation processing, and gamma processing on the image signal stored in the image memory 30. After these image processes, the image processing circuit 35 performs a format process for compression using a storage system such as the JPEG system. The image processing circuit 35 performs encoding processing and decoding processing on the image data. Reference numeral 37 denotes a recording I / F.

ＬＣＤ３８は、デジタルカメラ１０にて取得された画像、記憶媒体３６に記憶された画像、撮影待機状態時に取り込まれるスルー画像を表示する他に、撮影条件等の設定や設定変更を行う際の設定画像などを表示する。デジタルカメラ１０にて取得された画像や記憶媒体３６に記憶された画像としては、静止画像の他に動画像が挙げられる。このＬＣＤ３８は、表示制御回路３９により制御される。スピーカ４０は、動画像をＬＣＤ３８に表示する際に動画像に対応付けられた音声などを出力する。このスピーカ４０における音声の出力制御は、音声制御回路４１により実行される。 The LCD 38 displays an image acquired by the digital camera 10, an image stored in the storage medium 36, a through image captured in a shooting standby state, and a setting image for setting or changing a shooting condition. Etc. are displayed. Examples of the image acquired by the digital camera 10 and the image stored in the storage medium 36 include moving images in addition to still images. The LCD 38 is controlled by a display control circuit 39. The speaker 40 outputs sound or the like associated with the moving image when the moving image is displayed on the LCD 38. The sound output control in the speaker 40 is executed by the sound control circuit 41.

収音部４３は、例えばマイクから構成され、例えば動画撮影や録音時の音声を取得する。この収音部４３により取得される音声信号は、音声メモリ４４に記録される。 The sound collection unit 43 is constituted by, for example, a microphone, and acquires, for example, sound during moving image shooting or recording. The audio signal acquired by the sound collection unit 43 is recorded in the audio memory 44.

信号処理装置４５は、収音部４３にて取得される音声信号に対して、該音声信号に含まれるノイズ音を低減する処理を施す。なお、ノイズ音としては、動画撮影や録音時に、デジタルカメラ１０の内部に設けられた各機構が駆動されるときに発生する動作音や、各種機構を駆動させる際に操作される操作音などが挙げられる。また、信号処理装置４５は、取得された音声信号（上述したノイズ音を低減する処理が施された音声信号を含む）に対する圧縮符号化する処理や、圧縮符号化する処理が施された音声信号を復号化する処理を行う。 The signal processing device 45 performs processing for reducing the noise sound included in the audio signal with respect to the audio signal acquired by the sound collection unit 43. Note that the noise sound includes an operation sound that is generated when each mechanism provided in the digital camera 10 is driven at the time of moving image recording or recording, and an operation sound that is operated when various mechanisms are driven. Can be mentioned. In addition, the signal processing device 45 compresses and encodes the acquired audio signal (including the audio signal subjected to the noise noise reduction process described above) and the audio signal subjected to the compression encoding process. The process which decodes is performed.

ＣＰＵ５０は、内蔵メモリ５１に記憶された制御プログラム（図示省略）を実行することで、デジタルカメラ１０の各部を統括的に制御する。このＣＰＵ５０における制御としては、レリーズボタン５２の操作に基づいた制御や、設定操作部５３の操作に基づいた制御が挙げられる。レリーズボタン５２の操作に基づいた制御としては、周知のＡＥ処理やＡＦ処理、撮像処理などが挙げられる。また、設定操作部５３の操作に基づいた制御としては、初期設定や撮影条件の設定等の処理が挙げられる。 The CPU 50 comprehensively controls each unit of the digital camera 10 by executing a control program (not shown) stored in the built-in memory 51. Examples of the control in the CPU 50 include control based on the operation of the release button 52 and control based on the operation of the setting operation unit 53. Examples of the control based on the operation of the release button 52 include known AE processing, AF processing, imaging processing, and the like. Control based on the operation of the setting operation unit 53 includes processing such as initial setting and setting of shooting conditions.

また、ＣＰＵ５０は、撮影時に取得される画像データを記憶媒体３６に書き込む処理を行う。例えば、静止画撮影を行った場合には、ＣＰＵ５０は、画像処理回路３５にて符号化処理が施された画像データを、デジタルカメラ１０の機種情報や撮影時の撮影情報などと１つのファイル（静止画像ファイル）としてまとめて記憶媒体３６に書き込む。同様にして、動画撮影を行った場合には動画撮影にて取得された各フレーム画像データが画像処理回路３５にて符号化処理が施されるので、これら符号化処理が施された各フレーム画像データと、信号処理回路４５にて信号処理が施された音声データとを、デジタルカメラ１０の機種情報や撮影時の撮影情報などと１つのファイル（動画像ファイル）としてまとめて記憶媒体３６に書き込む。 In addition, the CPU 50 performs a process of writing image data acquired at the time of shooting to the storage medium 36. For example, when still image shooting is performed, the CPU 50 converts the image data that has been encoded by the image processing circuit 35 into one file (model information of the digital camera 10, shooting information at the time of shooting, and the like). (Still image files) are collectively written in the storage medium 36. Similarly, when moving image shooting is performed, each frame image data obtained by moving image shooting is subjected to encoding processing by the image processing circuit 35. Therefore, each frame image subjected to these encoding processing is processed. The data and the audio data subjected to the signal processing by the signal processing circuit 45 are collectively written in the storage medium 36 as one file (moving image file) together with the model information of the digital camera 10 and shooting information at the time of shooting. .

次に、上述した信号処理装置４５の構成について、図２の機能ブロック図を用いて説明する。図２に示すように、信号処理装置４５は、周波数変換部６１、信号算出部６２、記憶部６３、比率算出部６４、信号補正部６５及び周波数逆変換部６６を備えている。 Next, the configuration of the signal processing device 45 described above will be described with reference to the functional block diagram of FIG. As shown in FIG. 2, the signal processing device 45 includes a frequency conversion unit 61, a signal calculation unit 62, a storage unit 63, a ratio calculation unit 64, a signal correction unit 65, and a frequency inverse conversion unit 66.

周波数変換部６１は、収音部４３により取得された音声信号を、時間関数で示される信号（時間領域信号）から周波数関数で示される信号（周波数領域信号）に変換する。まず、周波数変換部６１は、後述する窓関数における窓幅を決定する。この窓関数における窓幅を決定した後、周波数変換部６１は、決定された窓幅を１フレームとしたときに、１フレーム当たりのサンプル数が例えば１０２４となるように、入力される音声信号を分割する。 The frequency conversion unit 61 converts the audio signal acquired by the sound collection unit 43 from a signal (time domain signal) indicated by a time function to a signal (frequency domain signal) indicated by a frequency function. First, the frequency converter 61 determines a window width in a window function described later. After determining the window width in this window function, the frequency converting unit 61 converts the input audio signal so that the number of samples per frame is, for example, 1024 when the determined window width is one frame. To divide.

次に、周波数変換部６１は、０．５フレーム分ずらしながらハニング窓などの窓関数を掛けた後、窓関数が適用された音声信号にフーリエ変換処理を実行する。周知のように、ハニング窓と呼ばれる窓関数は、その両端値が０、中央値が１となる関数であることから、窓関数を掛け合わせた信号は、中心部分が強調された信号となる。このため、時間とともに変化する振動などの信号を１フレーム毎ずらして解析した場合には、特徴的な箇所を捕らえることが難しい。このため、０．５フレーム分ずらすことでオーバーラップさせた解析を行うことで、信号に特徴的な箇所を検出する。これら処理を行うことで、取得された音声信号が、０．５フレーム分ずらしながら、１フレーム毎に時間領域信号から周波数領域信号に変換される。これら処理が施された音声信号は、信号算出部６２及び信号補正部６５に出力される。 Next, the frequency conversion unit 61 multiplies a window function such as a Hanning window while shifting by 0.5 frames, and then performs a Fourier transform process on the audio signal to which the window function is applied. As is well known, since a window function called a Hanning window is a function having both end values of 0 and a median value of 1, a signal obtained by multiplying the window functions is a signal in which the central portion is emphasized. For this reason, when a signal such as vibration that changes with time is shifted and analyzed for each frame, it is difficult to capture a characteristic part. For this reason, a characteristic portion of the signal is detected by performing an analysis that is overlapped by shifting by 0.5 frame. By performing these processes, the acquired audio signal is converted from a time domain signal to a frequency domain signal for each frame while shifting by 0.5 frame. The sound signal subjected to these processes is output to the signal calculation unit 62 and the signal correction unit 65.

信号算出部６２は、取得された音声信号に含まれるノイズ音を低減する処理を行う。上述したように、収音部４３により取得された音声信号は、目的の音（目的音）と、ノイズ音（動作音）とが混合された信号からなる。信号算出部６２は、周波数変換部６１から出力される音声信号から、ノイズ音に基づく信号（ノイズ信号）を減算することで、音声信号に含まれるノイズ音を低減する。このノイズ信号は、周波数変換部６１から出力される音声信号と同一のフレーム幅に対応する周波数領域信号からなる。また、このノイズ信号は記憶部６３に予め記憶されている。 The signal calculation unit 62 performs a process of reducing noise noise included in the acquired audio signal. As described above, the audio signal acquired by the sound collection unit 43 includes a signal in which a target sound (target sound) and a noise sound (operation sound) are mixed. The signal calculation unit 62 subtracts a signal (noise signal) based on the noise sound from the sound signal output from the frequency conversion unit 61 to reduce the noise sound included in the sound signal. This noise signal consists of a frequency domain signal corresponding to the same frame width as the audio signal output from the frequency converter 61. The noise signal is stored in the storage unit 63 in advance.

なお、本実施形態では、ノイズ信号を記憶部６３に予め記憶させておき、周波数変換部６１から出力された１フレーム毎の音声信号から、記憶部６３に記憶されたノイズ信号を減算するが、これに限定される必要はなく、係数を乗算したノイズ信号を、周波数変換部６１から出力される各フレームの音声信号から減算することも可能である。 In the present embodiment, the noise signal is stored in the storage unit 63 in advance, and the noise signal stored in the storage unit 63 is subtracted from the audio signal for each frame output from the frequency conversion unit 61. The present invention is not limited to this, and the noise signal multiplied by the coefficient can be subtracted from the audio signal of each frame output from the frequency converter 61.

また、ノイズ信号を記憶部６３に予め記憶させておくのではなく、従来のノイズ推定の手法を用いてノイズ信号を取得し、取得したノイズ信号、又は該ノイズ信号に係数を乗算した後の信号を、周波数変換部６１から出力された１フレーム毎の音声信号から減算してもよい。 In addition, the noise signal is not stored in the storage unit 63 in advance, but the noise signal is acquired using a conventional noise estimation method, or the acquired noise signal or a signal obtained by multiplying the noise signal by a coefficient is acquired. May be subtracted from the audio signal for each frame output from the frequency converter 61.

比率算出部６４は、目的音のみの音声信号を基準信号とした場合に、該基準信号の大きさに対する入力される１フレーム毎の音声信号の大きさとの比率を求める。上述したように、周波数変換部６１によって１フレーム毎の音声信号は周波数領域信号、言い換えれば周波数帯域と周波数スペクトルの値（その周波数帯域の音の強度）との関係で示される信号に変換される。比率算出部６４は、基準信号における各周波数帯域の周波数スペクトルの値の総和Ａ０と、音声信号における各周波数帯域の周波数スペクトルの値の総和Ａ１とをそれぞれ算出する。これら値を算出した後、比率ＢをＢ＝Ａ１／Ａ０にて求める。つまり、この比率Ｂを求めることで、取得される音声信号の大きさが変化しているか否かを判断することができる。なお、比率算出部６４は、求めた比率Ｂと基準信号とを信号補正部６５に出力する。 When the audio signal of only the target sound is used as the reference signal, the ratio calculation unit 64 obtains the ratio of the input audio signal size for each frame with respect to the reference signal size. As described above, the audio signal for each frame is converted by the frequency converter 61 into a frequency domain signal, in other words, a signal indicated by the relationship between the frequency band and the value of the frequency spectrum (the intensity of the sound in that frequency band). . The ratio calculator 64 calculates the sum A0 of the frequency spectrum values in each frequency band in the reference signal and the sum A1 of the frequency spectrum values in each frequency band in the audio signal. After calculating these values, determining the ratio factor B by B = A1 / A0. That is, by determining the ratio B, it can be determined whether or not the size of the acquired audio signal has changed. The ratio calculation unit 64 outputs the obtained ratio B and the reference signal to the signal correction unit 65.

信号補正部６５は、信号算出部６２によりノイズ成分が低減された音声信号（以下、低減処理済みの音声信号）に対する補正を実行する。まず、信号補正部６５は、比率算出部６４により求めた比率Ｂを基準信号に乗算する。その後、信号補正部６５は、比率Ｂを乗算した基準信号に基づいて、低減処理済みの音声信号を補正する。この補正が行われた補正済み音声信号は、周波数逆変換部６６に出力される。周波数逆変換部６６は、補正済み音声信号を、周波数領域信号から時間領域信号に逆変換する。時間領域信号に逆変換された音声信号は、音声メモリ４４に書き込まれる。 The signal correction unit 65 performs correction on the audio signal whose noise component has been reduced by the signal calculation unit 62 (hereinafter, the audio signal that has been subjected to reduction processing). First, the signal correction unit 65 multiplies the reference signal by the ratio B obtained by the ratio calculation unit 64. Thereafter, the signal correction unit 65 corrects the audio signal that has been subjected to the reduction process, based on the reference signal multiplied by the ratio B. The corrected audio signal subjected to this correction is output to the frequency inverse transform unit 66. The frequency inverse transform unit 66 inversely transforms the corrected audio signal from the frequency domain signal to the time domain signal. The audio signal inversely converted to the time domain signal is written into the audio memory 44.

次に、図３及び図４を用いて、信号処理装置４５における信号処理の流れについて説明する。図３に示すように、収音部４３により取得される音声は、数十ｍｓ程度の短時間においては、周期的な信号となる。上述したように、周波数変換部６１は、音声信号が入力されると、窓関数における窓幅を設定した後、フレーム分割を行う。 Next, the flow of signal processing in the signal processing device 45 will be described with reference to FIGS. 3 and 4. As shown in FIG. 3, the sound acquired by the sound collection unit 43 becomes a periodic signal in a short time of about several tens of ms. As described above, when an audio signal is input, the frequency converting unit 61 sets a window width in the window function and then performs frame division.

上述したように、周波数変換部６１は、０．５フレーム分ずらしながらハニング窓などの窓関数を掛けた後、窓関数が適用された音声信号にフーリエ変換処理を実行する。このため、周波数変換部６１に入力される音声信号に対して上述した処理が施されると、符号７１で示す領域及び符号７２で示す領域を１フレームとする音声信号、符号７２で示す領域及び符号７３で示す領域を１フレームとする音声信号、・・・・の順で１フレーム毎の音声信号が生成され、信号算出部６２に出力される。信号算出部６２は、１フレーム毎の音声信号が入力されると、記録部６３に記録された動作音からなるノイズ信号を読み出し、１フレーム毎の音声信号からノイズ信号を減算する。 As described above, the frequency conversion unit 61 multiplies a window function such as a Hanning window while shifting by 0.5 frames, and then performs a Fourier transform process on the audio signal to which the window function is applied. For this reason, when the above-described processing is performed on the audio signal input to the frequency converting unit 61, the audio signal having the area indicated by reference numeral 71 and the area indicated by reference numeral 72 as one frame, the area indicated by reference numeral 72, and An audio signal for each frame is generated in the order of an audio signal having the region indicated by reference numeral 73 as one frame, and so on, and is output to the signal calculation unit 62. When an audio signal for each frame is input, the signal calculation unit 62 reads a noise signal composed of an operation sound recorded in the recording unit 63 and subtracts the noise signal from the audio signal for each frame.

例えば、収音部４３により音声を取得している際にＡＦ（オートフォーカス）処理が実行されるときには、ＡＦ動作信号が出力される。このＡＦ動作信号が出力されることを受けて、レンズ駆動部１８が駆動し、撮像光学系１５を構成するレンズが光軸Ｌ方向に移動する。このレンズ駆動部１８の駆動及び撮像光学系１５を構成するレンズの移動時には、その動作音が生じる。このため、収音部４３により取得される音声信号は、目的音とノイズ音とが混合された音声信号となる。例えばＡＦ駆動信号が出力されたタイミングが符号７６で示す領域内の場合には、それ以降の領域では、音声信号にノイズ成分が重畳されていると推定することができる。 For example, when AF (autofocus) processing is executed while sound is acquired by the sound collection unit 43, an AF operation signal is output. In response to the output of the AF operation signal, the lens driving unit 18 is driven, and the lens constituting the imaging optical system 15 moves in the optical axis L direction. When the lens driving unit 18 is driven and the lens constituting the imaging optical system 15 is moved, the operation sound is generated. For this reason, the sound signal acquired by the sound collection unit 43 is a sound signal in which the target sound and the noise sound are mixed. For example, when the timing at which the AF drive signal is output is within the region indicated by reference numeral 76, it can be estimated that the noise component is superimposed on the audio signal in the subsequent region.

例えばノイズ成分が重畳された音声信号は、以下の（１）式で表すことができる。 For example, an audio signal on which a noise component is superimposed can be expressed by the following equation (1).

ｘ（ｔ）＝ｓ（ｔ）＋ｎ（ｔ）・・・（１）
ここで、ｘ（ｔ）は収音部４３により取得される音声信号、ｓ（ｔ）は目的音の音声信号、ｎ（ｔ）は動作音などのノイズ信号である。なお、これら信号は時間関数で示される。 x (t) = s (t) + n (t) (1)
Here, x (t) is an audio signal acquired by the sound collection unit 43, s (t) is an audio signal of a target sound, and n (t) is a noise signal such as an operation sound. These signals are shown as a time function.

上述したフーリエ変換を行うことで、目的音とノイズ音とが混合された音声信号は、時間関数で示される信号ｘ（ｔ）から、周波数関数で示される信号Ｘ（ｆ）に変換される。なお、ｆは周波数を示す。 By performing the Fourier transform described above, the audio signal in which the target sound and the noise sound are mixed is converted from the signal x (t) indicated by the time function into the signal X (f) indicated by the frequency function. Note that f indicates a frequency.

ここで、目的音の音声信号をＳｅ（ｆ）とすると、目的音の音声信号をＳｅ（ｆ）は以下に示す（２）式で表される。 Here, if the sound signal of the target sound is Se (f), Se (f) of the sound signal of the target sound is expressed by the following equation (2).

｜Ｓｅ（ｆ）｜＝｜Ｘ（ｆ）｜−α｜Ｎｅ（ｆ）｜・・・（２）
なお、Ｎｅ（ｆ）はノイズ信号、αは減算係数である。このαの値は、上述した（２）式を用いて目的音のみの音声信号を求める際に、減算するノイズ信号の大きさによっては、算出される目的成分の信号の周波数特性が変化してしまう場合や、減算後の音声信号にミュージカルノイズなどを人工的に重畳させてしまうこともある。このため、αの値としては、０．５〜４の値が用いられることが望ましい。 | Se (f) | = | X (f) | -α | Ne (f) | (2)
Note that Ne (f) is a noise signal, and α is a subtraction coefficient. The value of α depends on the magnitude of the noise signal to be subtracted when obtaining the audio signal of only the target sound using the above-described equation (2). In some cases, musical noise or the like may be artificially superimposed on the audio signal after subtraction. For this reason, as a value of (alpha), it is desirable to use the value of 0.5-4.

図４は、各信号（８１、８２、８３、８３’、８４、８４’）の周波数スペクトルを示しており、各周波数スペクトルのグラフは、横軸が周波数帯域、縦軸が音の強度（以下、「周波数スペクトルの値」ともいう）を示している。 FIG. 4 shows the frequency spectrum of each signal (81, 82, 83, 83 ′, 84, 84 ′). In the graph of each frequency spectrum, the horizontal axis is the frequency band, and the vertical axis is the sound intensity (hereinafter referred to as the sound intensity). , Also referred to as “frequency spectrum value”).

以下、図４に示されるように、取得された音声信号に対して符号８１を、ノイズ信号に対して符号８２を付して説明する。信号算出部６２は、記憶部６３に記憶されたノイズ信号８２を読み出した後、フーリエ変換された音声信号８１における周波数スペクトル８１ａ〜８１ｈから、ノイズ信号８２における周波数スペクトル８２ａ〜８２ｈをそれぞれ対応する周波数帯域毎に減算する。この減算処理により、ノイズ成分が低減された低減処理済みの音声信号８３が生成される。 Hereinafter, as illustrated in FIG. 4, the acquired audio signal is denoted by reference numeral 81 and the noise signal is denoted by reference numeral 82. The signal calculation unit 62 reads out the noise signal 82 stored in the storage unit 63 and then converts the frequency spectrums 82a to 82h in the noise signal 82 from the frequency spectra 81a to 81h in the Fourier-transformed audio signal 81, respectively. Subtract for each band. By this subtraction process, a reduced audio signal 83 with a reduced noise component is generated.

次に、比率算出部６４は、基準信号８４における各周波数帯域の周波数スペクトル８４ａ〜８４ｈの大きさの総和Ａ０、及び音声信号８１における各周波数帯域の周波数スペクトル８１ａ〜８１ｈの大きさの総和Ａ１を求める。これら総和Ａ０、Ａ１を求めた後、信号補正部６５は、比率Ｂ＝Ａ１／Ａ０を算出する。 Then, ratio calculation unit 64, the sum of the magnitude of the frequency spectrum 84a~84h of each frequency band in the reference signal 84 A0, the frequency spectrum 81 a to 81 h of each frequency band in及beauty speech signal 81 magnitude Is obtained. After obtaining these sums A0 and A1, the signal correction unit 65 calculates the ratio B = A1 / A0.

信号補正部６５は、比率算出部６４により算出された比率Ｂを基準信号８４の各周波数帯域のスペクトルの値に乗算した信号（符号８４’）を生成する。そして、信号補正部６５は、比率Ｂが乗算された基準信号８４’と、低減処理済みの音声信号８３とを比較する。この比較は、低減処理済みの音声信号８３の各周波数スペクトル８３ａ〜８３ｈの値が、同一の周波数帯域において、比率Ｂが乗算された基準信号８４’の各周波数スペクトル８４’ａ〜８４’ｈの値に対して所定の誤差範囲に含まれる値であるか否かを比較する。つまり、低減処理済みの音声信号８３の周波数スペクトルの値をＳ１、比率Ｂが乗算された基準信号８４の周波数スペクトルの値Ｓ０とした場合、Ｓ０−Ｋ≦Ｓ１≦Ｓ０＋Ｋ（Ｋは誤差）を満足しているか否かを判定する。なお、誤差Ｋの値は、適宜設定して良い。図４は、低減処理済みの音声信号８３の周波数スペクトル８３ａ〜８３ｈのうち、周波数スペクトル８３ｆ及び周波数スペクトル８３ｇの値が、上述した式を満足していない場合を示している。 The signal correction unit 65 generates a signal (symbol 84 ′) obtained by multiplying the ratio B calculated by the ratio calculation unit 64 by the spectrum value of each frequency band of the reference signal 84. Then, the signal correction unit 65 compares the reference signal 84 ′ multiplied by the ratio B with the reduced audio signal 83. In this comparison, the values of the frequency spectra 83a to 83h of the audio signal 83 subjected to the reduction process are the same as those of the frequency spectra 84′a to 84′h of the reference signal 84 ′ multiplied by the ratio B in the same frequency band. It is compared whether or not the value is within a predetermined error range. That is, when the frequency spectrum value of the reduced audio signal 83 is S1, and the frequency spectrum value S0 of the reference signal 84 multiplied by the ratio B is satisfied, S0−K ≦ S1 ≦ S0 + K (K is an error). It is determined whether or not. Note that the value of the error K may be set as appropriate. FIG. 4 shows a case where the values of the frequency spectrum 83f and the frequency spectrum 83g among the frequency spectra 83a to 83h of the audio signal 83 subjected to the reduction process do not satisfy the above-described equation.

この場合、信号補正部６５は、低減処理済みの音声信号８３の全周波数帯域の周波数スペクトル８３ａ〜８３ｈの値の総和と、対象となる周波数スペクトル８３ｆ及び周波数スペクトル８３ｇの値との比を求める。すなわち、周波数スペクトル８３ａ〜８３ｈの値の総和と周波数スペクトル８３ｆの値との比（第１強度比）、及び周波数スペクトル８３ａ〜８３ｈの値の総和と周波数スペクトル８３ｇの値との比（第２強度比）を求める。また、信号補正部６５は、比率Ｂが乗算された基準信号８４’の全周波数帯域の周波数スペクトル８４’ａ〜８４’ｈの値の総和と、対象となる周波数スペクトル８４’ｆ及び周波数スペクトル８４’ｇの値の比を求める。すなわち、周波数スペクトル８４’ａ〜８４’ｈの値の総和と周波数スペクトル８４’ｆの値との比（第３強度比）、及び周波数スペクトル８４’ａ〜８４’ｈの値の総和と周波数スペクトル８４’ｇの値との比（第４強度比）を求める。そして、信号補正部６５は、第１強度比が第３強度比と等しくなるように、対象となる周波数スペクトル８３ｆの値を補正する。さらに、第２強度比が第４強度比と等しくなるように、対象となる周波数スペクトル８３ｇの値を補正する。このようにして、補正済みの音声信号８３’が生成される。図４においては、補正処理により、低減処理済みの音声信号８３の周波数スペクトル８３ｆの値を大きくし、周波数スペクトル８３ｇの値を小さくする補正をし、音声信号８３’が生成されている。 In this case, the signal correction unit 65 obtains a ratio between the sum of the values of the frequency spectra 83a to 83h in the entire frequency band of the audio signal 83 subjected to the reduction process and the values of the target frequency spectrum 83f and the frequency spectrum 83g. That is, the ratio (first intensity ratio) of the sum of the values of the frequency spectra 83a to 83h and the value of the frequency spectrum 83f (the first intensity ratio), and the ratio of the sum of the values of the frequency spectra 83a to 83h and the value of the frequency spectrum 83g (second intensity). Ratio). The signal correction unit 65 also adds the sum of the values of the frequency spectra 84′a to 84′h in the entire frequency band of the reference signal 84 ′ multiplied by the ratio B, the target frequency spectrum 84′f, and the frequency spectrum 84. Find the ratio of 'g values. That is, the ratio (third intensity ratio) between the sum of the values of the frequency spectra 84′a to 84′h and the value of the frequency spectrum 84′f, and the sum of the values of the frequency spectra 84′a to 84′h and the frequency spectrum. A ratio with respect to a value of 84′g (fourth intensity ratio) is obtained. Then, the signal correction unit 65 corrects the value of the target frequency spectrum 83f so that the first intensity ratio becomes equal to the third intensity ratio. Further, the value of the target frequency spectrum 83g is corrected so that the second intensity ratio becomes equal to the fourth intensity ratio. In this way, a corrected audio signal 83 ′ is generated. In FIG. 4, correction is performed to increase the value of the frequency spectrum 83f of the audio signal 83 that has been reduced and to decrease the value of the frequency spectrum 83g, thereby generating an audio signal 83 ′.

この補正済みの音声信号８３’は、周波数逆変換部６５による逆フーリエ変換等により、周波数関数で示される信号から時間関数で示される信号に変換される。なお、上述した信号補正は、０．５フレームずらした１フレーム毎に実行されることから、周波数逆変換部６５により逆フーリエ変換処理が施された時間関数で示される音声信号は、０．５フレームずらしながら、つなぎ合わされた後、音声メモリ４４に書き込まれる。 The corrected audio signal 83 ′ is converted from a signal represented by a frequency function into a signal represented by a time function by inverse Fourier transform or the like by the frequency inverse transform unit 65. Since the signal correction described above is executed for each frame shifted by 0.5 frames, the audio signal indicated by the time function subjected to the inverse Fourier transform process by the frequency inverse transform unit 65 is 0.5 After being connected while shifting the frame, it is written in the audio memory 44.

このように、音声信号からカメラ内部の機構が駆動したときに発生する動作音の成分を減算することで、動作音の影響を低減した音声信号を生成した後、この動作音の影響を低減した音声信号を目的音のみの音声信号に基づいて補正している。この際、基準信号の大きさと音声信号の大きさとの比率を求めることで、音声信号の大きさが変化しているか否かを判断することができ、また、その比率を音声信号に乗算することで、音声信号の大きさが変化する場合にも対応することが可能となる。また、比率を乗算した基準信号に基づいて、ノイズ信号を低減した音声信号を補正できるので、ノイズ信号を低減した音声信号に生じるミュージカルノイズの発生を防止することができる。このように、本実施形態においては、信号の大きさが変化する音声信号であっても、音声信号に含まれるノイズ音を適切に低減することができる。 In this way, by subtracting the component of the operation sound that occurs when the internal mechanism of the camera is driven from the sound signal, after generating the sound signal that reduces the effect of the operation sound, the effect of this operation sound is reduced. The sound signal is corrected based on the sound signal of only the target sound. At this time, it is possible to determine whether or not the size of the audio signal has changed by obtaining the ratio between the size of the reference signal and the size of the audio signal, and multiply the audio signal by the ratio. Thus, it is possible to cope with the case where the size of the audio signal changes. Further, since the audio signal with the reduced noise signal can be corrected based on the reference signal multiplied by the ratio, it is possible to prevent the occurrence of musical noise that occurs in the audio signal with the reduced noise signal. As described above, in the present embodiment, it is possible to appropriately reduce the noise sound included in the audio signal even if the audio signal has a variable signal magnitude.

本実施形態では、ノイズ音としての動作音の発生の要件として、ＡＦ駆動信号が出力されるタイミングを挙げているが、この他に、ズームボタンなどの操作部が操作されたときに出力される操作信号など、デジタルカメラに設けられた操作部の操作信号が出力されるタイミング、さらに、手ブレ補正機能を備えたデジタルカメラの場合には手ブレ補正処理の開始信号が出力されるタイミングが挙げられる。 In the present embodiment, the timing for outputting the AF drive signal is given as a requirement for the generation of the operation sound as the noise sound, but in addition to this, it is output when an operation unit such as a zoom button is operated. The timing at which the operation signal of the operation unit provided in the digital camera, such as the operation signal, is output, and the timing at which the start signal of the camera shake correction process is output in the case of a digital camera having the camera shake correction function. It is done.

本実施形態では、音声信号の全周波数帯域の周波数スペクトルの値の総和と、補正対象となる周波数スペクトルの値との比から強度比を算出しているが、これに限定される必要はなく、例えば音声信号の全周波数帯域の周波数スペクトルの値の平均値と、補正対象となる周波数スペクトルの値との比から強度比を求めても良い。また、音声信号の全周波数帯域の周波数スペクトルの値の総和ではなく、補正の対象となる周波数スペクトルの値と該周波数スペクトルの周波数帯域の近傍の周波数帯域の周波数スペクトルの値の総和や、その平均値を用いることで、上述した強度比を求めることも可能である。さらに、上述した総和や平均値を求める代わりに、補正の対象となる周波数帯域の周波数スペクトルの値と、該周波数スペクトルに隣り合う周波数帯域の周波数スペクトルの値との比を用いてもよい。 In the present embodiment, the intensity ratio is calculated from the ratio between the sum of the frequency spectrum values of the entire frequency band of the audio signal and the value of the frequency spectrum to be corrected, but the present invention is not limited to this. For example, the intensity ratio may be obtained from the ratio between the average value of the frequency spectrum values of the entire frequency band of the audio signal and the value of the frequency spectrum to be corrected. Also, it is not the sum of the frequency spectrum values of all frequency bands of the audio signal, but the sum of the frequency spectrum values to be corrected and the frequency spectrum values of the frequency bands near the frequency band of the frequency spectrum, or the average By using the value, the above-described intensity ratio can also be obtained. Furthermore, instead of obtaining the above-described sum and average value, a ratio between the frequency spectrum value of the frequency band to be corrected and the frequency spectrum value of the frequency band adjacent to the frequency spectrum may be used.

本実施形態では、低減処理済みの音声信号から求められる強度比と、比率Ｂが乗算された基準信号８４から求められる強度比とが等しくなるように、補正の対象となる周波数スペクトルの値を補正しているが、これに限定される必要はなく、低減処理済みの音声信号から求められる強度比が、比率Ｂが乗算された基準信号８４から求められる強度比の誤差範囲内となるように、補正の対象となる周波数スペクトルを補正しても良い。この場合、誤差は、例えば低減処理済みの音声信号８３の周波数スペクトルの値と比率Ｂが乗算された基準信号８４の周波数スペクトルの値とを比較した場合に用いた値Ｋに基づいて算出される値を用いることができる。 In the present embodiment, the value of the frequency spectrum to be corrected is corrected so that the intensity ratio obtained from the reduced audio signal is equal to the intensity ratio obtained from the reference signal 84 multiplied by the ratio B. However, the present invention is not limited to this, and the intensity ratio obtained from the reduced audio signal is within the error range of the intensity ratio obtained from the reference signal 84 multiplied by the ratio B. The frequency spectrum to be corrected may be corrected. In this case, the error is calculated based on the value K used when, for example, the frequency spectrum value of the reduced audio signal 83 is compared with the frequency spectrum value of the reference signal 84 multiplied by the ratio B. A value can be used.

また、低減処理済みの音声信号８３に基づく強度比や、比率Ｂが乗算された基準信号８４に基づく強度比を算出する必要はなく、例えば低減処理済みの音声信号の周波数スペクトルの値が、比率Ｂが乗算された基準信号８４の周波数スペクトルの値の誤差範囲に含まれるように補正してもよい。 In addition, it is not necessary to calculate the intensity ratio based on the reduced audio signal 83 or the intensity ratio based on the reference signal 84 multiplied by the ratio B. For example, the value of the frequency spectrum of the reduced audio signal is the ratio. You may correct | amend so that it may be contained in the error range of the value of the frequency spectrum of the reference signal 84 by which B was multiplied.

本実施形態では、動画像撮影の際に取得される音声信号を例に挙げて説明しているが、これに限定される必要はなく、例えば音声信号のみを取得する場合にも適応できる。つまり、録音機能を有する電子機器であれば、適用することが可能である。また、動画撮影を行う装置としてデジタルカメラを例に挙げて説明しているが、この他に、携帯電話機や、ＰＤＡなどの携帯型端末機であってもよい。さらに、図２で示す信号処理装置の各機能をコンピュータにて実行させることが可能なプログラムであってもよい。この場合、該プログラムは、メモリカード、光学ディスク、磁気ディスクなどのコンピュータで読み取り可能な記憶媒体に記憶されていることが好ましい。 In the present embodiment, the audio signal acquired at the time of moving image shooting is described as an example. However, the present invention is not limited to this, and can be applied to the case of acquiring only an audio signal, for example. That is, any electronic device having a recording function can be applied. In addition, although a digital camera has been described as an example of a device that performs moving image shooting, a mobile terminal such as a mobile phone or a PDA may be used. Furthermore, the program may be a program that allows a computer to execute the functions of the signal processing device shown in FIG. In this case, the program is preferably stored in a computer-readable storage medium such as a memory card, an optical disk, or a magnetic disk.

１０…デジタルカメラ、１５…撮像光学系、１６…撮像素子、１８…レンズ駆動部、１９…絞り、２０…絞り駆動部、２１…シャッタ、２２…シャッタ駆動部、３６…記憶媒体、４３…収音部、４４…音声メモリ、４５…信号処理装置、６１…周波数変換部、６２…信号算出部、６３…記録部、６４…比率算出部、６５…信号補正部、６６…信号逆変換部 DESCRIPTION OF SYMBOLS 10 ... Digital camera, 15 ... Imaging optical system, 16 ... Imaging device, 18 ... Lens drive part, 19 ... Aperture, 20 ... Aperture drive part, 21 ... Shutter, 22 ... Shutter drive part, 36 ... Storage medium, 43 ... Collection Sound part, 44... Voice memory, 45... Signal processing device, 61... Frequency conversion part, 62... Signal calculation part, 63 ... recording part, 64 ... ratio calculation part, 65 ... signal correction part, 66.

Claims

Signal conversion means for converting a plurality of first audio signals obtained by dividing an audio signal represented by a time function at predetermined time intervals into second audio signals represented by a frequency function;
A signal for obtaining a third audio signal in which the influence of the operation sound is reduced from the second audio signal in which the operation sound is mixed and the audio signal indicating the operation sound among the plurality of second audio signals. A calculation means;
Among the plurality of second audio signals, a second audio signal that is obtained when the operation signal for executing the operation for generating the operation sound is not output and that is not mixed with the operation sound is used as a reference signal. A ratio calculating means for obtaining a ratio between the magnitude of the reference signal and the magnitude of the second audio signal mixed with the operation sound;
The frequency characteristic of the reference signal multiplied by the ratio is compared when the frequency characteristic of the third audio signal and the reference signal multiplied by the ratio are compared, and the comparison result is not in a predetermined range. Correction means for correcting the frequency characteristics of the third audio signal so as to be close to
Signal inverse conversion means for inversely converting the third sound signal corrected by the correction means from the sound signal indicated by the frequency function to the sound signal indicated by the time function;
A signal processing apparatus comprising:

The signal processing device according to claim 1,
When the frequency spectrum value of at least one frequency band is not within a predetermined range between the third audio signal and the reference signal multiplied by the ratio, the correction unit is configured to output the frequency of the at least one frequency band. The signal processing apparatus, wherein the third audio signal is corrected so that a spectrum value approaches the reference signal multiplied by the ratio.

The signal processing device according to claim 1 or 2,
The correction means is configured to multiply a ratio of a frequency spectrum value of a specific frequency band and a value obtained from a frequency spectrum of at least one frequency band of all frequency bands by the third audio signal and the ratio. The value of the frequency spectrum of the specific frequency band is determined such that the ratio obtained from the reference signal and obtained from the third audio signal is approximately equal to the ratio obtained from the reference signal multiplied by the ratio. A signal processing device characterized by correcting.

The signal processing device according to claim 3.
The correction means is a ratio of the value of the frequency spectrum of the specific frequency band and the sum of the values of the frequency spectrum of the entire frequency band or the average value of the frequency spectrum of the entire frequency band, A signal processing apparatus, wherein the signal processing apparatus obtains each of the third audio signal and the reference signal multiplied by the ratio.

The signal processing device according to claim 3.
The correction means includes a sum of a frequency spectrum value of the specific frequency band and a frequency spectrum value of the specific frequency band and a frequency band near the specific frequency band, or the specific frequency band and the specific frequency band. A signal processing characterized in that a ratio with any one of average values of frequency spectrums in a frequency band in the vicinity of the frequency band is obtained from the third audio signal and the reference signal multiplied by the ratio. apparatus.

Imaging means for obtaining an image signal;
  Sound collection means for acquiring an audio signal in synchronization with acquisition of the image signal by the imaging means;
  A signal processing device according to any one of claims 1 to 5,
  Storage means for storing the image signal acquired by the imaging means and the audio signal subjected to signal processing by the signal processing device;
  An imaging apparatus comprising: