JP2016111478A

JP2016111478A - Voice processing device

Info

Publication number: JP2016111478A
Application number: JP2014246195A
Authority: JP
Inventors: 太郎松野; Taro Matsuno
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2014-12-04
Filing date: 2014-12-04
Publication date: 2016-06-20

Abstract

PROBLEM TO BE SOLVED: To attain noise reduction in consideration of a frequency fluctuation of a noise component due to temperature change and secular change, etc.SOLUTION: A voice processing device includes: voice input means; drive means; a memory for storing a noise profile to be generated from a noise spectrum obtained by converting the sound signal of noise related to the drive means into a frequency area; correction means for correcting the noise profile with the use of auxiliary data indicating a fluctuation due to the disturbance of noise related to the drive means; and voice processing means for reducing noise related to the drive means from a voice signal to be input by the voice input means with the use of the noise profile corrected by the correction means.SELECTED DRAWING: Figure 1

Description

本発明は、マイクロフォンなどの音声入力手段で取り込んだ音声に含まれる騒音を低減する音声処理装置に関し、より具体的には、スペクトルサブトラクト法（ＳＳ法）により騒音を低減する音声処理装置に関する。 The present invention relates to a speech processing apparatus that reduces noise contained in speech captured by a speech input means such as a microphone, and more specifically to a speech processing apparatus that reduces noise by a spectral subtract method (SS method).

従来、マイクロフォンで取り込んだ音声信号に含まれる雑音を低減する方法として、雑音の周波数成分を選択的に低減するスペクトラム拡散（ＳＳ）を用いる方法が知られている。これは、騒音の周波数成分を推定し、その周波数成分を入力音声信号から減算することで騒音を低減するものである。 2. Description of the Related Art Conventionally, as a method for reducing noise contained in an audio signal captured by a microphone, a method using spread spectrum (SS) that selectively reduces noise frequency components is known. This is to reduce the noise by estimating the frequency component of the noise and subtracting the frequency component from the input voice signal.

特許文献１には、入力音声信号に含まれうる騒音スペクトルをノイズプロファイルとして予め記憶しておく騒音低減システムが記載されている。 Patent Document 1 describes a noise reduction system in which a noise spectrum that can be included in an input audio signal is stored in advance as a noise profile.

特開２０１１−９７２６０号公報JP 2011-97260 A

騒音は、ある程度固定された周波数成分を持つとしても、そのスペクトルは、様々な外乱によって容易に変動し得る。特許文献１に記載の技術は、温度変化や経年変化等による騒音成分の周波数変動を考慮しないノイズプロファイルを用いて騒音低減を行うので、実際の状況で適切に騒音を低減した結果を得るのが難しい。却って耳障りな音声成分を発生させてしまうこともある。 Even though noise has a frequency component that is fixed to some extent, its spectrum can easily fluctuate due to various disturbances. Since the technique described in Patent Document 1 performs noise reduction using a noise profile that does not take into account frequency fluctuations of noise components due to temperature change, secular change, etc., the result of appropriately reducing noise in an actual situation is obtained. difficult. On the other hand, an unpleasant voice component may be generated.

本発明は、ノイズプロファイルを利用しつつ、温度変化又は経年変化等による騒音の変動に対応しうる音声処理装置を提示することを目的とする。 An object of the present invention is to provide a speech processing apparatus that can cope with noise fluctuations due to temperature change or secular change while using a noise profile.

このような目的を達成するために、本発明にかかる音声処理装置は、騒音発生手段と、前記騒音発生手段が発生する騒音を他の音と共に取り込む音声入力手段とを具備する主装置に組み込まれる音声処理装置であって、前記騒音の音声信号を周波数領域に変換して得られる騒音スペクトルから作成されるノイズプロファイルを記憶するメモリと、前記騒音発生手段が発生する前記騒音の外乱による変動を示す補助データを用いて前記ノイズプロファイルを補正する補正手段と、前記補正手段によって補正されたノイズプロファイルを用いて前記音声入力手段により入力される音声信号に含まれる前記騒音を低減する音声処理手段とを具備することを特徴とする。 In order to achieve such an object, an audio processing apparatus according to the present invention is incorporated in a main apparatus including noise generating means and audio input means for taking in noise generated by the noise generating means together with other sounds. A speech processing apparatus, wherein a memory that stores a noise profile created from a noise spectrum obtained by converting the noise speech signal into a frequency domain, and fluctuation due to noise disturbance generated by the noise generating means Correction means for correcting the noise profile using auxiliary data; and voice processing means for reducing the noise contained in the voice signal input by the voice input means using the noise profile corrected by the correction means. It is characterized by comprising.

本発明によれば、外乱による騒音の周波数成分変動を考慮し、ノイズプロファイルを補正することにより、騒音成分の消し残しを大幅に削減した騒音低減を行うことができる。 According to the present invention, by correcting the noise profile in consideration of the fluctuation of the frequency component of noise due to disturbance, it is possible to perform noise reduction that greatly reduces the remaining noise component.

本発明を適用した撮像装置の一実施例の概略構成ブロック図である。It is a schematic block diagram of an embodiment of an imaging apparatus to which the present invention is applied. 補正ノイズプロファイル作成手順の説明図である。It is explanatory drawing of the correction noise profile creation procedure. 補正前ノイズプロファイルの作成手順の説明図である。It is explanatory drawing of the preparation procedure of the noise profile before correction | amendment. フロアノイズスペクトルの波形例を示す波形図である。It is a wave form diagram which shows the example of a waveform of a floor noise spectrum. 騒音スペクトルの波形例を示す波形図である。It is a wave form diagram which shows the waveform example of a noise spectrum. 補正前ノイズプロファイルの波形例を示す波形図である。It is a wave form diagram which shows the example of a waveform of the noise profile before correction | amendment. 補正前ノイズプロファイルの波形例と閾値γとの関係を示す波形図である。It is a wave form diagram which shows the relationship between the waveform example of the noise profile before correction | amendment, and threshold value (gamma). 補正前ノイズプロファイルの波形例と閾値γ，βとの関係を示す波形図である。It is a wave form diagram which shows the relationship between the example of a waveform of the noise profile before correction | amendment, and threshold value (gamma) and (beta). 変動レベルの周波数特性例を示す説明図である。It is explanatory drawing which shows the example of a frequency characteristic of a fluctuation | variation level. 補助データの周波数特性例を示す説明図である。It is explanatory drawing which shows the example of a frequency characteristic of auxiliary data. 補助データの別の周波数特性例を示す説明図である。It is explanatory drawing which shows another example of a frequency characteristic of auxiliary data. 補正ノイズプロファイルの作成手順の説明図である。It is explanatory drawing of the preparation procedure of a correction noise profile. 補正ノイズプロファイルの周波数特性例を示す波形図である。It is a wave form diagram which shows the frequency characteristic example of a correction noise profile. ノイズプロファイル補正装置の概略構成ブロック図である。It is a schematic block diagram of a noise profile correction apparatus. 図１に示す撮像装置に図１４に示すノイズプロファイル補正装置を組み合わせた構成の概略構成ブロック図である。FIG. 15 is a schematic block diagram illustrating a configuration in which the imaging apparatus illustrated in FIG. 1 is combined with the noise profile correction apparatus illustrated in FIG. 14. 図１５に示す構成における補正ノイズプロファイルの作成手順の説明図である。FIG. 16 is an explanatory diagram of a procedure for creating a correction noise profile in the configuration shown in FIG. 15.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明に係る音声処理装置の一実施例を組み込んだ主装置としての撮像装置の概略構成ブロック図を示す。 FIG. 1 shows a schematic block diagram of an imaging apparatus as a main apparatus incorporating an embodiment of a sound processing apparatus according to the present invention.

図１に示す撮像装置１００において、撮像部１０１は、撮影レンズにより取り込まれた被写体の光学像をＣＣＤセンサ又はＣＭＯＳセンサ等の撮像素子により画像信号に変換し、アナログデジタル変換して、画像処理部１０２に供給する。撮像部１０１は、撮像レンズを駆動するモーター・ギア等を有し、制御部１１４からの制御信号に従いモーター・ギア等を回転させることで、撮像レンズのズームイン／ズームアウト動作を行う。このモーター・ギア等は、撮像レンズを駆動するための駆動手段である。 In the imaging apparatus 100 shown in FIG. 1, the imaging unit 101 converts an optical image of a subject captured by a photographic lens into an image signal by an imaging element such as a CCD sensor or a CMOS sensor, performs analog-digital conversion, and performs an image processing unit. 102. The imaging unit 101 includes a motor and gear for driving the imaging lens, and rotates the motor and gear according to a control signal from the control unit 114, thereby performing zoom-in / zoom-out operations of the imaging lens. The motor gear and the like are driving means for driving the imaging lens.

画像処理部１０２は、撮像部１０１から入力するデジタル画像信号に、設定値に基づいてホワイトバランス、色及び明るさなどを調整する画質調整処理を行う。画像処理部１０２は、処理結果の画像データを状況に応じてメモリ１０５、映像出力部１１０、表示制御部１１１及び制御部１１４に送信する。 The image processing unit 102 performs image quality adjustment processing for adjusting white balance, color, brightness, and the like on the digital image signal input from the imaging unit 101 based on setting values. The image processing unit 102 transmits the processed image data to the memory 105, the video output unit 110, the display control unit 111, and the control unit 114 according to the situation.

音声入力部１０３は、少なくとも１つ以上の内蔵または外付けされたマイクを介して撮像装置１００の周辺の音声を取り込み、デジタル信号に変換して音声処理部１０４に供給する。 The audio input unit 103 takes in audio around the imaging device 100 via at least one or more built-in or external microphones, converts the audio into a digital signal, and supplies the digital signal to the audio processing unit 104.

音声処理部１０４は、音声入力部１０３から入力される音声データに音声レベルの適正化処理と特定周波数の低減処理等の音声処理を施し、その処理結果をメモリ１０５及び音声出力部１０９に供給する。音声処理部１０４は、騒音低減手段を内蔵する。音声処理部１０４は、また、音声入力部１０３からの音声データを用いて、ＳＳ（スペクトラム拡散）法により騒音を低減するために用いるノイズプロファイル等のデータを生成し、メモリ１０５に供給する。音声処理部１０４は、音声データを高速フーリエ変換する高速フーリエ変換（ＦＦＴ）回路と、逆高速フーリエ変換する逆高速フーリエ変換（ＩＦＦＴ）回路を具備する。 The sound processing unit 104 performs sound processing such as sound level optimization processing and specific frequency reduction processing on the sound data input from the sound input unit 103, and supplies the processing results to the memory 105 and the sound output unit 109. . The voice processing unit 104 incorporates noise reduction means. The voice processing unit 104 also uses the voice data from the voice input unit 103 to generate data such as a noise profile used for reducing noise by the SS (spread spectrum) method, and supplies the data to the memory 105. The speech processing unit 104 includes a fast Fourier transform (FFT) circuit that performs fast Fourier transform on speech data and an inverse fast Fourier transform (IFFT) circuit that performs inverse fast Fourier transform.

メモリ１０５は、画像処理部１０２からの画像信号、並びに、音声処理部１０４からの音声信号及びノイズプロファイル等のデータを記憶する。 The memory 105 stores data such as an image signal from the image processing unit 102, an audio signal from the audio processing unit 104, and a noise profile.

符号化処理部１０６は、画像データ及び音声データをそれぞれ所定の画像符号化方式及び音声符号化方式で符号化及び復号化することができる。符号化処理部１０６は、メモリ１０５に一時的に記憶された画像データ及び音声データを読み出して、それぞれ画像符号化及び音声符号化などを行って圧縮画像データ及び圧縮音声データを生成し、記録制御部１０７に供給する。 The encoding processing unit 106 can encode and decode image data and audio data by a predetermined image encoding method and audio encoding method, respectively. The encoding processing unit 106 reads out image data and audio data temporarily stored in the memory 105, performs image encoding and audio encoding, respectively, and generates compressed image data and compressed audio data, and recording control Supplied to the unit 107.

記録制御部１０７は、符号化処理部１０６で生成された圧縮画像データ及び圧縮音声データ、及び撮影に関する制御データを記録媒体１０８に記録する。また、記録制御部１０７は、記録媒体１０８に記録された圧縮画像データ、圧縮音声データ、各種データ及びプログラムを読み出す（再生する）こともできる。記録制御部１０７は、読み出した圧縮画像データ及び圧縮音声データを符号化処理部１０６に供給する。記録媒体１０８は、種々のデータを記録できる汎用的な記録媒体、例えば、磁気ディスク、光学式ディスクまたは半導体メモリなどからなり、異種または同種の複数の媒体の組み合わせであっても良い。 The recording control unit 107 records the compressed image data and compressed audio data generated by the encoding processing unit 106 and the control data related to shooting on the recording medium 108. The recording control unit 107 can also read (reproduce) compressed image data, compressed audio data, various data, and programs recorded on the recording medium 108. The recording control unit 107 supplies the read compressed image data and compressed audio data to the encoding processing unit 106. The recording medium 108 includes a general-purpose recording medium capable of recording various data, such as a magnetic disk, an optical disk, or a semiconductor memory, and may be a combination of a plurality of different or similar media.

符号化処理部１０６は、記録制御部１０７からの圧縮画像データ及び圧縮音声データをメモリ１０５に一時記憶し、所定の手順で復号化する。符号化処理部１０６は、復号化された音声データを音声出力部１０９に供給し、復号化された画像データを映像出力部１１０及び表示制御部１１１に供給する。 The encoding processing unit 106 temporarily stores the compressed image data and the compressed audio data from the recording control unit 107 in the memory 105 and decodes them according to a predetermined procedure. The encoding processing unit 106 supplies the decoded audio data to the audio output unit 109, and supplies the decoded image data to the video output unit 110 and the display control unit 111.

音声出力部１０９は、例えば、音声出力端子からなり、イヤホンまたはスピーカなどの音声出力装置に音声信号を出力する。音声出力部１０９は、撮像装置１００に内蔵されるスピーカであっても良い。映像出力部１１０は、例えば映像出力端子からなり、外部ディスプレイ等に画像信号を出力する。音声出力部１０９と映像出力部１１０は、統合された１つの端子、例えばＨＤＭＩ（High-Definition Multimedia Interface）（登録商標）端子として構成されていてもよい。 The audio output unit 109 includes, for example, an audio output terminal, and outputs an audio signal to an audio output device such as an earphone or a speaker. The audio output unit 109 may be a speaker built in the imaging apparatus 100. The video output unit 110 includes a video output terminal, for example, and outputs an image signal to an external display or the like. The audio output unit 109 and the video output unit 110 may be configured as one integrated terminal, for example, a High-Definition Multimedia Interface (HDMI) (registered trademark) terminal.

表示制御部１１１は、符号化処理部１０６からの画像信号及び画像処理部１０２からの画像信号の画像を表示部１１２に表示し、撮像装置１００を操作するための操作画面（メニュー画面）等を表示部１１２に表示する。表示部１１２は、例えば、液晶ディスプレイ、有機ＥＬディスプレイまたは電子ペーパ等の表示デバイスであれば何でも良い。 The display control unit 111 displays the image signal from the encoding processing unit 106 and the image of the image signal from the image processing unit 102 on the display unit 112, and displays an operation screen (menu screen) for operating the imaging apparatus 100. It is displayed on the display unit 112. The display unit 112 may be any display device such as a liquid crystal display, an organic EL display, or electronic paper.

操作部１１３は、ユーザの操作に応じた内容の指示信号を制御部１１４に送信する。制御部１１４は、操作部１１３から送信された指示信号に基づいて、撮像装置１００の各ブロックに制御信号を送信することで、各ブロックを制御する。操作部１１３は、例えば、電源ボタン、記録開始ボタン、メニュー表示ボタン、決定ボタン、カーソルキー、表示部１１２の任意の点を指定するためのポインティングデバイス及びタッチパネル等からなる。 The operation unit 113 transmits an instruction signal having contents corresponding to the user operation to the control unit 114. The control unit 114 controls each block by transmitting a control signal to each block of the imaging apparatus 100 based on the instruction signal transmitted from the operation unit 113. The operation unit 113 includes, for example, a power button, a recording start button, a menu display button, a determination button, a cursor key, a pointing device for designating an arbitrary point on the display unit 112, a touch panel, and the like.

制御部１１４は、各種処理（プログラム）を実行するための例えば、ＣＰＵ（ＭＰＵ）及びメモリ（ＤＲＡＭ、ＳＲＡＭ）などからなり、撮像装置１００の撮影等に関する動作を制御する。例えば、操作部１１３からのズーム撮影の指示信号を受けて、制御部１１４はズーム動作を示す制御信号を撮像部１０１に送信する。 The control unit 114 includes, for example, a CPU (MPU) and a memory (DRAM, SRAM) for executing various processes (programs), and controls operations related to shooting of the imaging apparatus 100. For example, upon receiving a zoom shooting instruction signal from the operation unit 113, the control unit 114 transmits a control signal indicating a zoom operation to the imaging unit 101.

通信部１１５は、無線又は有線で外部装置と通信し、音声データ及び画像データなどのデータを外部装置との間で送受信する。通信部１１５はまた、撮影開始・終了コマンド等の、撮影にかかる制御信号、その他の情報を送受信する。通信部１１５は例えば、赤外線通信モジュール、Ｂｌｕｅｔｏｏｔｈ（登録商標）通信モジュール、無線ＬＡＮ通信モジュール、有線ＬＡＮ通信モジュール、ＵＳＢ通信モジュール及びＴｈｕｎｄｅｒｂｏｌｔ（登録商標）通信モジュール等の１以上からなる。通信部１１５は、遠隔操作装置（リモートコントローラ）またはパーソナルコンピュータ（ＰＣ）等のデバイスと接続するためのインターフェースともなる。 The communication unit 115 communicates with an external device wirelessly or by wire and transmits / receives data such as audio data and image data to / from the external device. The communication unit 115 also transmits / receives a control signal related to shooting, such as a shooting start / end command, and other information. The communication unit 115 includes, for example, one or more of an infrared communication module, a Bluetooth (registered trademark) communication module, a wireless LAN communication module, a wired LAN communication module, a USB communication module, and a Thunderbolt (registered trademark) communication module. The communication unit 115 also serves as an interface for connecting to a device such as a remote operation device (remote controller) or a personal computer (PC).

バス１１６は、各種データ及び制御信号などを撮像装置１００の各ブロック間で伝送する媒体である。 The bus 116 is a medium that transmits various data, control signals, and the like between the blocks of the imaging apparatus 100.

本実施例の動画・音声の記録動作を説明する。ユーザが操作部１１３で動画撮影モードを選択し、ＲＥＣボタンを押すと、制御部１１４は、操作部１１３からのこの操作信号に応じて、関係するブロックに動作記録開始の制御信号を送信する。撮像部１０１は、制御部１１４からの動画撮影開始の制御信号を受けて、動画撮影を開始する。音声入力部１０３は、制御部１１４からの動画撮影開始の制御信号を受けて、周囲の音声の取り込みを開始する。撮像部１０１により撮影された動画は、画像処理部１０２で色温度、ホワイトバランス調整等の画像信号処理を行われ、動画データとしてメモリ１０５に一時記憶される。 The moving image / sound recording operation of this embodiment will be described. When the user selects the moving image shooting mode with the operation unit 113 and presses the REC button, the control unit 114 transmits an operation recording start control signal to the related block in response to the operation signal from the operation unit 113. In response to the control signal for starting moving image shooting from the control unit 114, the imaging unit 101 starts moving image shooting. The audio input unit 103 receives a control signal for starting moving image shooting from the control unit 114 and starts to capture surrounding audio. The moving image shot by the imaging unit 101 is subjected to image signal processing such as color temperature and white balance adjustment by the image processing unit 102 and temporarily stored in the memory 105 as moving image data.

動画記録中に、ユーザが操作部１１３でズーム動作を行った場合、制御部１１４は、ズーム動作開始の制御信号を撮像部１０１に供給する。撮像部１０１では、制御部１１４からズーム開始の制御信号を受けて、撮像レンズを動かすためにモーター・ギア等が回転し、撮像レンズを動かすためのモーター・ギア等による騒音が発生する。撮像レンズを動かすためのモーター・ギア等によって発生する騒音を「ズーム騒音」と呼ぶ。ズーム騒音が発生すると、音声入力部１０３が取り込む音声信号には、撮影した動画に付随する動画音声とズーム騒音が含まれる。音声入力部１０３は、動画音声とズーム騒音が混在する音声信号を音声処理部１０４に入力する。 When the user performs a zoom operation with the operation unit 113 during moving image recording, the control unit 114 supplies a control signal for starting the zoom operation to the imaging unit 101. The imaging unit 101 receives a zoom start control signal from the control unit 114, rotates a motor / gear to move the imaging lens, and generates noise due to the motor / gear to move the imaging lens. Noise generated by motors, gears, etc. for moving the imaging lens is called “zoom noise”. When zoom noise occurs, the audio signal captured by the audio input unit 103 includes moving image audio and zoom noise associated with the captured moving image. The audio input unit 103 inputs an audio signal in which moving image audio and zoom noise are mixed into the audio processing unit 104.

音声処理部１０４は、音声入力部１０３により入力される音声信号に対しレベル調整等の音声信号処理を行い、処理済みの音声データをメモリ１０５に一時記憶する。制御部１１４からのズーム動作の制御信号を受けると、音声処理部１０４は、動画撮影時の音声信号処理に加え、ズーム騒音を低減するための騒音低減処理を開始する。すなわち、制御部１１４は、音声処理部１０４の騒音低減手段を、ズーム騒音が発生する間、有効化する。音声処理部１０４は、制御部１１４から動画撮影中におけるズーム動作の制御信号を受けている間、補正ノイズプロファイルを用いたＳＳ法による騒音低減処理を実行する。 The audio processing unit 104 performs audio signal processing such as level adjustment on the audio signal input from the audio input unit 103, and temporarily stores the processed audio data in the memory 105. Upon receiving a zoom operation control signal from the control unit 114, the audio processing unit 104 starts noise reduction processing for reducing zoom noise in addition to audio signal processing during moving image shooting. That is, the control unit 114 activates the noise reduction unit of the audio processing unit 104 while zoom noise is generated. While receiving a control signal for zoom operation during moving image shooting from the control unit 114, the sound processing unit 104 executes noise reduction processing by the SS method using the correction noise profile.

符号化処理部１０６は、メモリ１０５に一時記憶された動画データと音声データを圧縮符号化する。記録制御部１０７は、符号化処理部１０６からの圧縮動画データと圧縮音声データをＡＶ（ＡｕｄｉｏＶｉｓｕａｌ）ファイルとして記録媒体１０８に記録する。 The encoding processing unit 106 compresses and encodes moving image data and audio data temporarily stored in the memory 105. The recording control unit 107 records the compressed moving image data and the compressed audio data from the encoding processing unit 106 on the recording medium 108 as an AV (Audio Visual) file.

音声処理部１０４における騒音低減処理を詳細に説明する。音声処理部１０４は、音声入力部１０３により入力される音声データ（動画音声とズーム騒音が混在している音声信号）を高速フーリエ変換（ＦＦＴ）し、入力音声スペクトルを得る。制御部１１４からのズーム動作の制御信号に従い、音声処理部１０４は、メモリ１０５に記憶されている補正ノイズプロファイルの中で最適なものを選択して読み出す。音声処理部１０４は、メモリ１０５より読み出した補正ノイズプロファイルを入力音声スペクトルから減算し、減算結果を逆高速フーリエ変換する。この処理により、入力音声信号に含まれる騒音成分を低減できる。このように、騒音成分が低減された音声信号が、上述のように、符号化処理部１０６及び記録制御部１０７により処理されて記録媒体１０８に記録される。 The noise reduction processing in the voice processing unit 104 will be described in detail. The audio processing unit 104 performs fast Fourier transform (FFT) on the audio data input from the audio input unit 103 (audio signal in which moving image audio and zoom noise are mixed) to obtain an input audio spectrum. In accordance with the zoom operation control signal from the control unit 114, the sound processing unit 104 selects and reads the optimum correction noise profile stored in the memory 105. The sound processing unit 104 subtracts the corrected noise profile read from the memory 105 from the input sound spectrum, and performs inverse fast Fourier transform on the subtraction result. By this process, the noise component contained in the input voice signal can be reduced. As described above, the audio signal with reduced noise components is processed by the encoding processing unit 106 and the recording control unit 107 and recorded on the recording medium 108 as described above.

音声処理部１０４は、図２に示すように、騒音スペクトルからフロアノイズスペクトルを減算して、補正前ノイズプロファイルを生成する。そして、音声処理部１０４は、補正前ノイズプロファイルを補助データを用いて補正して補正ノイズプロファイルを生成する。フロアノイズスペクトルは、撮像装置１００自体が発生する背景ノイズのスペクトルであり、騒音ノイズは、この背景ノイズに低減対象の騒音（ここでは、撮像部１０１で発生するズーム雑音）が加わった状態での騒音のスペクトルを示す。騒音スペクトルからフロアノイズスペクトルを減算することで、低減対象となる騒音のみのスペクトルが得られ、これが補正前ノイズプロファイルとなる。補助データは、低減対象の騒音の外乱などによる周波数特性の変動を反映するデータである。この補助データにより補正前ノイズプロファイルを補正して得られた補正ノイズプロファイルは、現時点で騒音低減に適したノイズプロファイルとなっている。 As shown in FIG. 2, the audio processing unit 104 subtracts the floor noise spectrum from the noise spectrum to generate a noise profile before correction. Then, the sound processing unit 104 corrects the pre-correction noise profile using auxiliary data to generate a corrected noise profile. The floor noise spectrum is a spectrum of background noise generated by the imaging apparatus 100 itself, and the noise noise is a state in which noise to be reduced (here, zoom noise generated by the imaging unit 101) is added to the background noise. The spectrum of noise is shown. By subtracting the floor noise spectrum from the noise spectrum, a spectrum of only noise to be reduced is obtained, and this becomes a noise profile before correction. Auxiliary data is data that reflects changes in frequency characteristics due to disturbance of noise to be reduced. The corrected noise profile obtained by correcting the pre-correction noise profile with this auxiliary data is a noise profile suitable for noise reduction at present.

図３を参照して、補正前ノイズプロファイルの作成手順を説明する。撮像装置１００を無響防音箱内に設置する。無響防音箱とは、密閉状態では内部の音声が反響せず、かつ、外部音が内部に入らないようになっている密閉可能な箱または室である。撮像装置１００は、無線又は有線接続により無響防音箱の外部にあるリモートコントローラと接続されており、無響防音箱の外部から操作可能な状態で設置される。無響防音箱内に設置された撮像装置１００の操作は、全て通信部１１５に接続されたリモートコントローラによって行なわれる。 With reference to FIG. 3, a procedure for creating a noise profile before correction will be described. The imaging device 100 is installed in an anechoic soundproof box. An anechoic soundproof box is a sealable box or room in which sound inside does not reverberate and outside sound does not enter inside in a sealed state. The imaging device 100 is connected to a remote controller outside the anechoic soundproof box by wireless or wired connection, and is installed in a state where it can be operated from the outside of the anechoic soundproof box. All operations of the imaging apparatus 100 installed in the anechoic soundproof box are performed by a remote controller connected to the communication unit 115.

リモートコントローラから撮像装置１００の通信部１１５を介して制御部１１４に、指示信号（例えば、録画開始の指示信号）を入力する。この指示信号に応じて、制御部１１４は、音声入力部１０３に音声取込みを開始させる。音声入力部１０３は、撮像部１０１がズーム動作をしていない状態で、ある程度の間、例えば、１秒間の音声を取り込む。音声処理部１０４は、入力された１秒間の音声信号を高速フーリエ変換する。これにより、撮像装置１００の装置自体の発生ノイズ（背景ノイズ）のスペクトルが得られる。このようにして得られた１秒間のスペクトルの各周波数における平均値をフロアノイズスペクトル（図４）とする。音声処理部１０４は、得られたフロアノイズスペクトルをメモリ１０５に書き込み、保存する。 An instruction signal (for example, a recording start instruction signal) is input from the remote controller to the control unit 114 via the communication unit 115 of the imaging apparatus 100. In response to this instruction signal, the control unit 114 causes the audio input unit 103 to start capturing audio. The audio input unit 103 captures audio for one second, for example, in a state where the imaging unit 101 is not performing a zoom operation. The audio processing unit 104 performs fast Fourier transform on the input audio signal for one second. Thereby, a spectrum of noise (background noise) generated by the image capturing apparatus 100 itself is obtained. The average value at each frequency of the spectrum obtained for 1 second in this way is defined as a floor noise spectrum (FIG. 4). The audio processing unit 104 writes the obtained floor noise spectrum in the memory 105 and stores it.

次に、密閉状態の無響防音箱内において、制御部１１４は、音声入力部１０３での音声入力中に、撮像部１０１のモーター・ギア等を回転させズーム動作を行わせる。ズーム動作は、広角端から望遠端までと、望遠端から広角端までの２つの動作が行われる。例えば、広角端から望遠端まで及び望遠端から広角端までのズーム動作にそれぞれちょうど４秒間かかるとする。音声処理部１０４は、広角端から望遠端までのズーム動作中に音声入力部１０３により得られる４秒間の音声信号（ただし、ズーム動作開始時のみに現れる起動音は除く）を高速フーリエ変換する。これにより、広角端から望遠端までのズーム動作で発生する騒音のスペクトルが得られる。なお、音声処理部１０４は、図５に示すように、このスペクトルの各周波数におけるピーク値を騒音スペクトルとし、メモリ１０５にこの騒音スペクトルを一時記憶する。同様に、音声処理部１０４は、望遠端から広角端までのズーム動作で発生する騒音の騒音スペクトルをメモリ１０５に一時記憶する。 Next, in the sealed anechoic soundproof box, the control unit 114 rotates the motor / gear of the imaging unit 101 and the like during zooming while the voice input unit 103 inputs voice. Two zoom operations are performed from the wide-angle end to the telephoto end and from the telephoto end to the wide-angle end. For example, it is assumed that zoom operations from the wide-angle end to the telephoto end and from the telephoto end to the wide-angle end take exactly 4 seconds, respectively. The audio processing unit 104 performs a fast Fourier transform on a 4-second audio signal obtained by the audio input unit 103 during the zoom operation from the wide-angle end to the telephoto end (except for the startup sound that appears only when the zoom operation starts). Thereby, a spectrum of noise generated by the zoom operation from the wide angle end to the telephoto end can be obtained. As shown in FIG. 5, the voice processing unit 104 sets the peak value at each frequency of the spectrum as a noise spectrum, and temporarily stores the noise spectrum in the memory 105. Similarly, the sound processing unit 104 temporarily stores in the memory 105 a noise spectrum of noise generated by a zoom operation from the telephoto end to the wide-angle end.

補助データは、周波数領域において、各周波数における外乱によるズーム騒音の周波数軸方向の変動範囲と変動幅（変動レベル）から作成したものであり、補正前ノイズプロファイルの補正に使用される。外乱は例えば、温度または湿度の変化及び撮像装置の経年劣化等によるものである。騒音スペクトルの中には、これらの外乱によって周波数軸方向に変動するスペクトルと変動しないスペクトルが存在する。周波数軸方向に変動する騒音スペクトルは、ＳＳ方式による騒音低減において、騒音成分の消し残りの原因となる。 The auxiliary data is created from the variation range and variation range (variation level) of the zoom noise due to disturbance at each frequency in the frequency domain, and is used for correcting the noise profile before correction. The disturbance is, for example, due to a change in temperature or humidity, an aging deterioration of the imaging device, or the like. The noise spectrum includes a spectrum that varies in the frequency axis direction due to these disturbances and a spectrum that does not vary. The noise spectrum that fluctuates in the frequency axis direction causes unerased noise components in the noise reduction by the SS method.

音声処理部１０４（または制御部１１４）は、以下に説明するように補助データを算出し、メモリ１０５に記憶する。撮像装置１００に上記で述べたような外乱を与えて、上述のように音声処理部１０４に補正前ノイズプロファイルを算出させる。図６は、図４に示すフロアレベルスペクトルと図５に示す騒音スペクトルに対する補正前ノイズプロファイルの周波数特性例を示す。 The audio processing unit 104 (or the control unit 114) calculates auxiliary data and stores it in the memory 105 as described below. The disturbance as described above is applied to the imaging apparatus 100, and the sound processing unit 104 is caused to calculate the noise profile before correction as described above. FIG. 6 shows an example of frequency characteristics of the noise profile before correction for the floor level spectrum shown in FIG. 4 and the noise spectrum shown in FIG.

例えば、温度及び湿度を変更できる無響防音箱内において、温度毎及び湿度毎の補正前ノイズプロファイルを得る。制御部１１４は、得られた複数の補正前ノイズプロファイルを比較し、温度及び湿度の変化によって周波数軸方向に変動している閾値γ以上のスペクトルを検出する。ここでは、一例として、閾値γを補正前ノイズプロファイルの周波数軸方向の総和の平均値とするが、これは実験的に定められたものであり、閾値γはこれに限定されない。 For example, a pre-correction noise profile for each temperature and humidity is obtained in an anechoic soundproof box that can change temperature and humidity. The control unit 114 compares the obtained plurality of pre-correction noise profiles, and detects a spectrum of a threshold value γ or more that varies in the frequency axis direction due to changes in temperature and humidity. Here, as an example, the threshold γ is an average value of the sum of the noise profile before correction in the frequency axis direction, but this is experimentally determined, and the threshold γ is not limited to this.

制御部１１４は、閾値γ以上のスペクトルが周波数軸方向に変動している変動範囲と変動レベルから補助データを作成し、作成した補助データを音声処理部１０４がアクセスできるようにメモリ１０５に格納する。例えば、補正前ノイズプロファイルにおいて、図７に示すように周波数軸上で３０００Ｈｚ周辺に立っている閾値γ以上のスペクトルがあるとする。このスペクトルが、温度又は湿度を変化させた場合に２９００Ｈｚ〜３１００Ｈｚの間で変動したとする。このときの変動範囲は、図９に示すように、最小値（ｍｉｎ＿ｆｒｅｑ）が２９００（Ｈｚ）、最大値（ｍａｘ＿ｆｒｅｑ）が３１００（Ｈｚ）となる。変動レベル（ｌｅｖ＿ｆｒｅｑ＝ｍａｘ＿ｆｒｅｑ−ｍｉｎ＿ｆｒｅｑ）は、２００（Ｈｚ）となる。これらの値を以下の式に代入することにより、

周波数ｘ（Ｈｚ）に対しての補正パラメータαを得る。補正パラメータαを周波数順に並べたものが補助データとなる。得られた補助データの周波数分布を図１０に示す。つまり、補助データは、ある周波数ｘ（Ｈｚ）に対してただ一つの補正パラメータαが求まるデータのことである。 The control unit 114 creates auxiliary data from the fluctuation range and fluctuation level in which the spectrum equal to or greater than the threshold γ fluctuates in the frequency axis direction, and stores the created auxiliary data in the memory 105 so that the voice processing unit 104 can access the auxiliary data. . For example, in the noise profile before correction, as shown in FIG. 7, it is assumed that there is a spectrum having a threshold value γ or more standing around 3000 Hz on the frequency axis. It is assumed that this spectrum fluctuates between 2900 Hz and 3100 Hz when the temperature or humidity is changed. As shown in FIG. 9, the fluctuation range at this time is 2900 (Hz) for the minimum value (min_freq) and 3100 (Hz) for the maximum value (max_freq). The fluctuation level (lev_freq = max_freq−min_freq) is 200 (Hz). By substituting these values into the following equation:

A correction parameter α for the frequency x (Hz) is obtained. The auxiliary data is obtained by arranging the correction parameters α in order of frequency. The frequency distribution of the auxiliary data obtained is shown in FIG. That is, auxiliary data is data for which only one correction parameter α is obtained for a certain frequency x (Hz).

温度や湿度以外の外乱についても、制御部１１４は、実際の外乱条件、又は実際の外乱条件と同等の結果が得られる条件において得られる補正前ノイズプロファイルの比較を行い、スペクトルの変動範囲と変動レベルを得て、補助データを作成する。 For disturbances other than temperature and humidity, the control unit 114 also compares the pre-correction noise profiles obtained under actual disturbance conditions or conditions that can obtain results equivalent to the actual disturbance conditions. Gain levels and create auxiliary data.

補正パラメータαは、別の式により求めても良い。騒音低減の条件によっては、補正パラメータαを図１１に示すような周波数分布としても良い。本実施例は、変動範囲と変動レベルから補助データを作成し、その補助データを用いて補正前ノイズプロファイルを補正することを特徴とするものであって、図１０及び図１１に例示した補正パラメータαに限定されない。 The correction parameter α may be obtained by another formula. Depending on the noise reduction conditions, the correction parameter α may be a frequency distribution as shown in FIG. The present embodiment is characterized in that auxiliary data is created from a fluctuation range and a fluctuation level, and the pre-correction noise profile is corrected using the auxiliary data. The correction parameters illustrated in FIGS. 10 and 11 are used. It is not limited to α.

図１２に示すように、音声処理部１０４は、動作状態に対応する補正前ノイズフィルタを現時点の外乱に対応する補助データで補正して補正ノイズプロファイルを生成する。補正ノイズプロファイルの作成に必要となる補正前ノイズプロファイルは、補助データの作成時に用いた補正前ノイズプロファイルと同一であるとは限らない。 As shown in FIG. 12, the sound processing unit 104 generates a corrected noise profile by correcting the pre-correction noise filter corresponding to the operating state with auxiliary data corresponding to the current disturbance. The pre-correction noise profile necessary for creating the correction noise profile is not necessarily the same as the pre-correction noise profile used when creating the auxiliary data.

補助データの補正パラメータαがα＝０（Ｈｚ）の周波数帯では、補正前ノイズプロファイルは補正されない。また、補正前ノイズプロファイルの値が閾値γ以下であった場合にも、補正前ノイズプロファイルは補正されない。いずれも、補正前ノイズプロファイルの値がそのまま補正ノイズプロファイルの値となる。つまり、補正パラメータαがα≠０の周波数帯であり、且つ、補正前ノイズプロファイルが閾値γ以上の値を取る場合に限り、補助データを用いて補正前ノイズプロファイルを補正した値が、補正ノイズプロファイルの値となる。 In the frequency band where the correction parameter α of the auxiliary data is α = 0 (Hz), the pre-correction noise profile is not corrected. Even when the value of the noise profile before correction is equal to or less than the threshold value γ, the noise profile before correction is not corrected. In either case, the value of the pre-correction noise profile becomes the value of the correction noise profile as it is. That is, only when the correction parameter α is a frequency band where α ≠ 0 and the pre-correction noise profile takes a value greater than or equal to the threshold γ, the value obtained by correcting the pre-correction noise profile using auxiliary data is the correction noise. The profile value.

例えば、図８に示すように、ｍｉｎ＿ｆｒｅｑ〜ｍａｘ＿ｆｒｅｑの範囲内で２９６５Ｈｚをピークとしてスペクトルが立っている補正前ノイズプロファイルを考える。このときのｍｉｎ＿ｆｒｅｑ〜ｍａｘ＿ｆｒｅｑの範囲に存在する全てのスペクトルをスペクトル群Ａとし、このスペクトル群Ａのピーク値をβ（ｄＢ）とする。この場合の補正パラメータαは、上述したα算出式により、
α＝200-(2965-(3100+2900)/2)²×2/200=187.75
と算出される。 For example, as shown in FIG. 8, consider a pre-correction noise profile in which a spectrum stands with a peak at 2965 Hz within a range of min_freq to max_freq. At this time, all spectra existing in the range of min_freq to max_freq are defined as a spectrum group A, and the peak value of the spectrum group A is defined as β (dB). In this case, the correction parameter α is determined by the above-described α calculation formula.
α = 200- (2965- (3100 + 2900) / 2) ² × 2/200 = 187.75
Is calculated.

スペクトル群Ａをピーク周波数である２９６５Ｈｚを中心として幅α＝１８７．７５（Ｈｚ）の範囲（２８７１．１２５（Ｈｚ）〜３０５８．８７５（Ｈｚ））で周波数軸方向に移動させる。スペクトル群Ａの移動時のピークをとったものが、補正ノイズプロファイルとなる。このようにして得られた補正ノイズプロファイルの周波数特性を図１３に示す。移動したスペクトル群Ａの一部は、ｍｉｎ＿ｆｒｅｑ〜ｍａｘ＿ｆｒｅｑの範囲外に出る。このとき、ｍｉｎ＿ｆｒｅｑ〜ｍａｘ＿ｆｒｅｑの範囲外では、スペクトル群Ａが移動したときのピークを取った値と補正前ノイズプロファイルの値とを比べ、大きい値の方を補正ノイズプロファイル値とする。 The spectrum group A is moved in the frequency axis direction within a range of width α = 187.75 (Hz) (2871.125 (Hz) to 3058.875 (Hz)) centering on 2965 Hz which is a peak frequency. A peak obtained when the spectrum group A moves is a corrected noise profile. FIG. 13 shows the frequency characteristics of the correction noise profile obtained in this way. A part of the moved spectrum group A goes out of the range of min_freq to max_freq. At this time, outside the range of min_freq to max_freq, the value obtained by taking the peak when the spectrum group A moves is compared with the value of the noise profile before correction, and the larger value is set as the correction noise profile value.

このようにして求められた補正ノイズプロファイルを用いてＳＳ方式による騒音低減を行うことで、騒音低減において、例えば温度及び湿度の変化並びに経年変化等による騒音成分の消し残りを大幅に削減できる。 By performing noise reduction by the SS method using the correction noise profile obtained in this way, in the noise reduction, for example, it is possible to greatly reduce the remaining noise components due to changes in temperature and humidity, aging, and the like.

図１４は、撮像装置１００のノイズプロファイルを補正するノイズプロファイル補正装置の概略構成ブロック図を示す。図１４に示すノイズプロファイル補正装置１４００は、演算装置１４１０、演算装置１４１０に接続する音響機器１４１７、映像表示機器１４１８及び操作機器１４１９から構成される。 FIG. 14 is a block diagram illustrating a schematic configuration of a noise profile correction apparatus that corrects the noise profile of the imaging apparatus 100. A noise profile correction apparatus 1400 shown in FIG. 14 includes an arithmetic device 1410, an audio device 1417 connected to the arithmetic device 1410, a video display device 1418, and an operation device 1419.

演算装置１４１０は、例えば、パーソナルコンピュータまたはノートパソコン等からなる。中央演算制御部１４１１は、例えば、パーソナルコンピュータ内部のＣＰＵであって、メモリ１４１２上に展開された所定のプログラムを実行し、操作機器１４１９からの指示信号に従い演算装置１４１０の各ブロックを制御する。メモリ１４１２は、演算装置１４１０で用いるデータ及びプログラム等を一時的に記憶する。 The computing device 1410 is composed of, for example, a personal computer or a notebook computer. The central arithmetic control unit 1411 is, for example, a CPU inside the personal computer, executes a predetermined program developed on the memory 1412, and controls each block of the arithmetic device 1410 in accordance with an instruction signal from the operation device 1419. The memory 1412 temporarily stores data, programs, and the like used by the arithmetic device 1410.

通信部１４１３は、無線又は有線で外部装置と通信し、音声データ及び画像データなどのデータを外部装置との間で送受信する。通信部１４１３はまた、外部装置との間で、演算装置１４１０への指示信号及びその他の情報を送受信する。通信部１４１３は例えば、赤外線通信モジュール、Ｂｌｕｅｔｏｏｔｈ（登録商標）通信モジュール、無線ＬＡＮ通信モジュール、有線ＬＡＮ通信モジュール、ＵＳＢ通信モジュール及びＴｈｕｎｄｅｒｂｏｌｔ（登録商標）通信モジュール等の１以上からなる。通信部１１５は、例えば、デジタルカメラ等の外部デジタル機器や、マウス、キーボード等のデバイスと接続するためのインターフェースである。 A communication unit 1413 communicates with an external device wirelessly or by wire and transmits / receives data such as audio data and image data to / from the external device. The communication unit 1413 also transmits / receives an instruction signal to the arithmetic device 1410 and other information to / from an external device. The communication unit 1413 includes, for example, one or more of an infrared communication module, a Bluetooth (registered trademark) communication module, a wireless LAN communication module, a wired LAN communication module, a USB communication module, and a Thunderbolt (registered trademark) communication module. The communication unit 115 is an interface for connecting to an external digital device such as a digital camera or a device such as a mouse or a keyboard.

記録媒体１４１４は、種々のデータを記録できる汎用的な記録媒体、例えば、磁気ディスク、光学式ディスクまたは半導体メモリなどからなり、異種または同種の複数の媒体の組み合わせであっても良い。 The recording medium 1414 includes a general-purpose recording medium capable of recording various data, such as a magnetic disk, an optical disk, or a semiconductor memory, and may be a combination of a plurality of different or similar media.

音声出力部１４１５は例えば音声出力端子からなり、イヤホンまたはスピーカなどの音声出力装置に音声信号を出力する。音声出力部１４１５は、演算装置１４１０に内蔵されるスピーカであっても良い。映像出力部１４１６は、例えば映像出力端子からなり、外部ディスプレイ等に画像信号を出力する。音声出力部１４１５と映像出力部１４１６は、統合された１つの端子、例えばＨＤＭＩ（Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）（登録商標）端子として構成されていてもよい。 The audio output unit 1415 includes, for example, an audio output terminal, and outputs an audio signal to an audio output device such as an earphone or a speaker. The audio output unit 1415 may be a speaker built in the arithmetic device 1410. The video output unit 1416 includes a video output terminal, for example, and outputs an image signal to an external display or the like. The audio output unit 1415 and the video output unit 1416 may be configured as a single integrated terminal, for example, a High-Definition Multimedia Interface (HDMI) (registered trademark) terminal.

音響機器１４１７は、演算装置１４１０と有線又は無線で接続された音響デバイスであって、例えば、イヤホンまたはスピーカ等である。また、音響機器１４１７は、ＵＳＢオーディオデバイスと、それに付随するヘッドフォンアンプ及びヘッドフォン等の複数のオーディオ機器等でもよい。 The acoustic device 1417 is an acoustic device connected to the arithmetic device 1410 by wire or wireless, and is, for example, an earphone or a speaker. The audio equipment 1417 may be a USB audio device and a plurality of audio equipment such as a headphone amplifier and headphones attached thereto.

映像表示機器１４１８は、演算装置１４１０と有線又は無線で接続された映像表示機器であって、例えば、液晶ディスプレイ、有機ＥＬディスプレイまたは電子ペーパ等の表示デバイスであれば何でも良い。 The video display device 1418 is a video display device that is wired or wirelessly connected to the arithmetic device 1410 and may be any display device such as a liquid crystal display, an organic EL display, or electronic paper.

操作機器１４１９は、演算装置１４１０と有線又は無線で接続された操作機器であって、例えば、マウス、キーボードまたはタッチパネル等の操作デバイスであれば何でもよい。また、操作機器１４１９は、複数の機器であってもよい。 The operation device 1419 is an operation device connected to the arithmetic device 1410 by wire or wireless, and may be any operation device such as a mouse, a keyboard, or a touch panel. Further, the operation device 1419 may be a plurality of devices.

ノイズプロファイル補正装置１４００による補正ノイズプロファイルの作成動作を説明する。図１５に示すようにノイズプロファイル補正装置１４００の通信部１４１３と撮像装置１００の通信部１１５を接続することで、ノイズプロファイル補正装置１４００を撮像装置１００に接続する。ノイズプロファイル補正装置１４００は、先に説明した補助データをメモリ１４１２に記憶している。 The operation of creating a corrected noise profile by the noise profile correction apparatus 1400 will be described. As illustrated in FIG. 15, the noise profile correction device 1400 is connected to the imaging device 100 by connecting the communication unit 1413 of the noise profile correction device 1400 and the communication unit 115 of the imaging device 100. The noise profile correction apparatus 1400 stores the auxiliary data described above in the memory 1412.

撮像装置１００を先に説明したように無響防音箱内に設置し、無響防音箱を密閉した状態で、補正前ノイズプロファイルを作成しメモリ１０５に格納する。演算装置１４１０は、メモリ１０５に記憶された補正前ノイズプロファイルを通信部１１５及び通信部１４１３を介してメモリ１４１２に転送する。 As described above, the imaging apparatus 100 is installed in an anechoic soundproof box, and a pre-correction noise profile is created and stored in the memory 105 with the anechoic soundproof box sealed. The arithmetic device 1410 transfers the pre-correction noise profile stored in the memory 105 to the memory 1412 via the communication unit 115 and the communication unit 1413.

中央演算制御部１４１１は、メモリ１４１２から補助データと補正前ノイズプロファイルを読み出し、先に説明したのと同様の図１６に示す演算手順で補正ノイズプロファイルを作成する。中央演算制御部１４１１は、得られた補正ノイズプロファイルをいったんメモリ２０２に記憶し、通信部１４１３及び通信部１１５を介して撮像装置１００のメモリ１０５に転送する。 The central processing control unit 1411 reads auxiliary data and a pre-correction noise profile from the memory 1412, and creates a correction noise profile by the same calculation procedure shown in FIG. 16 as described above. The central processing control unit 1411 temporarily stores the obtained corrected noise profile in the memory 202 and transfers it to the memory 105 of the imaging apparatus 100 via the communication unit 1413 and the communication unit 115.

このようにして、ノイズプロファイル補正装置１４００は、撮像装置１００のメモリ１０５に記憶されている補正前ノイズプロファイルを補助データで補正し、撮像装置１００のメモリ１０５に書き戻すことができる。撮像装置１００の音声処理部１０４は、ノイズプロファイル補正装置１４００から送信された、補正ノイズプロファイルを用いてＳＳ方式による騒音低減を行う。これにより、例えば温度、湿度の変化や経年変化等による騒音成分の消し残りを、大幅に削減した騒音低減が可能となる。 In this way, the noise profile correction apparatus 1400 can correct the pre-correction noise profile stored in the memory 105 of the imaging apparatus 100 with the auxiliary data and write it back to the memory 105 of the imaging apparatus 100. The audio processing unit 104 of the imaging apparatus 100 performs noise reduction by the SS method using the corrected noise profile transmitted from the noise profile correcting apparatus 1400. This makes it possible to reduce noise by greatly reducing unremoved noise components due to, for example, changes in temperature, humidity, and aging.

（他の実施例）
本発明に係る音声処理装置は、実施例１で説明した撮像装置１００に限定されるものではない。例えば、本発明に係る音声処理装置は、複数の装置から構成されるシステムにより実現することも可能である。 (Other examples)
The sound processing apparatus according to the present invention is not limited to the imaging apparatus 100 described in the first embodiment. For example, the audio processing apparatus according to the present invention can be realized by a system including a plurality of apparatuses.

また、実施例１で説明した様々な処理及び機能は、コンピュータプログラムにより実現することも可能である。この場合、本発明に係るコンピュータプログラムは、コンピュータ（ＣＰＵ等を含む）で実行可能であり、本実施例で説明した様々な機能を実現することになる。 The various processes and functions described in the first embodiment can also be realized by a computer program. In this case, the computer program according to the present invention can be executed by a computer (including a CPU and the like), and implements various functions described in this embodiment.

本発明に係るコンピュータプログラムは、コンピュータ上で稼動しているＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）などを利用して、本実施例で説明した様々な処理及び機能を実現してもよいことは言うまでもない。 It goes without saying that the computer program according to the present invention may realize various processes and functions described in this embodiment by using an OS (Operating System) running on the computer.

本発明に係るコンピュータプログラムは、コンピュータ読取可能な記録媒体から読み出され、コンピュータで実行されることになる。コンピュータ読取可能な記録媒体には、ハードディスク装置、光ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、メモリカード、ＲＯＭ等を用いることができる。また、本発明に係るコンピュータプログラムは、通信インターフェースを介して外部装置からコンピュータに提供され、当該コンピュータで実行されるようにしてもよい。 The computer program according to the present invention is read from a computer-readable recording medium and executed by the computer. As the computer-readable recording medium, a hard disk device, an optical disk, a CD-ROM, a CD-R, a memory card, a ROM, or the like can be used. The computer program according to the present invention may be provided from an external device to a computer via a communication interface and executed by the computer.

Claims

Voice input means;
Driving means;
A memory for storing a noise profile created from a noise spectrum obtained by converting an audio signal of noise related to the driving means into a frequency domain;
Correction means for correcting the noise profile using auxiliary data indicating fluctuation due to noise disturbance related to the driving means;
An audio processing apparatus comprising: an audio processing unit that reduces noise related to the driving unit from an audio signal input by the audio input unit using the noise profile corrected by the correction unit.

The sound processing device is an imaging device;
The audio processing apparatus according to claim 1, wherein the driving unit is a unit that drives a zoom.

The speech processing apparatus according to claim 1, wherein the auxiliary data includes data indicating a fluctuation range and a fluctuation range of the noise in a frequency axis direction.

The voice processing apparatus according to claim 1, further comprising a control unit that activates the voice processing unit while the driving unit generates the noise.