JP2018066963A

JP2018066963A - Sound processing device

Info

Publication number: JP2018066963A
Application number: JP2016207316A
Authority: JP
Inventors: 和広並木; Kazuhiro Namiki
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2018-04-26

Abstract

PROBLEM TO BE SOLVED: To reduce each noise in the case where wind noise and drive noise have been superimposed on sound.SOLUTION: A sound processing device includes conversion means that converts an input sound signal into a sound signal spectrum, storage means that stores a profile concerning the amplitude per frequency of drive noise, calculation means that calculates a subtraction coefficient on the basis of an output from the conversion means, correction amount determination means that determines a correction amount in accordance with the magnitude of wind noise included in the input sound signal, correction means that corrects a subtraction coefficient with the correction amount, determination means that determines a reduction value of noise per frequency from the corrected subtraction coefficient and the profile, noise reduction means that performs drive noise reduction processing by subtracting the reduction value from the sound signal spectrum output from the conversion means, inverse conversion means that converts the sound signal spectrum from the noise reduction means into a sound signal of a time domain, and reduction means that reduces wind noise from the sound signal output from the inverse conversion means.SELECTED DRAWING: Figure 5

Description

本発明は音声処理装置に関する。 The present invention relates to a voice processing apparatus.

従来、デジタルカメラなどの撮像装置では、撮影された動画と共に音声を記録している。デジタルカメラは屋外で使用されることが多い。そのため、マイクに対して風が当たり、この風による雑音が音声に含まれてしまう。また、デジタルカメラはビデオカメラに比べて本体のサイズが小さく、レンズなどの光学系とマイクの位置が近い。そのため、絞りやレンズを駆動させるモータによる駆動音がノイズとして音声に含まれてしまう。 Conventionally, in an imaging apparatus such as a digital camera, audio is recorded together with a captured moving image. Digital cameras are often used outdoors. For this reason, wind hits the microphone, and noise caused by the wind is included in the voice. Digital cameras are smaller in size than video cameras, and the optical system such as a lens is close to the microphone. For this reason, driving sound generated by the motor that drives the aperture and the lens is included in the sound as noise.

従来、このように、マイクからの音声に含まれる風によるノイズ（風雑音）や光学系の駆動によるノイズ（駆動雑音）を低減させる技術が提案されている。ノイズ低減の方法として、スペクトルサブトラクション（ＳｐｅｃｔｒａｌＳｕｔｒａｃｔｉｏｎ）法（以下、ＳＳ法）が知られている（特許文献１）。 Conventionally, techniques for reducing noise caused by wind (wind noise) and noise (driving noise) caused by driving of an optical system have been proposed. As a noise reduction method, a spectral subtraction method (hereinafter referred to as SS method) is known (Patent Document 1).

特開２０００−３３０５９７号公報JP 2000-330597 A

しかしながら、駆動雑音の帯域と風雑音帯域が重複する様な音声が入力された場合に、ＳＳ法で駆動雑音や風雑音の低減処理を行ったとすると、次のような問題がある。 However, assuming that the drive noise and wind noise reduction processing is performed by the SS method when a sound is input such that the drive noise band and the wind noise band overlap, there are the following problems.

図１２は、風切り音（風雑音）と駆動音が重畳した時において、Ｌｃｈ、Ｒｃｈの音声信号の位相ベクトルを表した図である。ここでは、例えば駆動音をズームレンズの駆動により発生する雑音（ズーム音）とする。図１２（ａ）では風切り音Ｒ１２０１と風切り音Ｌ１２０２は逆位相に入力され、ズーム音Ｒ１２０３とズーム音Ｌ１２０４は同位相で入力されたと仮定する。 FIG. 12 is a diagram illustrating the phase vectors of the Lch and Rch audio signals when the wind noise (wind noise) and the driving sound are superimposed. Here, for example, the driving sound is assumed to be noise (zoom sound) generated by driving the zoom lens. In FIG. 12A, it is assumed that the wind noise R1201 and the wind noise L1202 are input in opposite phases, and the zoom sound R1203 and the zoom sound L1204 are input in the same phase.

その時、風雑音の低減処理と駆動雑音の低減処理は、図１２（ｂ）のように行われる。図１２（ｂ）において、駆動雑音の低減処理１２０７、１２０８はズーム音＋風切り音１２０５、１２０６からスカラー量分を差し引く。また、風雑音の低減処理１２０９、１２１０は位相差分のスカラー量を差し引く。その為、ズーム音に同位相の雑音１２１１が残り、抑圧できないといった課題がある。 At that time, the wind noise reduction process and the drive noise reduction process are performed as shown in FIG. In FIG. 12B, the drive noise reduction processing 1207 and 1208 subtract the scalar amount from the zoom sound + wind noise 1205 and 1206. Further, the wind noise reduction processes 1209 and 1210 subtract the scalar amount of the phase difference. Therefore, there is a problem that noise 1211 having the same phase remains in the zoom sound and cannot be suppressed.

本発明は、風雑音と駆動雑音が音声に重畳している場合に、それぞれの雑音を低減することが可能な装置を提供することを目的とする。 An object of this invention is to provide the apparatus which can reduce each noise, when a wind noise and a drive noise are superimposing on the audio | voice.

入力手段と、駆動手段と、前記入力手段により得られた時間領域の音声信号を一定の時間毎に分割し、周波数領域の音声信号スペクトルへ変換する変換手段と、前記駆動手段による雑音である駆動雑音の周波数毎の振幅に関するプロファイルを記憶する記憶手段と、前記変換手段からの出力信号に基づいて減算係数を算出する算出手段と、前記入力手段より得られた音声信号に含まれる風雑音の大きさに応じて、前記減算係数の補正量を決定する補正量決定手段と、前記算出手段で求めた減算係数を前記補正量決定手段により得られた補正量で補正する補正手段と、前記補正手段により補正された減算係数と、前記記憶手段に記憶されている前記プロファイルとから、周波数毎の雑音の低減値を決定する決定手段と、前記変換手段から出力された前記周波数領域の音声信号のスペクトルから、前記決定手段により得られる前記低減値を減算することにより、前記駆動雑音の低減処理を行う雑音低減手段と、前記雑音低減手段からの前記周波数領域の音声信号スペクトルを時間領域の音声信号に変換する逆変換手段と、前記逆変換手段から出力された音声信号から風雑音を低減する風雑音の低減手段とを有する。 Input means, driving means, conversion means for dividing the time-domain audio signal obtained by the input means at regular intervals and converting it into a frequency-domain audio signal spectrum, and driving by noise generated by the driving means Storage means for storing a profile relating to amplitude for each frequency of noise, calculation means for calculating a subtraction coefficient based on an output signal from the conversion means, and magnitude of wind noise included in the audio signal obtained from the input means Accordingly, a correction amount determining means for determining a correction amount of the subtraction coefficient, a correction means for correcting the subtraction coefficient obtained by the calculation means with a correction amount obtained by the correction amount determination means, and the correction means And a determination means for determining a noise reduction value for each frequency from the subtraction coefficient corrected in accordance with the profile stored in the storage means and an output from the conversion means. Noise reduction means for reducing the drive noise by subtracting the reduction value obtained by the determination means from the spectrum of the audio signal in the frequency domain; and the audio in the frequency domain from the noise reduction means Inverse conversion means for converting a signal spectrum into an audio signal in the time domain, and wind noise reduction means for reducing wind noise from the audio signal output from the inverse conversion means.

本発明によれば、風雑音と駆動雑音が音声に重畳している場合でも、それぞれの雑音を低減することが可能となる。 According to the present invention, even when wind noise and driving noise are superimposed on speech, each noise can be reduced.

音声処理装置を含む撮像装置のブロック図である。It is a block diagram of an imaging device including a voice processing device. 音声処理を示すフローチャートである。It is a flowchart which shows an audio | voice process. 風雑音の低減回路を示す図である。It is a figure which shows the reduction circuit of a wind noise. 駆動雑音の低減回路を示す図である。It is a figure which shows the drive noise reduction circuit. 駆動雑音の低減回路を示す図である。It is a figure which shows the drive noise reduction circuit. 補正量決定部の処理を表すフローチャートである。It is a flowchart showing the process of the correction amount determination part. 風雑音の検出量及び減算係数γを示す図である。It is a figure which shows the detection amount and subtraction coefficient (gamma) of a wind noise. 補正量決定部の処理を示すフローチャートである。It is a flowchart which shows the process of the correction amount determination part. 減算係数を示す図である。It is a figure which shows a subtraction coefficient. 駆動雑音の低減回路を示す図である。It is a figure which shows the drive noise reduction circuit. ノイズプロファイルの生成処理を示す図である。It is a figure which shows the production | generation process of a noise profile. 駆動雑音低減処理と風雑音低減処理を説明する図である。It is a figure explaining a drive noise reduction process and a wind noise reduction process.

（実施例１）
以下、本発明の実施形態を説明する。 Example 1
Embodiments of the present invention will be described below.

図１は本発明の音声処理装置が適用される撮像装置の構成例を示すブロック図である。図１における撮像装置１００は、主に、音声処理部１０７で雑音の低減処理を行う。以下に説明する多くの構成要素はメモリバス１１７に接続され、メモリ１１６に対してデータのやり取りを行なっている。 FIG. 1 is a block diagram showing a configuration example of an imaging apparatus to which a sound processing apparatus of the present invention is applied. The imaging apparatus 100 in FIG. 1 mainly performs noise reduction processing by the audio processing unit 107. Many of the components described below are connected to the memory bus 117 and exchange data with the memory 116.

メモリ１１６は高速でランダムアクセス可能なダイナミックＲＡＭである。メモリ内は音声データ領域、画像データ領域、制御信号領域がある。また、音声データ領域に記憶された音声データ、及び画像データ領域に記憶された画像データ、タイミング信号領域のタイミング信号はフレーム毎にどの時刻のデータであるかを識別するように管理されている。 The memory 116 is a dynamic RAM that can be randomly accessed at high speed. The memory includes an audio data area, an image data area, and a control signal area. In addition, the audio data stored in the audio data area, the image data stored in the image data area, and the timing signal in the timing signal area are managed so as to identify the time of each frame.

図１において、レンズ１０１は、モータ駆動部１０３で制御されたモータ１０２によりズーム動作を行う。撮像部１０４はレンズ１０１を介して結像された被写体の光学像をＣＣＤセンサやＣＭＯＳセンサ等の撮像素子により光電変換してアナログ画像信号を生成する。そして、生成されたアナログ画像信号をデジタル信号に変換して、画像処理部１０５に送信する。画像処理部１０５は、入力されたデジタル画像信号に、設定値に応じたホワイトバランスや色、明るさ等を調整する画質調整処理を施し、メモリ１１６に送信し、メモリ１１６内の画像データ領域に記憶する。 In FIG. 1, a lens 101 performs a zoom operation by a motor 102 controlled by a motor driving unit 103. The imaging unit 104 photoelectrically converts an optical image of a subject formed through the lens 101 by an imaging element such as a CCD sensor or a CMOS sensor to generate an analog image signal. Then, the generated analog image signal is converted into a digital signal and transmitted to the image processing unit 105. The image processing unit 105 performs an image quality adjustment process for adjusting the white balance, color, brightness, and the like according to the set value on the input digital image signal, and transmits the processed image to the memory 116. Remember.

音声入力部１０６は内蔵されたマイクロホンまたは音声入力端子を介して接続された外部マイク等である。音声処理部１０７は、音声入力部１０６で装置周囲の音声を収音することにより得られたアナログ音声信号をデジタル信号に変換し、レベルの適正処理、特定周波数の低減処理等の音声に関する処理を行う。音声処理部１０７は、後述のように、駆動雑音の低減処理や風雑音の低減処理を行う。そして、音声処理部１０７は、デジタル音声信号をメモリ１１６に送信し、メモリ１１６内の音声データ領域に記憶する。また、音声処理部１０７は、通信部１０９からメモリ１１６の音声データ領域に記憶された音声データを読み出し、同様な処理を施し、再度メモリ１１６の音声データ領域に記憶する。 The voice input unit 106 is a built-in microphone or an external microphone connected via a voice input terminal. The sound processing unit 107 converts an analog sound signal obtained by collecting sound around the apparatus by the sound input unit 106 into a digital signal, and performs sound-related processing such as appropriate level processing and specific frequency reduction processing. Do. The audio processing unit 107 performs drive noise reduction processing and wind noise reduction processing as will be described later. Then, the audio processing unit 107 transmits the digital audio signal to the memory 116 and stores it in the audio data area in the memory 116. Also, the audio processing unit 107 reads out the audio data stored in the audio data area of the memory 116 from the communication unit 109, performs the same processing, and stores it again in the audio data area of the memory 116.

表示部１０８は、例えば、液晶ディスプレイ、有機ＥＬディスプレイ、電子ペーパー等の表示デバイスで有れば何であっても良い。例えば、表示用の画像データは画像処理部１０５からメモリ１１６の画像データ領域に一時的に記憶される。表示部１０８は画像データ領域から画像データを読み出してディスプレイに表示させる。 The display unit 108 may be anything as long as it is a display device such as a liquid crystal display, an organic EL display, and electronic paper. For example, display image data is temporarily stored in the image data area of the memory 116 from the image processing unit 105. The display unit 108 reads out the image data from the image data area and displays it on the display.

通信部１０９はワイヤレスマイクから送信されたデジタル音声信号を受信し、一時的にメモリ１１６の制御信号領域に記憶する。通信部１０９は、外部装置との間で通信を行うもので、例えば、音声信号、画像信号、撮影開始、終了コマンド等の撮影動作の為の制御信号等を送受信する。通信部１０９は、例えば、赤外線通信モジュールやＢｌｕｅｔｏｏｔｈ（登録商標）通信モジュール、無線ＬＡＮモジュール等の無線通信モジュールである。 The communication unit 109 receives the digital audio signal transmitted from the wireless microphone and temporarily stores it in the control signal area of the memory 116. The communication unit 109 communicates with an external device, and transmits and receives control signals for shooting operations such as audio signals, image signals, shooting start and end commands, and the like. The communication unit 109 is, for example, a wireless communication module such as an infrared communication module, a Bluetooth (registered trademark) communication module, or a wireless LAN module.

記録再生部１１０は記録媒体１１１に記録された画像データを読み出し（再生し）、表示部１０８や画像出力部１１４に送信する。記録媒体１１１は、メモリカードやＨＤＤなどのランダムアクセスの記録媒体である。 The recording / reproducing unit 110 reads (reproduces) the image data recorded on the recording medium 111 and transmits it to the display unit 108 and the image output unit 114. The recording medium 111 is a random access recording medium such as a memory card or HDD.

画像出力部１１４は、例えば画像出力端子からなり、撮像装置１００に接続された外部ディスプレイ等に映像を表示させるために画像信号を送信する。音声出力部１１５は、例えば、音声出力端子からなり、メモリ１１６の音声データ領域に記憶されている音声データを読みだして、撮像装置１００に接続されたイヤホンやスピーカ等から音声を出力する為に音声信号を送信する。また、音声出力部１１５は、撮像装置１００に内蔵され、音声信号に応じた音声を出力するスピーカであっても良い。また、画像出力部１１４及び音声出力部１１５は、統合された１つの端子、例えば、ＨＤＭＩ（登録商標）（ＨｉｇｈＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）のような端子であっても良い。 The image output unit 114 includes, for example, an image output terminal, and transmits an image signal to display an image on an external display or the like connected to the imaging device 100. The audio output unit 115 includes an audio output terminal, for example, for reading audio data stored in the audio data area of the memory 116 and outputting audio from an earphone or a speaker connected to the imaging apparatus 100. Send an audio signal. The audio output unit 115 may be a speaker that is built in the imaging apparatus 100 and outputs audio corresponding to an audio signal. In addition, the image output unit 114 and the audio output unit 115 may be a single integrated terminal, for example, a terminal such as HDMI (registered trademark) (High Definition Multimedia Interface).

操作部１１３は、例えば、ボタンやダイヤル等であり、ユーザの操作に応じた指示信号を制御部１１２に送信する。制御部１１２は、操作部１１３から送信された指示信号に基づいて、撮像装置１００の各ブロックに制御信号を送信することで、各ブロックを制御する。操作部１１３は、例えば、電源ボタン、記録開始ボタン、メニュー表示ボタン、決定ボタン、カーソールキー、タッチパネル等である。また、制御部１１２は、各種処理を実行する為の、例えばＣＰＵ、ＤＲＡＭ、ＳＲＡＭ等からなる。メモリバス１１７は、各種データや、制御信号を撮像装置１００の各ブロックに送信する為にものである。 The operation unit 113 is, for example, a button or a dial, and transmits an instruction signal corresponding to a user operation to the control unit 112. The control unit 112 controls each block by transmitting a control signal to each block of the imaging apparatus 100 based on the instruction signal transmitted from the operation unit 113. The operation unit 113 is, for example, a power button, a recording start button, a menu display button, a determination button, a cursor key, a touch panel, and the like. The control unit 112 includes, for example, a CPU, DRAM, SRAM, and the like for executing various processes. The memory bus 117 is used for transmitting various data and control signals to each block of the imaging apparatus 100.

次に、音声処理部１０７の内部構成を説明する。図２は、従来の音声処理部１０７の音声処理のフローを表すフローチャートである。Ｓ２０１では、ＳＳ法で駆動雑音の低減処理を行う。ＳＳ法による駆動雑音の低減処理の詳細は後述する。Ｓ２０２では、Ｓ２０１でカメラ駆動雑音処理をした音声出力を用いて、風雑音低減処理を行う。風雑音低減処理の詳細は後述する。Ｓ２０３では、Ｓ２０２までに一通りの雑音低減処理を行った音声信号に対し、一定の音量になる様に調整する為の、オートレベルコントロール処理（ＡＬＣ）を行う。例えば、マイクロホンで収音された音声が小さい時には、振幅を増幅し、適正な音量の場合には振幅の増幅率を元に戻す。また、一般的には、増幅率には上限が設けられている。 Next, the internal configuration of the audio processing unit 107 will be described. FIG. 2 is a flowchart showing the flow of audio processing of the conventional audio processing unit 107. In S201, drive noise reduction processing is performed by the SS method. Details of the drive noise reduction processing by the SS method will be described later. In S202, a wind noise reduction process is performed using the audio output subjected to the camera drive noise process in S201. Details of the wind noise reduction processing will be described later. In S203, an auto level control process (ALC) is performed for the audio signal that has undergone a series of noise reduction processes up to S202 in order to adjust the audio signal to a constant volume. For example, when the sound collected by the microphone is small, the amplitude is amplified, and when the sound volume is appropriate, the amplification factor of the amplitude is restored. In general, an upper limit is set for the amplification factor.

以下に風雑音低減処理とＳＳ法によるカメラ駆動雑音の低減処理の方法を説明する。 Hereinafter, a wind noise reduction process and a camera drive noise reduction process using the SS method will be described.

＜風雑音の低減方法＞
マイクロホンなどを使って屋外の音声を収音する際には風がマイクロホンにあたり、風雑音が記録されてしまうことがある。そこで、本実施形態では、左右方向からの音声を独立して収音するマイクロホン（ステレオマイク）で取得された左右それぞれの音声信号に対して、風雑音の低減処理を行う。無指向で近接する２つマイクロホンで収音された音声は、一般的に以下の様な特徴を有する。
（１）風がない環境で収音した場合、左右のマイクに同位相の音声が収音される。特に低域周波数は同位相となる。
（２）風がある環境で収音した場合、風雑音が低域周波数（約５０Ｈｚ〜数ｋＨｚ）程度に集中し、左右のマイクの収音では音声信号に位相差が発生する。 <Wind noise reduction method>
When picking up outdoor sound using a microphone or the like, wind may hit the microphone and wind noise may be recorded. Therefore, in the present embodiment, wind noise reduction processing is performed on each of the left and right audio signals acquired by a microphone (stereo microphone) that independently collects sound from the left and right directions. The sound picked up by two microphones that are omnidirectional and close to each other generally has the following characteristics.
(1) When sound is picked up in an environment where there is no wind, in-phase sound is picked up by the left and right microphones. In particular, the low frequency is in phase.
(2) When sound is collected in an environment with wind, the wind noise is concentrated at a low frequency (about 50 Hz to several kHz), and a phase difference occurs in the sound signal when the left and right microphones collect sound.

以上の２点の特徴を考慮し、風雑音の低減処理が行われる。図３（ａ）は、音声処理部１０７における風雑音の低減処理を行う回路ブロック３００を示す図である。また、図３（ｂ）はハイパスフィルタ３０３の振幅特性の例を表すグラフである。なお、図３（ａ）の風雑音処理回路３００は、後述のように、ＳＳ方による雑音の低減処理が行われた後のＬｃｈ、Ｒｃｈの音声信号が入力され、風雑音の低減処理を行う。 The wind noise reduction processing is performed in consideration of the above two features. FIG. 3A is a diagram illustrating a circuit block 300 that performs wind noise reduction processing in the audio processing unit 107. FIG. 3B is a graph showing an example of the amplitude characteristic of the high pass filter 303. As will be described later, the wind noise processing circuit 300 in FIG. 3A receives the Lch and Rch audio signals after the noise reduction processing by the SS method is performed, and performs the wind noise reduction processing. .

図３（ａ）において、風雑音の低減回路３００に入力されたステレオ音声信号の左チャンネル（Ｌｃｈ）と右チャンネル（Ｒｃｈ）は、加算回路３０１で加算、減算回路３０２で減算される。ハイパスフィルタ回路３０３では、例えば図３（ｂ）のような風雑音に多く含まれる周波数帯域である５０ｋＨｚ以下の成分を減衰させる。ハイパスフィルタ３０３は、減算回路３０２で求めたステレオ音声信号の差分値の低域周波数を除去して出力する。ここでは、例えば１ｋＨｚで３ｄＢ減衰するフィルタとする。 In FIG. 3A, the left channel (Lch) and the right channel (Rch) of the stereo audio signal input to the wind noise reduction circuit 300 are added by the addition circuit 301 and subtracted by the subtraction circuit 302. In the high-pass filter circuit 303, for example, a component of 50 kHz or less, which is a frequency band included in a lot of wind noise as shown in FIG. 3B, is attenuated. The high pass filter 303 removes the low frequency of the difference value of the stereo audio signal obtained by the subtraction circuit 302 and outputs the result. Here, for example, a filter that attenuates 3 dB at 1 kHz is used.

ハイパスフィルタ３０３の出力と加算回路３０１からの出力を加算回路３０４で加算し、コントローラ３０７で加算回路３０４の出力の音量を半分に調整してＬｃｈ音声出力部に出力する。また、減算回路３０５により、加算回路３０１からの出力からハイパスフィルタ３０３の出力を減算し、コントローラ３０７で減算回路３０５の出力の音量を半分に調整してＲｃｈ音声出力部に出力する。このように、風雑音低減回路３００により、入力音声に含まれる風雑音が低減される。 The output from the high-pass filter 303 and the output from the adder circuit 301 are added by the adder circuit 304, and the controller 307 adjusts the volume of the output from the adder circuit 304 in half and outputs it to the Lch audio output unit. Further, the subtracter circuit 305 subtracts the output of the high-pass filter 303 from the output of the adder circuit 301, and the controller 307 adjusts the volume of the output of the subtractor circuit 305 by half and outputs it to the Rch audio output unit. Thus, the wind noise included in the input voice is reduced by the wind noise reduction circuit 300.

＜ＳＳ法による駆動雑音の低減処理方法＞
ＳＳ法を用いた駆動雑音の低減処理の構成について、図４（ａ）を用いて説明する。図４（ａ）において、ノイズプロファイル４０１、４０２は、低減対象となる雑音の周波数成分をノイズプロファイルとして記憶する。具体的には、低減対象となる雑音のみから成る音声信号をフーリエ変換し、周波数成分を得る。このとき、低減対象となる雑音がある程度の時間（例えば４秒間）続く場合は、雑音が続く時間内における周波数成分の時間変化に対して、ピークホールドしたものがノイズプロファイルとなる。また、ノイズプロファイル４０１、４０２に記憶されるノイズプロファイルは、ある程度復元可能な形であれば圧縮されていてもよい。記憶されているノイズプロファイルの、圧縮、非圧縮は限定されない。また、ノイズプロファイル４０１、４０２は、記憶しているノイズプロファイルを、周波数成分比較部４０５、４０６と加減算器４１１、４１２へ送信する。 <Drive noise reduction processing method by SS method>
The configuration of the drive noise reduction process using the SS method will be described with reference to FIG. In FIG. 4A, noise profiles 401 and 402 store the frequency components of noise to be reduced as noise profiles. Specifically, an audio signal consisting only of noise to be reduced is Fourier transformed to obtain a frequency component. At this time, when the noise to be reduced continues for a certain period of time (for example, 4 seconds), a noise profile is obtained by peak holding with respect to the temporal change of the frequency component within the time during which the noise continues. The noise profiles stored in the noise profiles 401 and 402 may be compressed as long as they can be restored to some extent. Compression or non-compression of the stored noise profile is not limited. The noise profiles 401 and 402 transmit the stored noise profile to the frequency component comparison units 405 and 406 and the adder / subtractors 411 and 412.

フーリエ変換部４０３、４０４は、入力された音声信号を一定の時間毎（フレーム）毎に分割する。そして、分割した時間領域のデジタル音声信号に対してフーリエ変換を行い、周波数領域の音声信号スペクトルに変換する。その結果、音声信号の周波数毎の位相情報と、周波数毎の振幅の絶対値（周波数成分）、を算出する。また、フーリエ変換部４０３、４０４は、算出した周波数成分を周波数成分比較部４５、４０６と雑音低減部４１３、４１４へ送信する。また、フーリエ変換部４０３、４０４は、算出した周波数毎の位相情報を逆フーリエ変換部４１５、４１６へ送信する。 The Fourier transform units 403 and 404 divide the input audio signal at regular time intervals (frames). Then, Fourier transform is performed on the divided digital audio signal in the time domain to convert it into an audio signal spectrum in the frequency domain. As a result, the phase information for each frequency of the audio signal and the absolute value (frequency component) of the amplitude for each frequency are calculated. The Fourier transform units 403 and 404 transmit the calculated frequency components to the frequency component comparison units 45 and 406 and the noise reduction units 413 and 414. Further, the Fourier transform units 403 and 404 transmit the calculated phase information for each frequency to the inverse Fourier transform units 415 and 416.

周波数成分比較部４０５、４０６は除算器であり、フーリエ変換部４０３、４０４から送信された入力音声の周波数成分を、ノイズプロファイル記憶部４０１、４０２からのノイズプロファイルにより周波数毎に除算する。周波数成分比較部４０５、４０６は算出した周波数毎の演算結果を時間変化制御部４０７、４０８へ送信する。 The frequency component comparison units 405 and 406 are dividers, and divide the frequency components of the input speech transmitted from the Fourier transform units 403 and 404 for each frequency by the noise profiles from the noise profile storage units 401 and 402. The frequency component comparison units 405 and 406 transmit the calculated calculation results for each frequency to the time change control units 407 and 408.

時間変化制御部４０７、４０８は、周波数成分比較部４０５、４０６から送信された周波数毎の除算結果に対して、周波数毎に時間方向へローパスフィルタ（ＬＰＦ）をかけることにより平滑化する。時間変化制御部４０７、４０８は、算出した周波数毎の判定結果を、減算係数算出部４０９、４１０へ送信する。 The time change control units 407 and 408 smooth the division result for each frequency transmitted from the frequency component comparison units 405 and 406 by applying a low-pass filter (LPF) in the time direction for each frequency. The time change controllers 407 and 408 transmit the calculated determination results for each frequency to the subtraction coefficient calculators 409 and 410.

減算係数算出部４０９、４１０は、時間変化制御部４０７、４０８から送信された、周波数毎の演算結果を用いて、周波数毎の減算係数を算出する。時間変化制御部４０７、４０８の出力であるＬＰＦ出力レベル大きくなれば、徐々に減算係数γ［ｎ］が小さくなるようなテーブルである。例えば、図４（ｂ）の様なテーブルである。ＬＰＦ出力レベルが十分に大きい場合は、低減対象とする騒音の周波数成分に対して、十分大きな所望音声の周波数成分が重畳されており、マスキング効果により、人間の聴覚では騒音をほぼ知覚できなくなる。そのため、減算係数γ［ｎ］を小さくすることで、雑音低減部４１３、４１４により入力音声から減算されるプロファイルの大きさを小さくし、所望音声が劣化することを抑えることができる。減算係数算出部４０９、４１０は、算出した周波数毎の減算係数を乗算器４１１、４１２へ送信する。 The subtraction coefficient calculation units 409 and 410 calculate the subtraction coefficient for each frequency using the calculation result for each frequency transmitted from the time change control units 407 and 408. The table is such that the subtraction coefficient γ [n] gradually decreases as the LPF output level, which is the output of the time change controllers 407 and 408, increases. For example, a table as shown in FIG. When the LPF output level is sufficiently high, a sufficiently large frequency component of the desired voice is superimposed on the frequency component of the noise to be reduced, and the human hearing cannot substantially perceive the noise due to the masking effect. Therefore, by reducing the subtraction coefficient γ [n], the size of the profile that is subtracted from the input speech by the noise reduction units 413 and 414 can be reduced, and deterioration of the desired speech can be suppressed. The subtraction coefficient calculation units 409 and 410 transmit the calculated subtraction coefficient for each frequency to the multipliers 411 and 412.

乗算器４１１、４１２は、減算係数算出部４０９、４１０から送信された周波数毎の減算係数と、ノイズプロファイル記憶部４０１、４０２から送信されたノイズプロファイルとを、周波数毎に乗算する。乗算器４１１、４１２の周波数毎の演算結果を、減算係数スペクトルとする。乗算器４１１、４１２は、算出した減算係数スペクトルを雑音低減部４１３、４１４へ送信する。 The multipliers 411 and 412 multiply the subtraction coefficient for each frequency transmitted from the subtraction coefficient calculation units 409 and 410 and the noise profile transmitted from the noise profile storage units 401 and 402 for each frequency. The calculation result for each frequency of the multipliers 411 and 412 is defined as a subtraction coefficient spectrum. The multipliers 411 and 412 transmit the calculated subtraction coefficient spectrum to the noise reduction units 413 and 414.

雑音低減部４１３、４１４は減算器であり、フーリエ変換部４０３、４０４から送信される周波数成分から、乗算器４１１、４１２から送信される減算係数スペクトルを減算することで、騒音が低減された周波数毎のスペクトルを得る。雑音低減部４１３、４１４で得られたスペクトルを、雑音低減スペクトルとする。雑音騒音低減部４１３、４１４は、雑音低減スペクトルを逆フーリエ変換部４１５、４１６へ送信する。 The noise reduction units 413 and 414 are subtracters, and the frequency at which noise is reduced by subtracting the subtraction coefficient spectrum transmitted from the multipliers 411 and 412 from the frequency component transmitted from the Fourier transform units 403 and 404. Get every spectrum. The spectrum obtained by the noise reduction units 413 and 414 is defined as a noise reduction spectrum. The noise / noise reduction units 413 and 414 transmit the noise reduction spectrum to the inverse Fourier transform units 415 and 416.

逆フーリエ変換部４１５、４１６は、雑音低減部４１３、４１４から送信される雑音低減スペクトルに対して、フーリエ変換部４０３、４０４から送信される位相情報を用いて、逆フーリエ変換（フーリエ逆変換）を行う。即ち、周波数領域の音声信号スペクトルを時間領域の音声信号に変換する逆変換処理を行う。そして、雑音が低減されたデジタル音声信号を得る。逆フーリエ変換部４１５、４１６は、復元した雑音低減後のデジタル音声信号を信号出力制御部Ｌｃｈ、Ｒｃｈへ送信する。 The inverse Fourier transform units 415 and 416 use the phase information transmitted from the Fourier transform units 403 and 404 for the noise reduction spectrum transmitted from the noise reduction units 413 and 414, and perform inverse Fourier transform (Fourier inverse transform). I do. That is, an inverse conversion process is performed to convert a frequency domain audio signal spectrum into a time domain audio signal. Then, a digital audio signal with reduced noise is obtained. The inverse Fourier transform units 415 and 416 transmit the restored digital audio signal after noise reduction to the signal output control units Lch and Rch.

以上、風雑音の低減処理回路と、ＳＳ方式を用いた雑音低減処理回路について説明した。次に、本実施例における、音声処理部１０７による雑音の低減処理について説明する。 The wind noise reduction processing circuit and the noise reduction processing circuit using the SS method have been described above. Next, noise reduction processing by the voice processing unit 107 in this embodiment will be described.

図５は、本実施例における、雑音の低減処理の回路を示す図である。雑音低減回路５００は音声処理部１０７に含まれる。雑音低減回路５００は、ＳＳ法による駆動雑音の低減処理を行う。駆動雑音は、図１におけるモータ１０２の駆動により発生する雑音である。なお、図４（ａ）と共通するブロックに関しては同一の番号を付加し、詳細な説明を省略する。 FIG. 5 is a diagram illustrating a circuit for noise reduction processing in the present embodiment. The noise reduction circuit 500 is included in the audio processing unit 107. The noise reduction circuit 500 performs drive noise reduction processing by the SS method. The driving noise is noise generated by driving the motor 102 in FIG. In addition, the same number is added about the block which is common in Fig.4 (a), and detailed description is abbreviate | omitted.

雑音低減回路５００には、音声入力部１０６から入力された、ＬｃｈとＲｃｈの音声信号をデジタル信号に変換した後のデジタル音声信号が入力される。また、雑音低減回路５００からのＬｃｈ、Ｒｃｈの音声信号が、図３の風雑音処理回路３００に出力される。 The noise reduction circuit 500 receives a digital audio signal obtained by converting the Lch and Rch audio signals input from the audio input unit 106 into digital signals. Also, the Lch and Rch audio signals from the noise reduction circuit 500 are output to the wind noise processing circuit 300 of FIG.

雑音低減回路５００においては、入力音声に含まれる風雑音の成分の大きさに応じてノイズプロファイルの補正量を調整する。図５において、補正量決定部５１１では、高速フーリエ変換回路５０３、５０４で出力された周波数帯域毎の音声データを用いて、Ｌｃｈ、Ｒｃｈの音声信号の位相の異なる低域周波数を検出し、減衰係数γ［ｎ］を補正する為の補正値αを決定する。 The noise reduction circuit 500 adjusts the correction amount of the noise profile in accordance with the magnitude of the wind noise component included in the input voice. In FIG. 5, the correction amount determination unit 511 detects low-frequency frequencies having different phases of the Lch and Rch audio signals using the audio data for each frequency band output from the fast Fourier transform circuits 503 and 504, and attenuates them. A correction value α for correcting the coefficient γ [n] is determined.

図６（ａ）は、補正量決定部５１１による減衰係数γ［ｎ］の補正量αを決定する処理を示すフローチャートである。雑音低減回路５００では、入力音声に含まれる風雑音の成分の大きさに応じてノイズプロファイルの補正量を調整する。そのため、入力音声に含まれる風雑音の成分を検出する処理を行う。Ｓ６０１において、風検出の評価値を初期化する。Ｓ６０２では、フーリエ変換部４０１、４０２から送信した音声スペクトルの周波数帯域に、閾値ｔｈｆ以上の周波数帯域が含まれていないを判定する。閾値ｔｈｆ以上の周波数帯域が含まれていない場合、Ｓ６０３に移行する。また、閾値ｔｈｆ以上の周波数帯域が含まれている場合、フローを終了する。この時の音声スペクトルの周波数の閾値であるｔｈｆは、音声信号に風雑音が多く集中する帯域である１ｋＨｚ程度に設定する。 FIG. 6A is a flowchart showing processing for determining the correction amount α of the attenuation coefficient γ [n] by the correction amount determination unit 511. The noise reduction circuit 500 adjusts the correction amount of the noise profile according to the magnitude of the wind noise component included in the input voice. For this reason, processing for detecting wind noise components included in the input speech is performed. In S601, the wind detection evaluation value is initialized. In S602, it is determined that the frequency band of the voice spectrum transmitted from the Fourier transform units 401 and 402 does not include a frequency band equal to or higher than the threshold thf. When the frequency band equal to or higher than the threshold thf is not included, the process proceeds to S603. If the frequency band equal to or higher than the threshold thf is included, the flow is terminated. The threshold value of the frequency of the voice spectrum at this time, thf, is set to about 1 kHz, which is a band where a lot of wind noise is concentrated in the voice signal.

Ｓ６０３では、フーリエ変換部４０１、４０２の出力であるステレオマイクの音声スペクトルの位相特性を比較する。前述のように、音声に風雑音が含まれている場合、音声の低域周波数帯域では、周波数の位相が無相関であるため、ある閾値ｔｈφ以上の位相差があればＳ６０４に移行する。また、位相差が閾値ｔｈφ以下であれば、風雑音が含まれていないとして、Ｓ６０２に移行する。この時、図１の撮像装置１００としてのデジタルカメラは、ＬｃｈとＲｃｈのマイクの間の距離が小さいと考えられる。通常、マイク間距離が小さい場合、ステレオマイクに入ってくる音声は、低域周波数帯域であるｔｈｆ以下では、ほぼ同位相の音声が収音できる。 In S603, the phase characteristics of the audio spectrum of the stereo microphone that is the output of the Fourier transform units 401 and 402 are compared. As described above, when wind noise is included in the sound, the phase of the frequency is uncorrelated in the low frequency band of the sound, so if there is a phase difference equal to or greater than a certain threshold thφ, the process proceeds to S604. On the other hand, if the phase difference is equal to or smaller than the threshold thφ, it is determined that no wind noise is included, and the process proceeds to S602. At this time, the digital camera as the imaging apparatus 100 in FIG. 1 is considered to have a small distance between the Lch and Rch microphones. Normally, when the distance between the microphones is small, the sound entering the stereo microphone can be picked up with substantially the same phase at thf or less, which is the low frequency band.

Ｓ６０４では、フーリエ変換部４０１、４０２の出力であるステレオマイクの音声スペクトルの振幅を比較する。風雑音が含まれる音声スペクトルでは、振幅が大きい。そのため、左右の音声スペクトルの振幅がある閾値ｔｈＡ以上あれば、風雑音が含まれていると判定し６０５に移行する。また、振幅がある閾値ｔｈＡ以下であれば、Ｓ６０２に移行する。 In S604, the amplitude of the audio spectrum of the stereo microphone that is the output of the Fourier transform units 401 and 402 is compared. The sound spectrum including wind noise has a large amplitude. Therefore, if the amplitudes of the left and right audio spectra are equal to or greater than a certain threshold thA, it is determined that wind noise is included, and the process proceeds to 605. If the amplitude is equal to or less than a certain threshold thA, the process proceeds to S602.

Ｓ６０５では、Ｓ６０２からＳ６０４までの条件に合っていた場合、その音声スペクトルには風雑音が含まれているとして、風検出の評価値を１加算する。Ｓ６０６では、風検出の評価値がある閾値ｔｈｎ以上であれば、Ｓ６０７に移行する。閾値ｔｈｎ以下であればＳ６０２に移行する。Ｓ６０７では、例えば図７（ａ）の様な風検出の評価値に対する減衰係数γの補正量αを決めておく。Ｓ６０６で求めた風検出の評価値を参照して補正量αを決定し、フローを終了する。補正量αは風検出の評価値が大きければ補正量αも大きくする。この時、図７（ａ）は風検出の評価値がある一定値以上になった場合、補正量αを一定に設定したが、状況に応じてどのように設定しても良い。 In S605, if the conditions from S602 to S604 are met, wind noise is included in the speech spectrum, and the wind detection evaluation value is incremented by one. In S606, if the wind detection evaluation value is equal to or greater than a certain threshold thn, the process proceeds to S607. If it is equal to or less than the threshold value thn, the process proceeds to S602. In S607, for example, the correction amount α of the attenuation coefficient γ with respect to the wind detection evaluation value as shown in FIG. The correction amount α is determined with reference to the wind detection evaluation value obtained in S606, and the flow ends. If the evaluation value for wind detection is large, the correction amount α is also increased. At this time, in FIG. 7A, when the wind detection evaluation value is equal to or greater than a certain value, the correction amount α is set to be constant.

減衰係数補正部５１２、５１３では、このように補正量決定部５１１で求めた補正量αを用いて、減衰係数算出部５０９で算出した減衰係数γ［ｎ］を補正する。乗算器４１１、４１２は、減算係数補正部５１２、５１３から出力された周波数毎の減算係数と、ノイズプロファイル記憶部４０１、４０２から送信されたノイズプロファイルとを、周波数毎に乗算する。これにより、ノイズプロファイルに基づく、周波数毎の雑音の低減値が決定される。 The attenuation coefficient correction units 512 and 513 correct the attenuation coefficient γ [n] calculated by the attenuation coefficient calculation unit 509 using the correction amount α obtained by the correction amount determination unit 511 as described above. The multipliers 411 and 412 multiply the subtraction coefficient for each frequency output from the subtraction coefficient correction units 512 and 513 and the noise profile transmitted from the noise profile storage units 401 and 402 for each frequency. Thereby, a noise reduction value for each frequency based on the noise profile is determined.

図７（ｂ）は減衰係数γ［ｎ］を補正量αにより補正したグラフを示す。図７（ｂ）では、補正量決定部５１１で求めた補正量αを音声レベル毎の減衰係数γに加算する。ここでは、補正方法として、補正量αを加算したが、状況に応じて補正量αを変えても良い。 FIG. 7B shows a graph in which the attenuation coefficient γ [n] is corrected by the correction amount α. In FIG. 7B, the correction amount α obtained by the correction amount determination unit 511 is added to the attenuation coefficient γ for each audio level. Here, the correction amount α is added as the correction method, but the correction amount α may be changed according to the situation.

このように、本実施形態では、風雑音が含まれる場合、その大きさに応じてノイズプロファイルの減衰係数を補正する。具体的には、風雑音の大きさが大きいほど、プロファイルが大きくなるように減衰係数の補正量が決定される。そのため、風雑音が含まれていると判断される場合には、入力音声から減算されるノイズプロファイルの成分がより大きくなるように補正される。減衰係数γ［ｎ］に補正量αを加算することで、ＳＳ法を用いた駆動雑音の低減処理時には、低域で駆動音が残留する課題を解決出来る。 Thus, in the present embodiment, when wind noise is included, the attenuation coefficient of the noise profile is corrected according to the magnitude. Specifically, the attenuation coefficient correction amount is determined such that the larger the wind noise is, the larger the profile is. Therefore, when it is determined that wind noise is included, the noise profile component subtracted from the input voice is corrected to be larger. By adding the correction amount α to the attenuation coefficient γ [n], it is possible to solve the problem that the drive sound remains in a low frequency during the drive noise reduction process using the SS method.

（実施例２）
本実施例では、減衰係数γの補正方法に関する説明を行う。本実施例の雑音低減回路の構成は、実施例１の構成と同様の構成を持つ為、説明は省略する。ただし、補正量決定部５１１の構成が実施例１と異なる為、以下に詳細を述べる。 (Example 2)
In this embodiment, a description will be given of a method for correcting the attenuation coefficient γ. Since the configuration of the noise reduction circuit of this embodiment has the same configuration as that of the first embodiment, description thereof is omitted. However, since the configuration of the correction amount determination unit 511 is different from that of the first embodiment, details will be described below.

図８は、本実施例における補正量決定部５１１の処理を説明するフローチャートである。また、図９は、減衰係数γ［ｎ］を補正量αにより補正したグラフを示す。Ｓ８０１では、補正量決定部５１１で、音声信号のレベルが閾値以上かどうかを判定する。Ｓ８０２では、音声信号のレベルが閾値以上であれば、レベルに応じた減衰係数γに対し補正量αを加算する。また、図９の様に補正後、減衰係数１．０以上の値は減衰係数１．０にする。 FIG. 8 is a flowchart for explaining the processing of the correction amount determination unit 511 in the present embodiment. FIG. 9 shows a graph in which the attenuation coefficient γ [n] is corrected by the correction amount α. In step S801, the correction amount determination unit 511 determines whether the level of the audio signal is greater than or equal to a threshold value. In S802, if the level of the audio signal is equal to or higher than the threshold value, the correction amount α is added to the attenuation coefficient γ according to the level. Further, after correction as shown in FIG. 9, the value of the attenuation coefficient of 1.0 or more is set to 1.0.

Ｓ８０３では、音声信号が閾値以下であれば、補正量αは設定せず、減衰係数補正部５１０、５１１では減衰係数γの補正を行わない。Ｓ８０４では、Ｓ８０２、Ｓ８０３で補正した、減衰係数γ［ｎ］をプロファイル５０１、５０２にかけて、プロファイルを更新する。 In S803, if the audio signal is equal to or smaller than the threshold value, the correction amount α is not set, and the attenuation coefficient correction units 510 and 511 do not correct the attenuation coefficient γ. In step S804, the profile is updated by applying the attenuation coefficient γ [n] corrected in steps S802 and S803 to the profiles 501 and 502.

本実施例では、ステレオマイクから得られた、音声信号のレベルが閾値より小さい場合は、風雑音が入ってきたとしても補正量αを加算しない。音声信号が閾値以上のレベルであれば、補正量α分を加算する。 In this embodiment, when the level of the audio signal obtained from the stereo microphone is smaller than the threshold value, the correction amount α is not added even if wind noise enters. If the audio signal is at a level equal to or higher than the threshold, the correction amount α is added.

このように、音声信号のレベルに応じて、補正量αを加算するかを決定することで、音声信号のレベルが小さい場合に補正しすぎないように駆動雑音及び風雑音を低減することができる。 Thus, by determining whether or not to add the correction amount α in accordance with the level of the audio signal, it is possible to reduce drive noise and wind noise so that the correction is not excessive when the level of the audio signal is small. .

（実施例３）
図１０及び図１１を用いて本実施例における雑音低減回路１０００を説明する。本実施例の雑音低減回路１０００の構成は、実施例１及び実施例２の構成における図５の騒音低減装置の構成と比較し、ノイズプロファイル生成部１００１、１００２が異なる。 (Example 3)
The noise reduction circuit 1000 in the present embodiment will be described with reference to FIGS. The configuration of the noise reduction circuit 1000 according to the present embodiment is different from the configuration of the noise reduction device in FIG.

図１０は、本実施例の雑音低減回路１０００の構成を示す図である。また、図１１はＳＳ法による駆動雑音の低減処理を行う為のプロファイル生成を行うための概念図である。ノイズプロファイル生成部１００１、１００２は、低減対象となる雑音の周波数成分示すノイズプロファイルを記憶する。具体的には、図１１において低減対象の雑音が発生する前の音声１１０１及び低減対象の雑音１１０２から成る音声信号をフーリエ変換し、音声信号１１０１の周波数成分Ｒｔ１［ｉ］と音声信号１１０２の周波数成分Ｒｔ２［ｉ］を得る。この時周波数成分Ｒｔ１［ｉ］とＲｔ２［ｉ］は、振幅のスカラー量である。また、ｉは周波数のサンプル数で、サンプリング定理により音声データのサンプル数Ｎだとすると、ｉはＮ／２となる。つまり音声データのサンプル数Ｎ＝１０２４の場合、周波数のサンプル数は、ｉ＝５１２となる。 FIG. 10 is a diagram illustrating a configuration of the noise reduction circuit 1000 according to the present embodiment. FIG. 11 is a conceptual diagram for generating a profile for performing drive noise reduction processing by the SS method. The noise profile generation units 1001 and 1002 store noise profiles indicating frequency components of noise to be reduced. Specifically, in FIG. 11, the audio signal composed of the audio 1101 before the noise to be reduced and the noise 1102 to be reduced are Fourier-transformed, and the frequency component Rt1 [i] of the audio signal 1101 and the frequency of the audio signal 1102 The component Rt2 [i] is obtained. At this time, the frequency components Rt1 [i] and Rt2 [i] are scalar quantities of amplitude. Further, i is the number of frequency samples, and if the number of audio data samples is N according to the sampling theorem, i is N / 2. That is, when the number of audio data samples N = 1024, the frequency sample number is i = 512.

時間ｔ２の騒音周波数成分（Ｒｔ２）から時間ｔ１の雑音を含まない音声の周波数成分（Ｒｔ１）を差し引く事で、雑音のみの周波数成分、つまりノイズプロファイルＲｐを算出する。
Ｒｐ［ｉ］＝Ｒｔ２［ｉ］−Ｒｔ１［ｉ］式（１） By subtracting the frequency component (Rt1) of the voice not including the noise at time t1 from the noise frequency component (Rt2) at time t2, the frequency component of noise alone, that is, the noise profile Rp is calculated.
Rp [i] = Rt2 [i] −Rt1 [i] Formula (1)

算出した低減対象の雑音のみの周波数成分Ｒｐをノイズプロファイルとして記憶し、雑音低減部４１３、４１４において、騒音低減処理をする。 The calculated frequency component Rp of only noise to be reduced is stored as a noise profile, and the noise reduction units 413 and 414 perform noise reduction processing.

このように、騒音が発生する毎にノイズプロファイルを生成することで、それぞれの騒音に適した低減処理が行えるようになる。 In this manner, by generating a noise profile each time noise is generated, reduction processing suitable for each noise can be performed.

Claims

Input means;
Driving means;
A conversion unit that divides the time-domain audio signal obtained by the input unit at regular intervals and converts the signal into a frequency-domain audio signal spectrum;
Storage means for storing a profile relating to amplitude for each frequency of driving noise, which is noise generated by the driving means;
Calculation means for calculating a subtraction coefficient based on the signal output from the conversion means;
Correction amount determination means for determining the correction amount of the subtraction coefficient according to the magnitude of wind noise included in the audio signal obtained from the input means;
Correction means for correcting the subtraction coefficient obtained by the calculation means with the correction amount obtained by the correction amount determination means;
A determining unit that determines a noise reduction value for each frequency from the subtraction coefficient corrected by the correcting unit and the profile stored in the storage unit;
Noise reduction means for reducing the drive noise by subtracting the reduction value obtained by the determination means from the spectrum of the frequency domain audio signal output from the conversion means;
Inverse transform means for transforming the frequency domain speech signal spectrum from the noise reduction means into a time domain speech signal;
Wind noise reducing means for reducing wind noise from the audio signal output from the inverse transform means;
A speech processing apparatus comprising:

2. The correction amount determination unit according to claim 1, wherein the correction amount determination unit determines the correction amount so that the reduction value increases as the wind signal contained in the audio signal obtained from the input unit increases. Voice processing device.

The correction amount determination unit obtains an evaluation value of wind noise from an amplitude characteristic and a phase characteristic of an audio signal spectrum output from the conversion unit, and determines the correction amount based on the evaluation value. Item 6. The speech processing apparatus according to Item 1.

The input means inputs a stereo audio signal,
The correction amount determination unit obtains the evaluation value according to a phase difference and an amplitude magnitude in the audio signal spectrum of the left channel and the right channel of the stereo audio signal output from the conversion unit. The speech processing apparatus according to claim 3.

The audio processing apparatus according to claim 1, wherein the correction unit corrects the correction amount determined by the correction amount determination unit by adding the correction amount to a subtraction coefficient for each audio level.

The speech processing apparatus according to claim 1, wherein the calculation unit decreases the subtraction coefficient as the time change of the result of dividing the profile by the speech signal spectrum output from the conversion unit increases.

The calculation means obtains a ratio between the audio signal spectrum output from the conversion means and the profile for each frequency, and based on a result of smoothing the ratio for each frequency for each frequency with respect to a fixed time. The audio processing apparatus according to claim 6, wherein the subtraction coefficient is calculated.