JP6300464B2

JP6300464B2 - Audio processing device

Info

Publication number: JP6300464B2
Application number: JP2013165850A
Authority: JP
Inventors: 太郎松野
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-08-09
Filing date: 2013-08-09
Publication date: 2018-03-28
Anticipated expiration: 2033-08-09
Also published as: JP2015034898A

Description

本発明は、音声処理装置に関し、特に、雑音を周波数領域で低減する音声処理装置に関する。 The present invention relates to a speech processing apparatus, particularly relates to a speech processing apparatus for reducing the frequency domain noise.

従来、音声信号に混入する雑音を低減する方法として、雑音成分のスペクトルを予め計測しておき、周波数領域において入力音声信号から雑音成分を減算するスペクトルサブトラクト法（ＳＳ法）が知られている。 2. Description of the Related Art Conventionally, as a method for reducing noise mixed in a speech signal, a spectral subtract method (SS method) is known in which a spectrum of a noise component is measured in advance and the noise component is subtracted from an input speech signal in the frequency domain. .

雑音成分の減算量を規定する減算係数は、一般的には実験的に定めた固定値であるが、特許文献１に記載されるように、音声信号と雑音との比に基づいて減算係数を決定する方法も知られている。 The subtraction coefficient that defines the subtraction amount of the noise component is generally a fixed value determined experimentally. However, as described in Patent Document 1, the subtraction coefficient is determined based on the ratio between the audio signal and the noise. Methods for determining are also known.

特開２０００−３３０５９７号公報JP 2000-330597 A

特許文献１に記載の技術では、音声信号の振幅が時間的に大きく変動する場合に、減衰係数が時間方向で急激に変化し、その結果、雑音低減後の音声信号の振幅（音圧）が急激に変化する。これでは、聴感上の違和感が大きく、音質が劣化してしまう。 In the technique described in Patent Document 1, when the amplitude of the audio signal varies greatly with time, the attenuation coefficient changes rapidly in the time direction. As a result, the amplitude (sound pressure) of the audio signal after noise reduction is reduced. It changes rapidly. In this case, the sense of incongruity is great and the sound quality is deteriorated.

また、減算係数の時間方向での急激な変化と、入力信号に含まれる雑音成分の時間変化とにより、周波数帯で雑音成分の除去程度が相違する事態が生じ得る。すなわち、雑音成分を除去しきれている周波数帯と、雑音成分を除去しきれていない周波数帯が生じ得る。この状態では、雑音低減後の逆フーリエ変換（ＩＦＦＴ）で音声信号を復元したときに、ミュージカルノイズと呼ばれる耳触りな雑音が生まれ、音質が劣化してしまう。 Further, there may occur a situation in which the degree of noise component removal differs in the frequency band due to a rapid change in the time direction of the subtraction coefficient and a time change of the noise component included in the input signal. That is, a frequency band in which the noise component has been completely removed and a frequency band in which the noise component has not been completely removed can occur. In this state, when an audio signal is restored by inverse Fourier transform (IFFT) after noise reduction, a touching noise called musical noise is produced, and the sound quality is deteriorated.

本発明は、このような不都合を解消し、少ない音質劣化で雑音を低減する音声処理装置を提示することを目的とする。 It is an object of the present invention to provide a speech processing apparatus that eliminates such inconvenience and reduces noise with little deterioration in sound quality.

本発明に係る音声処理装置は、音声入力手段と、前記音声入力手段により入力された時間領域の音声信号を周波数領域の音声信号スペクトルに変換するフーリエ変換手段と、低減対象となる雑音の周波数振幅成分を示すノイズプロファイルを記憶するノイズプロファイル記憶手段と、前記フーリエ変換手段で求められた音声信号スペクトルと前記ノイズプロファイルとの比を周波数毎に求める周波数成分除算手段と、前記周波数成分除算手段により求められた周波数毎の比を周波数毎に平滑化する時間変化制御手段と、前記時間変化制御手段の出力に従い減算係数を算出する減算係数算出手段と、前記ノイズプロファイルに前記減算係数を乗算する乗算手段と、前記音声信号スペクトルから前記乗算手段の出力を減算する雑音低減手段と、前記雑音低減手段の出力を時間領域の音声信号に復元する逆フーリエ変換手段と、前記逆フーリエ変換手段により復元された音声信号を出力する音声出力手段とを有することを特徴とする。 The speech processing apparatus according to the present invention comprises: speech input means; Fourier transform means for converting a time domain voice signal input by the voice input means into a frequency domain voice signal spectrum; and frequency amplitude of noise to be reduced. A noise profile storage means for storing a noise profile indicating a component, a frequency component division means for obtaining a ratio of the audio signal spectrum obtained by the Fourier transform means and the noise profile for each frequency, and a frequency component division means. Time change control means for smoothing a ratio for each frequency for each frequency, subtraction coefficient calculation means for calculating a subtraction coefficient according to an output of the time change control means, and multiplication means for multiplying the noise profile by the subtraction coefficient Noise reduction means for subtracting the output of the multiplication means from the audio signal spectrum; Characterized in that it has the inverse Fourier transform means for restoring the output of the noise reduction means to the audio signal in the time domain, and an audio output means for outputting the audio signal restored by the inverse Fourier transform means.

本発明によれば、周波数領域における個々の周波数成分の信号対雑音比の時間変動を平滑化した後で雑音相当分を除去するので、ミュージカルノイズを軽減でき、音質を改善できる。 According to the present invention, since the noise equivalent is removed after smoothing the time variation of the signal-to-noise ratio of each frequency component in the frequency domain, the musical noise can be reduced and the sound quality can be improved.

本発明の実施例１の概略構成ブロック図を示す。1 shows a schematic block diagram of a first embodiment of the present invention. 雑音低減処理部の概略構成ブロック図を示す。The schematic block diagram of a noise reduction process part is shown. 時間変化制御部の概略構成ブロック図を示す。The schematic block diagram of a time change control part is shown. 周波数成分除算部の出力波形例である。It is an example of the output waveform of a frequency component division part. 時間変化制御部の出力波形例である。It is an example of an output waveform of a time change control part. 減算係数算出部の入出力特性の一例である。It is an example of the input-output characteristic of a subtraction coefficient calculation part. 本発明の第２実施例の概略構成ブロック図を示す。The schematic block diagram of 2nd Example of this invention is shown. 図７に示す実施例の雑音低減処理部の概略構成ブロック図を示す。The schematic block diagram of the noise reduction process part of the Example shown in FIG. 7 is shown. 図８に示す雑音低減処理部の雑音低減部の概略構成ブロック図を示す。FIG. 9 is a block diagram illustrating a schematic configuration of a noise reduction unit of the noise reduction processing unit illustrated in FIG. 8. 雑音の周波数成分の変動傾向（時間の経過で増加）を示すグラフである。It is a graph which shows the fluctuation tendency (increase with progress of time) of the frequency component of noise. 雑音の周波数成分の変動傾向（時間の経過で変化無し）を示すグラフである。It is a graph which shows the fluctuation tendency (no change with progress of time) of the frequency component of noise. 雑音の周波数成分の変動傾向（時間の経過で減少）を示すグラフである。It is a graph which shows the fluctuation tendency (decrease with progress of time) of the frequency component of noise. 減衰倍率算出部の入出力特性例である。It is an example of input / output characteristics of the attenuation factor calculation unit. 変動傾向が減少の場合の、ズーム雑音の周波数成分と時間との関係の一例を示す。An example of the relationship between the frequency component of zoom noise and time when the fluctuation tendency is decreasing is shown. ｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝０のときの切替え例を示す。An example of switching when fluctuation [n] = 0 is shown. ｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝１のときの切替え例を示す。An example of switching when fluctuation [n] = 1 is shown. ｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝２のときの切替え例を示す。An example of switching when fluctuation [n] = 2 is shown. モータ位置に対する雑音の周波数成分の変動傾向の一例を示す。An example of the fluctuation | variation tendency of the frequency component of the noise with respect to a motor position is shown.

以下、図面を参照して、本発明の実施例を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明に係る音声処理装置の一実施例を組み込んだ撮像装置１００の概略構成ブロック図を示す。撮像装置、特に、動画の撮像装置は、撮影レンズに電動のズームレンズを装備するのが一般的である。従って、マイクにより取得される周囲又は被写体からの音声信号にズーム操作に伴うズーム動作音が雑音として混入することがある。すなわち、ズーム駆動系が雑音発生源となりうる。以下では、このズーム操作に伴う雑音を低減する実施例を説明する。 FIG. 1 shows a schematic block diagram of an imaging apparatus 100 incorporating an embodiment of a sound processing apparatus according to the present invention. 2. Description of the Related Art In general, an imaging device, particularly a moving image imaging device, is equipped with an electric zoom lens as a photographing lens. Therefore, the zoom operation sound accompanying the zoom operation may be mixed as noise in the audio signal from the surroundings or the subject acquired by the microphone. That is, the zoom drive system can be a noise generation source. Hereinafter, an embodiment for reducing noise associated with this zoom operation will be described.

撮像装置１００は、操作部１０１、制御部１０２、撮像部１０３、音声入力部１０４、雑音低減処理部１０５、音声出力部１０６、映像出力部１０７、メモリ１０８及びメモリバス１０９から構成される。操作部１０１、制御部１０２、撮像部１０３、音声入力部１０４、雑音低減処理部１０５、音声出力部１０６及び映像出力部１０７は、メモリバス１０９を介してメモリ１０８にデータを読み書きする。 The imaging apparatus 100 includes an operation unit 101, a control unit 102, an imaging unit 103, an audio input unit 104, a noise reduction processing unit 105, an audio output unit 106, a video output unit 107, a memory 108, and a memory bus 109. The operation unit 101, the control unit 102, the imaging unit 103, the audio input unit 104, the noise reduction processing unit 105, the audio output unit 106, and the video output unit 107 read / write data from / to the memory 108 via the memory bus 109.

操作部１０１は、ユーザの指示を撮像装置１００に入力する手段であり、ボタン、タッチパネル又はズームレバー等からなる。制御部１０２は、撮像装置１００を制御するＣＰＵ（中央演算処理装置）からなり、操作部１０１によるユーザからの指示に従い、メモリバス１０９を介して関連するブロックを制御する。 The operation unit 101 is a unit that inputs a user instruction to the imaging apparatus 100, and includes a button, a touch panel, a zoom lever, or the like. The control unit 102 includes a CPU (Central Processing Unit) that controls the imaging device 100, and controls related blocks via the memory bus 109 in accordance with an instruction from the user through the operation unit 101.

例えば、制御部１０２は、ユーザによる操作部１０１のズーム操作に従い、撮像部１０３のズームレンズを制御すると共に、ズーム制御情報をメモリ１０８の制御信号領域に記憶する。ズーム制御情報は、ズーム駆動の開示時刻及び終了時刻、ズーム駆動中を示すフラグ、ズーム動作方向、並びにズーム駆動モータの回転速度などを示す情報からなる。開始時刻は、絶対時刻でも音声取り込みからの相対時刻でも良い。終了時刻は絶対時刻でも開始からの相対時刻でもよい。詳細は後述するが、ズーム制御情報は、音声信号に混入するズーム駆動音を除去又は低減するために使用される。ズーム制御情報は、ズーム駆動により発生する雑音の鳴り始めの時間及び鳴り終わりの時間を含む雑音時間情報でもあり、メモリ１０８の制御信号領域はいわば雑音時間保持手段である。 For example, the control unit 102 controls the zoom lens of the imaging unit 103 according to the zoom operation of the operation unit 101 by the user, and stores the zoom control information in the control signal area of the memory 108. The zoom control information includes information indicating the disclosure time and end time of zoom driving, a flag indicating that zoom driving is in progress, the zoom operation direction, the rotation speed of the zoom driving motor, and the like. The start time may be an absolute time or a relative time from voice capture. The end time may be an absolute time or a relative time from the start. Although details will be described later, the zoom control information is used to remove or reduce zoom drive sound mixed in the audio signal. The zoom control information is also noise time information including the start time and end time of noise generated by zoom driving, and the control signal area of the memory 108 is a noise time holding means.

撮像部１０３は、ズーム機能を備えたレンズ、絞り、撮像センサ及びＡ／Ｄ変換器からなる。撮像センサは、レンズを透過して撮像面に入射する光学像を画像信号に変換し、Ａ／Ｄ変換器が、撮像センサから出力されるアナログ画像信号をデジタル画像信号に変換する。撮像部１０３から出力されるデジタル画像信号は、メモリ１０８の画像データ領域に書き込まれる。 The imaging unit 103 includes a lens having a zoom function, a diaphragm, an imaging sensor, and an A / D converter. The imaging sensor converts an optical image that passes through the lens and enters the imaging surface into an image signal, and the A / D converter converts an analog image signal output from the imaging sensor into a digital image signal. The digital image signal output from the imaging unit 103 is written in the image data area of the memory 108.

音声入力部１０４は、例えば、外部の音声を取り込んで音声信号に変換するマイクのような電気音響変換素子と、そのアナログ出力をデジタル信号に変換するＡ／Ｄ変換器からなる。音声入力部１０４は、ある周波数、例えば４８ＫＨｚでサンプリングされたデジタル音声信号をメモリ１０８の音声データ領域に書き込む。 The audio input unit 104 includes, for example, an electroacoustic conversion element such as a microphone that takes in external audio and converts it into an audio signal, and an A / D converter that converts the analog output into a digital signal. The audio input unit 104 writes a digital audio signal sampled at a certain frequency, for example, 48 KHz, in the audio data area of the memory 108.

雑音低減処理部１０５は、制御信号領域に記憶されたズーム制御情報に従い、メモリ１０８の音声データ領域に記憶された音声データに対する雑音低減処理を実行する。具体的には、雑音低減処理部１０５は、ズーム制御情報がズーム動作中を示すときには、メモリ１０８から読み出した音声データに一定期間のフレーム単位で雑音低減処理を施し、処理後の音声データをメモリ１０８に書き戻す。ズーム駆動中でないときには、ズーム駆動音が発生していないので、雑音低減処理部１０５は、メモリ１０８から読み出した音声データをそのままメモリ１０８に書き戻す。 The noise reduction processing unit 105 performs noise reduction processing on the audio data stored in the audio data area of the memory 108 in accordance with the zoom control information stored in the control signal area. Specifically, when the zoom control information indicates that the zoom operation is being performed, the noise reduction processing unit 105 performs noise reduction processing on the audio data read from the memory 108 in units of frames for a certain period, and stores the processed audio data in the memory. Write back to 108. When zoom driving is not in progress, no zoom driving sound is generated, so the noise reduction processing unit 105 writes the audio data read from the memory 108 as it is into the memory 108.

雑音低減処理前の音声データを処理後の音声データとは別にメモリ１０８上に確保する必要が無い場合には、雑音低減処理部１０５から出力される音声データを、メモリ１０８の音声データ領域の処理前の音声データに上書きしてもよい。そもそも、雑音低減処理を行わない音声データ部分については、メモリ１０８から読み出さなくてもよい。 When it is not necessary to secure the audio data before the noise reduction processing on the memory 108 separately from the processed audio data, the audio data output from the noise reduction processing unit 105 is processed in the audio data area of the memory 108. The previous audio data may be overwritten. In the first place, it is not necessary to read out from the memory 108 the audio data portion for which noise reduction processing is not performed.

音声出力部１０６は、音声データをアナログ音声信号に変換するＤ／Ａ変換器と、スピーカ及び音声出力端子からなる。音声出力部１０６は、メモリ１０８の音声データ領域から音声データを読み出し、アナログ音声信号に変換し、スピーカから音響出力し、音声出力端子から外部に出力する。 The audio output unit 106 includes a D / A converter that converts audio data into an analog audio signal, a speaker, and an audio output terminal. The audio output unit 106 reads out audio data from the audio data area of the memory 108, converts it into an analog audio signal, outputs sound from a speaker, and outputs the sound from the audio output terminal to the outside.

映像出力部１０７は、いわゆる表示装置であり、メモリ１０８の画像データ領域から画像データを読み出して、その画像を表示する。 The video output unit 107 is a so-called display device, reads image data from the image data area of the memory 108, and displays the image.

メモリ１０８は、高速でランダムアクセス可能なダイナミックＲＡＭである。メモリ１０８の記憶領域は、音声データを記憶する音声データ領域、画像データを記憶する画像データ領域及び制御信号を記憶する制御信号領域に区分される。制御部１０２は、メモリ１０８に格納される音声データ、画像データ及び制御信号を、フレームごとにどの時刻のデータであるかを識別できるように管理する。 The memory 108 is a dynamic RAM that can be randomly accessed at high speed. The storage area of the memory 108 is divided into an audio data area for storing audio data, an image data area for storing image data, and a control signal area for storing control signals. The control unit 102 manages the audio data, the image data, and the control signal stored in the memory 108 so that it can be identified at which time each frame data.

図示しないメモリ制御回路が、メモリバス１０９に接続する各ブロックからのメモリバス１０９を介したメモリ１０８へのアクセスを調停し、時分割でのメモリ１０８への読み書きを可能にする。 A memory control circuit (not shown) arbitrates access to the memory 108 from each block connected to the memory bus 109 via the memory bus 109, and enables read / write to the memory 108 in time division.

図２は、雑音低減処理部１０５の概略構成ブロック図を示す。信号入力制御部２０１はメモリバス１０９とのインターフェースである。メモリ１０８の音声データ領域に記憶されている音声データと制御信号領域に記憶されているズーム制御情報が、メモリバス１０９を介して信号入力制御部２０１に入力する。信号入力制御部２０１は入力する音声データを所定の一定区間毎にフレーム分割し、ズーム制御情報に従い、雑音低減処理をするときにはフーリエ変換部２０２に供給し、雑音低減処理をしないときには信号出力制御部２１０に供給する。すなわち、ズーム制御情報がズーム中、即ち、ズーム駆動による雑音発生中（とその可能性）を示すときには、信号入力制御部２０１は、音声データをフーリエ変換部２０２に供給する。他方、ズーム制御情報がズーム中でないとき、即ち、ズーム駆動による雑音が発生しない状態を示すとき、信号入力制御部２０１は、音声データを信号出力制御部２１０に供給する。何れの場合でも、信号入力制御部２０１は、ズーム制御情報を信号出力制御部２１０に供給する。 FIG. 2 shows a schematic block diagram of the noise reduction processing unit 105. The signal input control unit 201 is an interface with the memory bus 109. Audio data stored in the audio data area of the memory 108 and zoom control information stored in the control signal area are input to the signal input control unit 201 via the memory bus 109. The signal input control unit 201 divides the input audio data into frames at predetermined predetermined intervals, and supplies them to the Fourier transform unit 202 when noise reduction processing is performed according to zoom control information, and the signal output control unit when noise reduction processing is not performed 210 is supplied. That is, when the zoom control information indicates that zooming is in progress, that is, noise is being generated due to zoom driving (and the possibility thereof), the signal input control unit 201 supplies audio data to the Fourier transform unit 202. On the other hand, when the zoom control information is not being zoomed, that is, when the zoom control information indicates a state in which noise due to zoom driving does not occur, the signal input control unit 201 supplies audio data to the signal output control unit 210. In any case, the signal input control unit 201 supplies zoom control information to the signal output control unit 210.

フーリエ変換部２０２は、信号入力制御部２０１からのフレーム分割された音声データをフーリエ変換し、周波数毎の位相情報と、周波数毎の振幅の絶対値（周波数振幅成分）を算出する。フーリエ変換部２０２は、周波数毎の周波数振幅成分を周波数成分除算部２０４と雑音低減部２０８に供給し、周波数毎の位相情報を逆フーリエ変換部２０９に供給する。 The Fourier transform unit 202 performs Fourier transform on the frame-divided audio data from the signal input control unit 201, and calculates phase information for each frequency and an absolute value (frequency amplitude component) of the amplitude for each frequency. The Fourier transform unit 202 supplies a frequency amplitude component for each frequency to the frequency component division unit 204 and the noise reduction unit 208, and supplies phase information for each frequency to the inverse Fourier transform unit 209.

ノイズプロファイル記憶部２０３は、低減対象となる雑音（ここでは、ズーム駆動音等のズーム駆動時に発生する雑音）の周波数振幅成分をノイズプロファイルとして記憶する。具体的には、低減対象となる雑音のみから成る音声信号をフーリエ変換し、得られた周波数振幅成分を得る。低減対象となる雑音がある程度の時間（例えば４秒間）続く場合、雑音が続く時間内における周波数成分の時間変化に対してピークホールドしたものをノイズプロファイルとする。 The noise profile storage unit 203 stores a frequency amplitude component of noise to be reduced (here, noise generated during zoom driving such as zoom driving sound) as a noise profile. Specifically, an audio signal consisting only of noise to be reduced is subjected to Fourier transform to obtain the obtained frequency amplitude component. When the noise to be reduced continues for a certain period of time (for example, 4 seconds), a noise profile is obtained by peak-holding with respect to the time change of the frequency component within the time that the noise continues.

ノイズプロファイル記憶部２０３に記憶するノイズプロファイルは、復元可能に圧縮されていてもよい。但し、この場合、復元のための伸長手段をノイズプロファイル記憶部２０３の出力段に配置する必要がある。ノイズプロファイル記憶部２０３は、記憶しているノイズプロファイルを、周波数成分除算部２０４と乗算器２０７に読み出す。 The noise profile stored in the noise profile storage unit 203 may be compressed so that it can be restored. In this case, however, decompression means for restoration needs to be arranged at the output stage of the noise profile storage unit 203. The noise profile storage unit 203 reads out the stored noise profile to the frequency component division unit 204 and the multiplier 207.

周波数成分除算部２０４は周波数毎の除算器からなり、フーリエ変換部２０２からの周波数振幅成分（Ａ）をノイズプロファイル記憶部２０３からのノイズプロファイルの対応する周波数の値（Ｂ）で周波数毎に除算する。周波数成分除算部２０４は、周波数毎の除算結果を時間変化制御部２０５に供給する。 The frequency component division unit 204 includes a frequency divider, and divides the frequency amplitude component (A) from the Fourier transform unit 202 by the frequency value (B) corresponding to the noise profile from the noise profile storage unit 203 for each frequency. To do. The frequency component division unit 204 supplies the division result for each frequency to the time change control unit 205.

時間変化制御部２０５は実質的にはローパスフィルタ（ＬＰＦ）からなり、周波数成分除算部２０４からの周波数毎の除算結果を時間軸上で平滑化する。時間変化制御部２０５の具体的な構成は後述する。時間変化制御部２０５は、周波数毎のフィルタ結果、すなわち低域成分を減算係数算出部２０６に出力する。 The time change control unit 205 substantially includes a low-pass filter (LPF), and smoothes the division result for each frequency from the frequency component division unit 204 on the time axis. A specific configuration of the time change control unit 205 will be described later. The time change control unit 205 outputs the filter result for each frequency, that is, the low frequency component, to the subtraction coefficient calculation unit 206.

減算係数算出部２０６は、時間変化制御部２０５からの周波数毎の時間変化の低域成分量から周波数毎の減算係数（ノイズプロファイルに対する重み係数）を算出し、乗算器２０７に出力する。減算係数算出部２０６の具体的な構成は後述する。 The subtraction coefficient calculation unit 206 calculates a subtraction coefficient for each frequency (weight coefficient for the noise profile) from the low frequency component amount of the time change for each frequency from the time change control unit 205, and outputs it to the multiplier 207. A specific configuration of the subtraction coefficient calculation unit 206 will be described later.

乗算器２０７は、減算係数算出部２０６からの周波数毎の減算係数をノイズプロファイル記憶部２０３から読み出されたノイズプロファイルの対応する周波数成分の値に乗算する。換言すると、乗算器２０７は、ノイズプロファイル記憶部２０３から読み出されたノイズプロファイルの強度を減算係数算出部２０６からの周波数毎の減算係数（重み係数）で調整する。この点で、乗算器２０７の出力は重み付きノイズスペクトルを示す。 The multiplier 207 multiplies the subtraction coefficient for each frequency from the subtraction coefficient calculation unit 206 by the value of the corresponding frequency component of the noise profile read from the noise profile storage unit 203. In other words, the multiplier 207 adjusts the intensity of the noise profile read from the noise profile storage unit 203 by the subtraction coefficient (weighting coefficient) for each frequency from the subtraction coefficient calculation unit 206. At this point, the output of multiplier 207 exhibits a weighted noise spectrum.

雑音低減部２０８は減算器からなり、フーリエ変換部２０２からの周波数振幅成分から、乗算器２０７からの重み付きノイズスペクトルを対応する周波数同士で減算する。この減算処理により、周波数領域でズーム駆動に起因する雑音が低減される。雑音低減部２０８の出力信号は雑音低減後の音声信号のスペクトルを示し、逆フーリエ変換部２０９に入力する。 The noise reduction unit 208 includes a subtracter, and subtracts the weighted noise spectrum from the multiplier 207 from the corresponding frequencies from the frequency amplitude component from the Fourier transform unit 202. By this subtraction process, noise caused by zoom driving is reduced in the frequency domain. The output signal of the noise reduction unit 208 indicates the spectrum of the speech signal after noise reduction, and is input to the inverse Fourier transform unit 209.

逆フーリエ変換部２０９は、雑音低減部２０８からの音声信号スペクトルを、フーリエ変換部２０２からの位相情報を用いて逆フーリエ変換する。これにより、雑音低減した時間領域の音声データを得ることができる。逆フーリエ変換部２０９は、逆フーリエ変換により復元された音声データを信号出力制御部２１０に供給する。 The inverse Fourier transform unit 209 performs inverse Fourier transform on the audio signal spectrum from the noise reduction unit 208 using the phase information from the Fourier transform unit 202. This makes it possible to obtain time-domain audio data with reduced noise. The inverse Fourier transform unit 209 supplies the audio data restored by the inverse Fourier transform to the signal output control unit 210.

信号出力制御部２１０は、メモリバス１０９に対するインターフェースである。信号出力制御部２１０は、信号入力制御部２０１からのズーム制御情報に従い、雑音低減処理をするときには、逆フーリエ変換部２０９からの音声データを選択して、メモリバス１０９に出力する。また、信号出力制御部２１０は、ズーム制御情報に従い、雑音低減処理をしないときには、信号入力制御部２０１からのフレーム分割された音声データを選択し、フレーム合成して、メモリバス１０９に出力する。なお、信号出力制御部２１０の、信号入力制御部２０１からの音声信号が入力するポートには、フーリエ変換部２０２から逆フーリエ変換部２０９まで部分での処理時間に相当する遅延を与える遅延器を配置してある。 The signal output control unit 210 is an interface to the memory bus 109. The signal output control unit 210 selects the audio data from the inverse Fourier transform unit 209 and outputs it to the memory bus 109 when performing noise reduction processing according to the zoom control information from the signal input control unit 201. Further, the signal output control unit 210 selects the audio data divided into frames from the signal input control unit 201 according to the zoom control information, performs frame synthesis, and outputs the frame data to the memory bus 109. Note that a delay unit that gives a delay corresponding to the processing time in the portion from the Fourier transform unit 202 to the inverse Fourier transform unit 209 is provided at a port of the signal output control unit 210 to which the audio signal from the signal input control unit 201 is input. It is arranged.

撮像装置１００の動画撮影の際に音声に混入するズーム動作音を雑音として低減する動作を説明する。 An operation for reducing, as noise, a zoom operation sound mixed in the sound when the image capturing apparatus 100 captures a moving image will be described.

まず、ユーザが操作部１０１により動画撮影開始を撮像装置１００に指示したとする。制御部１０２は操作部１０１からの指示に従い、各ブロックを動画撮影用に始動する。例えば、撮像部１０３は、撮影による動画像の画像データをメモリ１０８の画像データ領域に書き込み、音声入力部１０４は、周囲又は被写体の音声を取り込み、その音声データをメモリ１０８の音声データ領域に書き込む。 First, it is assumed that the user instructs the imaging apparatus 100 to start moving image shooting using the operation unit 101. The control unit 102 starts each block for moving image shooting in accordance with an instruction from the operation unit 101. For example, the imaging unit 103 writes image data of a moving image obtained by shooting in the image data area of the memory 108, and the audio input unit 104 takes in the sound of the surrounding or subject and writes the audio data in the audio data area of the memory 108. .

動画撮影中に、ユーザが操作部１０１でズーム操作を制御部１０２に指示したとする。制御部１０２は、操作部１０１からのズーム操作信号に従うズーム制御信号を撮像部１０３に供給すると同時に、メモリ１０８の制御信号領域にズーム制御情報を書き込む。撮像部１０３は、制御部１０２からのズーム制御信号に従いズームレンズを駆動する。このとき、ズーム駆動のためのモータとレンズ鏡筒等が回転動作し、雑音を発生する。音声入力部１０４は、この雑音を本来の周囲又は被写体からの音声と一緒に取り込む。音声入力部１０４は、取り込んだ音声の音声データをメモリ１０８の音声データ領域に書き込む。 Assume that the user instructs the control unit 102 to perform a zoom operation using the operation unit 101 during moving image shooting. The control unit 102 supplies the zoom control signal in accordance with the zoom operation signal from the operation unit 101 to the imaging unit 103 and simultaneously writes the zoom control information in the control signal area of the memory 108. The imaging unit 103 drives the zoom lens according to the zoom control signal from the control unit 102. At this time, the motor for driving the zoom, the lens barrel, and the like rotate to generate noise. The voice input unit 104 captures this noise together with the voice from the original surroundings or subject. The voice input unit 104 writes the voice data of the fetched voice into the voice data area of the memory 108.

雑音低減処理部１０５は、メモリ１０８の音声データ領域に記憶される音声データとメモリ１０８の制御信号領域に記憶されるズーム制御情報を読み込み、ズーム制御情報に従い音声データへの雑音低減処理の有無を切り替える。すなわち、雑音低減処理部１０５は、ズームが動作していないときの音声データには雑音低減処理を行わず、読み込んだ音声データをそのままメモリ１０８の音声データ領域に書き戻す。他方、雑音低減処理部１０５は、ズームが動作している時に入力された音声データには雑音低減処理を行い、雑音低減処理後の音声データをメモリ１０８の音声データ領域に書き戻す。 The noise reduction processing unit 105 reads audio data stored in the audio data area of the memory 108 and zoom control information stored in the control signal area of the memory 108, and determines whether or not noise reduction processing is performed on the audio data according to the zoom control information. Switch. That is, the noise reduction processing unit 105 does not perform noise reduction processing on the audio data when the zoom is not operating, and rewrites the read audio data as it is in the audio data area of the memory 108. On the other hand, the noise reduction processing unit 105 performs noise reduction processing on the audio data input when the zoom is operating, and writes the audio data after the noise reduction processing back to the audio data area of the memory 108.

取り込まれた音声を出力する場合、音声出力部１０６が、メモリ１０８の音声データ領域の雑音低減された音声データを読み出して音響出力する。また、映像出力部１０７が、メモリ１０８の画像データ領域に記憶されている画像データを読み出して、動画として出力する。 When outputting the captured sound, the sound output unit 106 reads out the sound data with reduced noise in the sound data area of the memory 108 and outputs it as sound. Further, the video output unit 107 reads out the image data stored in the image data area of the memory 108 and outputs it as a moving image.

音声入力部１０４により入力される音声データの雑音低減処理部１０５による雑音低減処理を具体的に説明する。ここでは、説明例として、音声入力部１０４のＡ／Ｄ変換におけるサンプリング周波数は４８ｋＨｚとする。ＦＦＴのためのフレーム分割単位は１０２４点、すなわち、フーリエ変換部２０２及び逆フーリエ変換部２０９の周波数分解能は１０２４点とする。つまり、フーリエ変換部２０２のフーリエ変換結果は、５１２点のスペクトルで２４ｋＨｚまでの周波数成分を表すことになる。 The noise reduction processing by the noise reduction processing unit 105 of the voice data input by the voice input unit 104 will be specifically described. Here, as an illustrative example, the sampling frequency in the A / D conversion of the voice input unit 104 is 48 kHz. The frame division unit for FFT is 1024 points, that is, the frequency resolution of the Fourier transform unit 202 and the inverse Fourier transform unit 209 is 1024 points. That is, the Fourier transform result of the Fourier transform unit 202 represents frequency components up to 24 kHz with a spectrum of 512 points.

ノイズプロファイル記憶部２０３に記憶されているノイズプロファイルは、低減対象とする雑音（ここではズーム雑音）のみで構成される音声信号をフーリエ変換することで得られる。事前に、音声入力部１０４にこの雑音のみを取り込み、信号入力制御部２０１でフレーム分割し、フーリエ変換部２０２でフーリエ変換する。この処理で得られる５１２点までの周波数成分を時間方向にフレームをまたいでピークホールドしたものをノイズプロファイルとしてノイズプロファイル記憶部２０３に格納する。ノイズプロファイルをｐｒｏｆｉｌｅ［ｎ］と表現する。但し、ｎ＝０，１，２，３，・・・・，５１１である。 The noise profile stored in the noise profile storage unit 203 is obtained by performing Fourier transform on an audio signal composed only of noise to be reduced (here, zoom noise). In advance, only this noise is captured in the voice input unit 104, the signal input control unit 201 divides the frame, and the Fourier transform unit 202 performs Fourier transform. The frequency components obtained up to 512 points obtained by this processing and peak-held across frames in the time direction are stored in the noise profile storage unit 203 as a noise profile. The noise profile is expressed as profile [n]. However, n = 0, 1, 2, 3,... 511.

雑音低減処理部１０５の信号入力制御部２０１は、メモリ１０８の音声データ領域に記憶されている音声データを、メモリバス１０９を通して読み出し、１０２４サンプル毎にフレーム分割する。このとき、メモリ１０８に記憶されているズーム制御情報も読み出し、ズーム動作中の音声データでなければ音声データを信号出力制御部２１０に供給し、ズーム動作中の音声データであれば音声データをフーリエ変換部２０２に供給する。 The signal input control unit 201 of the noise reduction processing unit 105 reads the audio data stored in the audio data area of the memory 108 through the memory bus 109 and divides the frame every 1024 samples. At this time, the zoom control information stored in the memory 108 is also read, and if the audio data is not in the zoom operation, the audio data is supplied to the signal output control unit 210. If the audio data is in the zoom operation, the audio data is Fourier transformed. This is supplied to the conversion unit 202.

フーリエ変換部２０２は、信号入力制御部２０１からの１０２４点の音声データをフーリエ変換する。フーリエ変換結果をｓｏｕｎｄ＿ｆｆｔ［Ｎ］と表現する。但し、Ｎ＝０，１，２，３，・・・・，１０２３である。 The Fourier transform unit 202 performs Fourier transform on the 1024 points of audio data from the signal input control unit 201. The Fourier transform result is expressed as sound_fft [N]. However, N = 0, 1, 2, 3,..., 1023.

フーリエ変換部２０２は、フーリエ変換結果のうちの前半の５１２点までの結果を周波数成分除算部２０４と雑音低減部２０８に供給する。フーリエ変換部２０２は、フーリエ変換結果のうちの前半の５１２点までの結果をｓｏｕｎｄ＿ｆｆｔ［ｎ］と表現する。但し、ｎ＝０，１，２，３，・・・・，５１１である。 The Fourier transform unit 202 supplies the results up to 512 points in the first half of the Fourier transform results to the frequency component division unit 204 and the noise reduction unit 208. The Fourier transform unit 202 expresses the results up to 512 points in the first half of the Fourier transform results as sound_fft [n]. However, n = 0, 1, 2, 3,... 511.

周波数成分除算部２０４は、フーリエ変換部２０２からのフーリエ変換結果ｓｏｕｎｄ＿ｆｆｔ［ｎ］をノイズプロファイル記憶部２０３からのノイズプロファイルｐｒｏｆｉｌｅ［ｎ］で除算する。除算結果はいわゆる信号対雑音比（ＳＮ比）であり、
ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ］
＝ｓｏｕｎｄ＿ｆｆｔ［ｎ］／ｐｒｏｆｉｌｅ［ｎ］
となる。ｎ＝０，１，２，・・・・，５１１である。 The frequency component division unit 204 divides the Fourier transform result sound_fft [n] from the Fourier transform unit 202 by the noise profile profile [n] from the noise profile storage unit 203. The division result is the so-called signal-to-noise ratio (S / N ratio),
signal_noise [n]
= Sound_fft [n] / profile [n]
It becomes. n = 0, 1, 2,..., 511.

この演算結果の数値が大きい周波数には、所望音声（周囲又は被写体からの音声）の周波数成分に、低減処理すべき雑音の周波数成分が重畳していると考えられる。逆に、ＳＮ比が１に近ければ近いほど、その周波数における、低減対象となる雑音の占める割合が大きいと考えられる。また、ＳＮ比が１以下の値であった場合は、低減処理すべき雑音の周波数成分がノイズプロファイルに対して変化していると考えられる。 It is considered that the frequency component of the noise to be reduced is superimposed on the frequency component of the desired sound (sound from the surrounding or the subject) at the frequency where the numerical value of the calculation result is large. On the contrary, it is considered that the closer the S / N ratio is to 1, the greater the proportion of noise to be reduced at that frequency. Further, when the SN ratio is a value of 1 or less, it is considered that the frequency component of noise to be reduced has changed with respect to the noise profile.

時間変化制御部２０５は、周波数成分除算部２０４の除算結果ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ］に、時間方向への巡回型ローパスフィルタをかける。図３は、時間変化制御部２０５の概略構成ブロック図を示す。時間変化制御部２０５は、加算器３０１、前フレームの雑音低減処理で算出された値を保持するレジスタ３０２、１以下の係数ｃｏｅｆを乗算する乗算器３０３、係数（１−ｃｏｅｆ）を乗算する乗算器３０４から構成される。係数ｃｏｅｆは実験的に求められる値である。 The time change control unit 205 applies a cyclic low-pass filter in the time direction to the division result signal_noise [n] of the frequency component division unit 204. FIG. 3 shows a schematic block diagram of the time change control unit 205. The time change control unit 205 includes an adder 301, a register 302 that holds a value calculated in the noise reduction process of the previous frame, a multiplier 303 that multiplies a coefficient coef of 1 or less, and a multiplication that multiplies a coefficient (1-coef). The unit 304 is configured. The coefficient coef is a value obtained experimentally.

周波数成分除算部２０４の演算結果ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ_Ｆ］が加算器３０１に入力する。ここでＦはフレーム番号を表し、ｎ_Ｆ＝０，１，２，・・・・，５１１である。加算器３０１は、周波数成分除算部２０４の出力ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ_Ｆ］にレジスタ３０２の保持値Ｒｅｇ［ｎ_Ｆ］を加算し、加算結果ｔｍｐ［ｎ_Ｆ］を出力する。レジスタ３０２は、前フレームの加算器３０１の出力ｔｍｐ［ｎ_Ｆ-１］に係数（１−ｃｏｅｆ）を乗算した結果をＲｅｇ［ｎ_Ｆ］として保持する。すなわち、
Ｒｅｇ［ｎ_Ｆ］
＝（ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ_Ｆ−１］＋Ｒｅｇ［ｎ_Ｆ−１］）×（１−ｃｏｅｆ）
ｔｍｐ［ｎ_Ｆ］
＝ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ_Ｆ］＋Ｒｅｇ［ｎ_Ｆ］
である。 The calculation result signal_noise [n _F ] of the frequency component division unit 204 is input to the adder 301. Here, F represents a frame number, and n _F = 0, 1, 2,. The adder 301 adds the holding value Reg [n _F ] of the register 302 to the output signal_noise [n _F ] of the frequency component division unit 204, and outputs the addition result tmp [n _F ]. The register 302 holds the result of multiplying the output tmp [n _F−1 ] of the adder 301 of the previous frame by the coefficient (1-coef) as Reg [n _F ]. That is,
Reg [n _F ]
= (Signal_noise [n _F-1 ] + Reg [n _F-1 ]) × (1-coef)
tmp [n _F ]
= Signal_noise [n _F ] + Reg [n _F ]
It is.

乗算器３０３は、加算器３０１の出力ｔｍｐ［ｎ_Ｆ］に係数ｃｏｅｆを乗算する。乗算器３０３の出力は時間変化制御部２０５の出力であり、
ｏｕｔｐｕｔ＿ＬＰＦ［ｎ_Ｆ］＝ｔｍｐ［ｎ_Ｆ］×ｃｏｅｆ
と表される。 The multiplier 303 multiplies the output tmp [n _F ] of the adder 301 by the coefficient coef. The output of the multiplier 303 is the output of the time change control unit 205,
output_LPF [n _F ] = tmp [n _F ] × coef
It is expressed.

以上のように、時間変化制御部２０５は周波数成分除算部２０４の演算結果に時間方向へのローパスフィルタをかける。このローパスフィルタ処理により、周波数成分除算部２０４の出力、すなわちＳＮ比の時間方向への急変を抑制できる。例えば、周波数成分除算部２０４の出力が図４に示すような時間変化を示す場合に、時間変化制御部２０５の出力は、図５に示すように、高周波数成分が低減した、よりスムーズな変化を示す。 As described above, the time change control unit 205 applies the low-pass filter in the time direction to the calculation result of the frequency component division unit 204. By this low-pass filter process, it is possible to suppress the output of the frequency component division unit 204, that is, the sudden change in the SN ratio in the time direction. For example, when the output of the frequency component dividing unit 204 shows a time change as shown in FIG. 4, the output of the time change control unit 205 is a smoother change with a reduced high frequency component as shown in FIG. Indicates.

時間変化制御部２０５で時間変化を緩和した結果に従い減算係数算出部２０６が、ノイズプロファイルに対する重み係数を決定するので、決定される重み係数は、ＳＮ比の急激な変動の影響を受けにくくなる。この結果、ミュージカルノイズを低減した雑音低減を実現でき、音質劣化を改善出来る。 Since the subtraction coefficient calculation unit 206 determines the weighting coefficient for the noise profile according to the result of relaxing the time change by the time change control unit 205, the determined weighting coefficient is not easily affected by a sudden change in the SN ratio. As a result, noise reduction with reduced musical noise can be realized, and sound quality deterioration can be improved.

以後の説明では、フレーム番号Ｆが影響しないので、フレーム番号Ｆの表記を省略する。 In the following description, the frame number F is not affected, omitted representation of frame number F.

減算係数算出部２０６は、時間変化制御部２０５の演算結果ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］を用いて減算係数γ［ｎ］を算出又は決定する。減算係数算出部２０６は例えば、図６に示すようにｏｕｔｐｕｔ＿ＬＰＦ［ｎ］の値によって一意に減算係数γ［ｎ］を決定するルックアップテーブルからなる。 The subtraction coefficient calculation unit 206 calculates or determines the subtraction coefficient γ [n] using the calculation result output_LPF [n] of the time change control unit 205. The subtraction coefficient calculation unit 206 includes, for example, a lookup table that uniquely determines the subtraction coefficient γ [n] based on the value of output_LPF [n] as shown in FIG.

減算係数算出部２０６は、図６に示すように、ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］の値が大きくなればなるほど減算係数γ［ｎ］を小さくし、一定値以上で一定値にする。この理由は次の通りである。すなわち、ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］が十分に大きい場合は、低減対象とする雑音の周波数成分に対して、十分大きな所望音声の周波数成分が重畳されている。このとき、マスキング効果により、人間の聴覚は雑音をほぼ知覚できなくなるので、減算係数γ［ｎ］を小さくすることで、所望音声の劣化を抑制する。減算係数算出部２０６は、周波数毎に決定した減算係数γ［ｎ］を乗算器２０７に供給する。 As illustrated in FIG. 6, the subtraction coefficient calculation unit 206 decreases the subtraction coefficient γ [n] as the value of output_LPF [n] increases, and sets a constant value equal to or greater than a certain value. The reason is as follows. That is, when output_LPF [n] is sufficiently large, a sufficiently large frequency component of the desired speech is superimposed on the frequency component of noise to be reduced. At this time, the human auditory sense can hardly perceive the noise due to the masking effect. Therefore, by reducing the subtraction coefficient γ [n], deterioration of the desired speech is suppressed. The subtraction coefficient calculation unit 206 supplies the subtraction coefficient γ [n] determined for each frequency to the multiplier 207.

乗算器２０７は、ノイズプロファイル記憶部２０３からのプロファイルｐｒｏｆｉｌｅ［ｎ］に減算係数算出部２０６からの減算係数γ［ｎ］を周波数毎に乗算する。これにより、低減すべき雑音量ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］が周波数ごとに決定される。すなわち、
ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］＝ｐｒｏｆｉｌｅ［ｎ］×γ［ｎ］
である。 The multiplier 207 multiplies the profile profile [n] from the noise profile storage unit 203 by the subtraction coefficient γ [n] from the subtraction coefficient calculation unit 206 for each frequency. Thereby, the noise amount sub_profile [n] to be reduced is determined for each frequency. That is,
sub_profile [n] = profile [n] × γ [n]
It is.

雑音低減部２０８は、フーリエ変換部２０２からの周波数振幅成分ｓｏｕｎｄ＿ｆｆｔ［ｎ］から乗算器２０７からの雑音相当量を周波数ごとに減算し、雑音低減スペクトルａｆｔｅｒ＿ｓｕｂｔｒａｃｔ［ｎ］を出力する。すなわち、
ａｆｔｅｒ＿ｓｕｂｔｒａｃｔ［ｎ］
＝ｓｏｕｎｄ＿ｆｆｔ［ｎ］−ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］
となる。 The noise reduction unit 208 subtracts the noise equivalent amount from the multiplier 207 from the frequency amplitude component sound_fft [n] from the Fourier transform unit 202 for each frequency, and outputs a noise reduction spectrum after_subtract [n]. That is,
after_subtract [n]
= Sound_fft [n] -sub_profile [n]
It becomes.

逆フーリエ変換部２０９は、雑音低減部２０８からの５１２点の雑音低減スペクトルａｆｔｅｒ＿ｓｕｂｔｒａｃｔ［ｎ］（ｎ＝０，１，２，…，５１１）を１０２４点に拡張する。拡張後の雑音低減スペクトルをａｆｔｅｒ＿ｓｕｂｔｒａｃｔ２［Ｎ］とし、Ｎ＝０、１、２、・・・、１０２３とする。Ｎ＜５１２では、ａｆｔｅｒ＿ｓｕｂｔｒａｃｔ２［Ｎ］
＝ａｆｔｅｒ＿ｓｕｂｔｒａｃｔ［ｎ］
とする。Ｎ＝５１２では、
ａｆｔｅｒ＿ｓｕｂｔｒａｃｔ２［Ｎ］＝０
とする。Ｎ＞５１２では、
ａｆｔｅｒ＿ｓｕｂｔｒａｃｔ２［Ｎ］
＝ａｆｔｅｒ＿ｓｕｂｔｒａｃｔ［１０２４−Ｎ］
とする。 The inverse Fourier transform unit 209 extends the 512-point noise reduction spectrum after_subtract [n] (n = 0, 1, 2,..., 511) from the noise reduction unit 208 to 1024 points. The expanded noise reduction spectrum is set to after_subtract2 [N], and N = 0, 1, 2,. If N <512, after_subtract2 [N]
= After_subtract [n]
And For N = 512,
after_subtract2 [N] = 0
And For N> 512,
after_subtract2 [N]
= After_subtract [1024-N]
And

逆フーリエ変換部２０９は、１０２４点に拡張した雑音低減スペクトルａｆｔｅｒ＿ｓｕｂｔｒａｃｔ２［Ｎ］を、フーリエ変換部２０２からの位相情報を用いて逆フーリエ変換する。これにより、周波数領域でズーム雑音を低減した音声データが得られる。逆フーリエ変換部２０９は、このようにして得られた音声データを信号出力制御部２１０に供給する。 The inverse Fourier transform unit 209 performs inverse Fourier transform on the noise reduction spectrum after_subtract2 [N] expanded to 1024 points using the phase information from the Fourier transform unit 202. Thereby, audio data with reduced zoom noise in the frequency domain can be obtained. The inverse Fourier transform unit 209 supplies the audio data obtained in this way to the signal output control unit 210.

信号出力制御部２１０は、逆フーリエ変換部２０９からの音声データをフレーム毎に繋ぎ合わせ、メモリバス１０９を通してメモリ１０８の音声データ領域に書き戻す。 The signal output control unit 210 connects the audio data from the inverse Fourier transform unit 209 for each frame, and writes it back to the audio data area of the memory 108 through the memory bus 109.

本実施例では、ズーム非動作時の音声データを雑音低減処理部１０５がメモリ１０８から読み出してメモリ１０８に書き戻すようにしているが、雑音低減処理部１０５を経由させないようにしてもよい。この場合、雑音低減処理部１０５の信号入力制御部２０１から信号出力制御部２１０に直接至る信号路は不要となる。 In this embodiment, the noise reduction processing unit 105 reads out the audio data when the zoom is not operated from the memory 108 and writes it back to the memory 108. However, the audio data may not be passed through the noise reduction processing unit 105. In this case, a signal path directly from the signal input control unit 201 of the noise reduction processing unit 105 to the signal output control unit 210 is unnecessary.

図７は、本発明の第２実施例の概略構成ブロック図を示し、図８は、雑音低減処理部７０５の概略構成ブロック図を示す。図７に示す撮像装置７００は、操作部７０１、制御部７０２、撮像部７０３、音声入力部７０４、雑音低減処理部７０５、音声出力部７０６、映像出力部７０７、メモリ７０８及びメモリバス７０９から構成される。雑音低減処理部７０５以外のブロック７０１〜７０４，７０６〜７０９はそれぞれ、図１に示す実施例の対応するブロック１０１〜１０４，１０６〜１０９と同じ機能を果たすので、詳細な説明を省略する。 FIG. 7 shows a schematic block diagram of the second embodiment of the present invention, and FIG. 8 shows a schematic block diagram of the noise reduction processing unit 705. An imaging apparatus 700 illustrated in FIG. 7 includes an operation unit 701, a control unit 702, an imaging unit 703, an audio input unit 704, a noise reduction processing unit 705, an audio output unit 706, a video output unit 707, a memory 708, and a memory bus 709. Is done. The blocks 701 to 704 and 706 to 709 other than the noise reduction processing unit 705 perform the same functions as the corresponding blocks 101 to 104 and 106 to 109 of the embodiment shown in FIG.

雑音低減処理部７０５は、制御信号領域に記憶されたズーム制御情報に従い、メモリ７０８の音声データ領域に記憶された音声データに対する雑音低減処理を実行する。具体的には、雑音低減処理部７０５は、ズーム制御情報がズーム動作中を示すときには、メモリ７０８から読み出した音声データに一定期間のフレーム単位で雑音低減処理を施し、処理後の音声データをメモリ７０８に書き戻す。ズーム駆動中でないときには、ズーム駆動音が発生していないので、雑音低減処理部７０５は、メモリ７０８から読み出した音声データをそのままメモリ７０８に書き戻す。 The noise reduction processing unit 705 performs noise reduction processing on the audio data stored in the audio data area of the memory 708 in accordance with the zoom control information stored in the control signal area. Specifically, when the zoom control information indicates that the zoom operation is being performed, the noise reduction processing unit 705 performs noise reduction processing on the audio data read from the memory 708 in units of frames for a certain period, and stores the processed audio data in the memory Write back to 708. When zoom driving is not being performed, zoom driving sound is not generated, so the noise reduction processing unit 705 writes the audio data read from the memory 708 back to the memory 708 as it is.

図８を参照して、雑音低減処理部７０５の構成と雑音低減動作を詳細に説明する。 The configuration and noise reduction operation of the noise reduction processing unit 705 will be described in detail with reference to FIG.

信号入力制御部８０１はメモリバス７０９とのインターフェースである。メモリ７０８の音声データ領域に記憶されている音声データと制御信号領域に記憶されているズーム制御情報が、メモリバス７０９を介して信号入力制御部８０１に入力する。信号入力制御部８０１は入力する音声データを所定の一定区間毎にフレーム分割し、ズーム制御情報に従い、雑音低減処理をするときにはフーリエ変換部８０２に供給し、雑音低減処理をしないときには信号出力制御部８１１に供給する。すなわち、ズーム制御情報がズーム中、即ち、ズーム駆動による雑音発生中（とその可能性）を示すときには、信号入力制御部８０１は、音声データをフーリエ変換部８０２に供給する。他方、ズーム制御情報がズーム中でないとき、即ち、ズーム駆動による雑音が発生しない状態を示すとき、信号入力制御部８０１は、音声データを信号出力制御部８１１に供給する。何れの場合でも、信号入力制御部８０１は、ズーム制御情報を信号出力制御部８１１に供給する。 The signal input control unit 801 is an interface with the memory bus 709. Audio data stored in the audio data area of the memory 708 and zoom control information stored in the control signal area are input to the signal input control unit 801 via the memory bus 709. The signal input control unit 801 divides the input audio data into frames at predetermined intervals, and supplies them to the Fourier transform unit 802 when noise reduction processing is performed according to zoom control information, and the signal output control unit when noise reduction processing is not performed. 811. That is, when the zoom control information indicates that zooming is in progress, that is, noise is being generated due to zoom driving (and the possibility thereof), the signal input control unit 801 supplies audio data to the Fourier transform unit 802. On the other hand, when the zoom control information is not during zooming, that is, when the zoom control information indicates a state in which noise due to zoom driving does not occur, the signal input control unit 801 supplies audio data to the signal output control unit 811. In any case, the signal input control unit 801 supplies zoom control information to the signal output control unit 811.

フーリエ変換部８０２は、フーリエ変換部２０２と同様に動作し、信号入力制御部８０１からのフレーム分割された音声データをフーリエ変換し、周波数毎の位相情報と、周波数毎の振幅の絶対値（周波数振幅成分）を算出する。フーリエ変換部８０２は、周波数毎の周波数振幅成分を周波数成分除算部８０４と雑音低減部８０９に供給し、周波数毎の位相情報を逆フーリエ変換部８１０に供給する。 The Fourier transform unit 802 operates in the same manner as the Fourier transform unit 202, performs Fourier transform on the frame-divided audio data from the signal input control unit 801, and performs phase information for each frequency and the absolute value of the amplitude for each frequency (frequency (Amplitude component) is calculated. The Fourier transform unit 802 supplies the frequency amplitude component for each frequency to the frequency component division unit 804 and the noise reduction unit 809, and supplies phase information for each frequency to the inverse Fourier transform unit 810.

ノイズプロファイル記憶部８０３は、ノイズプロファイル記憶部２０３と同様に、低減対象となる雑音（ここでは、ズーム駆動音等のズーム駆動時に発生する雑音）の周波数振幅成分をノイズプロファイルとして記憶する。ノイズプロファイル記憶部８０３は、記憶しているノイズプロファイルを、周波数成分除算部８０４と乗算器８０７に読み出す。 Similar to the noise profile storage unit 203, the noise profile storage unit 803 stores the frequency amplitude component of noise to be reduced (here, noise generated during zoom driving, such as zoom driving sound) as a noise profile. The noise profile storage unit 803 reads the stored noise profile to the frequency component division unit 804 and the multiplier 807.

周波数成分除算部８０４は周波数成分除算部２０４と同様に、フーリエ変換部８０２からの周波数振幅成分（Ａ）をノイズプロファイル記憶部８０３からのノイズプロファイルの対応する周波数の値（Ｂ）で周波数毎に除算する。周波数成分除算部８０４は、周波数毎の除算結果を時間変化制御部８０５に供給する。 Similarly to the frequency component division unit 204, the frequency component division unit 804 converts the frequency amplitude component (A) from the Fourier transform unit 802 into a frequency value (B) corresponding to the noise profile from the noise profile storage unit 803 for each frequency. Divide. The frequency component division unit 804 supplies the division result for each frequency to the time change control unit 805.

時間変化制御部８０５は実質的にはローパスフィルタ（ＬＰＦ）からなり、周波数成分除算部８０４からの周波数毎の除算結果を時間軸上で平滑化する。時間変化制御部８０５は、周波数毎のフィルタ結果、すなわち低域成分を減算係数算出部８０６と雑音低減部８０９に出力する。 The time change control unit 805 substantially includes a low-pass filter (LPF), and smoothes the division result for each frequency from the frequency component division unit 804 on the time axis. The time change control unit 805 outputs the filter result for each frequency, that is, the low frequency component, to the subtraction coefficient calculation unit 806 and the noise reduction unit 809.

減算係数算出部８０６は、減算係数算出部２０６と同様に動作する。すなわち、減算係数算出部８０６は、時間変化制御部８０５からの周波数毎の時間変化の低域成分量から周波数毎の減算係数（ノイズプロファイルに対する重み係数）を算出し、乗算器８０７に出力する。乗算器８０７は乗算器２０７と同様に、減算係数算出部８０６からの周波数毎の減算係数をノイズプロファイル記憶部８０３から読み出されたノイズプロファイルの対応する周波数成分の値に乗算する。 The subtraction coefficient calculation unit 806 operates in the same manner as the subtraction coefficient calculation unit 206. That is, the subtraction coefficient calculation unit 806 calculates a subtraction coefficient for each frequency (weight coefficient for the noise profile) from the low frequency component amount of the time change for each frequency from the time change control unit 805, and outputs it to the multiplier 807. Similarly to the multiplier 207, the multiplier 807 multiplies the subtraction coefficient for each frequency from the subtraction coefficient calculation unit 806 by the value of the corresponding frequency component of the noise profile read from the noise profile storage unit 803.

雑音時間変化情報記憶部８０８は、低減対象となる雑音の周波数毎の時間変化の傾向を記憶し、適時のタイミングで記憶情報を雑音低減部８０９に供給する。雑音時間変化情報記憶部８０８に記憶される情報の詳細は後述する。 The noise time change information storage unit 808 stores a tendency of time change for each frequency of noise to be reduced, and supplies the stored information to the noise reduction unit 809 at an appropriate timing. Details of the information stored in the noise time change information storage unit 808 will be described later.

雑音低減部８０９には、時間変化制御部８０５の演算結果８０５ｓと、乗算器８０７の乗算結果８０７ｓと、フーリエ変換部８０２からの周波数振幅成分８０２ｓが入力する。雑音低減部８０９は、信号入力制御部８０１からのズーム制御情報８０１ｓと、雑音時間変化情報記憶部８０８からの雑音時間変化傾向情報８０８ｓに従い、周波数振幅成分８０２ｓに含まれる雑音を周波数領域で低減する。雑音低減部８０９は、雑音低減処理を行った音声信号を、逆フーリエ変換部８１０に供給する。雑音低減部８０９の詳細は後述する。 The noise reduction unit 809 receives the calculation result 805s of the time change control unit 805, the multiplication result 807s of the multiplier 807, and the frequency amplitude component 802s from the Fourier transform unit 802. The noise reduction unit 809 reduces the noise included in the frequency amplitude component 802s in the frequency domain according to the zoom control information 801s from the signal input control unit 801 and the noise time change tendency information 808s from the noise time change information storage unit 808. . The noise reduction unit 809 supplies the audio signal subjected to the noise reduction process to the inverse Fourier transform unit 810. Details of the noise reduction unit 809 will be described later.

逆フーリエ変換部８１０は、逆フーリエ変換部２０９と同様に、雑音低減部８０９からの音声信号スペクトルをフーリエ変換部８０２からの位相情報を用いて逆フーリエ変換する。これにより、雑音低減した時間領域の音声データを得ることができる。逆フーリエ変換部８１０は、逆フーリエ変換により復元された音声データを信号出力制御部８１１に供給する。 Similar to the inverse Fourier transform unit 209, the inverse Fourier transform unit 810 performs inverse Fourier transform on the audio signal spectrum from the noise reduction unit 809 using the phase information from the Fourier transform unit 802. This makes it possible to obtain time-domain audio data with reduced noise. The inverse Fourier transform unit 810 supplies the audio data restored by the inverse Fourier transform to the signal output control unit 811.

信号出力制御部８１１は、信号出力制御部２１０と同様に動作する。即ち、信号出力制御部８１１は、信号入力制御部８０１からのズーム制御情報に従い、雑音低減処理をするときには、逆フーリエ変換部８１０からの音声データを選択してメモリバス７０９に出力する。また、信号出力制御部８１１は、ズーム制御情報に従い、雑音低減処理をしないときには、信号入力制御部８０１からのフレーム分割された音声データを選択し、フレーム合成して、メモリバス７０９に出力する。 The signal output control unit 811 operates in the same manner as the signal output control unit 210. That is, the signal output control unit 811 selects the audio data from the inverse Fourier transform unit 810 and outputs it to the memory bus 709 when performing noise reduction processing according to the zoom control information from the signal input control unit 801. Further, the signal output control unit 811 selects the frame-divided audio data from the signal input control unit 801 according to the zoom control information, and performs frame synthesis to output to the memory bus 709 when noise reduction processing is not performed.

音声入力部７０４により入力される音声データの雑音低減処理部７０５による雑音低減処理を具体的に説明する。ここでは、説明例として、音声入力部７０４のＡ／Ｄ変換におけるサンプリング周波数は４８ｋＨｚとする。ＦＦＴのためのフレーム分割単位は１０２４点、すなわち、フーリエ変換部８０２及び逆フーリエ変換部８１０の周波数分解能は１０２４点とする。つまり、フーリエ変換部８０２のフーリエ変換結果は、５１２点のスペクトルで２４ｋＨｚまでの周波数成分を表すことになる。 The noise reduction processing by the noise reduction processing unit 705 of the voice data input by the voice input unit 704 will be specifically described. Here, as an illustrative example, the sampling frequency in the A / D conversion of the voice input unit 704 is 48 kHz. The frame division unit for FFT is 1024 points, that is, the frequency resolution of the Fourier transform unit 802 and the inverse Fourier transform unit 810 is 1024 points. That is, the Fourier transform result of the Fourier transform unit 802 represents frequency components up to 24 kHz with a spectrum of 512 points.

ノイズプロファイル記憶部８０３に記憶されているノイズプロファイルは、低減対象とする雑音（ここではズーム雑音）のみで構成される音声信号をフーリエ変換することで得られる。事前に、音声入力部７０４にこの雑音のみを取り込み、信号入力制御部８０１でフレーム分割し、フーリエ変換部８０２でフーリエ変換する。この処理で得られる５１２点までの周波数成分を時間方向にフレームをまたいでピークホールドしたものをノイズプロファイルとしてノイズプロファイル記憶部８０３に格納する。ノイズプロファイルをｐｒｏｆｉｌｅ［ｎ］と表現する。但し、ｎ＝０，１，２，３，・・・・，５１１である。 The noise profile stored in the noise profile storage unit 803 is obtained by performing a Fourier transform on an audio signal composed only of noise to be reduced (zoom noise in this case). In advance, only this noise is captured in the voice input unit 704, the signal input control unit 801 divides the frame, and the Fourier transform unit 802 performs Fourier transform. The frequency components obtained up to 512 points obtained by this processing and peak-held in the time direction across frames are stored in the noise profile storage unit 803 as a noise profile. The noise profile is expressed as profile [n]. However, n = 0, 1, 2, 3,... 511.

雑音低減処理部７０５の信号入力制御部８０１は、メモリ７０８の音声データ領域に記憶されている音声データを、メモリバス７０９を通して読み出し、１０２４サンプル毎にフレーム分割する。このとき、メモリ１０８に記憶されているズーム制御情報も読み出し、ズーム動作中の音声データでなければ音声データを信号出力制御部８１１に供給し、ズーム動作中の音声データであれば音声データをフーリエ変換部８０２に供給する。 The signal input control unit 801 of the noise reduction processing unit 705 reads the audio data stored in the audio data area of the memory 708 through the memory bus 709 and divides the frame every 1024 samples. At this time, the zoom control information stored in the memory 108 is also read, and if the audio data is not in the zoom operation, the audio data is supplied to the signal output control unit 811. If the audio data is in the zoom operation, the audio data is Fourier-transformed. The data is supplied to the conversion unit 802.

フーリエ変換部８０２は、信号入力制御部８０１からの１０２４点の音声データをフーリエ変換する。フーリエ変換結果をｓｏｕｎｄ＿ｆｆｔ［Ｎ］と表現する。但し、Ｎ＝０，１，２，３，・・・・，１０２３である。 The Fourier transform unit 802 performs Fourier transform on the 1024 points of audio data from the signal input control unit 801. The Fourier transform result is expressed as sound_fft [N]. However, N = 0, 1, 2, 3,..., 1023.

フーリエ変換部８０２は、フーリエ変換結果のうちの前半の５１２点までの結果を周波数成分除算部８０４と雑音低減部８０９に供給する。フーリエ変換部８０２は、フーリエ変換結果のうちの前半の５１２点までの結果をｓｏｕｎｄ＿ｆｆｔ［ｎ］と表現する。但し、ｎ＝０，１，２，３，・・・・，５１１である。 The Fourier transform unit 802 supplies the results up to 512 points in the first half of the Fourier transform results to the frequency component division unit 804 and the noise reduction unit 809. The Fourier transform unit 802 expresses the results up to 512 points in the first half of the Fourier transform results as sound_fft [n]. However, n = 0, 1, 2, 3,... 511.

周波数成分除算部８０４は、フーリエ変換部８０２からのフーリエ変換結果ｓｏｕｎｄ＿ｆｆｔ［ｎ］をノイズプロファイル記憶部８０３からのノイズプロファイルｐｒｏｆｉｌｅ［ｎ］で除算する。除算結果はいわゆる信号対雑音比（ＳＮ比）であり、
ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ］
＝ｓｏｕｎｄ＿ｆｆｔ［ｎ］／ｐｒｏｆｉｌｅ［ｎ］
となる。ｎ＝０，１，２，・・・・，５１１である。 The frequency component division unit 804 divides the Fourier transform result sound_fft [n] from the Fourier transform unit 802 by the noise profile profile [n] from the noise profile storage unit 803. The division result is the so-called signal-to-noise ratio (S / N ratio),
signal_noise [n]
= Sound_fft [n] / profile [n]
It becomes. n = 0, 1, 2,..., 511.

時間変化制御部８０５は、時間変化制御部２０５と同じ構成からなり、周波数成分除算部８０４の除算結果ｓｉｇｎａｌ＿ｎｏｉｓｅ［ｎ］に、時間方向への巡回型ローパスフィルタをかける。 The time change control unit 805 has the same configuration as the time change control unit 205 and applies a cyclic low-pass filter in the time direction to the division result signal_noise [n] of the frequency component division unit 804.

この実施例でも、実施例１と同様に、時間変化制御部８０５のローパスフィルタ処理により、周波数成分除算部８０４の出力、すなわちＳＮ比の時間方向への急変を抑制できる。この抑制の結果として、音圧の急変を抑制してミュージカルノイズを低減でき、音質劣化を改善できる。 Also in this embodiment, similar to the first embodiment, the rapid change in the time direction of the output of the frequency component divider 804, that is, the SN ratio can be suppressed by the low pass filter processing of the time change controller 805. As a result of this suppression, a sudden change in sound pressure can be suppressed, musical noise can be reduced, and sound quality degradation can be improved.

実施例１と同様に、以後の説明ではフレーム番号Ｆが影響しないので、フレー得番号Ｆの表記を省略する。 As in the first embodiment, since the frame number F does not affect in the following description, the notation of the frame gain number F is omitted.

減算係数算出部８０６は、時間変化制御部８０５の演算結果ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］を用いて減算係数γ［ｎ］を算出又は決定する。減算係数算出部８０６は例えば、減算係数算出部２０６と同様に、ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］の値によって一意に減算係数γ［ｎ］を決定するルックアップテーブルからなる。 The subtraction coefficient calculation unit 806 calculates or determines the subtraction coefficient γ [n] using the calculation result output_LPF [n] of the time change control unit 805. The subtraction coefficient calculation unit 806 includes, for example, a lookup table that uniquely determines the subtraction coefficient γ [n] based on the value of output_LPF [n], similar to the subtraction coefficient calculation unit 206.

減算係数算出部８０６は、図６に示すように、ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］の値が大きくなればなるほど、徐々に減算係数γ［ｎ］を小さくする。この理由は次の通りである。すなわち、ｏｕｔｐｕｔ＿ＬＰＦ［ｎ］が十分に大きい場合は、低減対象とする雑音の周波数成分に対して、十分大きな所望音声の周波数成分が重畳されている。このとき、マスキング効果により、人間の聴覚は雑音をほぼ知覚できなくなるので、減算係数γ［ｎ］を小さくすることで、所望音声の劣化を抑制する。減算係数算出部８０６は、周波数毎に決定した減算係数γ［ｎ］を乗算器８０７に供給する。 As illustrated in FIG. 6, the subtraction coefficient calculation unit 806 gradually decreases the subtraction coefficient γ [n] as the value of output_LPF [n] increases. The reason is as follows. That is, when output_LPF [n] is sufficiently large, a sufficiently large frequency component of the desired speech is superimposed on the frequency component of noise to be reduced. At this time, the human auditory sense can hardly perceive the noise due to the masking effect. Therefore, by reducing the subtraction coefficient γ [n], deterioration of the desired speech is suppressed. The subtraction coefficient calculation unit 806 supplies the subtraction coefficient γ [n] determined for each frequency to the multiplier 807.

乗算器８０７は、プロファイル記憶部８０３からのプロファイルｐｒｏｆｉｌｅ［ｎ］に減算係数算出部８０６からの減算係数γ［ｎ］を周波数毎に乗算する。これにより、低減すべき雑音量ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］が周波数ごとに決定される。すなわち、
ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］＝ｐｒｏｆｉｌｅ［ｎ］×γ［ｎ］
である。 The multiplier 807 multiplies the profile profile [n] from the profile storage unit 803 by the subtraction coefficient γ [n] from the subtraction coefficient calculation unit 806 for each frequency. Thereby, the noise amount sub_profile [n] to be reduced is determined for each frequency. That is,
sub_profile [n] = profile [n] × γ [n]
It is.

雑音時間変化情報記憶部８０８は、雑音が時間方向に変動するときの傾向を周波数毎に記憶している。例えば、ズーム雑音について考えると、ズーム動作の開始から終了までの間で、時間経過と共に雑音の周波数成分が変動する。雑音時間変化情報記憶部８０８は、その時間経過による変動傾向ｆｌｕｃｔｕａｔｉｏｎ［ｎ］を周波数毎に記憶する。図１０、図１１、図１２は、ある周波数ｎ＝ｎ’に着目したときの、周波数成分の変動傾向を示すグラフである。 The noise time change information storage unit 808 stores a tendency when noise fluctuates in the time direction for each frequency. For example, when considering zoom noise, the frequency component of the noise fluctuates with time from the start to the end of the zoom operation. The noise time change information storage unit 808 stores the fluctuation tendency fluctuation [n] over time for each frequency. 10, FIG. 11 and FIG. 12 are graphs showing the fluctuation tendency of frequency components when attention is paid to a certain frequency n = n ′.

図１０は、周波数ｎ＝ｎ’における周波数成分のズーム動作時間による変動が、時間の経過につれて増加するような傾向を示す場合を示す。図１０に示す例では、変動傾向ｆｌｕｃｔｕａｔｉｏｎ［ｎ］の値を０とする。 FIG. 10 shows a case where the variation due to the zoom operation time of the frequency component at the frequency n = n ′ tends to increase as time elapses. In the example illustrated in FIG. 10, the value of the fluctuation tendency “fractation [n]” is set to 0.

図１１は、周波数成分のズーム動作時間による変動傾向が無い場合を示す。図１１に示す例では、変動傾向ｆｌｕｃｔｕａｔｉｏｎ［ｎ］の値を１とする。 FIG. 11 shows a case where there is no fluctuation tendency due to the zoom operation time of the frequency component. In the example illustrated in FIG. 11, the value of the fluctuation tendency “fluctuation [n]” is 1.

図１２は、周波数成分のズーム動作時間による変動が、時間の経過につれて減少するような傾向を示す場合を示す。図１２に示す例では、変動傾向ｆｌｕｃｔｕａｔｉｏｎ［ｎ］の値を２とする。 FIG. 12 shows a case where the variation of the frequency component due to the zoom operation time tends to decrease as time elapses. In the example illustrated in FIG. 12, the value of the fluctuation tendency “fractation [n]” is set to 2.

雑音時間変化情報記憶部８０８は、変動傾向値ｆｌｕｃｔｕａｔｉｏｎ［ｎ］を雑音時間変化傾向情報８０８ｓとして雑音低減部８０９に供給する。雑音低減部８０９は、信号入力制御部８０１からのズーム制御情報８０１ｓと、時間変化制御部８０５の出力８０５ｓ及び雑音時間変化情報記憶部８０８からの雑音時間変化傾向情報８０８ｓ（ｆｌｕｃｔｕａｔｉｏｎ［ｎ］）に従い、雑音低減処理を切り替える。 The noise time change information storage unit 808 supplies the fluctuation tendency value fluctuation [n] to the noise reduction unit 809 as noise time change trend information 808s. The noise reduction unit 809 follows the zoom control information 801 s from the signal input control unit 801, the output 805 s of the time change control unit 805, and the noise time change trend information 808 s (fructuation [n]) from the noise time change information storage unit 808. Switching noise reduction processing.

図９は、雑音低減部８０９の概略構成ブロック図を示す。雑音低減部８０９は、演算切替え制御部９０１、乗算器９０２、減衰倍率算出部９０３及び減算器９０４で構成される。 FIG. 9 shows a schematic block diagram of the noise reduction unit 809. The noise reduction unit 809 includes an operation switching control unit 901, a multiplier 902, an attenuation factor calculation unit 903, and a subtracter 904.

減衰倍率算出部９０３は、時間変化制御部８０５の演算結果に対して、図１３に示すような特性の、減衰倍率Ｋ［ｎ］（但しＫ［ｎ］≦１．０）を出力するテーブルからなる。このテーブルは基本的に、減算係数算出部８０６における減算係数γ［ｎ］とは逆の傾きを持つようなテーブルとなる。図１３で、減衰倍率Ｋ［ｎ］の最低値Ｋ［ｎ］＝０．１の区間は、図６における減算係数γ［ｎ］＝１．０の区間と等しくなる。図１３におけるＫ［ｎ］＝１．０の区間は、図６におけるγ［ｎ］＝０の区間と等しくなる。減衰倍率Ｋ［ｎ］の最低値Ｋ［ｎ］＝０．１の値は実験的に求められるものであり、ここで示す数値は一例である。減衰倍率算出部９０３は、時間変化制御部８０５の出力８０５Ｓに応じた減衰倍率Ｋ［ｎ］を乗算器９０２に供給する。 The attenuation factor calculation unit 903 uses a table that outputs an attenuation factor K [n] (where K [n] ≦ 1.0) having characteristics as shown in FIG. 13 for the calculation result of the time change control unit 805. Become. This table is basically a table having a slope opposite to that of the subtraction coefficient γ [n] in the subtraction coefficient calculation unit 806. In FIG. 13, the section of the minimum value K [n] = 0.1 of the attenuation factor K [n] is equal to the section of the subtraction coefficient γ [n] = 1.0 in FIG. The section of K [n] = 1.0 in FIG. 13 is equal to the section of γ [n] = 0 in FIG. The value of the minimum value K [n] = 0.1 of the attenuation magnification K [n] is obtained experimentally, and the numerical value shown here is an example. The attenuation factor calculation unit 903 supplies the multiplier 902 with an attenuation factor K [n] corresponding to the output 805S of the time change control unit 805.

乗算器９０２は、フーリエ変換部８０２からの周波数振幅成分８０２ｓに減衰倍率Ｋ［ｎ］を乗算する。すなわち、乗算器９０２の出力はｓｏｕｎｄ＿ｆｆｔ［ｎ］×Ｋ［ｎ］で与えられる。但し、ｎ＝０、１，２、・・・、５１１である。 The multiplier 902 multiplies the frequency amplitude component 802 s from the Fourier transform unit 802 by the attenuation factor K [n]. That is, the output of the multiplier 902 is given by sound_fft [n] × K [n]. However, n = 0, 1, 2,... 511.

また、減算器９０４は、フーリエ変換部８０２からの周波数振幅成分８０２ｓから、乗算器８０７の出力８０７ｓを減算する。減算器９０４の出力は、
ｓｏｕｎｄ＿ｆｆｔ［ｎ］−ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］
で与えられる。ただし、ｎ＝０、１，２、・・・、５１１である。

The subtractor 904 subtracts the output 807 s of the multiplier 807 from the frequency amplitude component 802 s from the Fourier transform unit 802. The output of the subtractor 904 is
sound_fft [n] -sub_profile [n]
Given in. However, n = 0, 1, 2,... 511.

演算切替え制御部９０１は、信号入力制御部８０１からのズーム制御情報と雑音時間変化情報記憶部８０８からの周波数変動傾向情報ｆｌｕｃｔｕａｔｉｏｎ［ｎ］の値に従い、乗算器９０２又は減算器９０４の出力を選択して逆フーリエ変換部８１０に出力する。 The arithmetic switching control unit 901 selects the output of the multiplier 902 or the subtracter 904 in accordance with the zoom control information from the signal input control unit 801 and the value of the frequency fluctuation tendency information fluctuation [n] from the noise time change information storage unit 808. And output to the inverse Fourier transform unit 810.

周波数変動傾向情報ｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝０の場合、演算切替え制御部９０１は、ズーム動作開始時からズーム動作時間内のある一定時間までは、乗算器９０２の出力を出力信号線８０９ｓに接続する。そして、その一定時間の経過後には、演算切替え制御部９０１は、減算器９０４の出力を出力信号線８０９ｓに接続する。 When the frequency variation tendency information fluctuation [n] = 0, the arithmetic switching control unit 901 connects the output of the multiplier 902 to the output signal line 809s from the start of the zoom operation to a certain time within the zoom operation time. Then, after the fixed time has elapsed, the arithmetic switching control unit 901 connects the output of the subtractor 904 to the output signal line 809s.

周波数変動傾向情報ｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝１の場合、演算切替え制御部９０１は、ズーム動作中常時、減算器９０４の出力を出力信号線８０９ｓに接続する。 When the frequency variation tendency information fluctuation [n] = 1, the arithmetic switching control unit 901 connects the output of the subtractor 904 to the output signal line 809s at all times during the zoom operation.

周波数変動傾向情報ｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝２の場合、演算切替え制御部９０１は、ズーム動作開始時からズーム動作時間内のある一定時間までは、減算器９０４の出力を出力信号線８０９ｓに接続する。その一定時間の経過後には、演算切替え制御部９０１は、乗算器９０２の出力を出力信号線８０９ｓに接続する。 When the frequency variation tendency information fluctuation [n] = 2, the arithmetic switching control unit 901 connects the output of the subtractor 904 to the output signal line 809s from the start of the zoom operation to a certain time within the zoom operation time. After the fixed time has elapsed, the arithmetic switching control unit 901 connects the output of the multiplier 902 to the output signal line 809s.

逆フーリエ変換部８１０は、逆フーリエ変換部２０９と同じ方法で、雑音低減部８０９から出力される５１２点の雑音低減スペクトルを１０２４点に拡張し、フーリエ変換部８０２からの位相情報を用いて逆フーリエ変換する。これにより、周波数領域でズーム雑音を低減した音声データが得られる。逆フーリエ変換部８１０は、このようにして得られた音声データを信号出力制御部８２２に供給する。 The inverse Fourier transform unit 810 extends the 512-point noise reduction spectrum output from the noise reduction unit 809 to 1024 points in the same manner as the inverse Fourier transform unit 209, and performs inverse using the phase information from the Fourier transform unit 802. Fourier transform. Thereby, audio data with reduced zoom noise in the frequency domain can be obtained. The inverse Fourier transform unit 810 supplies the audio data obtained in this way to the signal output control unit 822.

信号出力制御部８１１は、信号出力制御部２１０と同様に、逆フーリエ変換部８１０からの音声データをフレーム毎に繋ぎ合わせ、メモリバス７０９を通してメモリ７０８の音声データ領域に書き戻す。 Similar to the signal output control unit 210, the signal output control unit 811 connects the audio data from the inverse Fourier transform unit 810 for each frame and writes it back to the audio data area of the memory 708 through the memory bus 709.

本実施例でも、ズーム非動作時の音声データを雑音低減処理部７０５がメモリ７０８から読み出してメモリ７０８に書き戻すようにしているが、雑音低減処理部７０５を経由させないようにしてもよい。この場合、雑音低減処理部７０５の信号入力制御部８０１から信号出力制御部８１１に直接至る信号路は不要となる。 Also in this embodiment, the noise reduction processing unit 705 reads out the audio data when the zoom is not operated from the memory 708 and writes it back to the memory 708. However, the audio data may not be passed through the noise reduction processing unit 705. In this case, a signal path directly from the signal input control unit 801 of the noise reduction processing unit 705 to the signal output control unit 811 becomes unnecessary.

雑音時間変化情報記憶部８０８からのズーム雑音の周波数毎の時間変動傾向と、信号入力制御部８０１からのズーム制御情報により、雑音低減部８０９の演算方法を切り替える理由を、説明する。 The reason why the calculation method of the noise reduction unit 809 is switched based on the time fluctuation tendency of the zoom noise for each frequency from the noise time change information storage unit 808 and the zoom control information from the signal input control unit 801 will be described.

図１４は、ある周波数ｎ＝ｎ’においてｆｌｕｃｔｕａｔｉｏｎ［ｎ’］＝２の場合、即ち、減少傾向の場合の、ズーム雑音の周波数成分と時間との関係を示す。ノイズプロファイル記憶部８０３に記憶されるノイズプロファイルは、時間で変化するズーム雑音の周波数成分を時間方向にピークホールドしたものであるので、ｐｒｏｆｉｌｅ［ｎ’］は図１４で示す値となる。雑音区間をｔ１〜ｔ３とした時、雑音の周波数成分が時間と共に小さくなるにも関わらず、ｐｒｏｆｉｌｅ［ｎ’］が固定値である。従って、減算器９０４の出力ｓｏｕｎｄ＿ｆｆｔ［ｎ］−ｓｕｂ＿ｐｒｏｆｉｌｅ［ｎ］を出力信号線８０９ｓに接続するケースでは、特にｔ２〜ｔ３の区間で、所望音声が重畳されているときには大きく音質が劣化してしまう。 FIG. 14 shows the relationship between the frequency component of zoom noise and time when fluctuation [n ′] = 2 at a certain frequency n = n ′, that is, when the frequency tends to decrease. Since the noise profile stored in the noise profile storage unit 803 is obtained by peak-holding the frequency component of zoom noise that varies with time in the time direction, profile [n ′] has the value shown in FIG. When the noise interval is t1 to t3, the profile [n ′] is a fixed value even though the frequency component of the noise decreases with time. Therefore, in the case where the output sound_fft [n] −sub_profile [n] of the subtractor 904 is connected to the output signal line 809s, the sound quality is greatly deteriorated particularly when the desired sound is superimposed in the interval t2 to t3. .

そこで、本実施例では、変動傾向ｆｌｕｃｔｕａｔｉｏｎ［ｎ’］＝２の場合に、区間ｔ２〜ｔ３においては、演算切替え制御部９０１が、乗算器９０２の出力ｓｏｕｎｄ＿ｆｆｔ［ｎ］×Ｋ［ｎ］を出力信号線８０９ｓに接続するようにした。すなわち、乗算器９０２による雑音低減処理を選択することで、雑音の低減量を適切に制御し、所望音声の劣化を抑えることができる。 Therefore, in this embodiment, in the case of the fluctuation tendency fluctuation [n ′] = 2, the operation switching control unit 901 outputs the output sound_fft [n] × K [n] of the multiplier 902 in the sections t2 to t3. The signal line 809s is connected. That is, by selecting the noise reduction processing by the multiplier 902, it is possible to appropriately control the amount of noise reduction and suppress degradation of the desired speech.

演算切替え制御部９０１は、信号入力制御部８０１から供給されるズーム制御情報により時間ｔ１、ｔ２、ｔ３を判断し、乗算器９０２と減算器９０４の出力を切り替える。 The arithmetic switching control unit 901 determines the times t1, t2, and t3 based on the zoom control information supplied from the signal input control unit 801, and switches the outputs of the multiplier 902 and the subtracter 904.

図１５はｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝０のときの演算切替え制御部９０１による切替え例を示す。図１６はｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝１のときの演算切替え制御部９０１による切替え例を示す。図１７はｆｌｕｃｔｕａｔｉｏｎ［ｎ］＝２のときの演算切替え制御部９０１による切替え例を示す。 FIG. 15 shows an example of switching by the calculation switching control unit 901 when fluctuation [n] = 0. FIG. 16 shows an example of switching by the calculation switching control unit 901 when fluctuation [n] = 1. FIG. 17 shows an example of switching by the calculation switching control unit 901 when fluctuation [n] = 2.

ズーム制御情報として、ズーム時における駆動部（モータ）の回転情報と、駆動部の位置情報を、制御部７０２から信号入力制御部８０１を介して雑音低減部８０９に供給しても良い。この場合、雑音時間変化情報記憶部８０８で記憶される変動傾向情報は、例えばｆｌｕｃｔｕａｔｉｏｎ［ｎ’］＝２の場合（減少傾向の場合）、図１８で示されるようにモータの位置情報に対する雑音の周波数成分の変動傾向となる。 As zoom control information, rotation information of the drive unit (motor) during zooming and position information of the drive unit may be supplied from the control unit 702 to the noise reduction unit 809 via the signal input control unit 801. In this case, the fluctuation tendency information stored in the noise time change information storage unit 808 is, for example, when fluctuation [n ′] = 2 (in the case of a decreasing tendency), as shown in FIG. The frequency component tends to fluctuate.

図１８で、ｐ１、ｐ２、ｐ３はズームに関するモータの位置情報である。ｐ１が、撮像装置のズーム操作における広角側を、ｐ３が望遠側を示す。つまり、図１８に示す周波数では、雑音の周波数成分が望遠側にいけばいくほど小さくなる。信号入力制御部８０１から位置情報ｐ１〜ｐ３とズーム回転方向情報を雑音低減部８０９に供給する。雑音低減部８０９は、これらの情報と雑音時間変化情報記憶部８０８からの変動傾向情報に基づき、乗算器９０２の出力か減算器９０４の出力かを切り替える。 In FIG. 18, p1, p2, and p3 are motor position information regarding zoom. p1 indicates the wide-angle side in the zoom operation of the imaging apparatus, and p3 indicates the telephoto side. That is, at the frequency shown in FIG. 18, the noise frequency component becomes smaller as it goes to the telephoto side. Position information p <b> 1 to p <b> 3 and zoom rotation direction information are supplied to the noise reduction unit 809 from the signal input control unit 801. The noise reduction unit 809 switches between the output of the multiplier 902 and the output of the subtractor 904 based on the information and the fluctuation tendency information from the noise time change information storage unit 808.

Claims

Voice input means;
Fourier transform means for converting a time-domain sound signal input by the sound input means into a frequency-domain sound signal spectrum;
Noise profile storage means for storing a noise profile indicating a frequency amplitude component of noise to be reduced;
Frequency component dividing means for obtaining a ratio of the audio signal spectrum obtained by the Fourier transform means and the noise profile for each frequency;
Time change control means for smoothing the ratio for each frequency obtained by the frequency component dividing means for each frequency;
Subtraction coefficient calculating means for calculating a subtraction coefficient according to the output of the time change control means;
Multiplying means for multiplying the noise profile by the subtraction coefficient;
Noise reduction means for subtracting the output of the multiplication means from the audio signal spectrum;
An inverse Fourier transform means for restoring the output of the noise reduction means to a time domain audio signal;
And a voice output unit that outputs the voice signal restored by the inverse Fourier transform unit.

Furthermore, it has change information storage means for storing trend information indicating a tendency of change in the frequency component of the noise,
The noise reduction means includes
A subtractor for subtracting the output of the multiplication means from the audio signal spectrum;
A multiplier for adjusting the intensity of the audio signal spectrum according to the ratio for each frequency smoothed by the time change control means;
The speech processing apparatus according to claim 1, further comprising a switching unit that switches an output of the subtracter and the multiplier according to the trend information.

The speech processing apparatus according to claim 2, wherein the switching means switches the output of the multiplier to the output of the subtracter after selecting the output of the multiplier for a predetermined period when the trend information indicates an increasing tendency .

The speech processing apparatus according to claim 2, wherein when the trend information indicates a decreasing tendency, the switching means switches the output of the subtracter to the output of the multiplier after selecting the output for a predetermined period .

5. The multiplier according to claim 2, wherein the multiplier multiplies the audio signal spectrum by a coefficient that decreases as the ratio for each frequency smoothed by the time change control unit decreases. The speech processing apparatus according to the description.

Further comprising a noise time holding means for holding a noise time information including the time of the end time and resonance of sound beginning of the noise,
The trend information indicates a tendency of the frequency component of the noise to change in the time direction;
6. The speech processing apparatus according to claim 2, wherein the switching unit switches outputs of the subtracter and the multiplier according to the noise time information and the trend information.

Furthermore, it has means for holding the position information of the noise,
The trend information indicates a tendency of a change in a position of a frequency component of the noise;
The speech processing apparatus according to claim 6, wherein the switching unit switches the outputs of the subtracter and the multiplier according to the noise time information and the trend information.

Having imaging means including a zoom lens;
The noise is noise generated by driving the zoom lens.
The speech processing apparatus according to claim 1, wherein