JPS6044680B2

JPS6044680B2 - Audio information filter

Info

Publication number: JPS6044680B2
Application number: JP56048173A
Authority: JP
Inventors: 利之坂井; 博斉藤; 正宏浜田; 英樹藤恵; 良二鈴木
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1981-03-30
Filing date: 1981-03-30
Publication date: 1985-10-04
Also published as: JPS57161799A

Description

【発明の詳細な説明】本発明は音声情報フィルタに関し、その目的とするとこ
ろは雑音の加わつた音声から雑音成分を適確に取り除き
、聞き取りやすい、きれいな音声情報を再生することの
できる音声情報フィルタを提供することにある。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to an audio information filter, and its purpose is to provide an audio information filter that can accurately remove noise components from noisy audio and reproduce clear audio information that is easy to hear. Our goal is to provide the following.

一般にマイクロホンから遠く離れた地点で発せられる音
声情報、或いは周囲雑音の大きな環境下で発声される音
声情報には、雑音の重畳が避け難い。In general, it is difficult to avoid superimposition of noise on voice information that is emitted at a point far away from a microphone or in an environment with large ambient noise.

しかし、この様な雑音の多い環境下に於いても、きれい
な音声情報を収録したい。音声認識装置の認識率を出来
るだけ高めたい、などの要請がつよい。このためには雑
音の重畳した音声情報からできるだけ雑音を除去する必
要があるが、このような要求に十分に応えられる技術は
数少ない。特に雑音の重畳した音声情報より雑音を除去
するとき、（１）雑音が周期性を持つている場合、（Ｉ
ｉ）雑音が突発的な場合、｛Ｉｉｉ）雑音のエネルギが
音声の主要エネルギが存在する全ての帯域に於て音声エ
ネルギより大きい場合、等においては雑音のみを効果的
に除去することがむずかしいものであつた。本発明は、
入力信号をＩチヤンネルの帯域フィルタで各帯域エネル
ギに分析し、それらを一定時．間毎（これをフレームと
呼ぶ）のスペクトル時系列に区分けしたときに得られる
基板目状のエネルギパターンの統計的な出現確率が、入
力信号が音声のみの場合と、音声に雑音が重畳している
場合とで異なつているという事実に着目している。本．
発明実施の際に得られるこのようなスペクトル時系列の
概念図を示す。図中１はチヤンネル番号のパラメータ、
ｋはフレーム番号のパラメータであり、これらはそれぞ
れ最大１１およびｋまて存在する。今第１チヤンネル目
、第ｋフレーム目にあ・たる一つの基板目（セルと呼ぶ
）に着目すると、このセルは第１チヤンネル目の帯域フ
ィルタで分析された帯域信号の第ｋフレーム目での短時
間平均エネルギが、予め定められたエネルギの閾値によ
つて量子化された後に書き込まれた部分であることがわ
かる。従つて第１図は、Ｋ行１列のエネルギマトリツク
スとなつており、このパターンによつて入力信号全体が
特徴づけられている。この中で連続したＮＮチヤネルと
連続したＭＭフレームとから成る比較的小さい領域（単
位メツシユと呼ぶ）をとり、この領域が入力信号の局所
的な特徴を示しているものと考えると、単位メツシユに
現われる量子化エネルギのパターン（単位パタ一・ン）
によつて入力信号の特徴を局所的に把えることができる
。具体的には上記単位メツシユを、ｎ１セルを帯域１に
固定してフレーム方向に１フレームづつ移動させながら
、それぞれの場合に生じる単位パターンの累績頻度を求
め、全出現頻度で割つて出現確率密度関数Ｐｉ（γ）を
求める。ここではγは単位パターンの種類を示すインデ
ツクスで、単位メツシユがＮＮチヤンネルとＭＭフレー
ムとから成り立つているときはγ＝１，２，・・・・Ｌ
ｎＮｘ′″Ｍとなる。（但し、ここでＬは量子化レベル
の総数を示す）となる。また得られる出現確率密度関数
の集合は｛Ｐｉ（γ）Ｉｉ＝１，２，・・・Ｉ−ｎ、＋
１｝となる。これを実施する場合には、実際の雑音除去
処理時に、雑音の重畳した音声信号に対して各々のチヤ
ネルとフレームとに関する単位パターンを順次計測しつ
つ、得られた単位パターンのインデツクスに基いて、予
めメモリに書き込まれた標準のＰ１（γ）を参照し、こ
のＰｉ（γ）が予め定められた確率密度の閾値θｉより
小さければ非合法パターンとして該当する単位メツシユ
部の帯域信号を減衰器て減衰させ、閾値θｉより大きけ
れば合法パターンとして減衰させることなく加算出力回
路に入力する。第２図は以上の処理を実施するための装
置のプロック図である。However, even in such a noisy environment, we want to record clear audio information. There is a strong demand for increasing the recognition rate of speech recognition devices as much as possible. To this end, it is necessary to remove as much noise as possible from the noise-superimposed audio information, but there are only a few technologies that can fully meet this requirement. In particular, when removing noise from speech information on which noise is superimposed, (1) If the noise has periodicity, (I
It is difficult to effectively remove just the noise in i) when the noise is sudden, {iii) when the energy of the noise is greater than the voice energy in all bands where the main energy of voice exists, etc. It was hot. The present invention
The input signal is analyzed into each band energy by the I-channel bandpass filter, and these are divided at a certain time. The statistical appearance probability of the board-like energy pattern obtained when dividing the spectrum into time series for each interval (this is called a frame) is different when the input signal is only audio and when noise is superimposed on the audio. We are focusing on the fact that the situation is different depending on the situation. Book.
A conceptual diagram of such a spectrum time series obtained when implementing the invention is shown. 1 in the figure is the channel number parameter,
k is a frame number parameter, and these exist up to 11 and k, respectively. Now, if we focus on one substrate (called a cell) in the kth frame of the 1st channel, this cell is the one that is used for the kth frame of the band signal analyzed by the bandpass filter of the first channel. It can be seen that the short-time average energy is the portion written after being quantized by a predetermined energy threshold. Therefore, FIG. 1 is an energy matrix with K rows and 1 column, and the entire input signal is characterized by this pattern. If we take a relatively small region (called a unit mesh) consisting of consecutive NN channels and consecutive MM frames and consider that this region represents the local characteristics of the input signal, we can create a unit mesh. Pattern of quantized energy that appears (unit pattern)
It is possible to locally understand the characteristics of the input signal. Specifically, while fixing the n1 cell to band 1 and moving the unit mesh one frame at a time in the frame direction, find the cumulative frequency of the unit pattern that occurs in each case, and divide it by the total frequency of occurrence to calculate the probability of occurrence. Find the density function Pi(γ). Here, γ is an index indicating the type of unit pattern, and when the unit mesh consists of NN channels and MM frames, γ=1, 2,...L
nNx′″M (where L indicates the total number of quantization levels).The set of appearance probability density functions obtained is {Pi(γ)Ii=1,2,...I −n, +
1}. When implementing this, during actual noise removal processing, unit patterns related to each channel and frame are sequentially measured for the noise-superimposed audio signal, and based on the index of the obtained unit pattern, Referring to the standard P1(γ) written in the memory in advance, if this Pi(γ) is smaller than a predetermined probability density threshold θi, it is considered an illegal pattern and the band signal of the corresponding unit mesh part is attenuated. The pattern is attenuated, and if it is larger than the threshold value θi, it is input to the addition output circuit as a legal pattern without being attenuated. FIG. 2 is a block diagram of an apparatus for carrying out the above processing.

本装置への入力信号ｘ（ｔ）は入力端子より印加され、
先ず帯域フィルタ２０１（１＝１，２，・・ｎ）によつ
て複数の周波数帯域信号Ｕｉ（ｔ）（１＝１，２，・・
ｎ）に分割される。各周波数帯域信号Ｕｉ（ｔ）は、そ
れぞれ遅延時間γｉ（１＝１，２，・・ｎ）秒なる遅延
回路３０１（１＝１，２・・・・・ｎ）に導かれると同
時に平均エネルギ計測回路６０１（１＝１，２・・・・
・ｎ）にも導かれる。平均エネルギ計測回路６０ｉは各
フレーム毎に各周波数帯域信号Ｕｌ（ｔ）の短時間平均
エネルギＷｉ（ｋ）（１＝１，２・・・ｎ）を計測し、
予め定められた量子化のための各閾値λｌ（ト）＝１，
２，・・，Ｌ）との比較によつて多値化した後、得られ
た帯域別多値化時系列パターンを制御回路７００に出力
する。この際、各閾値λｌは該当単位メツシユ毎に値を
正規化してもよい。制御回路７００はこのパターンを受
け、対応するパターンインデツクスγに従つて記憶回路
９００からＰｉ（γ）を読み、予め定められた確率密度
の閾値θｉに従つて減衰器４０１（１＝１，２・・・・
・ｎ）の減衰量を制御するが、このとき遅延回路３０１
から出力された各帯域遅延信号ＵＩ（ｔ）がちようど減
衰器４０１に印加されるように各部の応答時間を設定し
ておかなくてはならない。より具体的には、任意チヤネ
ル、任意フレームで生じた単位パターンのＰｉ（γ）が
θｉ以下てあつたとき、どの帯域信号Ｕｉ（ｔ）の方向
フレームを減衰させるべきかとの問題が残つている。こ
れに対しては、対象となつた単位メツシユの中央の１個
のセルに対応する部分の信号のみを減衰させることが現
実的である。こうすることに−より、雑音重畳によつて
今仮りに第１チヤンネル、第ｋフレームに異常が発生し
たとすると、第３図に示すように、セルＩ，ｋとこれを
囲む（ＮＮｘｍＭ）の範囲のセルとに対応する部分の信
号Ｕｉ（ｔ）が減衰させられる可能性がある。即，ち、
５第１，ｋセルの異常によりＩ，ｋセル自身が減衰させ
られる以外に、第１，ｋセルは図中Ｂ，Ｃ，Ｄ，Ｅ等の
単位メツシユの一部ででもあるため、これらの単位パタ
ーンも非合法パターンとなる可能性が十分ある。従つて
図中にそれぞれＢ，．ｃ，ｄ，ｅて示したセルも減衰さ
せられる可能性があり、結局図中Ａで示した単位メツシ
ユに対応する部分の信号Ｕ１（ｔ）が全て減衰させられ
る可能性を持つこととなる。このように制御される減衰
器４０１からの出力信号Ｚｉ（ｔ）（１＝１，２，・・
，ｎ）は加算出力回路５００て加算され、出力端８００
に出力信号ｙ（ｔ）を出力する。尚、上記の実施例ては
制御回路７００において記憶回路９００に予め書き込ま
れた単位パターンの標準の出現確率密度周波数Ｐｌ（γ
）を参照し、制御回路７００に読み込まれる単位パター
ンの出現確率密度の閾値θｉを大小比較して、その結果
にもとずいて減衰器４０１を制御したが、これ以外にも
記憶回路９００に予め求められた合法パターンを全て記
憶させておき、制御回路７００で実際のパターンを合法
パターンで近似する作業を追加して雑音除去する処理を
行てもよい。The input signal x(t) to this device is applied from the input terminal,
First, a plurality of frequency band signals Ui(t) (1=1, 2, . . . n) are filtered by a band filter 201 (1=1, 2, . . .
n). Each frequency band signal Ui(t) is guided to a delay circuit 301 (1=1, 2...n) with a delay time γi (1=1, 2,...n) seconds, and at the same time the average energy Measurement circuit 601 (1=1, 2...
・It is also guided by n). The average energy measuring circuit 60i measures the short-time average energy Wi(k) (1=1, 2...n) of each frequency band signal Ul(t) for each frame,
Each predetermined threshold for quantization λl(t)=1,
2, . At this time, the value of each threshold value λl may be normalized for each corresponding unit mesh. The control circuit 700 receives this pattern, reads Pi(γ) from the storage circuit 900 according to the corresponding pattern index γ, and controls the attenuator 401 (1=1, 2) according to the predetermined probability density threshold θi.・・・・・・
・The attenuation amount of n) is controlled, but at this time the delay circuit 301
The response time of each part must be set so that each band delay signal UI(t) outputted from the attenuator 401 is applied to the attenuator 401 in the same way. More specifically, when Pi(γ) of a unit pattern generated in an arbitrary channel and arbitrary frame is less than θi, the question remains as to which direction frame of the band signal Ui(t) should be attenuated. . To deal with this, it is practical to attenuate only the signal in the portion corresponding to one cell in the center of the unit mesh. By doing this, if an abnormality occurs in the first channel, frame k, due to noise superimposition, as shown in Fig. The signal Ui(t) in the portion corresponding to the range of cells may be attenuated. Immediately, thi,
5 In addition to the I, k cells themselves being attenuated due to the abnormality of the 1, k cells, the 1, k cells are also part of unit meshes such as B, C, D, and E in the figure; There is a good possibility that the unit pattern is also an illegal pattern. Therefore, B, . The cells indicated by c, d, and e may also be attenuated, and as a result, there is a possibility that the signal U1(t) in the portion corresponding to the unit mesh indicated by A in the figure is all attenuated. The output signal Zi(t) from the attenuator 401 controlled in this way (1=1, 2,...
, n) are added by the addition output circuit 500, and the output terminal 800
The output signal y(t) is output to. Note that in the above embodiment, the control circuit 700 calculates the standard appearance probability density frequency Pl(γ
), the threshold value θi of the appearance probability density of the unit pattern read into the control circuit 700 is compared in size, and the attenuator 401 is controlled based on the result. All the determined legal patterns may be stored, and the control circuit 700 may additionally perform a process of approximating the actual pattern with the legal pattern to remove noise.

また記憶回路９００中に記憶された雑音重畳時のパター
ン出現確率の変化性状を、制御回路７００において実際
のパターンと実時間処理で比較し、両者の差違に基いて
雑音の有無を判定し、雑音除去する処理を行なつてもよ
い。以上述べたように、本発明によれば、音声のみでの
計測時に多く出現するエネルギ・パターン（合法パター
ン）と同一のパターンを持つ信号部分を雑音の重畳した
信号から選択的に拾い出すことができ、一方雑音の重畳
によつてエネルギ・パターンが異常になつた（非合法パ
ターン）信号部分は捨て去ることができるのて、雑音成
分を適確に取り除き、聞きとりやすい、きれいな音声情
報を再生することのできる音声情報フィルタを実現する
ことができるものてある。In addition, the control circuit 700 compares the change characteristics of the pattern appearance probability when noise is superimposed, which is stored in the storage circuit 900, with the actual pattern through real-time processing, and determines the presence or absence of noise based on the difference between the two. You may perform a process to remove it. As described above, according to the present invention, it is possible to selectively pick up a signal portion having the same energy pattern (legal pattern) that often appears when measuring only audio from a signal superimposed with noise. On the other hand, signal parts whose energy patterns become abnormal (illegal patterns) due to the superposition of noise can be discarded, so noise components can be appropriately removed and clear, easy-to-listen audio information can be reproduced. There are some things that can realize a voice information filter that can do this.

[Brief explanation of drawings]

第１図は本発明実施に際して計測されるスペクトル時系
列の概念図、第２図は本発明の音声情報フィルタの一実
施例を示すプロツク図、第３図はその動作説明図てある
。１００・・・・・・入力端子、２０１（１＝１，・・，
ｎ）・・・・・帯域フィルタ、３０１（１＝１，・・，
ｎ）・・・・・遅延回路、４０１（１＝１，・・，ｎ）
・・・・減衰器、５００・・・・加算出力回路、６０１
（１＝１，・・，ｎ）・・・・平均エネルギ計測回路
、７００・・・・・制御回路、８００・・・・・・出力
端子、９００・・・・単位パターン出現確率密度記憶用
回路。FIG. 1 is a conceptual diagram of a spectrum time series measured in carrying out the present invention, FIG. 2 is a block diagram showing an embodiment of the audio information filter of the present invention, and FIG. 3 is an explanatory diagram of its operation. 100...Input terminal, 201 (1=1,...,
n)...Band filter, 301 (1=1,...,
n)...Delay circuit, 401 (1=1,...,n)
... Attenuator, 500 ... Addition output circuit, 601
(1=1,...,n)... Average energy measurement circuit, 700... Control circuit, 800... Output terminal, 900... Unit pattern appearance probability density storage. circuit.

Claims

[Scope of Claims] 1. Band dividing means for dividing an input signal into a plurality of frequency bands; Delay means for delaying each band signal divided by the band dividing means by an arbitrary time; attenuating means for attenuating each of the band signals by an arbitrary amount; addition output means for adding and final outputting the respective band signals attenuated by the attenuating means; comprising an average energy measuring means for measuring the average energy of the signal at regular time intervals, and a control means capable of controlling the control amount of the attenuation means at regular time intervals; The appearance probability density of each pattern formed by a multidimensional time series over one or more time intervals of the quantized average energy obtained at fixed time intervals from one or more band signals is calculated by eliminating noise. For the input signal with superimposed noise, the appearance probability density obtained in the same pattern as the pattern of each multidimensional time series obtained by the same method as above is calculated in advance for long-duration speech that does not contain sound. When the probability density is smaller than a predetermined threshold value for each band, it is regarded as an illegal pattern and the band signal in the band and time interval corresponding to the multidimensional time series is attenuated by the attenuation means. and when it is larger than a threshold value, it is regarded as a legal pattern and the band signal is inputted to the addition output means without attenuating it. 2. In the audio information filter according to claim 1, one or more times of the average energy obtained and quantized at regular time intervals from one or more band signals obtained in advance by the band dividing means. Analyze each pattern formed by a multidimensional time series over an interval, and from among all the patterns that have appeared, select all the patterns that have appeared with a sufficiently good degree of approximation in terms of probability of occurrence. A series of patterns that can be approximated (legal patterns) are extracted, and for input signals with superimposed noise, all patterns generated from each multidimensional time series obtained using the same method as above are extracted from the legal patterns. 1. An audio information filter characterized in that a signal portion corresponding to each interval consisting of a frequency band and a time interval on a legal pattern after the approximation is controlled by an attenuation means. 3. In the audio information filter described in claim 2, it is understood in advance how the appearance frequency distribution of legal patterns changes when a signal with superimposed noise is input, and the appearance frequency distribution is sequentially measures the temporal transition of A voice information filter characterized in that the noise superimposed section and the non-noise superimposed section are subjected to different processing operations according to the determination.